[Electronics]  [Files]   [Links]

 
Electronics » the Z80 project » file system

With my IDE interface hardware and basic sector access software up and running, I started to think about implementing a file system. The possibilities ranged from the most basic "just save stuff to consecutive sectors and keep a list of filenames and a note to their start positions" to a full implementation of an existing file system such as Microsoft's "FAT".

After looking into the FAT disk structure I decided it was too complicated for my purposes. I wasn't interested in boot sectors, partitions and variable cluster sizes and the filename limit of "8.3" characters would be a bind (I wasn't about to go jumping through hoops to use MS's workarounds, expecially with only a few kilobytes free in my Z80's ROM space).

At the other end of the scale, the simple "dump and tag" approach was a little too basic, what with its lack of directories ("folders" in MS-speak) and general organization. What I needed was a proper file system that would provide most of the functionality of FAT but without the unnecessary stuff so I decided to make my own.. (It would be a bit of a shame that I wouldn't be able to read PC-format media natively but it wasn't that much of a concern as I could still import/export stuff via the Z80 project's serial interface where needed).

Interlude - Some general filesystem background:

In the simplest terms, a file system is a software interface (or "layer of abstraction") whereby files can be saved, loaded and deleted from a disk without the user having to worry about where the bits of the file are actually distributed - only a filename is required for access. In addition, a file system keeps track of which sectors are used and which are free as well as providing useful organisational features such as sub-directories.

Generally a file system divides up the storage area into "units of storage" and a map (stored in one or more of those units) keeps tabs on every storage unit on the disk, signifying whether it is empty / used / corrupt or whatever. PC hard drives (and Compact Flash cards in true IDE mode) inherently have sectors that are 512 bytes in length and these sectors are individually addressable - therefore the "granularity" is 512 bytes. In reality though you dont really want to individually tag each sector in your storage unit map: Modern hard drives have capacities of Gigabytes and to keep tabs on each 512 byte sector even on a 1GB drive you'd need 2 million entries in your map - pretty unwieldy!

Instead, what happens is that file system creators clump sectors together into what MS calls "clusters" and each cluster is given an entry in the disk map rather than each sector. On a large FAT32 drive, each cluster is 32KB long (in Windows, if you right click a file and select properties it'll show the actual file length and the amount of disk space it is using - always steps of 32KB). It can be seen that on smaller capacity disks, smaller cluster sizes are an advantage (they're less wasteful and as the capacity is smaller the unit tracking map wont be huge) and indeed this is what Windows FAT32 formatting does - sets the clusters to the most appropriate size, based on the capacity of the disk.

Meanwhile, back at the Z80 Project..

In designing my file system (which I called PQFS - "Phil's Quickie File System":) I tried to keep things as simple as possible. I fixed my cluster size to 32KB (64 sectors - which I call a Block, just to be different) no matter what. There were other trade-offs like the area of the drive that the file system could "see" and the number of entries allowed in a given directory, but these things were unlikely to be too confining.

In PQFS there are three types of block:

  • Type 01 - The Block Allocation Table (The "BAT")
  • Type 02 - A Directory Block
  • Type 03 - A File Block

The first sector of each block is a header that shows what type each block is and holds information such as the file/directory name. The other 63 sectors of each block hold the actual data for that type of block.

Upon a PQFS format, the BAT block is saved at block location 0 (LBA sector 0), and the root directory goes at block position 1 (LBA sector 64). The first free block is at block position 2 (LBA sector 128).

Structure of each PQFS block type:

The BAT block:

Sector 0:

 Byte Offset:$00      - Ascii "PQFS"
             $04      - Block ID (01 = BAT)
             $05-$1ff - Not used for BAT

Sectors $01-$3f:

 $7e00 bytes, 1 per disk block with values:

             $00 = block is free for use
             $01 = block is in use
             $02 = block contains bad sectors
Directory Blocks:
Sector 0:

 Byte Offset:$00      - Ascii "PQFS"
             $04      - Block ID (02 = dir)
             $05      - Not used
             $06      - Parent Block (0 = root)
             $08-$0f  - Not used for DIR block
             $10-$1f  - This directory name in ascii (padded with spaces)
             $20-$1ff - Not used

Sectors $01-3f:

 Each sector contains upto sixteen of the following 32-byte entries, 1 entry
 per file or subdirectory. Therefore there can be a maximum of 63*16=1008
 files or sub-directories in any directory.

 Byte Offset:$00     - File / Directory name Ascii (padded with spaces)
             $10     - Attributes:
                                   Bit 0    : 1 = entry is a dir
                                   Bits 1-7 : Not used at present
	
             $11     - Not used
             $12     - Block location of named file / directory
             $14     - Length of file (not used if bit 0 of $10 = 1 (ie: a dir))
             $18-$1f - Not used

File Blocks:
Sector 0:

 Byte Offset:$00      - Ascii "PQFS"
             $04      - Block ID (03 = file)
             $05      - Not used
             $06      - The directory block that this file belongs to
             $08      - Next block in chain for this file (if length >$7e00 bytes)
             $0a-$0f  - Not used
             $10-$1f  - File name Ascii (padded with spaces)
             $20-$1ff - Not used

Sectors $01-3f:

 $7e00 bytes of data for file 

Limitations of PQFS:

1. Only one block is allocated to the BAT and with one byte tagging each block, the maximum number of blocks that can be mapped is 63 * 512 = 32256. At 32KB per block the drive size is therefore capped at about 1GB (this could be easily expanded by using n-bit tags per block instead of a whole byte).

2. Only one block is allotted per directory. With each entry being 32 bytes, a directory can hold 63 * 512 / 32 = 1008 files (or subdirectories).

3. As the granularity is fixed at 32KB per block PQFS is wasteful, especially when used with smaller Compact Flash card and small files. Still, even a 16MB card could hold 500+ files or directories. What it does *not* limit is the size of files that can be saved (within the capacity of the disk of course:) File headers contain a file continuation link to the next block that contains data for a particular file (or zero if its the final block).

4. No checksums are implemented as yet - it totally relies on the hardware to report any faults. (There's plenty of places checksums could be stored in the block headers)

File system procedures:

There's number of key operations that are required for any file system: Format, Make directory, Delete directory, Change directory, Save file, Load file etc. Here I'll give give a brief outline of the procedures I used to achieve some of them in PQFS (Naturally all these procedures should actually check that a disk is present (and PQFS formatted - except "Format" obviously) before doing anything.)

"Format"

Very straightforward:

  1. Create a blank, default BAT map (all blocks free except for the first and second blocks - ie: the BAT itself and the ROOT block below) and write it to block 0.
  2. Form a new blank directory block called "ROOT" and write it to block 1.
(If you want to do a "long format" you typically fill each available block on the entire disk, check for bad sectors and make a note of them in the BAT).

"Make a directory"

  1. Scan the BAT for the location of the first free block, note its location (error if disk full)
  2. Check that the required directory name doesn't already exist in the current directory.
  3. Scan the current directory block for a free entry space to place the required name and put it there. (Give an error if the directory is full)
  4. Mark the free BAT location found at the start as taken.
  5. Create the header sector for the new directory.
  6. Write the header sector to the appropriate location.
  7. Zero the rest of the sectors in this block

"Change directory"

A file system can create the impression of branching tree of subdirectories by simply having a variable that holds the block position of the the current directory. When you change directory, the pointer is updated to the block location of the specified directory name. The "parent directory" function is achieved by updating to a pointer from the header of the current directory block (which was placed there when the directory was created.)

To change directory you'd do the following:

  1. Scan the curent directory block for a match to the name specified. Error if not found.
  2. Check the attributes of the match found - is it in fact a file not a dir? Error if apt.
  3. Copy the matched directory's location pointer to your current directory block variable.

(To move "up" a dir - check if the filename is ".." (or whatever) and update to the "parent block pointer" held in the header of the current directory block, giving an error if its already at the root directory).

"Delete a directory"

This isn't as straightforward as you might first think. If you simply remove the directory name entry from the current directory block and a free up it's block in the BAT, all the files and subdirectories that may have been in that directory will become "lost" but will still take up disk space. You therefore have two options, a) step through the entire sub-directory tree, deleting everything as you go.. or b) only allow empty directories to be deleted, forcing the user to do the donkey work.. I did the latter :)

  1. Scan the current directory block for a match to the name specified, error if not found.
  2. Check the attributes of the match found, is it in fact a file not a dir? Error if apt.
  3. Move to the block position of the directory match and check its entry list to make sure the directory in question is empty. Error if it contains stuff.
  4. Go back to the current directory block and delete the unwanted dir entry.
  5. In the BAT, mark the block position previously used by the directory as free.

"Save a file"

The operates much like the procedure for creating a directory, with the added complexitity of linking blocks for files longer than 63 * 512 bytes.

  1. Scan the BAT for the location of the first free block, note its location (error if disk full)
  2. Check that the required file name doesn't already exist in the current directory.
  3. Scan the current directory block for a free entry space to place the required name and put it there. (If the directory is full -> error.)
  4. Mark the free BAT location found at the start as taken.
  5. Create the header sector for the new file.
  6. Write out the header sector to the appropriate location and fill the block's following sectors with bytes from the file. Stop when all bytes written or the block is full.
  7. If all bytes have been saved, end the procedure.
  8. Scan the BAT for another free block, note its location.
  9. Mark new BAT block taken.
  10. Update the "continuation pointer" in the header of the previous file block to point to the new block
  11. Create a header sector for this next block.
  12. Go to to step 6

"Load a file"

Normally it's best to break up this procedure into two seperate routines. The first being "Open file" (which checks for the existence of the file etc) and the second "Get file data" (which actually loads in the data). This means that programs can get information about a file without actually loading the whole thing, plus you can modify the information that "Open file" produces to truncate the file length etc.

"Open File"

  1. Look for a match of the required file name in the current directory block (error if not found)
  2. Check that the entry is a file and not a directory (error if a dir).
  3. Note the starting block of the file.
  4. Load the header sector of the file and note all relevent info; file length etc.

"Get File Bytes"

  1. Obviously dont procede if "Open File" failed.
  2. Move to the block addresses saved by "Open File"
  3. Load in data bytes from the block to memory, decrementing the "file length" variable created by "Open File" as you go.
  4. If you reach the end of the last sector of the block before "file length" = 0, read in the header of this block and obtain the file continuation block address.
  5. Move to the new block address and goto step 3

"Delete a file"

Like "delete a directory" except you must follow any continuation links and free up their blocks in the BAT too.

  1. Scan the curent directory block for a match to the name specified, error if not found.
  2. Check the attributes of the match found, is it in fact a directory not a file? Error if apt.
  3. Make a note of the file's start block.
  4. Delete the unwanted directory entry.
  5. In the BAT, mark the block position previously used by the file block as free.
  6. Check the header of this block for a continuation link. End if it is zero.
  7. Goto step 5, using the continuation link just found as the file's block position.

And essentially, that's my quickie file system. Obviously it could be greatly improved but as I say simplicity and compactness were my goals. Note also that this file system does not offer any kind of wear-levelling*, so isn't particularly suitable for flash-memory based cards instead of hard disks. (Saying that, some cards allegedly feature automatic wear-levelling internally..) It was hardly likely to get hammered on my Z80 Project anyway :)

(* Spreading the writes across the entire storage area, instead of re-writing the same locations over and over. Flash memory has finite write endurance, and sectors such as those in the block allocation table are hit frequently, making them prime candidate for failure).