CS111 - Project 3A: File System Dump


Project 3 is expected to involve much more programming than any of the other projects.  You will design and implement programs to analyze file systems and diagnose corruption.  In part A, you will write a program to read the image of a file system, analyze it, and summarize its contents in several csv files. These csv files will be used in part B, where we will analyze the consistency of file system data structures and diagnose problems.

Part A can be broken into two major steps:

  1. Mount the provided image file on your own Linux and explore it with debugfs(8).
  2. Write a program to analyze the image file and output a summary to six csv files (describing the super block, cylinder groups, free-lists, inodes, indirect blocks, and directories).


This project more deeply explores the file and directory concepts described in Arpaci chapter 39.

This project is based on the same EXT2 file system that is discussed in sections 40.2-40.5.  Part B of this project goes much deeper than the introductory discussion of integrity presented in sections 42.1-2.



A single tarball (.tar.gz) containing:


Historically, file systems were almost always been implemented as part of the operating system, running in kernel mode.  Kernel code is expensive to develop, difficult to test, and prone to catastrophic failures.  Within the past 15 years or so, new developments have made it possible to implement file systems in user mode, improving maintainability … and in some cases delivering even better performance than could be achieved with kernel code.  All of this project will be done as user-mode software.

To ensure data security and integrity, disks are generally protected from access by ordinary applications.  Linux supports the creation, mounting, checking, and debugging of file systems stored in ordinary files.  In this project, we will provide EXT2 file system images in ordinary files.  Because they are in ordinary files (rather than protected disks) you can access/operate on those file system images with ordinary user mode code.

To complete this assignment, you will need to learn a few things:

Step 1 - study a provided file system image

Your program will be tested on multiple (broken) file systems. Here we provide one as an example. SEASnet servers do not support commands like sudo(8), mount(8), or debugfs(8), so to play with our provided file system image, you will have to install a Linux distribution (if you do not have one) on your own computer. For simplicity and convenience, you may choose to install Linux inside a virtual machine, for example VirtualBox (it’s free!).

Then download this image file, and mount it onto your own Linux, with the following commands:

mkdir fs
sudo mount -o loop disk-image fs
sudo chown -R
$USER fs        … where $USER is your user-name

Note:  this file is 200 Mbytes in size.

Now, you can navigate the file system, just like an ordinary directory, with commands like ls(1), cat(1), and cd(1). After you are done with it, you can unmount with the following command:

sudo umount fs

Before you start writing your C program to interpret the “disk-image” file, you can explore it further using debugfs(8) (in your own Linux). You may find many useful commands in its man page. Some particularly helpful ones are: stats, stat, bd, testi, and testb.  If you have problems interpreting parts of the file system, you can use the debugfs program to help you understand the correct contents.

To ensure you are correctly interpreting the file system image, we have included many unusual things, which would not likely be caught by an incorrect implementation:

Step 2 - write a program to summarize the file system’s contents

In this step, you will write a program called lab3a that:

Please note that, although you cannot mount the provided image file on departmental servers, your lab3a program should be able to run on departmental servers, just like all previous assignments.

There are to be six csv-format files, each summarizing a different part of the file system.  Remember, you can always check your program’s output against debugfs’s output. All the information required for the summary can be manually found and checked by using debugfs.

The file system images your program examines will, in many cases be damaged (as might result from missing or mis-directed writes).  This means that you will have to do some Defensive Programming in your analysis program: validating parameters and pointers before you use them.  In addition to correctly analyzing the file system and group parameters, inodes, block pointers, and directory entries, your program will also be expected to report (to stderr) any invalid parameters or addresses.

  1. super block

A basic set of file system parameters.



magic number


total number of inodes


total number of blocks


block size


fragment size


blocks per group


inodes per group


fragments per group


first data block


Sample correct csv output


Sanity checking

Sample error messages (to stderr):

Superblock - invalid magic: 0xdead

Superblock - invalid block size: 666

Superblock - invalid block count 200000 > image size 50000

Superblock - invalid first block 100000 > image size 50000

Superblock - 20000 blocks, 1050 blocks/group

Superblock - 1000 Inodes, 66 Inodes/group

It is important to validate superblock parameters, because they are used to determine the addresses of inodes and to validate block pointers.  If a file system fails any of these tests, log an error and exit without any further processing.

  1. group descriptor

Information about each cylinder group



number of contained blocks


number of free blocks


number of free inodes


number of directories


(free) inode bitmap block


(free) block bitmap block


inode table (start) block


Sample output (just for the first cylinder group)


Sanity checking

Sample error messages (to stderr):

Group 7: 100000 blocks, superblock says 50000

Group 7: blocks 100000-150000, free Inode map starts at 165000

Group 7: blocks 100000-150000, free block map starts at 165000

Group 7: blocks 100000-150000, Inode table starts at 165000

Group descriptor validation is also critical because this table tells us where the inodes and free bit-maps begin.  If a group descriptor fails any of these tests, log an error, and do not attempt to dump out bit-maps or inodes that are not within the group.

  1. free bitmap entry:

A list of free inodes and free blocks.

One line of output for each free inode or block in the bitmap.  No output should be produced for allocated inodes or blocks.



block number of the map


Free block/inode number


Sample output (just first five entries)






Sanity checking

  1. Inode

Key information for each allocated inode.

An allocated inode has a file type.  



inode number


file type








link count


creation time


modification time


access time


file size


number of blocks


block pointers * 15


Sample output (first four inodes only)





Sanity checking

It is important to validate all block pointers before we use them:

Sample error message (to stderr):

Inode 407 - invalid block pointer[9]: 3ab403

If any block address falls outside the reasonable range, include it in the output, log an error message.  But if an out-of-range block is an indirect block or contains directory entries, do not attempt to further interpret its contents.

  1. directory entry

The valid/allocated entries in each directory.

An allocated directory entry can be recognized by a non-zero inode number.  Any directory entry that has a zero inode number should be assumed to be empty and not included in the output.

The term parent inode number, below, refers to the inode number of the directory that contains all of the entries that are being listed.

The term entry number, below, is a number that starts at 0 for the first directory entry, and is incremented for each entry that is encountered (whether allocated or not).  If the first ten entries are allocated, the next is free, and the last is allocated, the output should describe entries: 0-9, and 11, with no output for entry 10.



parent inode number


entry number


entry length


name length


inode number of the file entry




Sample output (first three directories)




Sanity checking

If a group descriptor fails any of these tests, log an error.  If either the name or entry length is unreasonable, stop interpreting this directory:

Sample error message (to stderr):

Inode 204, block 3c05 - bad dirent: Inode = 107538

Inode 204, block 3c05 - bad dirent: len = 3040, namelen = 5240

  1. indirect block entry

These are all the non-zero block pointers in an indirect block. The blocks that contain indirect block pointers are included.



block number of the containing block


entry number within that block


block pointer value


Sample output (first three entries)




Sanity checking

It is important to validate all block numbers in indirect blocks for the same reason we had to validate block numbers in inodes: because we will follow those pointers to further interpret the file system.

Sample error message to stderr:

Indirect block 3c4 - invalid entry[103] = 304c8df

If any block number is out of range log an error message.  If a out-of-range block is an indirect block, do not attempt to dump its contents.

Output Files

For each different kind of entry, output its summary in a separate file. The names of your files should be:

In these csv files, each line represents one entry, and should be ended with a single new-line character ( ‘\n’, 0x0a).  The field orders in those csv files should be the same as listed above, and your program should use and only use a comma to separate each field.

For your convenience, here are the exact six csv files generated by our solution: super.csv, group.csv, bitmap.csv,inode.csv, directory.csv, and indirect.csv.


Please carefully check the Rubric section (especially the Code review subsection) and all (latest) footnotes before submitting your code.

Your tarball should have a name of the form lab3a-studentID.tar.gz and should be submitted via CCLE.

We will test your work on a SEASnet GNU/Linux server running RHEL 7 (this is on lnxsrv09). You would be well advised to test your submission on that platform before submitting it.


Value        Feature

        Packaging and build (10%)

3%        untars expected contents

3%        clean build w/default action (no warnings)

2%        Makefile has clean and dist targets

2%        reasonableness of README contents

        Code review (15%)

You can check header files like ext2_fs.h to have an idea about the variable types. HOWEVER, you shall write up your own code for the calculations like block/inode (number), indirect block pointer, various different kinds of offset, and so on. So in sum, do all the calculations with your own code, do not use macros/functions provided in header files like ext2fs.h.

5%        overall readability and reasonableness, correct program name and csv filenames

10%        image file investigation correct (use pread() and write your own code for calculations)


        Results (75%)

⅔ points shall go to outputting the required fields correctly, and ⅓ points go to correct lines (penalty will be given for extra/missing lines).

10%        super.csv

10%        group.csv

10%        bitmap.csv

15%        inode.csv

15%        directory.csv

15%        indirect.csv

[1] We only recognize a (small) subset of the file types: ‘f’ for regular file, ‘d’ for directory, and ‘s’ for symbolic links. For other types, use ‘?’.

[2] Double-quoted.  Don’t worry about how to print out non-graphical characters or embedded quotes.