Learning About NTFS
I spent most of the week studying NTFS in-depth, and really learning about some of the things I glossed over before. This is something I had to do make meaningful progress with increasing file sizes. This involved (re-re-re-)reading the three relevant chapters in File System Forensic Analysis.
Also, in order to retain and really understand the information, I've started making a standalone app that lists files on an NTFS partition. The app is maybe a third of the way there; it finds the master file table, which holds an entry for each file on the drive, and it starts to parse it but eventually fails:
I haven't spent that much time with the program, but I already understand NTFS much, much better than I did a few weeks ago!
In NTFS, everything is stored in a file, and every file has multiple attributes which store information associated with that file. For example, a file's name is stored in an attribute, and a file's data is stored in another attribute. Each file has an entry in the master file table. A file's attributes are listed in this entry. An attribute can be resident, meaning it's data is stored entirely in the mft entry, or non-resident, meaning it resides elsewhere on disk.
Over the past few days, I've been working on extending non-resident files (those larger than about 700 bytes). A lot of data structures need to be updated just to do this:
- The attribute header
- The file record (MFT Entry)
- The fixup array (AKA Update Sequence Array) has to be updated and applied
- The MFT Mirror has to updated (not yet done)
The code for updating these structures is new.
The fixup array requires some explanation. This is one of NTFS' many built-in integrity-enhancing features. Every time a file record is updated, the fixup array gets updated too. The fixup array is a small array contained in the file record, which records the last two bytes of each sector that are allocated to the file record. When the record is written, these bytes are replaced with the fixup sequence number. This number is advanced on every change.
The advantage of this scheme is if you lose power in the middle of updating a file record, you can tell when you next read the record, because the sequence number written to the end of the last sector will not match the expected value. The disadvantage for me is that this scheme has to be implemented to make any meaningful changes to a file record.
We already have a function that can read a file record from disk, which applies the fixup values transparently. To debug my implementation, I used this function to read an existing file record, then fixed-up that file record using the same sequence number (not advancing the sequence number, as you usually would). When I had a file record in memory that exactly matched the structure I found on disk, I knew my fixup algorithm was working.
What Works Now
You can increase the size of a non-resident $DATA attribute, provided the allocation size does not need to increase. My tester demonstrates this by starting with a 1Kb file and increasing its size by one byte at a time. React OS recognizes the changes to the file completely.
Not Yet Working
Windows does not fully accept the changes that have been made. Interestingly, explorer reports the new size of the file, but Windows won't return any data past the last point Windows extended the file to. Chkdsk has no effect on this situation.
This is the first day I've been able to extend the size of a file on disk and make it stick. Now I need to figure out and fix what Windows doesn't like about the situation, then expand this to start handling all the other cases* of changing a file size: extending allocation size, truncating a file, handling resident files, sparse files, etc.
*I mean within reason. Last week I used some phrase like, "Write to any file," and that's been keeping me up at night. I meant, files that we can already read. Not compressed files, encrypted files, or some other crazy-corner-case files.