1

Closed

FAT corruption

description

Hello community,

I have serious problems with the stability of the FAT filesystem ...

We have developed a NETMF-application, which acquires and stores data to the FAT filesystem (on a NAND flash).
The device/application is intended to run stand-alone, i.e. without a permanent network connection, so the reliability of the filesystem is very important for our needs.

Unfortunately I had to realize that the filesystem doesn't seem to be very stable.

I found that occasionally some files and/or the filesystem became corrupted. In that case blocks of data (most times multiples of 512 bytes) within a file were filled with 0xFF. Sometimes also the directory structure seems to be affected.

The problems first encountered with the framework version 4.0. In the meantime, I have started a lot of tests based on version 4.2, which also leads to problems. My application uses multi threading and does many other things besides writing files.
To isolate the potential causes for these file corruptions, I have implemented a simple test application, which only writes files and compares them afterwards. To be more exact, the test proceeds as following:
  • Format the root volume
  • Until there is no more free space ...
    • write defined content to a FileStream (within a 'using' statement)
    • open file again, read content and compare
  • After the filesystem is full, start a second compare run over all files.
Many tests lead to different failures, but the failures are not reproducible in that way, that always the n-th file after formatting is affected.
  • sometimes the file creation or writing itself leads to an exception, e.g. ...

    Exception System.IO.IOException - CLR_E_FILE_IO (1)

    Message:

    Microsoft.SPOT.IO.NativeFileStream::Write [IP: 0000]

    System.IO.FileStream::Write [IP: 002a]

     
  • sometimes the direct compare after writing detects failures (larger blocks filled with 0xFF)
  • when there were failures, the second compare run sometimes detects more failures than stated before. So it seems, that files which were written correctly are corrupted afterwards.
  • alter reboot (after failure) the FAT sometimes states to be not formatted
  • in rare cases, the test completely hangs up.
  • I just had the situation, that I was not able to format the volume. After erasing the filesystem region via MFDeploy, it works again.
Now for my questions ...

Are these problems based on any known bugs?
Is there any way to avoid or reduce these problems?
(Is there something to consider concerning filesize, max amount of files, formatting, FileStream usage, threading, ... )

I would appreciate ANY hint ...


Best regards,
Ron

file attachments

Closed Feb 7, 2012 at 8:02 PM by ZachLibby

comments

lorenzte wrote Jan 23, 2012 at 4:49 PM

Hi Ron,

it would be helpful to get some more information regarding this issue. Are you seeing the corruption on a device, on the emulator or both? If you observed it on a device only, which device is that? Finally, could you send your test application?

Thank You
Lorenzo Tessiore

Ron_11 wrote Jan 24, 2012 at 3:49 PM

Hello Lorenzo,

We are using a device/design of our own which is based on the AT91SAM9261. We use the NET MF
Binary, Version MF_4_2_0_99_BIN_RevA(CJ) from AUG Electronics written for the AUG AMI DevKit.

We are in close contact to AUG and have already talked with them about this issue.

I have attached my test application. The test has to be initiated explicitly by sending 'start' on the DBGU interface. For other options, please take a look at the source.

Best regards,
Ron

ZachLibby wrote Jan 24, 2012 at 8:27 PM

Hello Ron,

There is a maximum number of files/folders you can add to the volume root. From my code inspection of your test, it appears that you are placing all of your files at the following \ROOT\dir_<x>\file_<y> (where <x> and <y> are an increasing indexes). Please try modifying your test application to make a TEST directory in the volume root and add all other tests/folders to this directory (\ROOT\TEST\dir_<x>\file_<y>). My guess is that you are maxing out the allowable volume root indexes.

Thanks,
Zach

Ron_11 wrote Jan 25, 2012 at 1:04 PM

Hello Zach,

I already heard about some limitation of 250 (?) files/folders per folder (even if the filesystem does not prevent me from creating more files). For that reason, I divided the files into subfolders. In my current test configuration, I put 100 files into one subdirectory (dir<x>) and the amount of subdirs doesn't reach 250. These subdirs are not located directly in the root ( \ ) of the volume but in a folder called 'ROOT'.
I can give it a try, adding TEST to the path. But from my understanding, I am not writing directly into the volumes root by now.

Thanks,
Ron

ZachLibby wrote Jan 25, 2012 at 6:16 PM

Actually \ROOT\ is not a folder. It declares the volume used to write the folder/file. This is a little different than standard file systems because we don't have drive letters. Instead we use the volume name to distinguish between volumes in the full path. This volume name is prepended in the cases where the full path is not used. I am not 100% sure this is the problem you are seeing, but it is one that others (including myself) have come accross.

Thanks
Zach

Ron_11 wrote Jan 26, 2012 at 8:19 PM

I have extended the path, as recommended, to \ROOT\TEST\dir_<x>, but unfortunately the situation stays the same.

The tests, I am running in the last days, are not restricted to 4.2. I am testing in parallel on other devices with the 4.0 framework.

As far as I am informed, both versions are based on different drivers. Unlike version 4.2, the driver in 4.0 is not dealing with wear-leveling. I that right?

As I noticed in my latest tests with 4.0, it seems that the corruptions aren't detected in the direct compare after creation but in the subsequent compare loop.
There could be two reasons for that effect:
1) While reading the file right after writing, the data is taken from some cache instead of the flash memory.
2) If there is no such cache, the corruption has to happen later on.

Can anybody tell me, if there is done any caching or buffering while writing to the flash (especially in version 4.0 !). If the data is really taken from a cache, is there any way (explicit or as a side-effect of some other operation) to avoid this?

Thanks for your support,
Ron

wrote Feb 7, 2012 at 8:02 PM

Resolved with changeset 16876.