Skip to main content

Your browser may not be compatible with all the features on this site. Consider upgrading to a modern browser for an improved experience.

View Post [edit]

Poster: Seaware Date: Oct 3, 2012 9:49pm
Forum: petabox Subject: How long does the data last?

I am interested to learn if there are estimates on how many years the data would last before it needs to be replicated to a new drive. Also, is ECC used so if there is some data lost in a period, that data is likely to still be recoverable? It would be interesting to know if we would consider if this archive is really an archive from the perspective of 1000 years from now.

Reply [edit]

Poster: Coderjo Date: Oct 8, 2012 1:49pm
Forum: petabox Subject: Re: How long does the data last?

The data is stored on two separate hardware nodes as soon as it is uploaded to archive.org. As far as I know, the system does not do extra ECC (beyond what the hard drive does internally). However, in one of the item's xml files, it stores a list of files for the item along with checksums, which can be used to verify the files on each node.

Reply [edit]

Poster: Seaware Date: Oct 9, 2012 12:47am
Forum: petabox Subject: Re: How long does the data last?

Thanks. So if the half life of the data on the disk is 100 years (for example) would the drive be powered on and data be checked at least once during that period and the first failing checksum cause a replication to a fresh drive? Also, I hope you are using a CRC, not a pure checksum, which will be more likely to find multi-bit errors.

Reply [edit]

Poster: Coderjo Date: Oct 10, 2012 11:04pm
Forum: petabox Subject: Re: How long does the data last?

I don't know low-level details, so I don't know if the data is scrubbed regularly. I also don't know the procedures that occur when a drive fails and needs to be replaced.

Currently, looking at the files.xml file for a random item, the system does sha1, md5, and crc32. It also stores the file size and mtime.