Tape files

Once the dump files have been created, they must be written to tape. This is due to the fact that disk space is usually a limited commodity. The basic mechanism for the writing is the Unix tar(1) facility.

Each tape contains a series of tar files, the first and last of which are labels. The start and end labels are identical, and contain version information, the date the backup was taken, and an index which maps individual dump files into offsets into the tape. Thus the Coda backup tapes are self identifying for easy sanity checks. The label is a tar file which only contains a simple Unix file called TAPELABEL.

The dump files are first sequenced by size. They are then broken down into groups, where the total size of the group must be larger than a certain size, currently .5 Megabytes. Each group is stored in a single tar file on the tape. These data tar files are the 2nd through n-1st records on the tape, the first and nth being a tar files containing just the tape label.

This structure was chosen for several reasons. The first is that it is easy to implement. TAR has been used for many years, and has been proven to be reliable. The second is easy access of information on the tape. Using a single monolithic tar file would often require hours of waiting to retrieve a single dump file. This way you can skip over most of the data using mt(1) and its fast-forward facility. Finally, it provides a simple and effective end-to-end check to validate that all the information has made it to tape.

At CMU, we have created a convention for capturing sufficient information for reliability, while trying to avoid excess use of tapes. Full backups are taken once a week. However, since our staging disks are not large enough to hold full dumps for all the replicas of all the volumes, we stagger the full backups across the week.

There are three kinds of requests for restorations: users who have mistakenly trashed a file, users who lost data but didnt know it, or bugs which require us to roll back to a substantially earlier state. The first class of restores can be typically handled by yesterdays state, which we keep on-line in the form of read-only backup clones. Thus almost all forms of requests never reach the system administrator at all. To give users easy access to the previous days backup, create a directory, OldFiles, in their coda directory, and mount each of the backups in the OldFiles directory.

If the user did not catch the loss of data immediately, its reasonable to expect that they will catch it before a week has passed. We keep all incremental and full backups to guarantee we can restore state from any day in the last week. This requires 14 tapes, or two weekly sets. One weeks worth is not sufficient, because state from later incremental relies on earlier incremental backups at a lower dumplevel in order to be restored. Thus as soon as the first incremental tape is over-written (say Mondays), the state from the remainder of the last week is lost (last Tuesdays, Wednesdays, etc).

The third class of data loss is either due to infrequently used files or to catastrophe. (We've actually been forced to rely on the backup system to restore all Coda state, while developing server software). Since its unreasonable to keep all the tapes around, we only save tapes containing full dumps. Weekly tapes are saved for a month, and monthly tapes are saved for eternity.