Chapter 10. The Backup System

Table of Contents
Introduction: Design of the Coda Backup Subsystem
Installing a Coda Backup Coordinator Machine
Incremental Dumps
Tape files
Restoring a backup clone
Backup Scripts

Introduction: Design of the Coda Backup Subsystem

As the use of the Coda file system increased, the need for a reliable backup storage system with a large capacity and a minimal loss of service became apparent. A one operation backup system was determined to be infeasible given the volume of data in Coda, the nature of a distributed file-system, and the long down-time that would normally be required to backup the system in one operation.

In order to meet the goals of high availability and reliability inherent in Coda design and to make efficient use of backup hardware and materials, the volume was chosen as the unit of data, and 24 hours was chosen as the time unit for system management and administration. The result of these design considerations is a volume by volume backup mechanism that occurs in three phases:

Cloning. The cloning phase consists of freezing the (replicated) volume, creating a read-only clone of each of the replicas, and then unfreezing the volume. This allows mutating operations on the replicated volume to occur while maintaining a snapshot to backup. Once the cloning phase has been completed, normal read-write services can be resumed without fear of data corruption due to mutating operations on an active file system.

Dumping to disk files on a backup spool machine. The dumping phase consists of converting the read-only volume clones to disk images stored as regular disk files, on a spool machine. A dump can either be full (level 0), in which all files are dumped; or incremental (level 1 through 9), in which only those files or directories which have changed since the last successful backup of the lower level are included in the dump. So a level 1 dump will include changes from last full dump, a level 2 - from last level 1 dump, and so on. This allows for a system in which only a subset of volumes need a full backup at anyone time (with incremental backups done between full backups), thus reducing the amount of off-line storage and network bandwidth needed at any one time. However, it allows the re-creation of data a granularity of 24 hours when combined with incremental dumps. Incremental dumps, however, are only supported for replicated volumes. Since there is little need for non-replicated volumes, only full dumps are supported for non-replicated volumes.

Saving to media. The last phase consists of writing all the dump files from the backed up volumes dumped on local partitions to an archival media such as tape. Any standard backup system can be used for this phase. At CMU, we use the BSD dump and restore utilities to write and retrieve the disk images of Coda volumes to tape.

Practically, this system has been implemented as series of tasks. The first two tasks are carried out by the backup program the latter by a Coda independent Perl script (tape.pl).

  1. Backup

    1. create a read-only clone

    2. dumping the read-only clone to a local disk

    3. backing up the dumped data to a suitable archive media

  2. Restore

    1. Retrieving appropriate full and incremental dumps from the archive media.

    2. Merging the full and incremental dumps to the time line of restoration.

    3. restoring the fully integrated backup to the Coda file system.

Remember, in practice, many restores are a result of a user accidently deleting or corrupting their own files. In this case, users may use the the cfs mechanism to retrieve files for the last 24 hour time line. For example: cfs mkmount OldFiles u.hmpierce.0.backup, will mount the hmpierce's user backup volume from replica 0 to the OldFiles mount point. The file can then be copied out (backup volumesare read-only). Only if restoration needs are older than 24-hours or some catastrophic event outside of the users control occurs do restores from tape normally need be done

Several tools have been developed to help in the creation, analysis, and restoration of data backups. Some of these tools have been developed by the Coda team (those tools concerning Coda FS to local disk conversion) such as backup and tape.pl (used to coordinate the efforts of backup, dump, etc), others employ off-the-shelf software such as the traditional UNIX dump or tar to transfer the disk images created from the dump phase to the backup media. Coda, however, provides a Perl script front-end to dump.