Repairing an Inconsistent Directory

Occasionally, a directory entry will become inconsistent. This happens when there is a conflict between file server replicas that Coda cannot automatically resolve or a reintegration failed because of a local update the conflicts with the global state. The most common causes of a conflict are when the file servers are partitioned and a file is changed on more than one of the partitions or when a disconnected client updates a file that is also updated on the servers. When this happens, the directory containing the conflict will now look like a symbolic link and will be pointing to its file identifier (fid). For example, if a directory, conflict, is inconsistent, it will now appear as:

% ls -l conflict
lr--r--r--  1 root      27 Mar 23 14:52 conflict -> @@7f0000b3.00000005.0000011a

Most applications will return the error File not found when they try to open a file that is inconsistent. You need to resolve this conflict by using the repair(1) tool.

Server/Server Conflicts

Once you run repair, you need to do a beginRepair on the object that is inconsistent. After beginRepair is issued, the inconsistent directory will have an entry for each of the replicated volumes. You can look at all of these to decide which copy you want. Use repair to copy the correct version and clear the inconsistency. In the following example the file conflict/example is replicated on three servers. It has gone inconsistent.

% ls -lL conflict
lr--r--r--  1 root           27 Dec 20 13:12 conflict -> @@7f0002ec.000000e3.000005d1
% repair
The repair tool can be used to manually repair files and directories 
that have diverging replicas.  You will first need to do a "beginRepair" 
which will expose the replicas of the inconsistent object as its children.


If you are repairing a directory, you will probably use the "compareDir" and "doRepair" commands.

For inconsistent files you will only need to use the "doRepair" command.

If you want to REMOVE an inconsistent object, use the "removeInc" command.

Help on individual commands can also be obtained using the "help" facility.
* begin conflict
a server-server-conflict repair session started
use the following commands to repair the conflict:
	comparedirs
	removeinc
	dorepair
* ^Z
Stopped
% ls conflict
gershwin.coda.cs.cmu.edu	schumann.coda.cs.cmu.edu
% ls conflict/*
conflict/gershwin.coda.cs.cmu.edu:
example

conflict/schumann.coda.cs.cmu.edu:
example
% fg
repair
compare
Pathname of Object in conflict? [conflict]
Pathname of repair file produced?  [] /tmp/fix

 
NAME/NAME CONFLICT EXISTS FOR example

-rw-r--r--  1 raiff           0 Dec 20 13:10 gershwin.coda.cs.cmu.edu/example
-rw-r--r--  1 -101            0 Dec 20 13:11 schumann.coda.cs.cmu.edu/example


/coda/project/coda/demo/basic/rep/conflict/gershwin.coda.cs.cmu.edu/example
	Fid: (0xb0.612) VV:(0 2 0 0 0 0 0 0)(0x8002f23e.30c6e9aa)
/coda/project/coda/demo/basic/rep/conflict/schumann.coda.cs.cmu.edu/example
	Fid: (0x9e.5ea) VV:(2 0 0 0 0 0 0 0)(0x8002ce17.30d56fb9)
Should /coda/project/coda/demo/basic/rep/conflict/gershwin.coda.cs.cmu.edu/example be removed?  [no] yes
Should /coda/project/coda/demo/basic/rep/conflict/schumann.coda.cs.cmu.edu/example be removed?   [no]  
Do you want to repair the name/name conflicts  [yes]  
Operations to resolve conflicts are in /tmp/fix
* do
Pathname of object in conflict?  [conflict]  
Pathname of fix file? [/tmp/fix]
OK to repair "conflict" by fixfile "/tmp/fix"?  [no] yes
SCHUMANN.CODA.CS.CMU.EDU  succeeded
GERSHWIN.CODA.CS.CMU.EDU  succeeded
* quit
% ls conflict
example
% exit

Local/Global Conflicts

Local/global conflicts are caused by reintegration failures, which means that the mutations performed while the client was disconnected are in conflict with the mutations performed on the servers from other clients during the disconnection. The objects involved in local/global conflict are represented in the same fashion as server/server conflicts, i.e., they become dangling symbolic links.

To start a local/global repair session for an object OBJ, you need to invoke the repair tool and issue the beginrepair command with the pathname of OBJ as the argument. Once the repair session is started, both the local and global replicas of OBJ are visible at OBJ/local (read-only) and OBJ/global (mutable and serving as the workspace for storing the repair result for OBJ and its descendants). The central process of repairing the local/global conflicts on OBJ is to iterate the local-mutations-list containing all the local updates performed on OBJ or its descendants, which can be displayed by the listlocal command. Each operation in the list must be accounted for and the repair tool cooperates with Venus to maintain the current-mutation being iterated. The checklocal command can be used to show the conflict information between the current-mutation and the global server state. You can advance the iteration to the next operation using either the preservelocal or the discardlocal command with the former replaying the current-mutation operation on the relevant global replicas. You can also use the preservealllocal and discardalllocal commands to speed up the iteration. Because the global replica OBJ is mutable, existing tools such as emacs, etc. can be directly used to make the necessary updates. The quit command is used to either commit or abort the repair session. The man page on on the repair commands contains more detailed information, and the following simple example illustrates the main process of repairing a local/global conflict.

Suppose that during disconnection, a user creates a new directory /coda/usr/luqi/papers/cscw/figs and stores a new version for file /coda/usr/luqi/papers/cscw/paper.tex. However, during the disconnection his co-author also creates a directory /coda/usr/luqi/papers/cscw/figs and stores some PS files in it. Upon reintegration a local/global conflict is detected at /coda/usr/luqi/papers/cscw.

% ls -l /coda/usr/luqi/papers/cscw
lr--r--r--  1 root           27 Dec 20 00:36 cscw -> @@7f000279.00000df3.0001f027
% repair
* begin
Pathname of object in conflict?  [] /coda/usr/luqi/papers/cscw
a local-global-conflict repair session started
the conflict is caused by a reintegration failure
use the following commands to repair the conflict:
        checklocal
        listlocal
        preservelocal
        preservealllocal
        discardlocal
        discardalllocal
        setglobalview
        setmixedview
        setlocalview
a list of local mutations is available in the .cml file in the coda spool directory

* ls -l /coda/usr/luqi/papers/cscw
total 4
drwxr-xr-x  3 luqi         2048 Dec 20 00:51 global
drwxr-xr-x  3 luqi         2048 Dec 20 00:51 local

* listlocal
local mutations are:

Mkdir   /coda/usr/luqi/papers/cscw/local/figs
Store   /coda/usr/luqi/papers/cscw/local/paper.tex (length = 19603)

* checklocal
local mutation: mkdir /coda/usr/luqi/papers/cscw/local/figs
conflict: target /coda/usr/luqi/papers/cscw/global/figs exist on servers

* discardlocal
discard local mutation mkdir /coda/usr/luqi/papers/cscw/local/figs

* checklocal
local mutation: store /coda/usr/luqi/papers/cscw/local/paper.tex
no conflict

* preservelocal
store /coda/usr/luqi/papers/cscw/global/paper.tex succeeded

* checklocal
all local mutations processed

* quit
commit the local/global repair session?  [yes]