Server Problems

1. The server crashed and prints messages about AllocViaWrapAround.
2. server doesn't start due to salvaging problems
3. How to restore a backup from tape
4. createvol_rep reports RPC2_NOBINDING.
5. RPC2_DUPLICATESERVER in the rpc2portmap/auth2 logs
6. Server crashed shortly after updating files in /vice/db.
7. Users cannot authenticate or created volumes are not mountable.

1. The server crashed and prints messages about AllocViaWrapAround.

This happens when you have a resolution log that is full. In the SrvLog you will usually be able to see which volume is affected, take down it's volume id (you may need to consult /vice/vol/VRList on the SCM to do this. Kill the dead (zombied) server, and restart it. The moment it is up you do:
# filcon isolate -s this_server
We need to prevent clients from overwriting the log again
# volutil setlogparms volid reson 4 logsize 16384
# filcon clear -s this_server
Unless you do "huge" things 16k will be plenty.

2. server doesn't start due to salvaging problems

If this happens you have several options. If the server has crashed during salvaging it will not come up by trying again, you must either repair the damaged volume or not attach that volume.

Not attaching the volume is done as follows. Find the volume id of the damaged volume in the SrvLog. Create a file named /vice/vol/skipsalvage with the lines:

1		(1)
0xdd000123	(2)
(1)
indicates that a single volume is to be skipped
(2)
volume id of the replica that should not be attached. If this volume is a replicated volume, take all replicas offline, since otherwise the clients will get very confused.

You can also try to repair the volume with norton. Norton is invoked as:

norton [LOG] [DATA] [DATA-SIZE]

These parameters can be found in /vice/srv.conf. See norton(8) for detailed information about norton's operation. Built-in help is also available while running norton.

Note:

  1. Often corruption is replicated. This means that if you find a server has crashed and does not want to salvage a volume, your other replicas may suffer the same fate: the risk is that you may have to go back to tape (you do make tapes, right?). Therefore first copy out good data from the available replicas, then attend to repairing or skipping them in salvage.

  2. Very often you have to take both a volume and its most recent clone (generated during backup) offline, since corruption in a volume is inherited by the clone.

3. How to restore a backup from tape

Tuesday I have lost my email folder - the whole volume moose:braam.life was corrupted on server moose ant it wouldn't salvage. Here is how I got it back.

  1. First I tried mounting moose.braam.life.0.backup but this was corrupted too.

  2. On the SCM in /vice/vol/VRList I found the replicated volume number f0000427 and the volume number ce000011 (fictious) for the volume.

  3. I logged in as root to bison, our backup controlller. I read the backuplog for Tuesday morning in /vice/backuplogs/backuplog.DATE and saw that the incremental dump for August 31st had been fine. At the end of that log, I saw the name f0000427.ce000011 listed as dumped under /backup (a mere symlink) and/backup2 as spool directory with the actual file. The backup log almost shows how to move the tape to the correct place and invoke restore:
    # cd /backup2
    # mt -f /dev/nst0 rewind
    # restore -b 500 -f /dev/nst0 -s 3 -i
    Value after -s depends upon which /backup[123] volume we pick to restore backup.
    restore> cd 31Aug1998
    restore> add viotti.coda.cs.cmu.edu-f0000427.ce000011
    restore> extract
    Specify volume #: 1

  4. In /vice/db/dumplist I saw that the last full backup had been on Friday Aug28. I went to the machine room and inserted that tape (recent tapes are above bison). This time f0000427.ce000011 was a 200MB file (the last full dump) in /backup3. I extract the file as above.

  5. Then I merged the two dumps:
    # merge /restore/peter.mail /backup2/28Aug1998/f0000427.ce000011 \
    > /backup3/31Aug1998/f0000427.ce000011

  6. This took a minute or two to create /restore/peter.mail. Now all that was needed was to upload that to a volume:
    # volutil -h moose restore /restore/peter.mail /vicepa vio:braam.mail.restored

  7. Back to the SCM, to update the volume databases:
    # bldvldb.sh viotti

  8. Now I could mount the restored volume:
    # cfs mkm restored-mail vio:braam.mail.restored
    and copy it into a read write volume using cpio or tar.

4. createvol_rep reports RPC2_NOBINDING.

When trying to create volumes, and createvol_rep reports RPC2_NOBINDING, it is an indication that the server is not (yet) accepting connections.

It is useful to look at /vice/srv/SrvLog, the server performs the equivalent of fsck on startup, which might take some time. Only when the server logs Fileserver Started in SrvLog, it starts accepting incoming connections.

Another reason could be that an old server is still around, blocking the new server from accessing the network ports.

5. RPC2_DUPLICATESERVER in the rpc2portmap/auth2 logs

Some process has the UDP port open which rpc2portmap or auth2 is trying to obtain. In most cases this is an already running copy of rpc2portmap or auth2. Kill all running copies of the program in question and restart them.

6. Server crashed shortly after updating files in /vice/db.

Servers can crash when they are given inconsistent or bad data-files. You should check whether updateclnt and updatesrv are both running on the SCM and the machine that has crashed. You can kill and restart them. Then restart codasrv and it should come up.

7. Users cannot authenticate or created volumes are not mountable.

Check whether auth2, updateclnt, and updatesrv are running on all fileservers. Also check their logfiles for possible errors.