If you have any comments or suggestions just let me know:
XFS is a journalling filesystem developed by SGI and used in SGI's IRIX operating system. It is now also available under GPL for linux. It is extremely scalable, using btrees extensively to support large and/or sparse files, and extremely large directories. The journalling capability means no more waiting for fsck's or worrying about meta-data corruption.
You could also join the linux-xfs mailng list, and we also have an IRC channel on irc.openprojects.net, #xfs.
After that you have two subtrees of importance: linux and cmd. The
first one - linux - is a normal linux kernel source tree containing
the XFS code. It is updated to the latest available linux kernel
but it may be a bit (if not much) behind the official release. Just build
your kernel the way you are used to do it and don't forget to enable
XFS and pagebuf under filesystems.
in the cmd/xfs or cmd/xfsprogs, attr, ...] directory. The tools also have man-pages which you may consult for interesting options.
There exists also another way to get an XFS ready kernel - you may get kernel patches relative to a official kernel from:
and apply them to the kernel sources the patch is for. This is a good way for all the people who don't want to use CVS or do not have the bandwidth to checkout the whole kernel tree.
A third way to get a XFS ready system and kernel is to use the prepared rpm's and Red Hat Linux installer ISO images from SGI which you can in the download area.
where /dev/foo is the the partition you want to use (you may have to use the -f option of mkfs -t xfs (which calls mkfs.xfs from the cmd directory which you must have installed) if this partition already contains an old filesystem which you want to overwrite). Now you can mount the filesystem using:
The current XFS tree seems to work just fine on ppc now (aside from some trivial compile fixes). It also runs well and is getting sporadically tested on the alpha, sparc64 and ia64. But on all of those platforms it is not as well tested as on i386, but so far there are no major problems on those platforms known. All in all it looks like XFS will be running across a lot of platforms fine soon (with all the platforms above we have 32/64bit and little/big-endian architectures supported. If you run it on a platform not mentioned here please let me know so that i can add it. Also an important note is that XFS is inherently platform independent in the on disk layout - so it should be possible to move a XFS disk from one linux platform to another out of the box.
To use quotas with XFS, you need to enable linux quota support, and
XFS quota support when you configure your kernel. You also need to specify
quota support when you mount.
The tape format is not the same as the classic Unix dump but should work fine with tools like Amanda. Dumps produced with standard other dump programs should be able to be restored onto an XFS filesystem using the coresponding restore program.
This depens on where you install LILO. For MBR installation: Yes. For root partitions: No, because the XFS superblock goes where LILO would be installed. This is to maintain compatibility with the Irix on-disk format. This will not be changed. Putting the Superblock on the swap partition is reported to work but not guaranteed.
Yes XFS should run fine on top of LVM. If you plan to do so please keep in mind that the 1.0 and 1.0.1 release XFS tree (and also the XFS 1.0 previews) contains lvm 0.9beta6. This has recently changed to 1.0.1rc4 in CVS. This code has some tweaks for XFS. The 1.0.2 installer ships with 1.0.1rc4 as well. The snapshotting should work, please report problems to the mailinglist.
Yes. If you are using a 2.4.2 based XFS kernel you need to apply Jens Axboes loop-xfs-7c fix for it to work (the fix is for a problem in 2.4.2 and has nothing really to do with XFS). You may get this patch from
You might also have a look at the Linuxcare Bootable Toolbox which also supports XFS starting from version 2.0
mount: wrong fs type, bad option, bad superblock on /dev/rd/c0d0p1, or too many mounted file systemsand from /var/log/messages: This means that you can not mount the filesystem due to corruption. You will need to run xfs_repair and hope it can be repaired. If you hit this you have serious problems. It can be anything from the disks have failed in mysterious ways, software raid has gone mad, corruption through bad cables/drivers/DMA etc.
If the mount hangs you can use xfs_repair -L to zero the journal to let the system mount the disks again. To date this is only observed on the /var filesystem. We do not know yet what is causing these hangs (01-06-2002). Please contact the mailinglist when you observe this failure. It is a very rare problem which makes is hard to debug
Yes - the current XFS tree contain everything you need to run XFS on top of any md RAID level. Note that write performance using XFS on top of software raid level 5 is bad using anything lower then 2.4.18. Using a external log device returns the performance to normal. You could solve this by making a seperate md raid 1 on the disks of about 50 MB and using the rest of the space for the raid 5 volume. In this scenario you will have normal performance. In kernels => 2.4.18 there are fixes which help performance a lot. Using a external log is still faster but the penalty is smaller.
Yes. To get good performance make shure to use an XFS tree from after
mid-march 2001. There were some important fixes for useable NFS-performance.
So far there are no more known problems with XFS and NFS since then.
An XFS filesystem may be enlarged within a partition using
xfs_growfs. You need to have free space after this partition to do so.
Remove partition recreate it larger with exact same starting point.
run xfs_growfs to make the partition larger. This operation is dangerous to your data
Back up your filesystem before using this tool.
Yes. The on-disk format of XFS is the same on IRIX and Linux. Obviously, you should back-up your data before trying to move it between systems. Filesystems must be "clean" when moved (ie unmounted correctly). If plan to use IRIX disks on linux keep the followng things in mind: the kernel needs to have SGI partition support enabled (to find in the File systems -> Partition Types submenu of a "make menuconfig"), there is no XLV support in linux, so you won't be able to read IRIX disks using the XLV volume manager, also not all blocksizes available on IRIX are available on Linux. (only the blocksize <= pagesize of the architecture: 4k for i386, ppc, ... 8k for alpha, sparc, ... is possible for now). Make sure that the directory format is version 2 on the Irix disks. Linux can only read v2 directories on the moment. Using v1 will probably fail in spectacular ways. Support for blocksize > pagesize needs to have the linux kernel reworked to support this. This might come somewhere in 2.5.x
Yes. So far there were some problems reported with kernels built with
gcc 2.95.2 which were solved by compiling it with egcs 2.91.66.
If you are using gcc 3.0 and it gives problems or does not compile, drop a note on
the list with the oops and ksymoops output. We will be working on getting XFS fully
functional with gcc 3.0. The 3.0.1 seems to produce correct kernels as well.
Some of them are other interesting tools like: db - xfs_db is a xfs filesystem debugger (working) , copy - xfs_copy is tool for effectively copying one filesystem to another device (not yet ported, volunteer wanted), fsr - xfs_fsr is a defragmenter for xfs (working), repair - xfs_repair is the consistency checker for an xfs filesystem (working). As already mentioned earlier: the cmd structure has changed a bit at the beginning of 2001 - now it all looks a bit clearer i think (modeled a bit after the ext2 tools structure). The only other subdir is: xfstests - the SGI XFS stress test suite.
You can not run xfs_repair on a mounted filesystem although support is available in CVS (08-02-2002) that let's you
run xfs_repair with the -n switch on a read-only mounted filesystem. You must not try to repair a mounted fs since
this will result in dataloss and corruption when attempted.
This creates a bigger logspace. You currently cannot resize the log with xfs_growfs. You will need to remake the fs for this option. Also it is a good idea to mount meta-data intensive filesystems with
Note that in some earlier versions you may only use 4 logbufs as a maximum. Using more logbufs can fail if your system has not enough ram. Using 8 logbufs on a machine with 128MB ram wil probably fail. Also since mid-march there are some changes in the tree which improve the overall dbench performance a bit. Have a look for the xfs.txt file in the Documentation/filesystems subdirectory of your kernel sources for those and other XFS mount options.
In general you should get about the same or better performance values with XFS in various benchmarks. One thing XFS is usually bad at is removing large amounts of files (rm -rf or bonnie++). Kernels => 2.4.18 have a asynchronous delete patch which speeds up large deletes.
This is fixed in the current code (mid-march 2001). But also before this is normal and harmless - it only takes a bit of time (on boot and halt with SuSE startup scripts - only on halt with Red Hat based scripts).
SCSI subsystem driver Revision: 1.00 PCI: Found IRQ 11 for device 00:0c.0 scsi0: Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13The Adaptec driver in 2.4.5 and later needs to have the following selected in order to work.
gcc -I/usr/include -ldb1 aicasm_gram.c aicasm_scan.c aicasm.c aicasm_symbol.c -o aicasm aicasm_symbol.c:39: db1/db.h: No such file or directory make: *** [aicasm] Error 1The Adaptec driver in newer 2.4.2 kernels and later need to have the db headers. These can be found in the db-devel packages. Make sure to run make mrproper after selecting this option. If you also copied your own .config into the tree make sure to run
Yes, XFS supports files larger then 2GB. The large file support (LFS) is
largely dependent on the C library of your computer. Glibc 2.2 and higher has
full LFS support. If your C lib does not support it you will get errors that
the valued is too large for the defined data type.
I would challenge any filesystem running on Linux on an ia32, and using the page cache to get past the practical limit of 16 Tbytes using buffered I/O. At this point you run out of space to address pages in the cache since the core kernel code uses a 32 bit number as the index number of a page in the cache. As for XFS itself, this is a constant definition from the code: #define XFS_MAX_FILE_OFFSET ((long long)((1ULL<<63)-1ULL)) So 2^63 bytes is theoretically possible. All of this is ignoring the current limitation of 2 Tbytes of address space for block devices (including logical volumes). The only way to get a file bigger than this of course is to have large holes in it. And to get past 16 Tbytes you have to used direct I/O.Which would would mean a theoretical 8388608TB file size. Large enough?
At least one user has experienced problems when resizing his lvm volume multiple times. The workaround is to run xfs_repair after each xfs_growfs. They are not able to replicate this at SGI on the moment.
No, there is no relation. Newer utilities only have fixes and checks the previous versions
might not have. These are the same utilities that have been used under Irix for years so they
have been well developed.
XFS is a huge amount of kernel code which means that your kernel is probably
to big to fit on a boot floppy together with a inital ramdisk and scsi drivers.
There is patch available for mkbootdisk to properly format a floppy with a larger size.
Your mileage may very on booting overformatted floppy's of 1.68 or 1.72MB.
The patch can be found at
http://iserv.nl/files/xfs/mkbootdisk.large.patch Use the /dev/fd0u1680 for making
a 1.68MB floppy.
Kelly Eicher has made a boot floppy set available on his homepage http://www.astro.umn.edu/~carde These are very helpful and easy to use in migrating or repairing a system.
You might also have a look at the Linuxcare Bootable Toolbox which also supports XFS starting from version 2.0
You can backup a XFS filesystem with utilities like xfsdump and standard tar for standard files. If you want to backup acls you will need to use xfsdump. This is the only tool at the moment that supports backing up of acls. Support for XFS and acls is underway at several commercial backup tools. xfsdump can be made to work with amanda.
even be included at some point in the 2.4 series when some 2.5 features
are backported. There are more core kernel changes for 2.4, so the
patch is a bit more intrusive.
Still, the easiest way to get the "latest and greatest" is by checking
the out the xfs development tree by cvs.
XFS supports locking but it is not known to VMware. It has been reported to VMware but the status is unknown at this point. If you follow the instructions that VMware gives when starting up you should be fine.
which might help to find an easy and good way through the sources. I plan to keep this tree automatically updated to the current SGI XFS CVS version on a daily basis. If anyone has pointers to other XFS related docs - just send me a mail (address - see above).
So for those of you with money to spare:
The original debian boot disks by Zoltan Kraus are mirrored at http://www.physik.tu-cottbus.de/~george/woody_xfs/
Yes, the nVIDIA drivers work fine on XFS systems. Be sure to use the
1.0-1251 release or later of the nVIDIA linux drivers.
If you are using the 1.0 release of XFS I suggest disabling devfs. Devfs can be disabled by editing your lilo.conf and inserting the following line
append="devfs=nomount"This will prevent devfs from interfering. Other people suggest iserting the following magic in /etc/rc.d/rc.local (on Red Hat systems).
major=195 for i in 0 1 2 3; do devfile="/dev/nvidia$i" rm -f $devfile if ! mknod $devfile c $major $i || ! chmod 0666 $devfile; then echo "Couldn't create device \"$devfile\"." exit 1 fi done devfile=/dev/nvidiactl rm -f $devfile mknod $devfile c $major 255 chmod 0666 $devfileThis will create the nvidia devices with each boot.
(Note that these are obsolete now that Slackware officially supports XFS.)
There are multiple slackware boot disks available. The first is
http://village.flashnet.it/users/fn048069/linux-xfs.html. The page will explain what to do and what the disks
contain. These author of these disks can be contacted at the following address email@example.com
Patches are available for each formal XFS release. For the latest, please see ftp://oss.sgi.com/projects/xfs/download/latest/kernel_patches/. If you don't see the kernel version you want here, you may be interested in the snapshot patches.
Snapshot patches for getting a kernel with XFS can be found on the FTP server in
The most patches here are for the linus tree, patches for the -ac
(Alan Cox) series are not available. Alan Cox can somtimes produce more
then 3 kernels a day which is a pace the SGI people can not keep up with.
If you want to make unofficial patches available for the -ac series and
think you can keep up with the pace drop us a note on the list.
The patches you can find here are provided for recent linus kernels to
either seed a cvs tree which makes it faster to make your own local CVS
tree or patch a linus tree in to a XFS capable kernel tree.
These patches are generally released for each new kernel version. Read the README file in the above URL for more information.
Follow the following link to look what version is in the CVS tree. The top level Makefile for the current development CVS version.
At mount time, there are really three options which will make a difference o biosize (in the released tree the default is 16 or 64K, in the development tree the default is 12 or 4K). Making this larger may help some applications, it will hinder others. o osyncisdsync - indicates that O_SYNC is treated as O_DSYNC, which is the behavior ext2 gives you by default. Without this option, O_SYNC file I/O will sync more metadata for the file. o logbufs=4 or logbufs=8, this increases (from 2) the number of in memory log buffers. This means you can have more active transactions at once, and can still perform metadata changes while the log is being synced to disk. The flip side of this is that the amount of metadata changes which may be lost on crash is greater.
In the original 1.0 release the default biosize was 16 or 64K which resulted in a possibility to make a _really_ large rpm database when rebuilsing it. This has been fixed in versions after 2.4.4. This will be fixed in the 1.0.1+ version.Note that using compression may give problems.
I'm running such a setup for several months already. This only works, because Steve Best added a little tweak upon my request to get this going because XFS modifies some type declaration that JFS depends on. I'm maintaining patches with XFS plus JFS on my ftp server: ftp://ftp.uni-duisburg.de/linux/filesys/or see here http://oss.sgi.com/projects/xfs/mail_archive/0107/msg00025.html
You are doing nothing wrong. XFS is using an extra ioctl to set the block size of the device, it is not implemented for this device. However, a recent review of the code seems to show that we do not need actually need the ioctl anymore. Since the filesystem was made anyway, this is a message you should be able to ignore. Try mounting the filesystem and see what happens.
Things to include are what version of XFS you are using, if this is
a CVS version of what date and version of the kernel. If you have
problems with userland packages please report the version of the
package you are using. This also aplies to what distribution you are running.
The error 990 stands for EFSCORRUPTED which usually means XFS has
detected a metadata problem on the disk and has shut it down.
XFS will slow down doing allocations when it is really full, you are no where near full untill 99.x%. Basically XFS chops the filesystem into allocation groups (1 to 4 Gbytes each), free space is managed independently in each of these. The slowdown happens when you have to scan through lots of allocation groups looking for space to extend a file. There is an in memory summary structure which tells you if it is worth even looking in an allocation group, so it is not a major slowdown - unless you have lots of parallel allocation calls going on at the time.
There is no undelete in XFS, in fact once you delete something, the chances are the space it used to occupy is the first thing reused. Undelete is really something you have to design in from the start. Getting anything back after a accidental rm -rf is near to impossible.
xfs_force_shutdown(ide0(3,8),0x1) called from line 4069 of file xfs_bmap.c. Return address = 0xc017fbcb I/O Error Detected. Shutting down filesystem: ide0(3,8) Please umount the filesystem, and rectify the problem(s)This is error is common when XFS runs into IO error. This error can be caused by either hardware or software failure. The messages have more meaningfull messages to say wheter the corruption happenend in memory or whilst writing to disk. It is there to protect your data. Most of the time people run into a bad cluster on the disk.
This is not always fatal since in some cases the newer IDE drives will map a bad cluster out to a new one that is spare. This will be done untill all spare clusters inside the disk are gone and then will produce errors. Most scsi drives have had this feature since a long time.
Note: If you have S.M.A.R.T. on your IDE disk and controller you can be notified when a drive is going bad. This does not always work right since some disks will only start reporting errors when all the spare clusters are gone while others start barking loudly and give warnings when it starts mapping out bad clusters in the first place. Each disks manufacturer behaves different.
What also can happen is that a bad cluster is detected but it is not restored untill the first powercycle/reboot. This is something that has been observed but should not happen. It's a very rare case. Maybe the folks at linux-ide.org can tell you something more.
If you have a scsi system this will probably mean that a disk has gone bad. Raid systems should not see this error unless _very_ weird things happen or your driver is b0rken. If you are using software raid or LVM this can sometimes be a software problem. This has been observed once up to now on a software raid device. If you can replicate this error please report the problem on the list with as much related info as you can. If it produces a Oops please include the ksymoops output.
* NOTE: XFS 1.1 and kernels => 2.4.18 has the asynchronous delete path which
means that you will see a lot less of these problems. If you still have not updated
to the 1.1 release or later, now would be a good time!
This same will apply to other metadata only journaling filesystems. The current linux kernel VM will write out the metadata after 1/60th of a second and the data after 30 seconds. So the possibility of losing data when unplugging the power within 30 seconds is quite large. The only way of being sure that your data will get to the disk is using fsync in the program of sync after closing the program.
You can find the Partition Image tool at http://www.partimage.org/ which can
create disk images to help speed up cloning systems or making snapshot images for backup.
There is a installer ISO for Red Hat 7.3 available on the
SGI FTP site at
ftp://oss.sgi.com/projects/xfs/download/Release-1.1/installer/. If you
need a installer for Red Hat 7.0 or 7.1 for compatibility requirements you can find it under one
of the testing directories or use an older release.
Christoph Hellwig maintains a status document of what is needed for merging XFS into 2.4 2.4-ac and 2.5.
maintains a page listing the most recent checkins into the CVS tree
since the last release (starting from 27 august 2002).