Posts Tagged ‘xfs’

btrfs gets very slow, metadata almost full

Thursday, March 7th, 2013

One of our storage servers that has had problems in the past. Originally it seemed like XFS was having a problem with the large filesystem, so, we gambled and decided to use btrfs. After eight days running, the machine has gotten extremely slow for disk I/O to the point where backups that should take minutes, were taking hours.

Switching the disk scheduler from cfq to noop to deadline appeared to have only short-term benefits at which point the machine bogged down again.

We’re running an Adaptec 31205 with 11 Western Digital 2.0 terabyte drives in hardware Raid 5 with roughly 19 terabytes accessible on our filesystem. During the first few days of backups, we would easily hit 800mb/sec inbound, but, after a few machines had been backed up to the server, 100mb/sec was optimistic with 20-40mb/sec being more normal. We originally attributed this to rsync of thousands of smaller files rather than the large files moved on some of the earlier machines. Once we started overlapping machines to get their second generational backup, the problem was much more evident.

The Filesystem:

# df -h /colobk1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8        19T  8.6T  9.6T  48% /colobk1

# btrfs fi show
Label: none  uuid: 3cd405c7-5d7d-42bd-a630-86ec3ca452d7
	Total devices 1 FS bytes used 8.44TB
	devid    1 size 18.14TB used 8.55TB path /dev/sda8

Btrfs Btrfs v0.19

# btrfs filesystem df /colobk1
Data: total=8.34TB, used=8.34TB
System, DUP: total=8.00MB, used=940.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=106.25GB, used=104.91GB
Metadata: total=8.00MB, used=0.00

The machine

# uname -a
Linux st1 3.8.0 #1 SMP Tue Feb 19 16:09:18 EST 2013 x86_64 GNU/Linux

# btrfs --version
Btrfs Btrfs v0.19

As it stands, we appear to be running out of Metadata space. Since our used metadata space is more than 75% of our total metadata space, updates are taking forever. The initial filesystem was not created with any special inode or leaf parameters, so, it is using the defaults.

The btrfs wiki points to this particular tuning option which seems like it might do the trick. Since you can run the balance while the filesystem is in use and check its status, we should be able to see whether it is making a difference.

I don’t believe it is going to make a difference as we have only a single device exposed to btrfs, but, here’s the command we’re told to use:

btrfs fi balance start -dusage=5 /colobk1

After a while, the box returned with:

# btrfs fi balance start -dusage=5 /colobk1
Done, had to relocate 0 out of 8712 chunks

# btrfs fi df /colobk1
Data: total=8.34TB, used=8.34TB
System, DUP: total=8.00MB, used=940.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=107.25GB, used=104.95GB
Metadata: total=8.00MB, used=0.00

So it added 1GB to the metadata size. At first glance, it is still taking considerable time to do the backup of a single machine of 9.7gb – over 2 hours and 8 minutes when the first backup took under 50 minutes. I would say that the balance didn’t do anything positive as we have a single device. I suspect that the leafsize and nodesize might be the difference here – requiring a format and backup of 8.6 terabytes of data again. It took two and a half minutes to unmount the partition after it had bogged down and after running the balance.

mkfs -t btrfs -l 32768 -n 32768 /dev/sda8

# btrfs fi df /colobk1
Data: total=8.00MB, used=0.00
System, DUP: total=8.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=192.00KB
Metadata: total=8.00MB, used=0.00

# df -h /colobk1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8        19T   72M   19T   1% /colobk1

XFS took 52 minutes to back up the machine. XFS properly tuned took 51 minutes. Btrfs tested with the leafnode set took 51 minutes. I suspect I need to run things for a week to get the extent’s close to filled again and check it again. In any case, it is a lot faster than it was with the default settings.

* Official btrfs wiki

Ext4, XFS and Btrfs benchmark redux

Tuesday, May 22nd, 2012

As Linux 3.4 was just released and it includes a number of btrfs filesystem changes, I felt it was worth retesting to see if btrfs had better performance.

$ /usr/sbin/bonnie++ -s 8g -n 512

ext4

mkfs -t ext4 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   582  98 59268   6 30754   3  3515  99 104817   4 306.1   1
Latency             15867us    1456ms     340ms    8997us   50112us     323ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 35092  55 520637  91  1054   1 35182  54 791080 100  1664   2
Latency              1232ms     541us   14112ms    1189ms      41us   11701ms
1.96,1.96,colo7,1,1337657098,8G,,582,98,59268,6,30754,3,3515,99,104817,4,306.1,1,512,,,,,35092,55,520637,91,1054,1,35182,54,791080,100,1664,2,15867us,1456ms,340ms,8997us,50112us,323ms,1232ms,541us,14112ms,1189ms,41us,11701ms

ext4 with tuning and mount options

mkfs -t ext4 /dev/sda9
tune2fs -o journal_data_writeback /dev/sda9
mount -o rw,noatime,data=writeback,barrier=0,nobh,commit=60 /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   587  97 64875   6 34046   4  3149  96 105157   4 317.2   4
Latency             13877us     562ms    1351ms   18692us   54835us     287ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 38127  59 525459  92  2118   2 37746  58 792967  99  1433   2
Latency               980ms     525us   14018ms    1056ms      46us   12355ms
1.96,1.96,colo7,1,1337661756,8G,,587,97,64875,6,34046,4,3149,96,105157,4,317.2,4,512,,,,,38127,59,525459,92,2118,2,37746,58,792967,99,1433,2,13877us,562ms,1351ms,18692us,54835us,287ms,980ms,525us,14018ms,1056ms,46us,12355ms

btrfs from ext4 partition

umount /mnt
fsck.ext3 -f /dev/sda9
btrfs-convert /dev/sda9
mount -t btrfs -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   462  98 62854   5 30782   4  3065  88 88883   8 313.1   7
Latency             63644us     272ms     206ms   38178us     241ms     409ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 36868  85 598431  98 26007  93 32002  73 756164  99 21975  84
Latency             15858us     427us    1003us     471us     157us    2161us
1.96,1.96,colo7,1,1337660385,8G,,462,98,62854,5,30782,4,3065,88,88883,8,313.1,7,512,,,,,36868,85,598431,98,26007,93,32002,73,756164,99,21975,84,63644us,272ms,206ms,38178us,241ms,409ms,15858us,427us,1003us,471us,157us,2161us

btrfs without conversion

mkfs -t btrfs /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   468  98 60274   4 29605   4  3629 100 89250   8 301.5   7
Latency             55633us     345ms     196ms    3767us     229ms    1119ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 26078  60 603783  99 26027  92 25617  58 754598  99 21935  84
Latency               452us     423us    1029us     426us      16us    2314us
1.96,1.96,colo7,1,1337661202,8G,,468,98,60274,4,29605,4,3629,100,89250,8,301.5,7,512,,,,,26078,60,603783,99,26027,92,25617,58,754598,99,21935,84,55633us,345ms,196ms,3767us,229ms,1119ms,452us,423us,1029us,426us,16us,2314us

xfs defaults

mkfs -t xfs /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G  1391  96 65559   5 31315   3  2984  99 103339   4 255.8   3
Latency              5625us   33224us     221ms   10524us     103ms     198ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512  7834  37 807425  99 14627  50  8612  41 790321 100  1169   4
Latency              1182ms     123us     837ms    2037ms      18us    7031ms
1.96,1.96,colo7,1,1337660479,8G,,1391,96,65559,5,31315,3,2984,99,103339,4,255.8,3,512,,,,,7834,37,807425,99,14627,50,8612,41,790321,100,1169,4,5625us,33224us,221ms,10524us,103ms,198ms,1182ms,123us,837ms,2037ms,18us,7031ms

xfs tuned

mkfs -t xfs -d agcount=32 -l size=64m /dev/sda9
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G  1413  96 64640   5 31226   3  2977  99 104762   4 246.8   3
Latency              5616us     370ms     235ms   10530us   62654us     206ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 14763  70 793694  98 23959  81 15104  72 790204  99  2290   8
Latency               482ms     118us     274ms     683ms      17us    5201ms
1.96,1.96,colo7,1,1337666959,8G,,1413,96,64640,5,31226,3,2977,99,104762,4,246.8,3,512,,,,,14763,70,793694,98,23959,81,15104,72,790204,99,2290,8,5616us,370ms,235ms,10530us,62654us,206ms,482ms,118us,274ms,683ms,17us,5201ms

btrfs with a snapshot

mkfs -t btrfs /dev/sda9
mount -o noatime,subvolid=0 /dev/sda9 /mnt
wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.4.tar.bz2
tar xjf linux-3.4.tar.bz2
btrfs subvolume snapshot /mnt/ /mnt/@_2012_05_22
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   469  98 58400   5 30092   4  2999  85 89761   8 321.1   3
Latency             17017us     267ms     240ms   22907us     300ms     359ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 31715  72 598360  98 25780  92 26411  59 756110  99 22058  84
Latency               102ms     424us     844us     472us      20us    2171us
1.96,1.96,colo7,1,1337664006,8G,,469,98,58400,5,30092,4,2999,85,89761,8,321.1,3,512,,,,,31715,72,598360,98,25780,92,26411,59,756110,99,22058,84,17017us,267ms,240ms,22907us,300ms,359ms,102ms,424us,844us,472us,20us,2171us

Deleted kernel, left it in snapshot, reran test

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   469  98 63934   5 31244   4  3208  94 90227   8 296.3   7
Latency             17009us     282ms     217ms    3746us     224ms    1269ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 28758  66 596074  97 26185  93 25714  59 755464  99 21893  84
Latency             42108us     424us     993us     445us      17us    2245us
1.96,1.96,colo7,1,1337671128,8G,,469,98,63934,5,31244,4,3208,94,90227,8,296.3,7,512,,,,,28758,66,596074,97,26185,93,25714,59,755464,99,21893,84,17009us,282ms,217ms,3746us,224ms,1269ms,42108us,424us,993us,445us,17us,2245us

Updated results using some different parameters. Same hardware, same hard drive.

leaf and btree size of 16384

mkfs -t btrfs -l 16384 -n 16384 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   472  98 37514   2 14395   2  3135  89 80600   7 294.0   6
Latency             16820us     781ms     383ms   19736us     230ms     379ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 17447  46 621480  99 24345  94 14984  39 754999  99 19873  82
Latency               303us     494us     900us     412us     107us    3127us
1.96,1.96,colo7,1,1338411461,8G,,472,98,37514,2,14395,2,3135,89,80600,7,294.0,6,512,,,,,17447,46,621480,99,24345,94,14984,39,754999,99,19873,82,16820us,781ms,383ms,19736us,230ms,379ms,303us,494us,900us,412us,107us,3127us

leaf and btree size of 32768

mkfs -t btrfs -l 32768 -n 32768 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   468  97 26136   2 17256   2  3135  89 84450   7 306.5   7
Latency             43238us     923ms     330ms   12632us     367ms     986ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 17958  61 624570  99 19930  95 14506  50 753354  99 15976  80
Latency             15384us     514us     937us     431us     144us    4782us
1.96,1.96,colo7,1,1338409200,8G,,468,97,26136,2,17256,2,3135,89,84450,7,306.5,7,512,,,,,17958,61,624570,99,19930,95,14506,50,753354,99,15976,80,43238us,923ms,330ms,12632us,367ms,986ms,15384us,514us,937us,431us,144us,4782us

leaf and btree size of 65536

mkfs -t btrfs -l 65536 -n 65536 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   467  97 25097   2 17349   2  2845  87 86653   8 300.2   7
Latency             56046us     772ms     414ms    4101us     249ms     241ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 15372  68 626336  98 14723  96 13137  58 753463 100 11652  80
Latency             15825us     395us   77890us     428us      19us   15727us
1.96,1.96,colo7,1,1338410439,8G,,467,97,25097,2,17349,2,2845,87,86653,8,300.2,7,512,,,,,15372,68,626336,98,14723,96,13137,58,753463,100,11652,80,56046us,772ms,414ms,4101us,249ms,241ms,15825us,395us,77890us,428us,19us,15727us

Analysis

Last time I tested ext4, xfs and btrfs, deletions really lagged behind. Now, it looks like btrfs is quite a bit more robust. Additionally, there are better repair and recovery tools which were basically missing before. btrfs doesn’t lag behind like it used to, and while it is a little slower in some cases, it’s only a few percent. However, it makes up for that with some of the random and sequential creation and deletions.

Rough analysis at this point – if you need a versioning filesystem and don’t mind being a bit on the bleeding edge, btrfs has made substantial strides.

Updated Analysis

For the hardware in question, it appears that the larger block sizes with Bonnie++ don’t benefit things, but, make sure you test with your workload.

Test Equipment

  • Linux colo7 3.4.0 #1 SMP Mon May 21 00:29:58 EDT 2012 x86_64 GNU/Linux
  • Intel(R) Xeon(R) CPU X3220 @ 2.40GHz
  • WDC WD7500AACS-0 01.0 PQ: 0 ANSI: 5
  • ahci enabled
  • 100gb partition
  • machine rebooted between each test

XFS Filesystem Benchmarking thoughts due to real world situation

Tuesday, November 15th, 2011

I suspect I’m going to learn something about XFS today that is going to create a ton of work.

Writing a benchmark test (in bash) that uses bonnie++ to benchmark along with two other real world scenarios. I’ve thought a lot about how to replicate some real-world testing, but, benchmarks usually stress a worst case scenario and rarely replicate real-world scenarios.

Benchmarks shouldn’t really be published as the end-all, be-all of testing and you have to make sure you’re testing the piece you’re looking at, not your benchmark tool.

I’ve tried to benchmark Varnish, Tux, Nginx multiple times and I’ve seen numerous benchmarks that claim one is insanely faster than the others for this workload or that workload, but, there are always potential issues in their tests. A benchmark should be published with every bit of information possible so that others can replicate the test in the same way and potentially point out configuration issues.

One benchmark I read lately showed Nginx reading a static file at 35krps, and Varnish flatlined at 8krps. My first thought was, is it caching or talking to the backend? There was a snapshot of varnishstat supporting the notion that it was indeed cached, but, was the shmlog mounted on a ram based tmpfs? Was varnishstat running while the benchmark was?

Benchmarks test particular workloads – workloads you may not see. What you learn from a benchmark is how this load is affected by that setting – so when you start to see symptoms, your benchmarking has taught you what knob to turn to fix things.

Based on my impressions of the filesystem issues we’re running into on a few production boxes, I am convinced lazy-count=0 is a problem. While I did benchmark it and received different results, logic dictates that lazy-count=1 should be enabled for almost all workloads. Another value I’m looking at is -i size=256 – which is the default for XFS. I believe this should be larger which would really assist directories with tens of thousands of files. -b 8192 might be a good compromise since many of these sites are running small files, but, the average filesize is 5120 bytes – slightly over the 4096 byte block – meaning that each file written receives two inodes – and two metadata updates. logsize should be increased on heavy write machines, and I believe the default setting is too low even for normal workloads.

With that in mind, I’ve got 600 permutations of filesystem tests, which need to be run four times to check each mount option, which again need to be run three times to check each IO scheduler.

I’ll use the same methodology to test ext4 which is going to be a lot easier due to fewer knobs, but, I believe XFS is still going to win based on some earlier testing I did.

In this quick test, I increased deletes from about 5.3k/sec to 13.4k/sec which took a little more than six minutes. I suspect this machine will be running tests for the next few days after I write the test script.

Versioning Filesystem choices using OSS

Tuesday, November 1st, 2011

One of the clusters we have uses DRBD between two machines with GFS2 mounted on DRBD in dual primary. I’d played around with Gluster and Lustre, OCFS2, AFS and many others and I’ve used NetApps in the past, but, I’ve never been extremely happy with any of the distributed and clustered filesystems.

With my recent thinking on SetUID mode or SetGID to deal with particular problems led me to look at a versioning filesystem. Currently that leaves ZFS and BtrFS.

I’ve used ZFS in the past on Solaris and it is supported natively within FreeBSD. Since we use Debian, there is Debian’s K*BSD project which puts the Debian userland on the BSD kernel – making most of our in-house management processes easy to convert. Using ZFS under Linux requires using Fuse which could introduce performance issues.

The other option we have is BtrFS. BtrFS is less mature, but, also has the ability to handle in-place migrations from ext3/ext4. While this doesn’t really help much since we primarily run XFS, future machines could use ext4 until BtrFS is deemed stable enough at which point they could be live converted.

In testing, XFS and Ext4 have similar performance when well tuned which means we shouldn’t see any real significant difference with either. Granted this disagrees with some current benchmarks, but, those benchmarks didn’t appear to set the filesystem up correctly and didn’t modify the mount parameters to allow for more buffers to be used. When dealing with small files, XFS needs a little more RAM and the journal logbuffers needs to be increased – keeping more of the log in RAM before being replayed and committed. Large file performance is usually deemed superior with XFS, but, properly tuning Ext3 (and by inference, Ext4), we can change the performance characteristics of Ext3/4 and get about 95% of XFS’s large file performance.

Currently we keep two generations of weekly machine backups. While this wouldn’t change, we actually could do checkpointing and more frequent snapshots so that a file uploaded and modified or deleted would have a much better chance of being able to be restored. One of the things about versioning filesystems is the ability to do hourly or daily snapshots which should allow us to reduce the data loss if a site is exploited or catastrophically damaged through a mistake.

So, we’ve got three potential solutions in order of confidence that the solution will work:

* FreeBSD ZFS
* Debian/K*BSD ZFS
* Debian BtrFS

This weekend I’ll start putting the two Debian solutions through their paces to see if I feel comfortable with either. I’ve got a chassis swap to do this week and we’ll probably switch that machine from XFS to Ext4 in preparation as well. Most of the new machines we’ve been putting online now use Ext4 due to some of the issues I’ve had with XFS.

Ideally, I would like to start using BtrFS on every machine, but, if I need to move things over to FreeBSD, I would have to make some very tough decisions and migrations.

Never a dull moment.

Finding my XFS Bug

Thursday, October 6th, 2011

Recently one of our servers had some filesystem corruption – corruption that has occurred more than once over time. As we use hardlinks a lot with link-dest and rsync, I’m reasonably sure the issue occurs due to the massive number of hardlinks and deletions that take place on that system.

I’ve written a small script to repeatedly test things and started it running a few minutes ago. My guess is that the problem should show up in a few days.

#!/bin/bash

RSYNC=/usr/bin/rsync
REVISIONS=10

function rsync_kernel () {
  DATE=`date +%Y%m%d%H%M%S`

  BDATES=""
  loop=0
  for f in `ls -d1 /tmp/2011*`
  do
    BDATES[$loop]=$f
    loop=$(($loop+1))
  done

  CT=${#BDATES[*]}

  if (( $CT > 0 ))
  then
    RECENT=${BDATES[$(($CT-1))]}
    LINKDEST=" --link-dest=$RECENT"
  else
    RECENT="/tmp/linux-3.0.3"
    LINKDEST=" --link-dest=/tmp/linux-3.0.3"
  fi

  $RSYNC -aplxo $LINKDEST $RECENT/ $DATE/

  if (( ${#BDATES[*]} >= $REVISIONS ))
  then
    DELFIRST=$(( ${#BDATES[*]} - $REVISIONS ))
    loop=0
    for d in ${BDATES[*]}
      do
        if (( $loop < = $DELFIRST ))
        then
          `rm -rf $d`
        fi
        loop=$(($loop+1))
      done
  fi
}

while [ 1==1 ]
do
  rsync_kernel
  echo .
  sleep 1
done

Entries (RSS) and Comments (RSS).
Cluster host: li