Posts Tagged ‘ext4’

Ext4, XFS and Btrfs benchmark redux

Tuesday, May 22nd, 2012

As Linux 3.4 was just released and it includes a number of btrfs filesystem changes, I felt it was worth retesting to see if btrfs had better performance.

$ /usr/sbin/bonnie++ -s 8g -n 512

ext4

mkfs -t ext4 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   582  98 59268   6 30754   3  3515  99 104817   4 306.1   1
Latency             15867us    1456ms     340ms    8997us   50112us     323ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 35092  55 520637  91  1054   1 35182  54 791080 100  1664   2
Latency              1232ms     541us   14112ms    1189ms      41us   11701ms
1.96,1.96,colo7,1,1337657098,8G,,582,98,59268,6,30754,3,3515,99,104817,4,306.1,1,512,,,,,35092,55,520637,91,1054,1,35182,54,791080,100,1664,2,15867us,1456ms,340ms,8997us,50112us,323ms,1232ms,541us,14112ms,1189ms,41us,11701ms

ext4 with tuning and mount options

mkfs -t ext4 /dev/sda9
tune2fs -o journal_data_writeback /dev/sda9
mount -o rw,noatime,data=writeback,barrier=0,nobh,commit=60 /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   587  97 64875   6 34046   4  3149  96 105157   4 317.2   4
Latency             13877us     562ms    1351ms   18692us   54835us     287ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 38127  59 525459  92  2118   2 37746  58 792967  99  1433   2
Latency               980ms     525us   14018ms    1056ms      46us   12355ms
1.96,1.96,colo7,1,1337661756,8G,,587,97,64875,6,34046,4,3149,96,105157,4,317.2,4,512,,,,,38127,59,525459,92,2118,2,37746,58,792967,99,1433,2,13877us,562ms,1351ms,18692us,54835us,287ms,980ms,525us,14018ms,1056ms,46us,12355ms

btrfs from ext4 partition

umount /mnt
fsck.ext3 -f /dev/sda9
btrfs-convert /dev/sda9
mount -t btrfs -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   462  98 62854   5 30782   4  3065  88 88883   8 313.1   7
Latency             63644us     272ms     206ms   38178us     241ms     409ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 36868  85 598431  98 26007  93 32002  73 756164  99 21975  84
Latency             15858us     427us    1003us     471us     157us    2161us
1.96,1.96,colo7,1,1337660385,8G,,462,98,62854,5,30782,4,3065,88,88883,8,313.1,7,512,,,,,36868,85,598431,98,26007,93,32002,73,756164,99,21975,84,63644us,272ms,206ms,38178us,241ms,409ms,15858us,427us,1003us,471us,157us,2161us

btrfs without conversion

mkfs -t btrfs /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   468  98 60274   4 29605   4  3629 100 89250   8 301.5   7
Latency             55633us     345ms     196ms    3767us     229ms    1119ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 26078  60 603783  99 26027  92 25617  58 754598  99 21935  84
Latency               452us     423us    1029us     426us      16us    2314us
1.96,1.96,colo7,1,1337661202,8G,,468,98,60274,4,29605,4,3629,100,89250,8,301.5,7,512,,,,,26078,60,603783,99,26027,92,25617,58,754598,99,21935,84,55633us,345ms,196ms,3767us,229ms,1119ms,452us,423us,1029us,426us,16us,2314us

xfs defaults

mkfs -t xfs /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G  1391  96 65559   5 31315   3  2984  99 103339   4 255.8   3
Latency              5625us   33224us     221ms   10524us     103ms     198ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512  7834  37 807425  99 14627  50  8612  41 790321 100  1169   4
Latency              1182ms     123us     837ms    2037ms      18us    7031ms
1.96,1.96,colo7,1,1337660479,8G,,1391,96,65559,5,31315,3,2984,99,103339,4,255.8,3,512,,,,,7834,37,807425,99,14627,50,8612,41,790321,100,1169,4,5625us,33224us,221ms,10524us,103ms,198ms,1182ms,123us,837ms,2037ms,18us,7031ms

xfs tuned

mkfs -t xfs -d agcount=32 -l size=64m /dev/sda9
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G  1413  96 64640   5 31226   3  2977  99 104762   4 246.8   3
Latency              5616us     370ms     235ms   10530us   62654us     206ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 14763  70 793694  98 23959  81 15104  72 790204  99  2290   8
Latency               482ms     118us     274ms     683ms      17us    5201ms
1.96,1.96,colo7,1,1337666959,8G,,1413,96,64640,5,31226,3,2977,99,104762,4,246.8,3,512,,,,,14763,70,793694,98,23959,81,15104,72,790204,99,2290,8,5616us,370ms,235ms,10530us,62654us,206ms,482ms,118us,274ms,683ms,17us,5201ms

btrfs with a snapshot

mkfs -t btrfs /dev/sda9
mount -o noatime,subvolid=0 /dev/sda9 /mnt
wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.4.tar.bz2
tar xjf linux-3.4.tar.bz2
btrfs subvolume snapshot /mnt/ /mnt/@_2012_05_22
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   469  98 58400   5 30092   4  2999  85 89761   8 321.1   3
Latency             17017us     267ms     240ms   22907us     300ms     359ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 31715  72 598360  98 25780  92 26411  59 756110  99 22058  84
Latency               102ms     424us     844us     472us      20us    2171us
1.96,1.96,colo7,1,1337664006,8G,,469,98,58400,5,30092,4,2999,85,89761,8,321.1,3,512,,,,,31715,72,598360,98,25780,92,26411,59,756110,99,22058,84,17017us,267ms,240ms,22907us,300ms,359ms,102ms,424us,844us,472us,20us,2171us

Deleted kernel, left it in snapshot, reran test

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   469  98 63934   5 31244   4  3208  94 90227   8 296.3   7
Latency             17009us     282ms     217ms    3746us     224ms    1269ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 28758  66 596074  97 26185  93 25714  59 755464  99 21893  84
Latency             42108us     424us     993us     445us      17us    2245us
1.96,1.96,colo7,1,1337671128,8G,,469,98,63934,5,31244,4,3208,94,90227,8,296.3,7,512,,,,,28758,66,596074,97,26185,93,25714,59,755464,99,21893,84,17009us,282ms,217ms,3746us,224ms,1269ms,42108us,424us,993us,445us,17us,2245us

Updated results using some different parameters. Same hardware, same hard drive.

leaf and btree size of 16384

mkfs -t btrfs -l 16384 -n 16384 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   472  98 37514   2 14395   2  3135  89 80600   7 294.0   6
Latency             16820us     781ms     383ms   19736us     230ms     379ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 17447  46 621480  99 24345  94 14984  39 754999  99 19873  82
Latency               303us     494us     900us     412us     107us    3127us
1.96,1.96,colo7,1,1338411461,8G,,472,98,37514,2,14395,2,3135,89,80600,7,294.0,6,512,,,,,17447,46,621480,99,24345,94,14984,39,754999,99,19873,82,16820us,781ms,383ms,19736us,230ms,379ms,303us,494us,900us,412us,107us,3127us

leaf and btree size of 32768

mkfs -t btrfs -l 32768 -n 32768 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   468  97 26136   2 17256   2  3135  89 84450   7 306.5   7
Latency             43238us     923ms     330ms   12632us     367ms     986ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 17958  61 624570  99 19930  95 14506  50 753354  99 15976  80
Latency             15384us     514us     937us     431us     144us    4782us
1.96,1.96,colo7,1,1338409200,8G,,468,97,26136,2,17256,2,3135,89,84450,7,306.5,7,512,,,,,17958,61,624570,99,19930,95,14506,50,753354,99,15976,80,43238us,923ms,330ms,12632us,367ms,986ms,15384us,514us,937us,431us,144us,4782us

leaf and btree size of 65536

mkfs -t btrfs -l 65536 -n 65536 /dev/sda9
mount -o noatime /dev/sda9 /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
colo7            8G   467  97 25097   2 17349   2  2845  87 86653   8 300.2   7
Latency             56046us     772ms     414ms    4101us     249ms     241ms
Version  1.96       ------Sequential Create------ --------Random Create--------
colo7               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 15372  68 626336  98 14723  96 13137  58 753463 100 11652  80
Latency             15825us     395us   77890us     428us      19us   15727us
1.96,1.96,colo7,1,1338410439,8G,,467,97,25097,2,17349,2,2845,87,86653,8,300.2,7,512,,,,,15372,68,626336,98,14723,96,13137,58,753463,100,11652,80,56046us,772ms,414ms,4101us,249ms,241ms,15825us,395us,77890us,428us,19us,15727us

Analysis

Last time I tested ext4, xfs and btrfs, deletions really lagged behind. Now, it looks like btrfs is quite a bit more robust. Additionally, there are better repair and recovery tools which were basically missing before. btrfs doesn’t lag behind like it used to, and while it is a little slower in some cases, it’s only a few percent. However, it makes up for that with some of the random and sequential creation and deletions.

Rough analysis at this point – if you need a versioning filesystem and don’t mind being a bit on the bleeding edge, btrfs has made substantial strides.

Updated Analysis

For the hardware in question, it appears that the larger block sizes with Bonnie++ don’t benefit things, but, make sure you test with your workload.

Test Equipment

  • Linux colo7 3.4.0 #1 SMP Mon May 21 00:29:58 EDT 2012 x86_64 GNU/Linux
  • Intel(R) Xeon(R) CPU X3220 @ 2.40GHz
  • WDC WD7500AACS-0 01.0 PQ: 0 ANSI: 5
  • ahci enabled
  • 100gb partition
  • machine rebooted between each test

Versioning Filesystem choices using OSS

Tuesday, November 1st, 2011

One of the clusters we have uses DRBD between two machines with GFS2 mounted on DRBD in dual primary. I’d played around with Gluster and Lustre, OCFS2, AFS and many others and I’ve used NetApps in the past, but, I’ve never been extremely happy with any of the distributed and clustered filesystems.

With my recent thinking on SetUID mode or SetGID to deal with particular problems led me to look at a versioning filesystem. Currently that leaves ZFS and BtrFS.

I’ve used ZFS in the past on Solaris and it is supported natively within FreeBSD. Since we use Debian, there is Debian’s K*BSD project which puts the Debian userland on the BSD kernel – making most of our in-house management processes easy to convert. Using ZFS under Linux requires using Fuse which could introduce performance issues.

The other option we have is BtrFS. BtrFS is less mature, but, also has the ability to handle in-place migrations from ext3/ext4. While this doesn’t really help much since we primarily run XFS, future machines could use ext4 until BtrFS is deemed stable enough at which point they could be live converted.

In testing, XFS and Ext4 have similar performance when well tuned which means we shouldn’t see any real significant difference with either. Granted this disagrees with some current benchmarks, but, those benchmarks didn’t appear to set the filesystem up correctly and didn’t modify the mount parameters to allow for more buffers to be used. When dealing with small files, XFS needs a little more RAM and the journal logbuffers needs to be increased – keeping more of the log in RAM before being replayed and committed. Large file performance is usually deemed superior with XFS, but, properly tuning Ext3 (and by inference, Ext4), we can change the performance characteristics of Ext3/4 and get about 95% of XFS’s large file performance.

Currently we keep two generations of weekly machine backups. While this wouldn’t change, we actually could do checkpointing and more frequent snapshots so that a file uploaded and modified or deleted would have a much better chance of being able to be restored. One of the things about versioning filesystems is the ability to do hourly or daily snapshots which should allow us to reduce the data loss if a site is exploited or catastrophically damaged through a mistake.

So, we’ve got three potential solutions in order of confidence that the solution will work:

* FreeBSD ZFS
* Debian/K*BSD ZFS
* Debian BtrFS

This weekend I’ll start putting the two Debian solutions through their paces to see if I feel comfortable with either. I’ve got a chassis swap to do this week and we’ll probably switch that machine from XFS to Ext4 in preparation as well. Most of the new machines we’ve been putting online now use Ext4 due to some of the issues I’ve had with XFS.

Ideally, I would like to start using BtrFS on every machine, but, if I need to move things over to FreeBSD, I would have to make some very tough decisions and migrations.

Never a dull moment.

Entries (RSS) and Comments (RSS).
Cluster host: li