Recently I talked about versioning filesystems available for OSS systems. While most of our server farms use XFS, we have been moving to Ext4 on a number of machines. This wasn’t done as a precursor to BtrFS but problems we’ve been having with XFS on very large filesystems. The fact that we can migrate Ext4 to BtrFS in-place is just a coincidental bonus.
While ZFS is still a consideration if we move to FreeBSD (I was not suitably impressed with Debian’s K*BSD project enough to consider it stable enough for production), I felt that looking at BtrFS might be worth a look. There is also CephFS but that requires a little more infrastructure as you need to run a cluster and it isn’t really made for single machine deployments.
We’re also going to make some assumptions and do things you might not want to do on a home system. Since the data center we’re in has a 100% power SLA, we can be sure that we won’t lose power and can be a little more aggressive. We disable atime which may negatively impact clients if you are dealing with a disk that handles your mailspool. Also, recent versions of XFS handle atime updates much differently, so, the performance boost from noatime is negligible.
Command used to test:
/usr/sbin/bonnie++ -s 8g -n 512
Ext4
mkfs -t ext4 /dev/sda5
mount -o noatime /dev/sda5 /mnt
Results:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 332 99 54570 12 23512 5 1615 98 62905 6 131.8 3
Latency 24224us 471ms 370ms 13739us 110ms 5257ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 24764 69 267757 98 2359 6 25084 67 388005 98 1403 3
Latency 1258ms 1402us 11767ms 1187ms 66us 11682ms
1.96,1.96,version,1,1320193244,8G,,332,99,54570,12,23512,5,1615,98,62905,6,131.8,3,512,,,,,24764,69,267757,98,2359,6,25084,67,388005,98,1403,3,24224us,471ms,370ms,13739us,110ms,5257ms,1258ms,1402us,11767ms,1187ms,66us,11682ms
Ext4 with journal conversion and mount options
mkfs -t ext4 /dev/sda5
tune2fs -o journal_data_writeback /dev/sda5
mount -o rw,noatime,data=writeback,barrier=0,nobh,commit=60 /dev/sda5 /mnt
Results:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 335 99 53396 11 25240 6 1619 99 62724 6 130.9 5
Latency 23955us 380ms 231ms 15962us 143ms 16261ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 24253 65 266963 98 2341 6 24567 65 389243 98 1392 3
Latency 1232ms 1405us 11500ms 1232ms 130us 11543ms
1.96,1.96,version,1,1320192213,8G,,335,99,53396,11,25240,6,1619,99,62724,6,130.9,5,512,,,,,24253,65,266963,98,2341,6,24567,65,389243,98,1392,3,23955us,380ms,231ms,15962us,143ms,16261ms,1232ms,1405us,11500ms,1232ms,130us,11543ms
XFS:
mount -t xfs -f /dev/sda5
mount -o noatime /dev/sda5 /mnt
Results:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 558 98 55174 9 26660 6 1278 96 62598 6 131.4 5
Latency 14264us 227ms 253ms 77527us 85140us 773ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 2468 19 386301 99 4311 25 2971 22 375624 99 546 3
Latency 1986ms 346us 1341ms 1580ms 82us 5904ms
1.96,1.96,version,1,1320194740,8G,,558,98,55174,9,26660,6,1278,96,62598,6,131.4,5,512,,,,,2468,19,386301,99,4311,25,2971,22,375624,99,546,3,14264us,227ms,253ms,77527us,85140us,773ms,1986ms,346us,1341ms,1580ms,82us,5904ms
XFS, mount options:
mkfs -t xfs -f /dev/sda5
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda5 /mnt
Results:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 563 98 55423 9 26710 6 1328 99 62650 6 129.5 5
Latency 14401us 345ms 298ms 20328us 119ms 357ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 3454 26 385552 100 5966 35 4459 34 375917 99 571 3
Latency 1625ms 360us 1323ms 1243ms 67us 5060ms
1.96,1.96,version,1,1320196498,8G,,563,98,55423,9,26710,6,1328,99,62650,6,129.5,5,512,,,,,3454,26,385552,100,5966,35,4459,34,375917,99,571,3,14401us,345ms,298ms,20328us,119ms,357ms,1625ms,360us,1323ms,1243ms,67us,5060ms
XFS, file system creation options and mount options:
mkfs -t xfs -d agcount=32 -l size=64m -f /dev/sda5
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda5 /mnt
Results:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 561 97 54674 9 26502 6 1235 95 62613 6 131.4 5
Latency 14119us 346ms 247ms 94238us 76841us 697ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 9576 73 383305 100 14398 85 9156 70 373557 99 2375 14
Latency 1110ms 375us 301ms 850ms 36us 5772ms
1.96,1.96,version,1,1320198613,8G,,561,97,54674,9,26502,6,1235,95,62613,6,131.4,5,512,,,,,9576,73,383305,100,14398,85,9156,70,373557,99,2375,14,14119us,346ms,247ms,94238us,76841us,697ms,1110ms,375us,301ms,850ms,36us,5772ms
BtrFS:
mkfs -t btrfs /dev/sda5
mount -o noatime /dev/sda5 /mnt
Also, make sure CONFIG_CRYPTO_CRC32C_INTEL is set in the kernel, or loaded as a module and use an Intel CPU.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
version 8G 254 99 54778 9 23070 8 1407 92 59932 13 131.2 5
Latency 31553us 264ms 826ms 94269us 180ms 17963ms
Version 1.96 ------Sequential Create------ --------Random Create--------
version -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 17034 83 256486 100 13485 97 15282 76 38942 73 1472 23
Latency 126ms 2162us 11295us 71992us 20713us 28647ms
1.96,1.96,version,1,1320204006,8G,,254,99,54778,9,23070,8,1407,92,59932,13,131.2,5,512,,,,,17034,83,256486,100,13485,97,15282,76,38942,73,1472,23,31553us,264ms,826ms,94269us,180ms,17963ms,126ms,2162us,11295us,71992us,20713us,28647ms
Analysis
Ext4 is considerably better than Ext3 was last time we ran the check. Even with the allocation group tweaks and mount options we use, Ext4 isn’t a bad alternative and shows some improvements over XFS. However, BtrFS even with the Intel CRC hardware acceleration, the Random Create Read benchmark shows a significant drop.
I believe our recent conversion to Ext4 isn’t negatively impacting things based on the typical workload machines see.
I’ll continue to work with BtrFS and see if I can figure out why that one particular benchmark performs so poorly, but, some of the other options present in BtrFS since it is a versioning filesystem will be quite useful.
Machine specs:
* Linux version 3.1.0 #3 SMP Tue Nov 1 16:23:42 EDT 2011 i686 GNU/Linux
* P4/3.0ghz, 2gb RAM
* Western Digital SATA2 320gb 7200 RPM Drive