For a few years I’ve run a system using DRBD replication between two machines with GFS2 running in dual primary mode to test a theory on a particular type of web hosting I’ve been developing.
For months the system will run fine, then, out of the blue, one of the nodes will drop from the cluster, reboot and we’ve never seen anything in the logs. It’ll run another 120-180 days without incident and then will reboot again with no real indication of the problem. We knew it was a kernel panic or kernel oops, but, the logs never were flushed to disk when the machine was rebooted.
Imagine our luck when two days in a row, at roughly the same time of day, the node rebooted. Even though we have remote syslog set up, we’ve never caught it.
/etc/sysctl.conf was changed so that panic_on_oops was set to 0, a number of terminal sessions were opened from another machine tailing various logs, and we were hoping to have the problem occur again.
/etc/sysctl.conf:
kernel.panic=5
kernel.panic_on_oops=0
At 6:25am, coincidentally during log rotation, the GFS2 partition umounted, but, the machine didn’t reboot. Checking our terminal, we still had access to dmesg, and, we had some logs:
GFS2: fsid=gamma:gfs1.0: fatal: invalid metadata block
GFS2: fsid=gamma:gfs1.0: bh = 1211322 (magic number)
GFS2: fsid=gamma:gfs1.0: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 401
GFS2: fsid=gamma:gfs1.0: about to withdraw this file system
GFS2: fsid=gamma:gfs1.0: telling LM to unmount
GFS2: fsid=gamma:gfs1.0: withdrawn
Pid: 18047, comm: gzip Not tainted 3.0.0 #1
Call Trace:
[] ? gfs2_lm_withdraw+0xd9/0x10a
[] ? gfs2_meta_check_ii+0x3c/0x48
[] ? gfs2_meta_indirect_buffer+0xf0/0x14a
[] ? gfs2_block_map+0x1a3/0x9fe
[] ? drive_stat_acct+0xf3/0x12e
[] ? do_mpage_readpage+0x160/0x49f
[] ? pagevec_lru_move_fn+0xab/0xc1
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? mpage_readpages+0xd0/0x12a
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? bit_waitqueue+0x14/0x63
[] ? gfs2_readpages+0x67/0xa8
[] ? sd_prep_fn+0x2c1/0x902
[] ? gfs2_readpages+0x3b/0xa8
[] ? __do_page_cache_readahead+0x11b/0x1c0
[] ? ra_submit+0x19/0x1d
[] ? generic_file_aio_read+0x2b4/0x5e0
[] ? do_sync_read+0xab/0xe3
[] ? vfs_read+0xa3/0x10f
[] ? sys_read+0x45/0x6e
[] ? system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
WARNING: at fs/buffer.c:1188 gfs2_block_map+0x2be/0x9fe()
Hardware name: PDSMi
VFS: brelse: Trying to free free buffer
Modules linked in:
Pid: 18047, comm: gzip Not tainted 3.0.0 #1
Call Trace:
[] ? gfs2_block_map+0x2be/0x9fe
[] ? warn_slowpath_common+0x78/0x8c
[] ? warn_slowpath_fmt+0x45/0x4a
[] ? gfs2_block_map+0x2be/0x9fe
[] ? drive_stat_acct+0xf3/0x12e
[] ? do_mpage_readpage+0x160/0x49f
[] ? pagevec_lru_move_fn+0xab/0xc1
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? mpage_readpages+0xd0/0x12a
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? bit_waitqueue+0x14/0x63
[] ? gfs2_readpages+0x67/0xa8
[] ? sd_prep_fn+0x2c1/0x902
[] ? gfs2_readpages+0x3b/0xa8
[] ? __do_page_cache_readahead+0x11b/0x1c0
[] ? ra_submit+0x19/0x1d
[] ? generic_file_aio_read+0x2b4/0x5e0
[] ? do_sync_read+0xab/0xe3
[] ? vfs_read+0xa3/0x10f
[] ? sys_read+0x45/0x6e
[] ? system_call_fastpath+0x16/0x1b
---[ end trace 54fad1a4877f173c ]---
BUG: unable to handle kernel paging request at ffffffff813b8f0f
IP: [] __brelse+0x7/0x26
PGD 1625067 PUD 1629063 PMD 12001e1
Oops: 0003 [#1] SMP
CPU 0
Modules linked in:
Pid: 18047, comm: gzip Tainted: G W 3.0.0 #1 Supermicro PDSMi/PDSMi+
RIP: 0010:[] [] __brelse+0x7/0x26
RSP: 0018:ffff880185d85800 EFLAGS: 00010286
RAX: 00000000e8df8948 RBX: ffff8801f3fb6c18 RCX: ffff880185d857d0
RDX: 0000000000000010 RSI: 000000000002ccee RDI: ffffffff813b8eaf
RBP: 0000000000000000 R08: ffff880185d85890 R09: ffff8801f3fb6c18
R10: 00000000000029e0 R11: 0000000000000078 R12: ffff880147986000
R13: ffff880147986140 R14: 00000000000029c1 R15: 0000000000001000
FS: 00007fb7d320f700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff813b8f0f CR3: 0000000212a73000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gzip (pid: 18047, threadinfo ffff880185d84000, task ffff880147dc4ed0)
Stack:
ffffffff811537f6 0000000000000000 00000000000022a1 0000000000000001
000000000000ffff 0000000000000000 ffffffff8102a7ce 0000000000000001
0000000100000008 00000000fffffffb 0000000000000001 ffff8801f3fb6c18
Call Trace:
[] ? gfs2_block_map+0x2be/0x9fe
[] ? warn_slowpath_common+0x7d/0x8c
[] ? printk+0x43/0x48
[] ? alloc_page_buffers+0x62/0xba
[] ? block_read_full_page+0x141/0x260
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? do_mpage_readpage+0x49b/0x49f
[] ? pagevec_lru_move_fn+0xab/0xc1
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? mpage_readpages+0xd0/0x12a
[] ? gfs2_unstuff_dinode+0x383/0x383
[] ? bit_waitqueue+0x14/0x63
[] ? gfs2_readpages+0x67/0xa8
[] ? sd_prep_fn+0x2c1/0x902
[] ? gfs2_readpages+0x3b/0xa8
[] ? __do_page_cache_readahead+0x11b/0x1c0
[] ? ra_submit+0x19/0x1d
[] ? generic_file_aio_read+0x2b4/0x5e0
[] ? do_sync_read+0xab/0xe3
[] ? vfs_read+0xa3/0x10f
[] ? sys_read+0x45/0x6e
[] ? system_call_fastpath+0x16/0x1b
Code: 31 00 45 31 f6 fe 85 88 00 00 00 48 89 df e8 a2 1a fc ff eb 03 45 31 f6 5b 4c 89 f0 5d 41 5c 41 5d 41 5e c3 8b 47 60 85 c0 74 05 ff 4f 60 c3 48 c7 c2 96 7b 4d 81 be a4 04 00 00 31 c0 48 c7
RIP [] __brelse+0x7/0x26
RSP
CR2: ffffffff813b8f0f
---[ end trace 54fad1a4877f173d ]---
As I suspected, log rotation appeared to trigger the problem and handed us the above traceback. Running fsck.gfs2 resulted in:
# fsck.gfs2 -y /dev/drbd1
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Error: resource group 7339665 (0x6ffe91): free space (64473) does not match bitmap (64658)
The rgrp was fixed.
Error: resource group 7405179 (0x70fe7b): free space (64249) does not match bitmap (64299)
(50 blocks were reclaimed)
The rgrp was fixed.
Error: resource group 7470693 (0x71fe65): free space (65456) does not match bitmap (65464)
(8 blocks were reclaimed)
The rgrp was fixed.
...snip...
Ondisk and fsck bitmaps differ at block 133061348 (0x7ee5ae4)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 133061349 (0x7ee5ae5)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
RG #133061031 (0x7ee59a7) free count inconsistent: is 65232 should be 65508
Inode count inconsistent: is 37 should be 0
Resource group counts updated
Inode count inconsistent: is 1267 should be 1266
Resource group counts updated
Pass5 complete
The statfs file is wrong:
Current statfs values:
blocks: 188730628 (0xb3fcd04)
free: 176443034 (0xa844e9a)
dinodes: 644117 (0x9d415)
Calculated statfs values:
blocks: 188730628 (0xb3fcd04)
free: 177426468 (0xa935024)
dinodes: 493059 (0x78603)
The statfs file was fixed.
Writing changes to disk
gfs2_fsck complete
Filesystem was remounted after a 7 minute fsck and we’ll see if it happens again tomorrow.