Finding my XFS Bug

Recently one of our servers had some filesystem corruption – corruption that has occurred more than once over time. As we use hardlinks a lot with link-dest and rsync, I’m reasonably sure the issue occurs due to the massive number of hardlinks and deletions that take place on that system.

I’ve written a small script to repeatedly test things and started it running a few minutes ago. My guess is that the problem should show up in a few days.

#!/bin/bash

RSYNC=/usr/bin/rsync
REVISIONS=10

function rsync_kernel () {
  DATE=`date +%Y%m%d%H%M%S`

  BDATES=""
  loop=0
  for f in `ls -d1 /tmp/2011*`
  do
    BDATES[$loop]=$f
    loop=$(($loop+1))
  done

  CT=${#BDATES[*]}

  if (( $CT > 0 ))
  then
    RECENT=${BDATES[$(($CT-1))]}
    LINKDEST=" --link-dest=$RECENT"
  else
    RECENT="/tmp/linux-3.0.3"
    LINKDEST=" --link-dest=/tmp/linux-3.0.3"
  fi

  $RSYNC -aplxo $LINKDEST $RECENT/ $DATE/

  if (( ${#BDATES[*]} >= $REVISIONS ))
  then
    DELFIRST=$(( ${#BDATES[*]} - $REVISIONS ))
    loop=0
    for d in ${BDATES[*]}
      do
        if (( $loop < = $DELFIRST ))
        then
          `rm -rf $d`
        fi
        loop=$(($loop+1))
      done
  fi
}

while [ 1==1 ]
do
  rsync_kernel
  echo .
  sleep 1
done

Tags: , ,

3 Responses to “Finding my XFS Bug”

  1. cd34 Says:

    After 12 hours, no corruption yet.

    I’m curious if the problem is in the inode64 code and won’t surface in the 32bit inodes on this 1gb partition.

  2. cd34 Says:

    23.5 hours later:

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:Oops: 0000 [#1] SMP

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:Process rsync (pid: 15871, ti=e9d34000 task=f4a691a0 task.ti=e9d34000)

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:Stack:

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:Call Trace:

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:Code: dd 60 00 00 89 d8 e8 87 5d 00 00 8b 54 24 34 c7 02 00 00 00 00 bd 05 00 00 00 89 e8 83 c4 10 5b 5e 5f 5d c3 57 56 53 89 d3 85 c0 <8b> b2 8c 00 00 00 75 14 85 f6 74 72 81 7e 1c 3c 12 00 00 75 69

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:EIP: [] xfs_trans_brelse+0x7/0x9a SS:ESP 0068:e9d35ce4

    Message from syslogd@test at Oct 7 01:31:56 …
    kernel:CR2: 000000001001008c
    Write failed: Broken pipe
    tsavo:~ mcd$

  3. cd34 Says:

    Again, ran into this issue. 3 hour xfs_repair, lost about 90 files.

    I need to compile a kernel with debug, and run the console tty to another machine since it does appear to hang the machine, and, the log files never get committed to, even though they were on a different filesystem.

Leave a Reply

You must be logged in to post a comment.

Entries (RSS) and Comments (RSS).
Cluster host: li