Posts Tagged ‘kernel’

KVM guest extremely slow, Bug in Host Linux 3.2.2 kernel

Friday, March 22nd, 2013

Client upgraded a KVM instance today, rebooted it and the machine is extremely slow.

The instance is a Debian system and running 3.1.0-1-amd64 which appears to have a bug with time. This causes the machine to respond to packets very sporadically which doesn’t allow anything to be done without a lot of delay. To make matters worse, he’s using a filesystem that is not supported on the host so we can’t just mount the LVM partition and put an older kernel on the machine.

Transferring the 22mb kernel stops at 55%-66%, using rsync –partial results in timeouts and never gets the file transferred. So, we’re stuck with trying to move files around.

Enter the split command

split -b 1m linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

which results in a bunch of files named xaa through xaw. Now we can transfer these 1mb at a time which takes quite a bit of time, but, we get them moved over.

cat xa* > linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
md5sum linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

After verifying the checksum is correct:

dpkg -i linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
reboot

However, this didn’t seem to fix the issue. Even creating a fresh installation doesn’t allow the network to work properly, but, I was able to mount the partition in another VM that was ext3 so I could copy over the ext4 filesystem and be able to mount it. For now, I need to probably pull the other VMs off that machine and get down to the root of the issue as I suspect rebooting either will result in the same problem.

Networking on the bare metal works fine. Networking on each of the still running VMs is working, but, on the VM I restarted and the one I just created, networking is not working properly, and, both are using the same scripts that had been used before.

As it turns out, the kernel issue is related to the host. A new kernel was compiled, instances moved off and the host was rebooted into the new kernel. Everything appears to be working fine and the machine came right up on reboot. I’m not 100% happy with the kernel config, but, things are working. Amazing that the bug hadn’t been hit in 480 days that the host was up, but, now that it was identified and fixed, I was also able to apply a few tweaks which should speed things up a bit with some of the enhanced virtio drivers.

Make sure your KVM host machine has the loop device and every filesystem you expect a client might mount. While we did have backups that were seven days old, there was still some data worth retrieving.

Finding my XFS Bug

Thursday, October 6th, 2011

Recently one of our servers had some filesystem corruption – corruption that has occurred more than once over time. As we use hardlinks a lot with link-dest and rsync, I’m reasonably sure the issue occurs due to the massive number of hardlinks and deletions that take place on that system.

I’ve written a small script to repeatedly test things and started it running a few minutes ago. My guess is that the problem should show up in a few days.

#!/bin/bash

RSYNC=/usr/bin/rsync
REVISIONS=10

function rsync_kernel () {
  DATE=`date +%Y%m%d%H%M%S`

  BDATES=""
  loop=0
  for f in `ls -d1 /tmp/2011*`
  do
    BDATES[$loop]=$f
    loop=$(($loop+1))
  done

  CT=${#BDATES[*]}

  if (( $CT > 0 ))
  then
    RECENT=${BDATES[$(($CT-1))]}
    LINKDEST=" --link-dest=$RECENT"
  else
    RECENT="/tmp/linux-3.0.3"
    LINKDEST=" --link-dest=/tmp/linux-3.0.3"
  fi

  $RSYNC -aplxo $LINKDEST $RECENT/ $DATE/

  if (( ${#BDATES[*]} >= $REVISIONS ))
  then
    DELFIRST=$(( ${#BDATES[*]} - $REVISIONS ))
    loop=0
    for d in ${BDATES[*]}
      do
        if (( $loop < = $DELFIRST ))
        then
          `rm -rf $d`
        fi
        loop=$(($loop+1))
      done
  fi
}

while [ 1==1 ]
do
  rsync_kernel
  echo .
  sleep 1
done

unable to mount root fs on unknown-block(0,0)

Sunday, January 31st, 2010

After building a system for the new backup servers that utilized an Adaptec 31205 controller, I always prefer to use a kernel that we’ve tuned inhouse.

Upon booting into the kernel I had built, I received:

unable to mount root fs on unknown-block(0,0)

Since the drive size on the array was very large, the Debian Installer automatically created an EFI GUID Partition table, which my kernel was not set up for.

In the kernel makemenu, File Systems, Partition Types, enable Advanced partition selection. Near the bottom is EFI GUID Partition support. Enable that, recompile your kernel and you should be set.

One reboot later and voila:

st1:/colobk1# uname -a
Linux st1 2.6.32.7 #1 SMP Fri Jan 29 21:43:32 EST 2010 x86_64 GNU/Linux
st1:/colobk1# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             462M  232M  207M  53% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                   10M   60K   10M   1% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sda8              19T  305G   18T   2% /colobk1
/dev/sda5             1.9G   55M  1.8G   3% /home
/dev/sda4             949M  4.2M  945M   1% /tmp
/dev/sda6             2.4G  204M  2.2G   9% /usr
/dev/sda7             9.4G  237M  9.1G   3% /var

Entries (RSS) and Comments (RSS).
Cluster host: li