KVM guest extremely slow, Bug in Host Linux 3.2.2 kernel
Friday, March 22nd, 2013Client upgraded a KVM instance today, rebooted it and the machine is extremely slow.
The instance is a Debian system and running 3.1.0-1-amd64 which appears to have a bug with time. This causes the machine to respond to packets very sporadically which doesn’t allow anything to be done without a lot of delay. To make matters worse, he’s using a filesystem that is not supported on the host so we can’t just mount the LVM partition and put an older kernel on the machine.
Transferring the 22mb kernel stops at 55%-66%, using rsync –partial results in timeouts and never gets the file transferred. So, we’re stuck with trying to move files around.
Enter the split command
split -b 1m linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
which results in a bunch of files named xaa through xaw. Now we can transfer these 1mb at a time which takes quite a bit of time, but, we get them moved over.
cat xa* > linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb md5sum linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
After verifying the checksum is correct:
dpkg -i linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb reboot
However, this didn’t seem to fix the issue. Even creating a fresh installation doesn’t allow the network to work properly, but, I was able to mount the partition in another VM that was ext3 so I could copy over the ext4 filesystem and be able to mount it. For now, I need to probably pull the other VMs off that machine and get down to the root of the issue as I suspect rebooting either will result in the same problem.
Networking on the bare metal works fine. Networking on each of the still running VMs is working, but, on the VM I restarted and the one I just created, networking is not working properly, and, both are using the same scripts that had been used before.
As it turns out, the kernel issue is related to the host. A new kernel was compiled, instances moved off and the host was rebooted into the new kernel. Everything appears to be working fine and the machine came right up on reboot. I’m not 100% happy with the kernel config, but, things are working. Amazing that the bug hadn’t been hit in 480 days that the host was up, but, now that it was identified and fixed, I was also able to apply a few tweaks which should speed things up a bit with some of the enhanced virtio drivers.
Make sure your KVM host machine has the loop device and every filesystem you expect a client might mount. While we did have backups that were seven days old, there was still some data worth retrieving.