Posts Tagged ‘linux’

From Dedicated Server to KVM Virtualized Instance

Sunday, June 23rd, 2013

Recently we’ve had a few clients that have wanted to downsize from a dedicated server to a Private Virtual Server running a virtualized instance. In the past, that has been a very time consuming process. While it would probably be better to upgrade these clients to 64bit and take advantage of a fresh OS load, sometimes there are issues that preclude this. One potential problem is not having a recent enough kernel to work with KVM if you are using a kernel that was built specifically for your setup.

What follows is a general recipe for migrating these machines. Most of the tasks are ‘hurry up and wait’ and the actual work involved with the move is typically waiting for data to be moved. This guide starts from bare metal to migrating the first machine.

On the KVM box

  • install Linux
  • install kvm, virt-tools, and any base utilities (our minimal install includes libvirt-bin qemu-kvm virtinst rsync bridge-utils rsync cgroup-bin nvi)
  • Download the .iso for your initial build
  • Determine disk size for each instance
  • Install base image to COW file or LVM partition. From this, you’ll clone it to your new instances. Make sure the COW file is created with the same size as the resulting image, or, you’ll need to resize via LVM and your underlying filesystem.
  • pvcreate /dev/sda2 (or /dev/md1 if you use software rather than hardware raid)
  • vgcreate -s 16M vg0 /dev/sda2
  • lvcreate -L 80G -n c1 vg0
  • virt-clone –original base_image –name newvirtualmachine –file=/dev/vg0/c1

On the machine being moved

  • Install grub2 (if not already running grub2)
  • Edit /etc/default/grub and disable UUID
  • upgrade-grub
  • Install new kernel
  • Paths changing? /etc/fstab should be modified or needs to be done via vnc. If done via VNC, note that the machine may come up in singleuser mode as it will fail the fsck on devices that may not be present.
  • rsync (logs needed?)

Ready for the switch

On dedicated server

  • secondary network helpful, ipv6 on primary interface works
  • ifconfig primary interface to temporary IP, add default route
  • restart firewall (if pinned to primary ethernet)
  • log out, log back in using temporary IP
  • remove ipv6

On KVM Machine

  • virsh start newinstancename
  • connect via vnc
  • clear arp on your routers
  • dpkg-reconfigure grub-pc (sometimes, grub is not recognized on QEMU hard drive)
  • verify swap (double check /etc/fstab)

after grub is reinstalled, reboot just to ensure machine comes up with no issues

Some Tools

KVM Shell Scripts used when we migrated a number of machines.

Kernel Config Notes for KVM

If you are building your own kernels, here are some notes.

Make sure the following are installed in your KVM kernel

  • High Res Timers (required for most of the virtualization options)
  • CPU Task/Time accounting (if desired)
  • Bridge
  • VIRTIO_NET
  • CONFIG_VIRTIO_BLK
  • SCSI_VIRTIO
  • VIRTIO_CONSOLE
  • CONFIG_VIRTIO
  • VIRTIO_PCI
  • VIRTIO_BALLOON
  • VIRTIO_MMIO
  • VIRTIO_MMIO_CMDLINE_DEVICES

Make sure the following are installed in your guest kernel

  • sym53c8xx
  • virtio-pci
    PIIX_IDE

Remember that your guest kernel is running on the underlying hardware of your KVM machine. The guest kernel should have the CPU type set based on the KVM’s CPU type to take advantage of any hardware optimizations the CPU may have.

KVM guest extremely slow, Bug in Host Linux 3.2.2 kernel

Friday, March 22nd, 2013

Client upgraded a KVM instance today, rebooted it and the machine is extremely slow.

The instance is a Debian system and running 3.1.0-1-amd64 which appears to have a bug with time. This causes the machine to respond to packets very sporadically which doesn’t allow anything to be done without a lot of delay. To make matters worse, he’s using a filesystem that is not supported on the host so we can’t just mount the LVM partition and put an older kernel on the machine.

Transferring the 22mb kernel stops at 55%-66%, using rsync –partial results in timeouts and never gets the file transferred. So, we’re stuck with trying to move files around.

Enter the split command

split -b 1m linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

which results in a bunch of files named xaa through xaw. Now we can transfer these 1mb at a time which takes quite a bit of time, but, we get them moved over.

cat xa* > linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
md5sum linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

After verifying the checksum is correct:

dpkg -i linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
reboot

However, this didn’t seem to fix the issue. Even creating a fresh installation doesn’t allow the network to work properly, but, I was able to mount the partition in another VM that was ext3 so I could copy over the ext4 filesystem and be able to mount it. For now, I need to probably pull the other VMs off that machine and get down to the root of the issue as I suspect rebooting either will result in the same problem.

Networking on the bare metal works fine. Networking on each of the still running VMs is working, but, on the VM I restarted and the one I just created, networking is not working properly, and, both are using the same scripts that had been used before.

As it turns out, the kernel issue is related to the host. A new kernel was compiled, instances moved off and the host was rebooted into the new kernel. Everything appears to be working fine and the machine came right up on reboot. I’m not 100% happy with the kernel config, but, things are working. Amazing that the bug hadn’t been hit in 480 days that the host was up, but, now that it was identified and fixed, I was also able to apply a few tweaks which should speed things up a bit with some of the enhanced virtio drivers.

Make sure your KVM host machine has the loop device and every filesystem you expect a client might mount. While we did have backups that were seven days old, there was still some data worth retrieving.

Changing Linux Mint to boot off an mdadm raidset

Sunday, October 14th, 2012

I installed Linux Mint on a machine, but, wanted to use Raid 1 for the drive. However, even through the custom installation with both drives in place, I saw no way to configure Raid on installation. Since we do this sort of thing quite frequently, I figured I would write a quick guide detailing the proces.

apt-get install mdadm
cfdisk /dev/sdb

When you install mdadm, it’ll ask you if you want to boot if the primary boot partition is degraded, i.e. one of the primary drives has failed. You will want to answer Yes to this now, but, can change this later. Since we are creating a Raid-1 partition that is already degraded, it’ll prevent your system from booting. Create your partitions

mdadm --create --run --metadata=0.90 --force --level=1 --raid-devices=1 /dev/md127 /dev/sdb1

We create the raid partition using the old metadata 0.9.0 just in case you ever use a kernel that doesn’t have an initrd.

mkfs -t ext4 /dev/md127
mount /dev/md0 /mnt
rsync -aplx / /mnt/
rsync -aplx /dev/ /mnt/dev/
vi /etc/default/grub

Uncomment the line that says: GRUB_DISABLE_LINUX-UUID=true so that it is enabled.

dpkg-reconfigure grub-pc

Make sure grub is written to the second drive.

vi /mnt/etc/fstab

change reference from /dev/sda1 to /dev/md127

Halt the machine, remove the primary drive, make sure grub boots into the raid volume properly. sdb will now become sda. If on boot, grub complains that it cannot find the root disk, or, goes into rescue mode:

insmod normal
normal

Then, it should boot. You might need to hit e to edit the command line to set the linux kernel option for root to root=/dev/md127

After the system comes up, run

dpkg-reconfigure grub-pc

to get everything reconfigured properly. It’ll say that it detected a drive that wasn’t in the boot sequence and prompt to rewrite grub. Why mint changes the device to md127, I don’t know.

After you’ve done that, halt, reconnect the old ‘sda’ as ‘sdb’ and bring the system up.

At this point, you are booting off the raid set and we just need to make a few changes to the raid configuration, then, add the other drive to the raid set.

cfdisk /dev/sdb

Change the partition type from 83 to FD. You might need to reboot if your controller doesn’t properly handle the ioctl change and/or tells you that /dev/sdb1 is too small to join the array.

mdadm --grow /dev/md127 --raid-disks=2 --force
mdadm --add /dev/md127 /dev/sdb1
echo 40000 > /proc/sys/dev/raid/speed_limit_min
echo 100000 > /proc/sys/dev/raid/speed_limit_max
watch cat /proc/mdstat
/usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf
dpkg-reconfigure mdadm

The last command will show you the progress as the partition is being mirrored. Once this finishes, you should be set.

Reboot your machine and it should come up running Raid 1 on the boot drive.

Finding my XFS Bug

Thursday, October 6th, 2011

Recently one of our servers had some filesystem corruption – corruption that has occurred more than once over time. As we use hardlinks a lot with link-dest and rsync, I’m reasonably sure the issue occurs due to the massive number of hardlinks and deletions that take place on that system.

I’ve written a small script to repeatedly test things and started it running a few minutes ago. My guess is that the problem should show up in a few days.

#!/bin/bash

RSYNC=/usr/bin/rsync
REVISIONS=10

function rsync_kernel () {
  DATE=`date +%Y%m%d%H%M%S`

  BDATES=""
  loop=0
  for f in `ls -d1 /tmp/2011*`
  do
    BDATES[$loop]=$f
    loop=$(($loop+1))
  done

  CT=${#BDATES[*]}

  if (( $CT > 0 ))
  then
    RECENT=${BDATES[$(($CT-1))]}
    LINKDEST=" --link-dest=$RECENT"
  else
    RECENT="/tmp/linux-3.0.3"
    LINKDEST=" --link-dest=/tmp/linux-3.0.3"
  fi

  $RSYNC -aplxo $LINKDEST $RECENT/ $DATE/

  if (( ${#BDATES[*]} >= $REVISIONS ))
  then
    DELFIRST=$(( ${#BDATES[*]} - $REVISIONS ))
    loop=0
    for d in ${BDATES[*]}
      do
        if (( $loop < = $DELFIRST ))
        then
          `rm -rf $d`
        fi
        loop=$(($loop+1))
      done
  fi
}

while [ 1==1 ]
do
  rsync_kernel
  echo .
  sleep 1
done

Adaptec 31205 under Debian

Saturday, September 25th, 2010

We have a Storage Server with 11 2tb drives in a Raid5. During a recent visit, we heard the alarm, but, no red light on any drive was visible nor was the light on the front of the chassis lit. Knowing it was a problem waiting to happen, but, without being able to see which drive had caused the array to fail, we scheduled a maintenance window that happened to coincide with a kernel upgrade.

In the meantime, we attempted to install the RPM and java management system to no avail. So, we weren’t able to read the controller status to find out what the problem was.

When we rebooted the machine, the array status was degraded and it prompted us to hit enter to accept the configuration or control-A to enter the admin. We entered the admin, Manage array, all drives are present and working. Immediately the array status changes to rebuilding with no indication which drive had failed and was being readded.

Exiting the admin, saving the config, the client said, pull the machine offline until it is fixed. This started what seemed like an endless process. We figured we would let it rebuild while it was online, but, disable it from the cluster. We installed a new kernel, 2.6.36-rc5, rebooted and this is where the trouble started. On boot, the new kernel got an I/O error, the channel hung, it forced a reset and then sat there for about 45 seconds. After it continued, it paniced as it was unable to read /dev/sda1.

Rebooting and entering the admin, we’re faced with an array that is marked offline. After identifying each of the drives through disk utils to make sure that they are recognized, we forced the array back online and rebooted into the old kernel. As it turns out, something in our 2.6.36-rc5 disables the array and sets it offline. It takes 18 hours to rebuild the array and return it to optimal status.

After the machine comes up, we knew we had a problem on one of the directories on the system and this seemed like an opportune time to run xfs_repair. About 40 minutes into it, we run into an I/O error with a huge block number and bam, the array is offline again.

In Disk Util in the ROM we start the test on the first drive. It takes 5.5 hours to run through the first disk which puts us at an estimated 60+ hours to check all 11 drives in the array. smartctl doesn’t allow us to independently check the drives, so, we fire up a second machine and mount each of the drives looking for any possible telltale signs in the S.M.A.R.T. data stored on the drives. Two drives show some abnormal numbers and we have an estimated 11 hours to check those disks. 5.5 hours later, the first disk is clean, less than 30 minutes later, we have our culprit. Relocating a number of bad sectors results in the controller hanging again, yet, no red fault light anywhere to be seen, no indication in the Adaptec manager that this drive is bad.

Replacing the drive and going back into the admin shows us a greyed out drive which immediately starts reconstructing. We reboot the system into the older kernel and start xfs_repair again. After two hours, it has run into a number of errors, but no I/O Errors.

It is obvious we’ve had some corruption for quite some time. We had a directory we couldn’t delete because it claimed it had files, however, no files were in the directory. We had 2 directories with files that we couldn’t do anything with and couldn’t even mv them to an area outside our working directories. We figured it was an xfs bug that we had hit due to the 18 terabyte size of the partition, but guessed that an xfs_repair would fix this. It was a minor annoyance to the client until we could get to a maintenance interval so we waited. In reality, this should have been a sign that we had some issues and we should have pushed the client harder to allow us to diagnose this much earlier. There is some data corruption, but, this is the second in a pair of backup servers for their cluster. Resyncing the data to a known good source will fix this without too much difficulty.

After four hours, xfs_repair is reporting issues like:


bad directory block magic # 0 in block 0 for directory inode 21491241467
corrupt block 0 in directory inode 21491241467
        will junk block
no . entry for directory 21491241467
no .. entry for directory 21491241467
problem with directory contents in inode 21491241467
cleared inode 21491241467
        - agno = 6
        - agno = 7
        - agno = 8
bad directory block magic # 0 in block 1947 for directory inode 34377945042
corrupt block 1947 in directory inode 34377945042
        will junk block
bad directory block magic # 0 in block 1129 for directory inode 34973370147
corrupt block 1129 in directory inode 34973370147
        will junk block
bad directory block magic # 0 in block 3175 for directory inode 34973370147
corrupt block 3175 in directory inode 34973370147
        will junk block

It appears that we have quite a bit of data corruption due to a bad drive which is precisely why we use Raid.

The array failed, why didn’t the Adaptec on-board manager know which drive had failed? Had we gotten the Java application to run, I’m still not convinced it would have told us which drive was throwing the array into degraded status. Obviously the card knew something was wrong as the alarm was on. Each drive has a fault light and an activity light, but, all of the drives allowed the array to be rebuilt and claimed the status was Optimal. During initialization, the Adaptec does light the fault and activity lights for each drive so it seems reasonable that when the drive encountered errors, it could have lit the fault light so we knew which drive to replace. When running xfs_repair and receiving the I/O error where it couldn’t relocate the block, why didn’t the Adaptec controller immediately fail the drive?

All in all, I’m not too happy with Adaptec right now. A 2tb hard drive failed which cost us roughly 60 hours to diagnose and put back into service. The failing drive should have been tagged and removed from the raid set immediately and marked. As it is right now, even though it was running in degraded mode, we shouldn’t have seen any corruption, however, xfs_repair is finding a considerable number of errors.

The drives report roughly 5600 hours online which corresponds to the eight months we’ve had the machine online and based on the number of files xfs_repair is finding are bad, I believe that drive had been failing for quite some time and Adaptec has failed us. While we have a considerable number of Adaptec controllers, we’ve never seen a failure like this.

Entries (RSS) and Comments (RSS).
Cluster host: li