KVM guest extremely slow, Bug in Host Linux 3.2.2 kernel

March 22nd, 2013

Client upgraded a KVM instance today, rebooted it and the machine is extremely slow.

The instance is a Debian system and running 3.1.0-1-amd64 which appears to have a bug with time. This causes the machine to respond to packets very sporadically which doesn’t allow anything to be done without a lot of delay. To make matters worse, he’s using a filesystem that is not supported on the host so we can’t just mount the LVM partition and put an older kernel on the machine.

Transferring the 22mb kernel stops at 55%-66%, using rsync –partial results in timeouts and never gets the file transferred. So, we’re stuck with trying to move files around.

Enter the split command

split -b 1m linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

which results in a bunch of files named xaa through xaw. Now we can transfer these 1mb at a time which takes quite a bit of time, but, we get them moved over.

cat xa* > linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb
md5sum linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

After verifying the checksum is correct:

dpkg -i linux-image-3.2.0-2-amd64_3.2.17-1_amd64.deb

However, this didn’t seem to fix the issue. Even creating a fresh installation doesn’t allow the network to work properly, but, I was able to mount the partition in another VM that was ext3 so I could copy over the ext4 filesystem and be able to mount it. For now, I need to probably pull the other VMs off that machine and get down to the root of the issue as I suspect rebooting either will result in the same problem.

Networking on the bare metal works fine. Networking on each of the still running VMs is working, but, on the VM I restarted and the one I just created, networking is not working properly, and, both are using the same scripts that had been used before.

As it turns out, the kernel issue is related to the host. A new kernel was compiled, instances moved off and the host was rebooted into the new kernel. Everything appears to be working fine and the machine came right up on reboot. I’m not 100% happy with the kernel config, but, things are working. Amazing that the bug hadn’t been hit in 480 days that the host was up, but, now that it was identified and fixed, I was also able to apply a few tweaks which should speed things up a bit with some of the enhanced virtio drivers.

Make sure your KVM host machine has the loop device and every filesystem you expect a client might mount. While we did have backups that were seven days old, there was still some data worth retrieving.

btrfs gets very slow, metadata almost full

March 7th, 2013

One of our storage servers that has had problems in the past. Originally it seemed like XFS was having a problem with the large filesystem, so, we gambled and decided to use btrfs. After eight days running, the machine has gotten extremely slow for disk I/O to the point where backups that should take minutes, were taking hours.

Switching the disk scheduler from cfq to noop to deadline appeared to have only short-term benefits at which point the machine bogged down again.

We’re running an Adaptec 31205 with 11 Western Digital 2.0 terabyte drives in hardware Raid 5 with roughly 19 terabytes accessible on our filesystem. During the first few days of backups, we would easily hit 800mb/sec inbound, but, after a few machines had been backed up to the server, 100mb/sec was optimistic with 20-40mb/sec being more normal. We originally attributed this to rsync of thousands of smaller files rather than the large files moved on some of the earlier machines. Once we started overlapping machines to get their second generational backup, the problem was much more evident.

The Filesystem:

# df -h /colobk1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8        19T  8.6T  9.6T  48% /colobk1

# btrfs fi show
Label: none  uuid: 3cd405c7-5d7d-42bd-a630-86ec3ca452d7
	Total devices 1 FS bytes used 8.44TB
	devid    1 size 18.14TB used 8.55TB path /dev/sda8

Btrfs Btrfs v0.19

# btrfs filesystem df /colobk1
Data: total=8.34TB, used=8.34TB
System, DUP: total=8.00MB, used=940.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=106.25GB, used=104.91GB
Metadata: total=8.00MB, used=0.00

The machine

# uname -a
Linux st1 3.8.0 #1 SMP Tue Feb 19 16:09:18 EST 2013 x86_64 GNU/Linux

# btrfs --version
Btrfs Btrfs v0.19

As it stands, we appear to be running out of Metadata space. Since our used metadata space is more than 75% of our total metadata space, updates are taking forever. The initial filesystem was not created with any special inode or leaf parameters, so, it is using the defaults.

The btrfs wiki points to this particular tuning option which seems like it might do the trick. Since you can run the balance while the filesystem is in use and check its status, we should be able to see whether it is making a difference.

I don’t believe it is going to make a difference as we have only a single device exposed to btrfs, but, here’s the command we’re told to use:

btrfs fi balance start -dusage=5 /colobk1

After a while, the box returned with:

# btrfs fi balance start -dusage=5 /colobk1
Done, had to relocate 0 out of 8712 chunks

# btrfs fi df /colobk1
Data: total=8.34TB, used=8.34TB
System, DUP: total=8.00MB, used=940.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=107.25GB, used=104.95GB
Metadata: total=8.00MB, used=0.00

So it added 1GB to the metadata size. At first glance, it is still taking considerable time to do the backup of a single machine of 9.7gb – over 2 hours and 8 minutes when the first backup took under 50 minutes. I would say that the balance didn’t do anything positive as we have a single device. I suspect that the leafsize and nodesize might be the difference here – requiring a format and backup of 8.6 terabytes of data again. It took two and a half minutes to unmount the partition after it had bogged down and after running the balance.

mkfs -t btrfs -l 32768 -n 32768 /dev/sda8

# btrfs fi df /colobk1
Data: total=8.00MB, used=0.00
System, DUP: total=8.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=192.00KB
Metadata: total=8.00MB, used=0.00

# df -h /colobk1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8        19T   72M   19T   1% /colobk1

XFS took 52 minutes to back up the machine. XFS properly tuned took 51 minutes. Btrfs tested with the leafnode set took 51 minutes. I suspect I need to run things for a week to get the extent’s close to filled again and check it again. In any case, it is a lot faster than it was with the default settings.

* Official btrfs wiki

uwsgi version 1.2.3, Debian, and Pyramid

February 23rd, 2013

First, we need to install uwsgi:

apt-get update
apt-get install uwsgi uwsgi-plugin-python

Then we need to create a .ini file that goes into /etc/uwsgi/apps-available:

uid = username
gid = username
workers = 4
buffer-size = 25000
chmod-socket = 666
single-interpreter = true
master = true
socket = /tmp/project.sock
plugin = python
pythonpath = /var/www/virtualenv/bin
paste = config:/var/www/virtualenv/project/production.ini
home = /var/www/virtualenv

Then create the symlink:

ln -s /etc/uwsgi/apps-available/project.ini /etc/uwsgi/apps-enabled/project.ini

Then restart uwsgi:

/etc/init.d/uwsgi restart

If it doesn’t work, a few things might need to be checked. First, you need to make sure your virtualenv is using Python 2.7.

If you still need to debug, you can do

uwsgi --ini /etc/uwsgi/apps-enabled/project.ini

which will give you some output at which point you should be able to determine the issue.

For Nginx, the config on your site should contain something like the following:

    location / {
        uwsgi_pass  unix:/tmp/project.sock;
        include     uwsgi_params;

and /etc/nginx/uwsgi_params should contain

uwsgi_param  QUERY_STRING       $query_string;
uwsgi_param  REQUEST_METHOD     $request_method;
uwsgi_param  CONTENT_TYPE       $content_type;
uwsgi_param  CONTENT_LENGTH     $content_length;

uwsgi_param  REQUEST_URI        $request_uri;
uwsgi_param  PATH_INFO		$document_uri;
uwsgi_param  DOCUMENT_ROOT      $document_root;
uwsgi_param  SERVER_PROTOCOL    $server_protocol;

uwsgi_param  REMOTE_ADDR        $remote_addr;
uwsgi_param  REMOTE_PORT        $remote_port;
uwsgi_param  SERVER_PORT        $server_port;
uwsgi_param  SERVER_NAME        $server_name;

uwsgi_param  SCRIPT_NAME	'';

Using temporary; Using filesort

December 2nd, 2012

Ahh the dreaded temporary table and filesort. This is one performance killer that is incredibly bad on a high traffic site and the cause is fairly easy to explain.

MySQL tries to keep a result set in memory. When the query plan optimizer checks the number of rows that might be returned, it also looks at the table structure. In this case, we have text fields, and, a lot of them. However, it only takes one for MySQL to decide to write to disk.

The fix for this is somewhat simple to explain, but, may be a little difficult to implement. In our case, we have 91000 lines of some very poorly written php code that ‘builds’ the command through string concatenation, allowing for unique prefixes and tablenames. Houdini would be proud at the misdirection in this application, but, we’ve found the query through the MySQL slow query log, and we can fix it there, then, figure out where to modify the code.

Heart of the problem

select * from tablea,tableb where tablea.a=1 and tablea.b=2 and tablea.c=3 and tablea.id=tableb.id;

Of course, the initial application had no indexes on the 35000 row table. If you’re interested in some blog posts I wrote about indexing, MySQL Query Optimization, MySQL 5.1’s Query Optimizer and Designing MySQL Indices.

What is the solution to dealing with queries that return text fields?

Creative use of Subqueries is needed.

SELECT * from tablea,tableb where tablea.id in (SELECT id from tablea where a=1 and b=2 and c=3) and tablea.id=tableb.id;

But wait, I need a limit clause in my subselect and MySQL says:

ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

Now we modify the query slightly to use a join:

select * from tablea where tablea.id join (SELECT id from tablea where a=1 and b=2 and c=3 order by c desc limit 15) subq on subq.id=tablea.id,tableb where tablea.id=tableb.id;

We’ve avoided the creation of the temporary table, we’ve avoided the filesort and we’ve saved ten seconds off this query which is loaded on every pageload.

Now to convince this person that they don’t need to regenerate the page on every pageload – only when they are adding content. But, that’s an argument for another day.

Changing Linux Mint to boot off an mdadm raidset

October 14th, 2012

I installed Linux Mint on a machine, but, wanted to use Raid 1 for the drive. However, even through the custom installation with both drives in place, I saw no way to configure Raid on installation. Since we do this sort of thing quite frequently, I figured I would write a quick guide detailing the proces.

apt-get install mdadm
cfdisk /dev/sdb

When you install mdadm, it’ll ask you if you want to boot if the primary boot partition is degraded, i.e. one of the primary drives has failed. You will want to answer Yes to this now, but, can change this later. Since we are creating a Raid-1 partition that is already degraded, it’ll prevent your system from booting. Create your partitions

mdadm --create --run --metadata=0.90 --force --level=1 --raid-devices=1 /dev/md127 /dev/sdb1

We create the raid partition using the old metadata 0.9.0 just in case you ever use a kernel that doesn’t have an initrd.

mkfs -t ext4 /dev/md127
mount /dev/md0 /mnt
rsync -aplx / /mnt/
rsync -aplx /dev/ /mnt/dev/
vi /etc/default/grub

Uncomment the line that says: GRUB_DISABLE_LINUX-UUID=true so that it is enabled.

dpkg-reconfigure grub-pc

Make sure grub is written to the second drive.

vi /mnt/etc/fstab

change reference from /dev/sda1 to /dev/md127

Halt the machine, remove the primary drive, make sure grub boots into the raid volume properly. sdb will now become sda. If on boot, grub complains that it cannot find the root disk, or, goes into rescue mode:

insmod normal

Then, it should boot. You might need to hit e to edit the command line to set the linux kernel option for root to root=/dev/md127

After the system comes up, run

dpkg-reconfigure grub-pc

to get everything reconfigured properly. It’ll say that it detected a drive that wasn’t in the boot sequence and prompt to rewrite grub. Why mint changes the device to md127, I don’t know.

After you’ve done that, halt, reconnect the old ‘sda’ as ‘sdb’ and bring the system up.

At this point, you are booting off the raid set and we just need to make a few changes to the raid configuration, then, add the other drive to the raid set.

cfdisk /dev/sdb

Change the partition type from 83 to FD. You might need to reboot if your controller doesn’t properly handle the ioctl change and/or tells you that /dev/sdb1 is too small to join the array.

mdadm --grow /dev/md127 --raid-disks=2 --force
mdadm --add /dev/md127 /dev/sdb1
echo 40000 > /proc/sys/dev/raid/speed_limit_min
echo 100000 > /proc/sys/dev/raid/speed_limit_max
watch cat /proc/mdstat
/usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf
dpkg-reconfigure mdadm

The last command will show you the progress as the partition is being mirrored. Once this finishes, you should be set.

Reboot your machine and it should come up running Raid 1 on the boot drive.

Entries (RSS) and Comments (RSS).
Cluster host: li