Archive for the ‘Web Infrastructure’ Category

Debian pushes breaking changes… again.

Sunday, May 10th, 2015

My server backup script broke a while back – probably around Dec 14th, 2014 with a python update that debian pushed which broke my virtualenv. This isn’t the first time debian has broken virtualenvs and my last post was about this. In addition the backup script filled up the backup drive without triggering an exception which is odd, because the source didn’t exceed the size of the destination drive even prior to things breaking. The script just does a simple rsync so it wouldn’t have duplicated symlinks.

It finished its backup last night in about 30 minutes (usually takes 5-7 minutes) and now has 6gb free which matches my server.

I recognized my backups were failing because debian pushed another change where varnish overwrote the startup script without asking – and I had to reconstruct that file from the machine configuration.

At times, I wonder why OS upgrades push breaking changes without any mention. The conversion to systemd was also

Amazon EC2 + EBS and rsync as a quick backup/mirror

Wednesday, August 28th, 2013

A few days ago a client came to me and asked how they could back up a lot of data on a nightly basis that only had a few changes. Many solutions with their current hosting providers were discussed, Amazon’s S3 and Glacier, but, none gave him the flexibility he was after. So, the logical conclusion here was Amazon EC2 + Elastic Block store (EBS).

We created a micro instance of Debian, logged in and added rsync, created a volume large enough to hold his backup with some room for growth and with a little command line magic he was able to mirror his data to EBS. Of course, the instance didn’t need to be running all the time so he would need to start it, run the rsync, shut it down.

After some thought I decided it would be easy enough to write a quick Python script using boto to start the instance, rsync the volume and stop the instance. If he needed access to the instance he could start it manually and log in when needed. Now, his backups could be run via cron on a regular basis.

I put the code on GitHub: https://github.com/cd34/spawncamping-octo-ninja

Most of the instance startup situations are handled and so far it seems robust.

My EC2 instance… oops

Wednesday, August 28th, 2013

I created an EC2 instance a while back to test a theory and had some time this evening to take a look at it again. I went to start the instance and:

    Xen Minimal OS!
  start_info: 0xac4000(VA)
    nr_pages: 0x26700
  shared_inf: 0x7de16000(MA)
     pt_base: 0xac7000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x990000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda1 ro 4
  stack:      0x94f860-0x96f860
MM: Init
      _text: 0x0(VA)
     _etext: 0x5ffbd(VA)
   _erodata: 0x78000(VA)
     _edata: 0x80ae0(VA)
stack start: 0x94f860(VA)
       _end: 0x98fe68(VA)
  start_pfn: ad3
    max_pfn: 26700
Mapping memory range 0xc00000 - 0x26700000
setting 0x0-0x78000 readonly
skipped 0x1000
MM: Initialise page allocator for c01000(c01000)-26700000(26700000)
MM: done
Demand map pfns at 26701000-2026701000.
Heap resides at 2026702000-4026702000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x26701000.
Initialising scheduler
Thread "Idle": pointer: 0x2026702010, stack: 0x26640000
Initialising xenbus
Thread "xenstore": pointer: 0x20267027c0, stack: 0x26650000
Dummy main: start_info=0x96f960
Thread "main": pointer: 0x2026702f70, stack: 0x26660000
"main" "root=/dev/sda1" "ro" "4" 
vbd 2049 is hd0
******************* BLKFRONT for device/vbd/2049 **********


backend at /local/domain/0/backend/vbd/3617/2049
Failed to read /local/domain/0/backend/vbd/3617/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/3617/2049/feature-flush-cache.
16777216 sectors of 512 bytes
**************************

    [H
    [J  Booting '3.9-1-amd64'



root (hd0)

 Filesystem type is ext2fs, using whole disk

kernel /boot/vmlinuz-3.9-1-amd64 root=/dev/xvda1 ro 

initrd /boot/initrd.img-3.9-1-amd64



ERROR Invalid kernel: xc_dom_probe_bzimage_kernel: unknown compression format

xc_dom_bzimageloader.c:394: panic: xc_dom_probe_bzimage_kernel: unknown compression format
ERROR Invalid kernel: xc_dom_find_loader: no loader found

xc_dom_core.c:536: panic: xc_dom_find_loader: no loader found
xc_dom_parse_image returned -1



Error 9: Unknown boot failure



Press any key to continue...

This happens when you use a kernel compiled with .xz and the Xen Instance you’re using has the old Xen hypervisor which cannot support .xz.

What you would normally do to fix this is take another instance in the same availability zone, detach the EBS volume from the broken instance, attach the EBS volume to the other instance, make the changes to grub or put a new kernel on, detach the volume from the new instance, attach the volume to the old instance, and restart.

However, if you’re not using your own AMI, you might get the following message:

'vol-xxxxxxxx' with Marketplace codes may not be attached as a secondary device.

in which case I believe you’re stuck.

Raid 1… or maybe not

Friday, July 19th, 2013

We received a client machine a while back. At some point they needed more disk space and sent us a pair of drives with instructions for mirroring data from the existing machine to the new drives.

When I looked at the drive I noticed something odd:

md4 : active raid1 sdb8[1](S) sda8[0]
      1942145856 blocks super 1.2 [1/1] [U]

While this was a Raid 1 set, notice that the indicator shows that sdb8 is a Spare (S) and that the Raid 1 set has 1 of 1 drive and isn’t broken. This usually occurs when someone is doing an in-place migration to a larger drive and creates a Raid 1 partition with a single device and forces it online with the intention of later adding the second drive. Had the primary drive failed, they would have experienced data loss.

To fix it:

mdadm --remove /dev/md4 /dev/sdb8
mdadm --grow /dev/md4 --raid-devices=2 --force
mdadm --add /dev/md4 /dev/sdb8

and we see the resulting md status:

md4 : active raid1 sda8[0]
      1942145856 blocks super 1.2 [2/1] [U_]

After the reconstruction we now have a properly configured Raid 1 set.

From Dedicated Server to KVM Virtualized Instance

Sunday, June 23rd, 2013

Recently we’ve had a few clients that have wanted to downsize from a dedicated server to a Private Virtual Server running a virtualized instance. In the past, that has been a very time consuming process. While it would probably be better to upgrade these clients to 64bit and take advantage of a fresh OS load, sometimes there are issues that preclude this. One potential problem is not having a recent enough kernel to work with KVM if you are using a kernel that was built specifically for your setup.

What follows is a general recipe for migrating these machines. Most of the tasks are ‘hurry up and wait’ and the actual work involved with the move is typically waiting for data to be moved. This guide starts from bare metal to migrating the first machine.

On the KVM box

  • install Linux
  • install kvm, virt-tools, and any base utilities (our minimal install includes libvirt-bin qemu-kvm virtinst rsync bridge-utils rsync cgroup-bin nvi)
  • Download the .iso for your initial build
  • Determine disk size for each instance
  • Install base image to COW file or LVM partition. From this, you’ll clone it to your new instances. Make sure the COW file is created with the same size as the resulting image, or, you’ll need to resize via LVM and your underlying filesystem.
  • pvcreate /dev/sda2 (or /dev/md1 if you use software rather than hardware raid)
  • vgcreate -s 16M vg0 /dev/sda2
  • lvcreate -L 80G -n c1 vg0
  • virt-clone –original base_image –name newvirtualmachine –file=/dev/vg0/c1

On the machine being moved

  • Install grub2 (if not already running grub2)
  • Edit /etc/default/grub and disable UUID
  • upgrade-grub
  • Install new kernel
  • Paths changing? /etc/fstab should be modified or needs to be done via vnc. If done via VNC, note that the machine may come up in singleuser mode as it will fail the fsck on devices that may not be present.
  • rsync (logs needed?)

Ready for the switch

On dedicated server

  • secondary network helpful, ipv6 on primary interface works
  • ifconfig primary interface to temporary IP, add default route
  • restart firewall (if pinned to primary ethernet)
  • log out, log back in using temporary IP
  • remove ipv6

On KVM Machine

  • virsh start newinstancename
  • connect via vnc
  • clear arp on your routers
  • dpkg-reconfigure grub-pc (sometimes, grub is not recognized on QEMU hard drive)
  • verify swap (double check /etc/fstab)

after grub is reinstalled, reboot just to ensure machine comes up with no issues

Some Tools

KVM Shell Scripts used when we migrated a number of machines.

Kernel Config Notes for KVM

If you are building your own kernels, here are some notes.

Make sure the following are installed in your KVM kernel

  • High Res Timers (required for most of the virtualization options)
  • CPU Task/Time accounting (if desired)
  • Bridge
  • VIRTIO_NET
  • CONFIG_VIRTIO_BLK
  • SCSI_VIRTIO
  • VIRTIO_CONSOLE
  • CONFIG_VIRTIO
  • VIRTIO_PCI
  • VIRTIO_BALLOON
  • VIRTIO_MMIO
  • VIRTIO_MMIO_CMDLINE_DEVICES

Make sure the following are installed in your guest kernel

  • sym53c8xx
  • virtio-pci
    PIIX_IDE

Remember that your guest kernel is running on the underlying hardware of your KVM machine. The guest kernel should have the CPU type set based on the KVM’s CPU type to take advantage of any hardware optimizations the CPU may have.

Entries (RSS) and Comments (RSS).
Cluster host: li