Posts Tagged ‘debian’

Conversion to Dual Stack IPv6

Monday, April 25th, 2011

Over the weekend we took a large step and converted 25% of our network over to DualStack IPv4/IPv6. We aren’t just saying that we’re ready for IPv6, we are actually using IPv6 internally and starting to move some public facing sites so that they can serve both IPv4 and IPv6 enabled web surfers. We primarily run Debian, but, have a few Windows, FreeBSD and other OSs. Our current efforts are switching our Debian machines over since that comprises 95% of our network. We run 2.6.38 with a few config changes rather than the default Debian kernel, but, much of the testing was done with a stock Debian kernel.

We ran into a few minor issues, but, currently backups for the first group of machines that we converted are running over IPv6. Additionally, email that is handled from our MX servers is handed off to the machines over IPv6. Currently, only the actual machines have IPv6 addresses so, we don’t have many public facing sites running, but, of the few that are announcing both IPv4 and IPv6, they amount to almost 4% of our traffic. Clients that access the machines directly for SSH, FTP, POP3, IMAP, SMTP will use IPv6 if they are able. Most clients don’t use the actual devicename for FTP/POP3/IMAP/SMTP, so, most won’t use IPv6 until their public facing site is IPv6 enabled.

Our network is relatively flat which makes our deployment a little easier, but, the basic structure is:

edge router -> chassis switch
                        chassis switch
                        chassis switch
                        ...
edge router -> chassis switch

We use VRRP to announce a x:x::1/64, each machine gets a /128 from that /64, then, using static routes, we route a /64 to each machine. Due to an issue with OSPF3 on our current network, we had to fall back to static routes. Each machine is allocated a /128 from our main network, and a /64 to the client. Virtual webhost machines, we might allocate /80s to each virtual client out of the /64, but we haven’t made a firm decision on that. We’ve actually cheated and run IPv6 on their own connections to the chassis switch to make traffic and flow monitoring a little easier.

Our basic network now consists of every machine in a single /64 which cuts down on arp requests and VLAN issues, but, requires a slightly different configuration than our existing IPv4 network which used VLANs.

When we configure a machine, we need to add the admin IP to it and push the config changes using our management software. We’ve not automated putting the initial IP address on each machine as it requires route entries into our edge routers. Once OSPF3 is fixed later this week, I expect the process to be more automated.

The first step is to take a /128 out of the /64 for the ‘device’ network and assign it to the machine:


ifconfig eth0 add xxxx:xxxx::c:1/64
route --inet6 add default gateway xxxx:xxxx::1

We opted to use ::ad:xxxx for admin machines, ::c:xxxx for client servers. Since you can use hexadecimal, you could actually assign cabinet numbers, or switch numbers to make identifying the machine location a little quicker. Perhaps some identifier for the building, cabinet/rack, switch it is connected to, etc. could be used. For now, we’re using :c: and :ad: to signify client and admin. Our primary storage server is :ad:bac1, our development machine is :ad:de, etc. Our admin network is unlikely to exceed 65536 machines, but, there is a lot of flexibility if you want to get creative.

Once we’ve added the initial IP, our management software inserts the following into /etc/network/interfaces:

iface eth0 inet6 static
        address xxxx:xxxx::c:1
        netmask 64
        endpoint any
        gateway xxxx:xxxx::1

At this point, the AAAA record for the device is published, and we can access the machine over ssh using IPv6.

For Postfix, we needed to add the following to /etc/postfix/main.cf:

inet_protocols = ipv4, ipv6

Additionally, we needed to modify /etc/postfix/mynetworks.txt to add:

[xxxx:xxxx::/64]

which allows machines on our local network to communicate with the server and ‘relay’. It is possible that the line to be modified might not refer to a config file and is specified in /etc/postfix/main.cf:

mynetworks =

Dovecot required changes to /etc/dovecot/dovecot.conf:

listen=[::], *

Pure-FTPD had problems with IPv6 reverse lookups:


echo yes > /etc/pure-ftpd/conf/DontResolve;/etc/init.d/pure-ftpd restart

And of course, /etc/resolv.conf:


nameserver xx.xx.xx.xx
nameserver xx.xx.xx.xx
nameserver xxxx:xxxx::ad:1
nameserver xxxx:xxxx::ad:2

We’ve had minor customer impact and lost one email during our conversions due to missing the mynetworks parameter in postfix and bouncing one message. Debian’s version of Dovecot doesn’t listen to both interfaces with listen=[::] as one might imagine by reading the documentation, but, that was tested on a test machine and didn’t affect any clients.

Many of the config files require [] around the IPv6 addresses such as Apache and Varnish. When you need to specify Listen ports on those machines since they have multiple services listening on port 80 on the same machine on separate IPs, it is something to remember.

Most of the server software we’ve run across hasn’t had any issues. However, client software that uses char(15) to store IP addresses probably needs to be fixed.

So far, I think we’ll be ready for World IPv6 Day with over 98% of our machines running DualStack and we’re shooting for 20% of our client’s public facing sites to have IPv6 support by June 8, 2011.

We have two machines running Tux that are stuck on 2.4.37 and regrettably, Tux appears to segfault when it receives IPv6 traffic. It is a shame since Varnish and Nginx are still outclassed by a webserver written twelve years ago.

So far, the conversion process has been quite straightforward with the minor issues you would expect when introducing IPv6 to applications and server stacks that weren’t written to handle it. I suspect we’ll have 98% of our machines dualstack by May 7 which will give us a month to get 20% of our client base educated and convinced to turn on IPv6.

Adaptec 31205 under Debian

Saturday, September 25th, 2010

We have a Storage Server with 11 2tb drives in a Raid5. During a recent visit, we heard the alarm, but, no red light on any drive was visible nor was the light on the front of the chassis lit. Knowing it was a problem waiting to happen, but, without being able to see which drive had caused the array to fail, we scheduled a maintenance window that happened to coincide with a kernel upgrade.

In the meantime, we attempted to install the RPM and java management system to no avail. So, we weren’t able to read the controller status to find out what the problem was.

When we rebooted the machine, the array status was degraded and it prompted us to hit enter to accept the configuration or control-A to enter the admin. We entered the admin, Manage array, all drives are present and working. Immediately the array status changes to rebuilding with no indication which drive had failed and was being readded.

Exiting the admin, saving the config, the client said, pull the machine offline until it is fixed. This started what seemed like an endless process. We figured we would let it rebuild while it was online, but, disable it from the cluster. We installed a new kernel, 2.6.36-rc5, rebooted and this is where the trouble started. On boot, the new kernel got an I/O error, the channel hung, it forced a reset and then sat there for about 45 seconds. After it continued, it paniced as it was unable to read /dev/sda1.

Rebooting and entering the admin, we’re faced with an array that is marked offline. After identifying each of the drives through disk utils to make sure that they are recognized, we forced the array back online and rebooted into the old kernel. As it turns out, something in our 2.6.36-rc5 disables the array and sets it offline. It takes 18 hours to rebuild the array and return it to optimal status.

After the machine comes up, we knew we had a problem on one of the directories on the system and this seemed like an opportune time to run xfs_repair. About 40 minutes into it, we run into an I/O error with a huge block number and bam, the array is offline again.

In Disk Util in the ROM we start the test on the first drive. It takes 5.5 hours to run through the first disk which puts us at an estimated 60+ hours to check all 11 drives in the array. smartctl doesn’t allow us to independently check the drives, so, we fire up a second machine and mount each of the drives looking for any possible telltale signs in the S.M.A.R.T. data stored on the drives. Two drives show some abnormal numbers and we have an estimated 11 hours to check those disks. 5.5 hours later, the first disk is clean, less than 30 minutes later, we have our culprit. Relocating a number of bad sectors results in the controller hanging again, yet, no red fault light anywhere to be seen, no indication in the Adaptec manager that this drive is bad.

Replacing the drive and going back into the admin shows us a greyed out drive which immediately starts reconstructing. We reboot the system into the older kernel and start xfs_repair again. After two hours, it has run into a number of errors, but no I/O Errors.

It is obvious we’ve had some corruption for quite some time. We had a directory we couldn’t delete because it claimed it had files, however, no files were in the directory. We had 2 directories with files that we couldn’t do anything with and couldn’t even mv them to an area outside our working directories. We figured it was an xfs bug that we had hit due to the 18 terabyte size of the partition, but guessed that an xfs_repair would fix this. It was a minor annoyance to the client until we could get to a maintenance interval so we waited. In reality, this should have been a sign that we had some issues and we should have pushed the client harder to allow us to diagnose this much earlier. There is some data corruption, but, this is the second in a pair of backup servers for their cluster. Resyncing the data to a known good source will fix this without too much difficulty.

After four hours, xfs_repair is reporting issues like:


bad directory block magic # 0 in block 0 for directory inode 21491241467
corrupt block 0 in directory inode 21491241467
        will junk block
no . entry for directory 21491241467
no .. entry for directory 21491241467
problem with directory contents in inode 21491241467
cleared inode 21491241467
        - agno = 6
        - agno = 7
        - agno = 8
bad directory block magic # 0 in block 1947 for directory inode 34377945042
corrupt block 1947 in directory inode 34377945042
        will junk block
bad directory block magic # 0 in block 1129 for directory inode 34973370147
corrupt block 1129 in directory inode 34973370147
        will junk block
bad directory block magic # 0 in block 3175 for directory inode 34973370147
corrupt block 3175 in directory inode 34973370147
        will junk block

It appears that we have quite a bit of data corruption due to a bad drive which is precisely why we use Raid.

The array failed, why didn’t the Adaptec on-board manager know which drive had failed? Had we gotten the Java application to run, I’m still not convinced it would have told us which drive was throwing the array into degraded status. Obviously the card knew something was wrong as the alarm was on. Each drive has a fault light and an activity light, but, all of the drives allowed the array to be rebuilt and claimed the status was Optimal. During initialization, the Adaptec does light the fault and activity lights for each drive so it seems reasonable that when the drive encountered errors, it could have lit the fault light so we knew which drive to replace. When running xfs_repair and receiving the I/O error where it couldn’t relocate the block, why didn’t the Adaptec controller immediately fail the drive?

All in all, I’m not too happy with Adaptec right now. A 2tb hard drive failed which cost us roughly 60 hours to diagnose and put back into service. The failing drive should have been tagged and removed from the raid set immediately and marked. As it is right now, even though it was running in degraded mode, we shouldn’t have seen any corruption, however, xfs_repair is finding a considerable number of errors.

The drives report roughly 5600 hours online which corresponds to the eight months we’ve had the machine online and based on the number of files xfs_repair is finding are bad, I believe that drive had been failing for quite some time and Adaptec has failed us. While we have a considerable number of Adaptec controllers, we’ve never seen a failure like this.

Entries (RSS) and Comments (RSS).
Cluster host: li