Archive for the ‘Web Infrastructure’ Category

Gracefully Degrading Site with Varnish and High Load

Saturday, July 16th, 2011

If you run Varnish, you might want to gracefully degrade your site when traffic comes unexpectedly. There are other solutions listed on the net which maintain a Three State Throttle, but, it seemed like this could be done easily within Varnish without needing too many external dependencies.

The first challenge was to figure out how we wanted to handle state. Our backend director is set up with a ‘level1’ backend which doesn’t do any health checks. We need at least one node to never fail the health check since the ‘level2’ and ‘level3’ backends will go offline to signify to Varnish that we need to take action. While this scenario considers the failure mode cascades, i.e. level2 fails, then if things continue to increase load, level3 fails, there is nothing preventing you from having separate failure modes and different VCL for those conditions.

You could have VCL that replaced the front page of your site with ‘top news’ during an event which links to your secondary page. You can rewrite your VCL to handle almost any condition and you don’t need to worry about doing a VCL load to update the configuration.

While maintaining three configurations is easier, there are a few extra points of failure added in that system. If the load on the machine gets too high and the cron job or daemon that is supposed to update the VCL doesn’t run quickly enough or has issues with network congestion talking with Varnish, your site could run in a degraded mode much longer than needed. With this solution, in the event that there is too much network congestion or too much load for the backend to respond, Varnish automatically considers that a level3 failure and enacts those rules – without the backend needing to acknowledge the problem.

The basics

First, we set up the script that Varnish will probe. The script doesn’t need to be php and only needs to respond with an error 404 to signify to Varnish that probe request has failed.

<?php
$level = $_SERVER['QUERY_STRING'];
$load = file_get_contents('/proc/loadavg') * 1;
if ( ($level == 2) and ($load > 10) ) {
  header("HTTP/1.0 404 Get the bilge pumps working!");
}
if ( ($level == 3) and ($load > 20) ) {
  header("HTTP/1.0 404 All hands abandon ship");
}
?>

Second, we need to have our backend pool configured to call our probe script:

backend level1 {
  .host = "66.55.44.216";
  .port = "80";
}
backend level2 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?2";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}
backend level3 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?3";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}

director crisis random {
  {
# base that should always respond so we don't get an Error 503
    .backend = level1;
    .weight = 1;
  }
  {
    .backend = level2;
    .weight = 1;
  }
  {
    .backend = level3;
    .weight = 1;
  }
}

Since both of our probes go to the same backend, it doesn’t matter which director we use or what weight we assign. We just need to have one backend configured that won’t fail the probe along with our level2 and level3 probes. In this example, when the load on the server is greater than 10, it triggers a level2 failure. If the load is greater than 20, it triggers a level3 failure.

In this case, when the backend probe request fails, we just rewrite the URL. Any VCL can be added, but, you will have some duplication. Since the VCL is compiled into the Varnish server, it should have negligible performance impact.

sub vcl_recv {
  set req.backend = level2;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level2.php";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level3.php";
  }
  set req.backend = crisis;
}

In this case, when we have a level2 failure, we change any URL requested to serve the file /level2.php. In vcl_fetch, we make a few changes to the object ttl so that we prevent the backend from getting hit too hard. We also change the server name so that we can look at the headers to see what level our server is currently running. In Firefox, there is an extension called Header Spy which will allow you to keep track of a header. Often times I’ll track X-Cache which I set to HIT or MISS to make sure Varnish is caching, but, you could also track Server and be aware of whether things are running properly.

sub vcl_fetch {
  set beresp.ttl = 0s;

  set req.backend = level2;
  if (!req.backend.healthy) {
    set beresp.ttl = 5m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 2 - Warning)";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    set beresp.ttl = 30m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 3 - Critical)";
  }

At this point, we’ve got a system that degrades gracefully, even if the backend cannot respond or update Varnish’s VCL and it self-heals based on the load checks. Ideally you’ll also want to put Grace timers and possibly run Saint mode to handle significant failures, but, this should help your system protect itself from meltdown.

Complete VCL

backend level1 {
  .host = "66.55.44.216";
  .port = "80";
}
backend level2 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?2";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}
backend level3 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?3";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}

director crisis random {
  {
# base that should always respond so we don't get an Error 503
    .backend = level1;
    .weight = 1;
  }
  {
    .backend = level2;
    .weight = 1;
  }
  {
    .backend = level3;
    .weight = 1;
  }
}

sub vcl_recv {
  set req.backend = level2;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level2.php";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level3.php";
  }
  set req.backend = crisis;
}

sub vcl_fetch {
  set beresp.ttl = 0s;

  set req.backend = level2;
  if (!req.backend.healthy) {
    set beresp.ttl = 5m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 2 - Warning)";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    set beresp.ttl = 30m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 3 - Critical)";
  }

  if (req.url ~ "\.(gif|jpe?g|png|swf|css|js|flv|mp3|mp4|pdf|ico)(\?.*|)$") {
    set beresp.ttl = 365d;
  }
}

Updated WordPress VCL – still not complete, but, closer

Saturday, July 16th, 2011

Worked with a new client this week and needed to get the VCL working for their installation. They were running W3TC, but, this VCL should work for people running WP-Varnish or any plugin that allows Purging. This VCL is for Varnish 2.x.

There are still some tweaks, but, this appears to be working quite well.

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

acl purge {
    "10.0.1.100";
    "10.0.1.101";
    "10.0.1.102";
    "10.0.1.103";
    "10.0.1.104";
}

sub vcl_recv {
 if (req.request == "PURGE") {
   if (!client.ip ~ purge) {
     error 405 "Not allowed.";
   }
   return(lookup);
 }

  if (req.http.Accept-Encoding) {
#revisit this list
    if (req.url ~ "\.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(\?.*|)$") {
      remove req.http.Accept-Encoding;
    } elsif (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      remove req.http.Accept-Encoding;
    }
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    unset req.http.cookie;
    set req.url = regsub(req.url, "\?.*$", "");
  }
  if (req.http.cookie) {
    if (req.http.cookie ~ "(wordpress_|wp-settings-)") {
      return(pass);
    } else {
      unset req.http.cookie;
    }
  }
}

sub vcl_fetch {
# this conditional can probably be left out for most installations
# as it can negatively impact sites without purge support. High
# traffic sites might leave it, but, it will remove the WordPress
# 'bar' at the top and you won't have the post 'edit' functions onscreen.
  if ( (!(req.url ~ "(wp-(login|admin)|login)")) || (req.request == "GET") ) {
    unset beresp.http.set-cookie;
# If you're not running purge support with a plugin, remove
# this line.
    set beresp.ttl = 5m;
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    set beresp.ttl = 365d;
  }
}

sub vcl_deliver {
# multi-server webfarm? set a variable here so you can check
# the headers to see which frontend served the request
#   set resp.http.X-Server = "server-01";
   if (obj.hits > 0) {
     set resp.http.X-Cache = "HIT";
   } else {
     set resp.http.X-Cache = "MISS";
   }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "OK";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not cached";
  }
}

When to Cache, What to Cache, How to Cache

Tuesday, June 21st, 2011

This post is a version of the slideshow presentation I did at Hack and Tell in Fort Lauderdale, Florida at The Collide Factory on Saturday, April 2, 2011. These are 5 minute talks where each slide auto-advances after fifteen seconds which limits the amount of detail that can be conveyed.

A brief introduction

What makes a page load quickly? While we can look at various metrics, there are quite a few things that impact pageloads. While the page can be served quickly, the design of the page can often times impact the way that the page is rendered in the browser which can make a site appear to be sluggish. However, we’re going to focus on the mechanics of what it takes to get a page to serve quickly.

The Golden Rule – do as few calculations as possible to hand content to your surfer.

But my site is dynamic!

Do you really need to calculate the last ten posts entered on your blog every time someone visits the page? Surely you could cache that and purge the cache when a new post is entered. When someone adds a new comment, purge the cache and let it be recalculated once.

But my site has user personalization!

Can that personalization be broken into it’s own section of the webpage? Or, is it created by a cacheable function within your application? Even if you don’t support fragment caching on the edge, you can emulate that by caching your expensive SQL queries or even portions of your page.

Even writing a generated file to a static file and allowing your webserver to serve that static file provides an enormous boost. This is what most of the caching plugins for WordPress do. However, they are page caching, not fragment caching, which means that the two most expensive queries that WordPress executes, Category list and Tag Cloud, are generated each time a new page is hit until that page is cached.

One of the problems with high performance sites is the never-ending quest for that Time to First Byte. Each load balancer or proxy in front adds some latency. It also means a page needs to be pre-constructed before it is served, or, you need to do a little trickery. This eliminates being able to do any dynamic processing on the page in order to hand a response back as quickly as possible unless you’ve got plenty of spare computing horsepower.

With this, we’re left with a few options to have a dynamic site that has the performance of a statically generated site.

Amazon was one of the first to embrace the Page and Block method by using Mason, a mod_perl based framework. Each of the blocks on the page was generated ahead of time, and only the personalized blocks were generated ‘late’. This allowed the frontend to assemble these pieces, do minimal work to display the personal recommendations and present the page quickly.

Google took a different approach by having an immense amount of computing horsepower behind their results. Google’s method probably isn’t cost effective for most sites on the Internet.

Facebook developed bigpipe which generates pages and then dynamically loads portions of the page into the DOM units. This makes the page load quickly, but in stages. The viewer sees the rough page quickly as the rest of the page fills in.

The Presentation begins here

Primary Goal

Fast Pageloads – We want the page to load quickly and render quickly so that the websurfer doesn’t abandon the site.

Increased Scalability – Once we get more traffic, we want the site to be able to scale and provide websurfers with a consistent, fast experience while the site grows.

Metrics We Use

Time to First Byte – This is a measure of how quickly the site responds to an incoming request and starts sending the first byte of data. Some sites have to take time to analyze the request, build the page, etc before sending any data. This lag results in the browser sitting there with a blank screen.

Time to Completion – We want the entire page to load quickly enough that the web surfer doesn’t abandon. While we can do some tricky things with chunked encoding to fool websurfers into thinking our page loads more quickly than it really does, for 95% of the sites, this is a decent metric.

Number of Requests – The total number of requests for a page is a good indicator of overall performance. Since most browsers will only request a handful of static assets from a page per hostname, we can use a CDN, embed images in CSS or use Image Sprites to reduce the number of requests.

Why Cache?

Expecting Traffic

When we have an advertising campaign or holiday promotion going on, we don’t know what our expected traffic level might be, so, we need to prepare by having the caching in place.

Receiving Traffic

If we receive unexpected publicity, or our site is listed somewhere, we might cache to allow the existing system to survive a flood of traffic.

Fighting DDOS

When fighting a Distributed Denial of Service Attack, we might use caching to avoid the backend servers from getting overloaded.

Expecting Traffic

There are several types of caching we can do when we expect to receive traffic.

* Page Cache – Varnish/Squid/Nginx provide page caching. A static copy of the rendered page is held and updated from time to time either by the content expiring or being purged from the cache.
* Query Cache – MySQL includes a query cache that can help on repetitive queries.
* Wrap Queries with functions and cache – We can take our queries and write our own caching using a key/value store, avoiding us having to hit the database backend.
* Wrap functions with caching – In Python, we can use Beaker to wrap a decorator around a function which does the caching magic for us. Other languages have similar facilities.

Receiving Traffic

* Page Caching – When we’re receiving traffic, the easiest thing to do is to put a page cache in place to save the backend/database servers from getting overrun. We lose some of the dynamic aspects, but, the site remains online.

* Fragment Caching – With fragment caching, we can break the page into zones that have separate expiration times or can be purged separately. This can give us a little more control over how interactive and dynamic the site appears while it is receiving traffic.

DDOS Handling

* Slow Client/Vampire Attacks – Certain DDOS attacks cause problems with some webserver software. Recent versions of Apache and most event/poll driven webservers have protection against this.
* Massive Traffic – With some infrastructures, we’re able to filter out the traffic ahead of time – before it hits the backend.

Caching Easy, Purging Hard

Caching is scaleable. We can just add more caching servers to the pool and keep scaling to handle increased load. The problem we run into is keeping a site interactive and dynamic as content needs to be updated. At this point, purging/invalidating cached pages or regions requires communication with each cache.

Page Caching

Some of the caching servers that work well are Varnish, Squid and Nginx. Each of these allows you to do page caching, specify expire times, and handle most requests without having to talk to the backend servers.

Fragment Caching

With Edge Side Includes or a Page and Block Construction can allow you to cache pieces of the page as shown in the following diagram. With this, we can individually expire pieces of the page and allow our front end cache, Varnish, to reassemble the pieces to serve to the websurfer.

http://www.trygve-lie.com/blog/entry/esi_explained_simple

Cache Methods

* Hardware – Hard drives contain caches as do many controller cards.
* SQL Cache – adding memory to keep the indexes in memory or enabling the SQL query cache can help.
* Redis/Memcached – Using a key/value store can keep requests from hitting rotational media (disks)
* Beaker/Functional Caching – Either method can use a key/value store, preferably using RAM rather than disk, to prevent requests from having to hit the database backend.
* Edge/Frontend Caching – We can deploy a cache on the border to reduce the number of requests to the backend.

OS Buffering/Caching

* Hardware Caching on drive – Most hard drives today have caches – finding one with a large cache can help.
* Caching Controller – If you have a large ‘hot set’ of data that changes, using a caching controller can allow you to put a gigabyte or more RAM to avoid having to hit the disk for requests. Make sure you get the battery backup card just in case your machine loses power – those disk writes are often reported as completed before they are physically written to the disk.
* Linux/FreeBSD/Solaris/Windows all use RAM for caching

MySQL Query Cache

The MySQL Query cache is simple yet effective. It isn’t smart and doesn’t cache based on query plan, but, if your code base executes queries where the arguments are in the same order, it can be quite a plus. If you are dynamically creating queries, assembling the queries to try and keep the conditions in the same order will help.

Redis/Memcached

* Key Value Store – you can store frequently requested data in memory.
* Nginx can read rendered pages right from Memcached.

Both methods use RAM rather than hitting slower disk media.

Beaker/Functional Caching

With Python, we can use the Beaker decorator to specify caching. This insulates us from having to write our own handler.

Edge/Front End Caching

* Define blocks that can be cached, portions of the templates.
* Page Caching
* JSON (CouchDB) – Even JSON responses can run behind Varnish.
* Bigpipe – Cache the page, and allow javascript to assemble the page.

Content Delivery Network (CDN)

When possible, use a Content Delivery Network to store static assets off net. This adds a separate hostname and sometimes a separate domain name which allows most browsers to fetch more resources at the same time. Preferably you want to use a separate domain name that won’t have any cookies set – which cuts down on the size of the request object sent from the browser to the server with the static assets.

Bigpipe

Facebook uses a technology called Bigpipe which caches the page template and the javascript required to build the page. Once that has loaded, Javascript fetches the data and builds the page. Some of the json data requested is also cached, leading to a very compact page being loaded and built while you’re viewing the page.

Google’s Answer

Google has spent many years building a tremendous distributed computer. When you request a site, their frontend servers use a deadline scheduler and request blocks from their advertising, personalization, search results and other page blocks. The page is then assembled and returned to the web surfer. If any block doesn’t complete quickly enough, it is left out from assembly – which motivates the advertising department to make sure their block renders quickly.

What else can we do?

* Reduce the number of calculations required to serve a page
* Reduce the number of disk operations
* Reduce the network Traffic

In general, do as few calculations as possible while handing the page to the surfer.

Weekend, WRT54GS, OpenWRT, IPv6 through tunnelbroker.net

Saturday, April 30th, 2011

While we’ve been doing a lot of work recently with IPv6, I decided to see if I could reconfigure an older Linksys WRT54GS to run OpenWRT, so that I could use it to route IPv6 to the machines at the house, rather than using the entire /64 on my macbook. This will also allow me to run IPv6 on other machines at the house.

First I ran into some issues flashing OpenWRT – which were fixed by upgrading the firmware on the machine to the latest version supplied by Cisco/Linksys, then, flashing the OpenWRT build from http://downloads.openwrt.org/snapshots/trunk/brcm47xx/.

Once you’ve done that, telnet to 192.168.1.1, type passwd, enter a new password, log out, ssh root@192.168.1.1 using the new password and you’re set.

Configuring wireless was simple enough, though, I couldn’t get WEP to work, I had to move over to WEP/PSK2. With WEP configured, using multiple different suggested configurations, OpenWRT would always respond with:

Configuration file: /var/run/hostapd-phy0.conf
Could not set WEP encryption.
Interface initialization failed
wlan0: Unable to setup interface.
rmdir[ctrl_interface]: No such file or directory
Failed to start hostapd for phy0

Changing the encryption type to psk2 and setting the key allowed me to type wifi which then recognized the configuration. A warning pops up:

root@OpenWrt:/etc/config# wifi
Configuration file: /var/run/hostapd-phy0.conf
Using interface wlan0 with hwaddr 00:12:17:3a:c6:4a and ssid 'ipv6'
random: Cannot read from /dev/random: Resource temporarily unavailable
random: Only 0/20 bytes of strong random data available from /dev/random
random: Not enough entropy pool available for secure operations
WPA: Not enough entropy in random pool for secure operations - update keys later when the first station connects

I set up a separate network, and am allowing the one router to stay online with my existing config. That way, I am not disrupting the main router and can keep testing on its own Wireless LAN. At this point, I’ve set 192.168.6.0/24 as the IPv4 for the IPv6 Wireless router, connected through it as my preferred Wireless LAN and am now able to surf the internet.

Next, we need to set up our IPv6 configuration from http://www.tunnelbroker.net/, a free service provided by Hurricane Electric.

We need to install the ipv6 kernel models, then, activate IPv6 (or, you can power cycle the router and the ipv6 modules will automatically be installed.

opkg install kmod-ipv6
insmod ipv6
opkg install 6in4

We can verify that ipv6 is working by typing:

root@OpenWrt:/etc# ifconfig br-lan
br-lan    Link encap:Ethernet  HWaddr 00:12:17:3A:C6:48  
          inet addr:192.168.6.1  Bcast:192.168.6.255  Mask:255.255.255.0
          inet6 addr: fe80::212:17ff:fe3a:c648/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5338 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4690 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:574933 (561.4 KiB)  TX bytes:2397889 (2.2 MiB)

and we can see that the inet6 addr: is set with a default, unrouteable address. For troubleshooting, we’ll install tcptraceroute6.

opkg install tcptraceroute6

From this thread, we take the script listed and name it /etc/init.d/ipv6:

NOTE: I’ve made minor changes altering br0 to br-lan as the original script uses the whiterussian distribution of openWRT and we’re using the kamakaze version.

#!/bin/sh /etc/rc.common

#Information from the "Tunnel Details" page
SERVER_v4=Server IPv4 Address
SERVER_v6=Server IPv6 Address

CLIENT_v4=Client IPv4 Address
CLIENT_v6=Client IPv6 Address

# Uncomment if you have a /48
#ROUTED_48=Your /48 netblock's gateway address, e.g., 2001:a:b::1
ROUTED_64=Your /64 netblock's gateway address, e.g., 2001:a:b:c::1

START=50

start() {
	echo "Starting he.net IPv6 tunnel: "
	ip tunnel add henet mode sit remote $SERVER_v4 local $CLIENT_v4 ttl 255
	ip link set henet up

	ip -6 addr add $CLIENT_v6/64 dev henet
	ip -6 ro add default via $SERVER_v6 dev henet

	ip -6 addr add $ROUTED_64/64 dev br-lan
	# Uncomment if you have a /48
        #ip -6 addr add $ROUTED_48/48 dev br-lan
	ip -f inet6 addr

	echo "Done."
}
stop() {
	echo -n "Stopping he.net IPv6 tunnel: "
	ip link set henet down
	ip tunnel del henet

	ip -6 addr delete $ROUTED_64/64 dev br-lan
	# Uncomment if you have a /48
        #ip -6 addr delete $ROUTED_48/48 dev br-lan

	echo "Done."
}
restart() {
	stop
	start
}

We fill in the information available to us from the tunnelbroker.net admin page which lists your existing tunnel configurations.

/etc/init.d/ipv6 start

root@OpenWrt:/etc/init.d# ping6 -c 5 ipv6.google.com
PING ipv6.google.com (2001:4860:8003::63): 56 data bytes
64 bytes from 2001:4860:8003::63: seq=0 ttl=55 time=89.572 ms
64 bytes from 2001:4860:8003::63: seq=1 ttl=55 time=88.701 ms
64 bytes from 2001:4860:8003::63: seq=2 ttl=55 time=121.524 ms
64 bytes from 2001:4860:8003::63: seq=3 ttl=55 time=87.989 ms
64 bytes from 2001:4860:8003::63: seq=4 ttl=55 time=88.010 ms

--- ipv6.google.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 87.989/95.159/121.524 ms
root@OpenWrt:/etc/init.d#

And we have IPv6 routing on the router. After we’re sure things are working, we can do:

/etc/init.d/ipv6 enable

which will configure the router to run our script on startup. A slight configuration change on the laptop, and:

tsavo:~ mcd$ ping6 -c 5 ipv6.google.com
PING6(56=40+8+8 bytes) 2001:470:4:590::cd34 --> 2001:4860:8007::67
16 bytes from 2001:4860:8007::67, icmp_seq=0 hlim=54 time=91.914 ms
16 bytes from 2001:4860:8007::67, icmp_seq=1 hlim=54 time=90.727 ms
16 bytes from 2001:4860:8007::67, icmp_seq=2 hlim=54 time=91.214 ms
16 bytes from 2001:4860:8007::67, icmp_seq=3 hlim=54 time=94.121 ms
16 bytes from 2001:4860:8007::67, icmp_seq=4 hlim=54 time=90.975 ms

--- ipv6.l.google.com ping6 statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 90.727/91.790/94.121/1.231 ms
tsavo:~ mcd$

Compared to the tunnel script on the mac, I’ve shaved off about 51ms from each ping – which seems to indicate that the gif0 interface on the mac is a little resource heavy since I am routing through the WRT54GS through a WRT160Nv2 and still getting better ping times.

At this point, it would be wise to install ipv6tables, shorewall6-lite or one of the other ipv6 capable firewalls. Configuring those is as easy as it would be on a normal machine, with shorewall probably being the easiest of them to configure.

Not bad for about 40 minutes of work – and now I can add other machines on my network and utilize IPv6.

Conversion to Dual Stack IPv6

Monday, April 25th, 2011

Over the weekend we took a large step and converted 25% of our network over to DualStack IPv4/IPv6. We aren’t just saying that we’re ready for IPv6, we are actually using IPv6 internally and starting to move some public facing sites so that they can serve both IPv4 and IPv6 enabled web surfers. We primarily run Debian, but, have a few Windows, FreeBSD and other OSs. Our current efforts are switching our Debian machines over since that comprises 95% of our network. We run 2.6.38 with a few config changes rather than the default Debian kernel, but, much of the testing was done with a stock Debian kernel.

We ran into a few minor issues, but, currently backups for the first group of machines that we converted are running over IPv6. Additionally, email that is handled from our MX servers is handed off to the machines over IPv6. Currently, only the actual machines have IPv6 addresses so, we don’t have many public facing sites running, but, of the few that are announcing both IPv4 and IPv6, they amount to almost 4% of our traffic. Clients that access the machines directly for SSH, FTP, POP3, IMAP, SMTP will use IPv6 if they are able. Most clients don’t use the actual devicename for FTP/POP3/IMAP/SMTP, so, most won’t use IPv6 until their public facing site is IPv6 enabled.

Our network is relatively flat which makes our deployment a little easier, but, the basic structure is:

edge router -> chassis switch
                        chassis switch
                        chassis switch
                        ...
edge router -> chassis switch

We use VRRP to announce a x:x::1/64, each machine gets a /128 from that /64, then, using static routes, we route a /64 to each machine. Due to an issue with OSPF3 on our current network, we had to fall back to static routes. Each machine is allocated a /128 from our main network, and a /64 to the client. Virtual webhost machines, we might allocate /80s to each virtual client out of the /64, but we haven’t made a firm decision on that. We’ve actually cheated and run IPv6 on their own connections to the chassis switch to make traffic and flow monitoring a little easier.

Our basic network now consists of every machine in a single /64 which cuts down on arp requests and VLAN issues, but, requires a slightly different configuration than our existing IPv4 network which used VLANs.

When we configure a machine, we need to add the admin IP to it and push the config changes using our management software. We’ve not automated putting the initial IP address on each machine as it requires route entries into our edge routers. Once OSPF3 is fixed later this week, I expect the process to be more automated.

The first step is to take a /128 out of the /64 for the ‘device’ network and assign it to the machine:


ifconfig eth0 add xxxx:xxxx::c:1/64
route --inet6 add default gateway xxxx:xxxx::1

We opted to use ::ad:xxxx for admin machines, ::c:xxxx for client servers. Since you can use hexadecimal, you could actually assign cabinet numbers, or switch numbers to make identifying the machine location a little quicker. Perhaps some identifier for the building, cabinet/rack, switch it is connected to, etc. could be used. For now, we’re using :c: and :ad: to signify client and admin. Our primary storage server is :ad:bac1, our development machine is :ad:de, etc. Our admin network is unlikely to exceed 65536 machines, but, there is a lot of flexibility if you want to get creative.

Once we’ve added the initial IP, our management software inserts the following into /etc/network/interfaces:

iface eth0 inet6 static
        address xxxx:xxxx::c:1
        netmask 64
        endpoint any
        gateway xxxx:xxxx::1

At this point, the AAAA record for the device is published, and we can access the machine over ssh using IPv6.

For Postfix, we needed to add the following to /etc/postfix/main.cf:

inet_protocols = ipv4, ipv6

Additionally, we needed to modify /etc/postfix/mynetworks.txt to add:

[xxxx:xxxx::/64]

which allows machines on our local network to communicate with the server and ‘relay’. It is possible that the line to be modified might not refer to a config file and is specified in /etc/postfix/main.cf:

mynetworks =

Dovecot required changes to /etc/dovecot/dovecot.conf:

listen=[::], *

Pure-FTPD had problems with IPv6 reverse lookups:


echo yes > /etc/pure-ftpd/conf/DontResolve;/etc/init.d/pure-ftpd restart

And of course, /etc/resolv.conf:


nameserver xx.xx.xx.xx
nameserver xx.xx.xx.xx
nameserver xxxx:xxxx::ad:1
nameserver xxxx:xxxx::ad:2

We’ve had minor customer impact and lost one email during our conversions due to missing the mynetworks parameter in postfix and bouncing one message. Debian’s version of Dovecot doesn’t listen to both interfaces with listen=[::] as one might imagine by reading the documentation, but, that was tested on a test machine and didn’t affect any clients.

Many of the config files require [] around the IPv6 addresses such as Apache and Varnish. When you need to specify Listen ports on those machines since they have multiple services listening on port 80 on the same machine on separate IPs, it is something to remember.

Most of the server software we’ve run across hasn’t had any issues. However, client software that uses char(15) to store IP addresses probably needs to be fixed.

So far, I think we’ll be ready for World IPv6 Day with over 98% of our machines running DualStack and we’re shooting for 20% of our client’s public facing sites to have IPv6 support by June 8, 2011.

We have two machines running Tux that are stuck on 2.4.37 and regrettably, Tux appears to segfault when it receives IPv6 traffic. It is a shame since Varnish and Nginx are still outclassed by a webserver written twelve years ago.

So far, the conversion process has been quite straightforward with the minor issues you would expect when introducing IPv6 to applications and server stacks that weren’t written to handle it. I suspect we’ll have 98% of our machines dualstack by May 7 which will give us a month to get 20% of our client base educated and convinced to turn on IPv6.

Entries (RSS) and Comments (RSS).
Cluster host: li