Updated WordPress VCL – still not complete, but, closer

July 16th, 2011

Worked with a new client this week and needed to get the VCL working for their installation. They were running W3TC, but, this VCL should work for people running WP-Varnish or any plugin that allows Purging. This VCL is for Varnish 2.x.

There are still some tweaks, but, this appears to be working quite well.

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

acl purge {
    "10.0.1.100";
    "10.0.1.101";
    "10.0.1.102";
    "10.0.1.103";
    "10.0.1.104";
}

sub vcl_recv {
 if (req.request == "PURGE") {
   if (!client.ip ~ purge) {
     error 405 "Not allowed.";
   }
   return(lookup);
 }

  if (req.http.Accept-Encoding) {
#revisit this list
    if (req.url ~ "\.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(\?.*|)$") {
      remove req.http.Accept-Encoding;
    } elsif (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      remove req.http.Accept-Encoding;
    }
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    unset req.http.cookie;
    set req.url = regsub(req.url, "\?.*$", "");
  }
  if (req.http.cookie) {
    if (req.http.cookie ~ "(wordpress_|wp-settings-)") {
      return(pass);
    } else {
      unset req.http.cookie;
    }
  }
}

sub vcl_fetch {
# this conditional can probably be left out for most installations
# as it can negatively impact sites without purge support. High
# traffic sites might leave it, but, it will remove the WordPress
# 'bar' at the top and you won't have the post 'edit' functions onscreen.
  if ( (!(req.url ~ "(wp-(login|admin)|login)")) || (req.request == "GET") ) {
    unset beresp.http.set-cookie;
# If you're not running purge support with a plugin, remove
# this line.
    set beresp.ttl = 5m;
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    set beresp.ttl = 365d;
  }
}

sub vcl_deliver {
# multi-server webfarm? set a variable here so you can check
# the headers to see which frontend served the request
#   set resp.http.X-Server = "server-01";
   if (obj.hits > 0) {
     set resp.http.X-Cache = "HIT";
   } else {
     set resp.http.X-Cache = "MISS";
   }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "OK";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not cached";
  }
}

Posted in Web Infrastructure | 2 Comments »

Google+, Python, and mechanize

July 3rd, 2011

Since Google+’s release, I’ve wanted access to an API. I’m told soon. I couldn’t wait.

#!/usr/bin/env python

import mechanize

cj = mechanize.LWPCookieJar()
cj.load("cookies.txt")

br = mechanize.Browser()
br.set_cookiejar(cj)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.810.0 Safari/535.1 cd34/0.9b')]
br.open('https://www.google.com/accounts/ServiceLogin?service=oz&passive=1209600&continue=https://plus.google.com/up/start/')

br.select_form(nr=0)

br.form.find_control("Email").readonly = False
br.form['Email'] = 'email@address.com'
br.form['Passwd'] = 'supersecretpasswordhere'

br.submit()

for l in br.links():
    print l

cj.save("cookies.txt")

Posted in Python | No Comments »

When to Cache, What to Cache, How to Cache

June 21st, 2011

This post is a version of the slideshow presentation I did at Hack and Tell in Fort Lauderdale, Florida at The Collide Factory on Saturday, April 2, 2011. These are 5 minute talks where each slide auto-advances after fifteen seconds which limits the amount of detail that can be conveyed.

A brief introduction

What makes a page load quickly? While we can look at various metrics, there are quite a few things that impact pageloads. While the page can be served quickly, the design of the page can often times impact the way that the page is rendered in the browser which can make a site appear to be sluggish. However, we’re going to focus on the mechanics of what it takes to get a page to serve quickly.

The Golden Rule – do as few calculations as possible to hand content to your surfer.

But my site is dynamic!

Do you really need to calculate the last ten posts entered on your blog every time someone visits the page? Surely you could cache that and purge the cache when a new post is entered. When someone adds a new comment, purge the cache and let it be recalculated once.

But my site has user personalization!

Can that personalization be broken into it’s own section of the webpage? Or, is it created by a cacheable function within your application? Even if you don’t support fragment caching on the edge, you can emulate that by caching your expensive SQL queries or even portions of your page.

Even writing a generated file to a static file and allowing your webserver to serve that static file provides an enormous boost. This is what most of the caching plugins for WordPress do. However, they are page caching, not fragment caching, which means that the two most expensive queries that WordPress executes, Category list and Tag Cloud, are generated each time a new page is hit until that page is cached.

One of the problems with high performance sites is the never-ending quest for that Time to First Byte. Each load balancer or proxy in front adds some latency. It also means a page needs to be pre-constructed before it is served, or, you need to do a little trickery. This eliminates being able to do any dynamic processing on the page in order to hand a response back as quickly as possible unless you’ve got plenty of spare computing horsepower.

With this, we’re left with a few options to have a dynamic site that has the performance of a statically generated site.

Amazon was one of the first to embrace the Page and Block method by using Mason, a mod_perl based framework. Each of the blocks on the page was generated ahead of time, and only the personalized blocks were generated ‘late’. This allowed the frontend to assemble these pieces, do minimal work to display the personal recommendations and present the page quickly.

Google took a different approach by having an immense amount of computing horsepower behind their results. Google’s method probably isn’t cost effective for most sites on the Internet.

Facebook developed bigpipe which generates pages and then dynamically loads portions of the page into the DOM units. This makes the page load quickly, but in stages. The viewer sees the rough page quickly as the rest of the page fills in.

The Presentation begins here

Primary Goal

Fast Pageloads – We want the page to load quickly and render quickly so that the websurfer doesn’t abandon the site.

Increased Scalability – Once we get more traffic, we want the site to be able to scale and provide websurfers with a consistent, fast experience while the site grows.

Metrics We Use

Time to First Byte – This is a measure of how quickly the site responds to an incoming request and starts sending the first byte of data. Some sites have to take time to analyze the request, build the page, etc before sending any data. This lag results in the browser sitting there with a blank screen.

Time to Completion – We want the entire page to load quickly enough that the web surfer doesn’t abandon. While we can do some tricky things with chunked encoding to fool websurfers into thinking our page loads more quickly than it really does, for 95% of the sites, this is a decent metric.

Number of Requests – The total number of requests for a page is a good indicator of overall performance. Since most browsers will only request a handful of static assets from a page per hostname, we can use a CDN, embed images in CSS or use Image Sprites to reduce the number of requests.

Why Cache?

Expecting Traffic

When we have an advertising campaign or holiday promotion going on, we don’t know what our expected traffic level might be, so, we need to prepare by having the caching in place.

Receiving Traffic

If we receive unexpected publicity, or our site is listed somewhere, we might cache to allow the existing system to survive a flood of traffic.

Fighting DDOS

When fighting a Distributed Denial of Service Attack, we might use caching to avoid the backend servers from getting overloaded.

Expecting Traffic

There are several types of caching we can do when we expect to receive traffic.

* Page Cache – Varnish/Squid/Nginx provide page caching. A static copy of the rendered page is held and updated from time to time either by the content expiring or being purged from the cache.
* Query Cache – MySQL includes a query cache that can help on repetitive queries.
* Wrap Queries with functions and cache – We can take our queries and write our own caching using a key/value store, avoiding us having to hit the database backend.
* Wrap functions with caching – In Python, we can use Beaker to wrap a decorator around a function which does the caching magic for us. Other languages have similar facilities.

Receiving Traffic

* Page Caching – When we’re receiving traffic, the easiest thing to do is to put a page cache in place to save the backend/database servers from getting overrun. We lose some of the dynamic aspects, but, the site remains online.

* Fragment Caching – With fragment caching, we can break the page into zones that have separate expiration times or can be purged separately. This can give us a little more control over how interactive and dynamic the site appears while it is receiving traffic.

DDOS Handling

* Slow Client/Vampire Attacks – Certain DDOS attacks cause problems with some webserver software. Recent versions of Apache and most event/poll driven webservers have protection against this.
* Massive Traffic – With some infrastructures, we’re able to filter out the traffic ahead of time – before it hits the backend.

Caching Easy, Purging Hard

Caching is scaleable. We can just add more caching servers to the pool and keep scaling to handle increased load. The problem we run into is keeping a site interactive and dynamic as content needs to be updated. At this point, purging/invalidating cached pages or regions requires communication with each cache.

Page Caching

Some of the caching servers that work well are Varnish, Squid and Nginx. Each of these allows you to do page caching, specify expire times, and handle most requests without having to talk to the backend servers.

Fragment Caching

With Edge Side Includes or a Page and Block Construction can allow you to cache pieces of the page as shown in the following diagram. With this, we can individually expire pieces of the page and allow our front end cache, Varnish, to reassemble the pieces to serve to the websurfer.

http://www.trygve-lie.com/blog/entry/esi_explained_simple

Cache Methods

* Hardware – Hard drives contain caches as do many controller cards.
* SQL Cache – adding memory to keep the indexes in memory or enabling the SQL query cache can help.
* Redis/Memcached – Using a key/value store can keep requests from hitting rotational media (disks)
* Beaker/Functional Caching – Either method can use a key/value store, preferably using RAM rather than disk, to prevent requests from having to hit the database backend.
* Edge/Frontend Caching – We can deploy a cache on the border to reduce the number of requests to the backend.

OS Buffering/Caching

* Hardware Caching on drive – Most hard drives today have caches – finding one with a large cache can help.
* Caching Controller – If you have a large ‘hot set’ of data that changes, using a caching controller can allow you to put a gigabyte or more RAM to avoid having to hit the disk for requests. Make sure you get the battery backup card just in case your machine loses power – those disk writes are often reported as completed before they are physically written to the disk.
* Linux/FreeBSD/Solaris/Windows all use RAM for caching

MySQL Query Cache

The MySQL Query cache is simple yet effective. It isn’t smart and doesn’t cache based on query plan, but, if your code base executes queries where the arguments are in the same order, it can be quite a plus. If you are dynamically creating queries, assembling the queries to try and keep the conditions in the same order will help.

Redis/Memcached

* Key Value Store – you can store frequently requested data in memory.
* Nginx can read rendered pages right from Memcached.

Both methods use RAM rather than hitting slower disk media.

Beaker/Functional Caching

With Python, we can use the Beaker decorator to specify caching. This insulates us from having to write our own handler.

Edge/Front End Caching

* Define blocks that can be cached, portions of the templates.
* Page Caching
* JSON (CouchDB) – Even JSON responses can run behind Varnish.
* Bigpipe – Cache the page, and allow javascript to assemble the page.

Content Delivery Network (CDN)

When possible, use a Content Delivery Network to store static assets off net. This adds a separate hostname and sometimes a separate domain name which allows most browsers to fetch more resources at the same time. Preferably you want to use a separate domain name that won’t have any cookies set – which cuts down on the size of the request object sent from the browser to the server with the static assets.

Bigpipe

Facebook uses a technology called Bigpipe which caches the page template and the javascript required to build the page. Once that has loaded, Javascript fetches the data and builds the page. Some of the json data requested is also cached, leading to a very compact page being loaded and built while you’re viewing the page.

Google’s Answer

Google has spent many years building a tremendous distributed computer. When you request a site, their frontend servers use a deadline scheduler and request blocks from their advertising, personalization, search results and other page blocks. The page is then assembled and returned to the web surfer. If any block doesn’t complete quickly enough, it is left out from assembly – which motivates the advertising department to make sure their block renders quickly.

What else can we do?

* Reduce the number of calculations required to serve a page
* Reduce the number of disk operations
* Reduce the network Traffic

In general, do as few calculations as possible while handing the page to the surfer.

Posted in Web Infrastructure | 1 Comment »

WordPress, Varnish and ESI Plugin

June 5th, 2011

This post is a version of the slideshow presentation I did at Hack and Tell in Fort Lauderdale, Florida at The Whitetable Foundation on Saturday, June 4, 2011.

Briefly, I created a Plugin that enabled Fragment Caching with WordPress and Varnish. The problem we ran into with normal page caching methods was related to the fact that this particular client had people visiting many pages per visit, requiring the sidebar to be regenerated on uncached (cold) pages. By caching the sidebar and the page and assembling the page using Edge Side Includes, we can cache the sidebar which contains the most database intensive queries separately from the page. Thus, a visitor moving from one page to a cold page, only needs to wait for the page to generate and pull the sidebar from the cache.

What problem are we solving?

We had a high traffic site where surfers visited multiple pages, and, a very interactive site. Surfers left a lot of comments which meant we were constantly purging the page cache. This resulted in the sidebar having to be regenerated numerous times – even when it wasn’t truly necesssary.

What are our goals?

First, we want that Time to First Byte to be as quick as possible – surfers hate to wait and if you have a site that takes 12 seconds before they see any visible indication that there is something happening, most will leave.

We needed to keep the site interactive, which meant purging pages from cache when posts were made.

We had to have fast pageloads – accomplished by caching the static version of the page and doing as few calculations as possible to deliver the content.

We needed fast static content loading. Apache does very well, but, isn’t the fastest webserver out there.

How does the WordPress front page work?

The image above is a simple representation of a page that has a header, an article section where three articles are shown and a sidebar. Each of those elements is built from a number of SQL queries, assembled and displayed to the surfer. Each plugin that is used, especially filter plugins that look at content and modify it before output add a little latency – resulting in a slower page display.

How does an Article page work?

An article page works very similar to the frontpage except our content block now only contains the contents from one post. Sometimes additional plugins are called to display the post content dealing with comments, social media sharing icons, greetings based on where you’re visiting from (Google, Digg, Reddit, Facebook, etc) and many more. We also see the same sidebar on our site which contains the site navigation, advertisements and other content.

What Options do we Have?

There are a number of existing caching plugins that I have benchmarked in the past. Notably we have:

* WP-Varnish
* W3 Total Cache
* WP Super Cache
* WordPress-Varnish-ESI
* and many others

Page Caching

With Page Caching, you take the entire generated page and cache it either in ram or on disk. Since the page doesn’t need to be generated from the database, the static version of the page is served much more quickly.

Fragment Caching

With Fragment Caching, we’re able to cache the page and a smaller piece that is often repeated, but, perhaps doesn’t change as often as the page. When a websurfer comments on a post, the sidebar doesn’t need to be regenerated, but, the page does.

WordPress and Varnish

Varnish doesn’t deal well with cookies, and WordPress uses a lot of cookies to maintain information about the current web surfer. Some plugins also add their own cookies to track things so that their plugin works.

Varnish can do domain name normalization which may be desired or not. Many sites redirect the bare domain to the www.domain.com. If you do this, you can modify your Varnish Cache Language (VCL) to make sure it always hands back the proper host header.

There are other issues with Varnish that affect how well it caches. There are a number of situations where Varnish doesn’t work as you would expect, but, this can all be addressed with VCL.

Purging – caching is easy, purging is hard once you graduate beyond a single server setup.

WordPress and Varnish with ESI

In this case, our plugin caches the page and the sidebar separately, and allows Varnish to assemble the page prior to sending it to the server. This is going to be a little slower than page caching, but, in the long run, if you have a lot of page to page traffic, having that sidebar cached will make a significant impact.

Possible Solutions

You could hardcode templates and write modules to cache CPU or Database heavy widgets and in some cases, that is a good solution.

You could create a widget that handles the work to cache existing widgets. There is a plugin called Widget Cache, but, I didn’t find it to have much benefit when testing.

Many of the plugins could be rewritten to use client-side javascript. This way, caching would allow the javascript to be served and the actual computational work would be done on the client’s web browser.

Technical Problems

When the plugin was originally written, Varnish didn’t support compressing ESI assembled pages which resulted in a very difficult to manage infrastructure.

WordPress uses a lot of cookies which need to be dealt with very carefully in Varnish’s configuration.

What sort of Improvement?

Before the ESI Widget	After the ESI Widget
12 seconds time to first byte	.087 seconds time to first byte
.62 requests per second	567 requests per second
Huge number of elements	Moved some elements to a ‘CDN’ url

WordPress Plugin

In the above picture, we can see the ESI widget has been added to the sidebar, and we’ve added our desired widgets to the new ESI Widget Sidebar.

Varnish VCL – vcl_recv

sub vcl_recv {
    if (req.request == "BAN") {
       ban("req.http.host == " + req.http.host +
              "&& req.url == " + req.url);
       error 200 "Ban added";
    }
    if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
      unset req.http.cookie;
      set req.url = regsub(req.url, "\?.*$", "");
    }
    If (!(req.url ~ "wp-(login|admin)")) {
      unset req.http.cookie;
    }
}

In vcl_recv, we set up rules to allow the plugin to purge content, we do a little manipulation to cache static assets and ignore some of the cache breaking arguments specified after the ? and we aggressively remove cookies.

Varnish VCL – vcl_fetch

sub vcl_fetch {
  if ( (!(req.url ~ "wp-(login|admin)")) || (req.request == "GET") ) {
                unset beresp.http.set-cookie;
  }
  set beresp.ttl = 12h;

  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    set beresp.ttl = 365d;
  } else {
    set beresp.do_esi = true;
  }
}

Here, we remove cookies set by the backend. We set our timeout to 12 hours, overriding any expire time. Since the widget purges cached content, we can set this to a longer expiration time – eliminating additional CPU and database work. For static asset, we set a one year expiration time, and, if it isn’t a static asset, we parse it for ESI. The ESI parsing rule needs to be refined considerably as it currently parses objects that wouldn’t contain ESI.

Did Things Break?

Purging broke things and revealed a bug in PHP’s socket handling.

Posting Comments initially broke as a result of cookie handling that was a little too aggressive.

Certain plugins break that rely on being run on each pageload such as WP Greet Box and many of the Post Count and Statistics plugins.

Apache logs are rendered virtually useless since most of the queries are handled by Varnish and never hit the backend. You can log from varnishncsa, but, Google Analytics or some other webbug statistics program is a little easier to use.

End Result

Varnish 3.0, currently in beta, allows compression of ESI assembled pages, and, now can accept compressed content from the backend – allowing the Varnish server to exist at a remote location, possibly opening up avenues for companies to provide Varnish hosting in front of your WordPress site using this plugin.

Varnish ESI powered sites became much easier to deploy with 3.0. Before 2.0, you needed to run Varnish to do the ESI assembly, then, into some other server like Nginx to compress the page before sending it to the surfer, or, you would be stuck handing uncompressed pages to your surfers.

Other Improvements

* Minification/Combining Javascript and CSS
* Proper ordering of included static assets – i.e. include .css files before .js, use Async javascript includes.
* Spriting images – combining smaller images and using CSS to alter the display port resulting in one image being downloaded rather than a dozen tiny social media buttons.
* Inline CSS for images – if your images are small enough, they could be included inline in your CSS – saving an additional fetch for the web browser.
* Multiple sidebars – currently, the ESI widget only handles one sidebar.

How can I get the code?

http://code.google.com/p/wordpress-varnish-esi/

Posted in Scalability | No Comments »

Apache mod_rewrite Performance issue discussion and fix

May 16th, 2011

This weekend I was with a client that was having some issues unrelated to this issue, but, it raised an interesting point. Apache’s handlers have a load order dependent on the modules installed and there are certain modules that slow down apache enough that you want to avoid them on production servers – mod_status being one of those.

The story behind this one is probably something that you’ve run into. WebApp written for one machine, client base grows and it is time to expand. Moving from one server to two, is infinitely harder than moving from two to three. However, you have a legacy that you need to support – clients that won’t change the hyperlink pointing to some API that you’ve designed, so, you use mod_rewrite to fix the problem.

A simple mod_rewrite can redirect the traffic to our old location to the new location so that you don’t need to worry about clients that aren’t going to change the HTML. Lets also pretend this app was written before RESTful APIs were handy and we need to also pass the query string.

RewriteEngine on
RewriteRule ^specialapi.php$ http://newserver.superapp.com/specialapi.php [R=301,L,QSA]

So, after some testing, we’re satisfied that things work as expected and we’re happy that we could split things effectively.

What happens in a request for that url

Our original API did some processing of the request based on some command line arguments, and redirects the person elsewhere. When we do a normal request for this object, using strace, we get the following output:

accept(4, {sa_family=AF_INET6, sin6_port=htons(49632), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305564609, 151401}, NULL) = 0
gettimeofday({1305564609, 151686}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 435
gettimeofday({1305564609, 153050}, NULL) = 0
gettimeofday({1305564609, 153303}, NULL) = 0
gettimeofday({1305564609, 153521}, NULL) = 0
gettimeofday({1305564609, 153741}, NULL) = 0
gettimeofday({1305564609, 153933}, NULL) = 0
gettimeofday({1305564609, 154152}, NULL) = 0
gettimeofday({1305564609, 154317}, NULL) = 0
gettimeofday({1305564609, 154533}, NULL) = 0
gettimeofday({1305564609, 154722}, NULL) = 0
gettimeofday({1305564609, 154914}, NULL) = 0
gettimeofday({1305564609, 155103}, NULL) = 0
gettimeofday({1305564609, 155295}, NULL) = 0
gettimeofday({1305564609, 155483}, NULL) = 0
gettimeofday({1305564609, 156089}, NULL) = 0
gettimeofday({1305564609, 156279}, NULL) = 0
gettimeofday({1305564609, 156496}, NULL) = 0
gettimeofday({1305564609, 156685}, NULL) = 0
gettimeofday({1305564609, 156877}, NULL) = 0
gettimeofday({1305564609, 157065}, NULL) = 0
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
rt_sigaction(SIGPROF, {0xb70c1a60, [PROF], SA_RESTART}, {0xb70c1a60, [PROF], SA_RESTART}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0
umask(077)                              = 022
umask(022)                              = 077
getcwd("/", 4095)                       = 2
chdir("/var/www/uc/test")               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
time(NULL)                              = 1305564609
open("/var/www/uc/test/api.php", O_RDONLY|O_LARGEFILE) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
mmap2(NULL, 22, PROT_READ, MAP_SHARED, 11, 0) = 0xb6de7000
munmap(0xb6de7000, 22)                  = 0
close(11)                               = 0
chdir("/")                              = 0
umask(022)                              = 022
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\247q\340\"", 4)              = 4
close(11)                               = 0
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\216\241*W", 4)               = 4
close(11)                               = 0
open("/dev/urandom", O_RDONLY)          = 11
read(11, "\270\267\22+", 4)             = 4
close(11)                               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
writev(10, [{"HTTP/1.1 200 OK\r\nDate: Mon, 16 M"..., 237}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"+I-.\1\0", 6}, {"\f~\177\330\4\0\0\0", 8}], 4) = 261
gettimeofday({1305564609, 174811}, NULL) = 0
gettimeofday({1305564609, 175003}, NULL) = 0
read(10, 0xb93489e0, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 214) = 214
write(8, "vhost_combined\n", 15)        = 15

Briefly, the request comes in for the asset http://testserver.com/test/api.php as you can see by the:

Apache checks to see if the file exists:
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

And does something odd:
open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

Even though the file exists, and isn’t a directory, apache is checking to see if there is a .htaccess file in the api.php directory. This is where part of the issue comes to light.
Eventually, apache changes to the directory and serves the content:
chdir("/var/www/uc/test")               = 0
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
time(NULL)                              = 1305564609
open("/var/www/uc/test/api.php", O_RDONLY|O_LARGEFILE) = 11
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

So, a normal request works, and we’re able to see what Apache is doing. Now, lets put our modified rule in to redirect people to the new location:
accept(4, {sa_family=AF_INET6, sin6_port=htons(50286), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305565527, 718766}, NULL) = 0
gettimeofday({1305565527, 718990}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 435
gettimeofday({1305565527, 719683}, NULL) = 0
gettimeofday({1305565527, 719909}, NULL) = 0
gettimeofday({1305565527, 720127}, NULL) = 0
gettimeofday({1305565527, 720347}, NULL) = 0
gettimeofday({1305565527, 720539}, NULL) = 0
gettimeofday({1305565527, 720732}, NULL) = 0
gettimeofday({1305565527, 720921}, NULL) = 0
gettimeofday({1305565527, 721936}, NULL) = 0
gettimeofday({1305565527, 722127}, NULL) = 0
gettimeofday({1305565527, 722343}, NULL) = 0
gettimeofday({1305565527, 722533}, NULL) = 0
gettimeofday({1305565527, 722724}, NULL) = 0
gettimeofday({1305565527, 722913}, NULL) = 0
gettimeofday({1305565527, 723106}, NULL) = 0
gettimeofday({1305565527, 723295}, NULL) = 0
gettimeofday({1305565527, 723487}, NULL) = 0
gettimeofday({1305565527, 723676}, NULL) = 0
gettimeofday({1305565527, 723869}, NULL) = 0
gettimeofday({1305565527, 724058}, NULL) = 0
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=72, ...}) = 0
read(11, "RewriteEngine on\nRewriteRule ^ap"..., 4096) = 72
read(11, "", 4096)                      = 0
close(11)                               = 0
open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
writev(10, [{"HTTP/1.1 301 Moved Permanently\r\n"..., 303}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"mP\301N\3030\f\275\367+LOpX\334\26\t!\224E\32k\21\2236\250D9p\364\32\263"..., 236}, {"\314\226,\242>\1\0\0", 8}], 4) = 557
gettimeofday({1305565527, 734362}, NULL) = 0
gettimeofday({1305565527, 734577}, NULL) = 0
read(10, 0xb93489e0, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 215) = 215
write(8, "vhost_combined\n", 15)        = 15

In this case, we see something that shouldn’t really happen. Even though our mod_rewrite has rewritten the url, apache is still checking to see if api.php and api.php/.htaccess exist:
stat64("/var/www/uc/test/api.php", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0

open("/var/www/uc/test/api.php/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

So, even with the mod_rewrite rule passing the file over to another machine, apache is still testing the existence of the file and a directory named api.php containing the file .htaccess. The latter check being the one that we’re going to fix.
accept(4, {sa_family=AF_INET6, sin6_port=htons(50516), inet_pton(AF_INET6, "2001:470:5:590::cd34", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 10
fcntl64(10, F_GETFD)                    = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
getsockname(10, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2604:3500::c:21", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl64(10, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1305565791, 419574}, NULL) = 0
gettimeofday({1305565791, 419798}, NULL) = 0
read(10, "GET /test/api.php HTTP/1.1\r\nHost"..., 8000) = 409
gettimeofday({1305565791, 420459}, NULL) = 0
gettimeofday({1305565791, 420687}, NULL) = 0
gettimeofday({1305565791, 420905}, NULL) = 0
gettimeofday({1305565791, 421126}, NULL) = 0
gettimeofday({1305565791, 421319}, NULL) = 0
gettimeofday({1305565791, 421603}, NULL) = 0
gettimeofday({1305565791, 421891}, NULL) = 0
gettimeofday({1305565791, 422112}, NULL) = 0
gettimeofday({1305565791, 422360}, NULL) = 0
gettimeofday({1305565791, 422585}, NULL) = 0
gettimeofday({1305565791, 422809}, NULL) = 0
gettimeofday({1305565791, 423063}, NULL) = 0
gettimeofday({1305565791, 423313}, NULL) = 0
gettimeofday({1305565791, 423567}, NULL) = 0
gettimeofday({1305565791, 423818}, NULL) = 0
gettimeofday({1305565791, 424071}, NULL) = 0
gettimeofday({1305565791, 424297}, NULL) = 0
stat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
lstat64("/var", {st_mode=S_IFDIR|S_ISGID|0755, st_size=148, ...}) = 0
lstat64("/var/www", {st_mode=S_IFDIR|S_ISGID|0711, st_size=78, ...}) = 0
open("/var/www/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
lstat64("/var/www/uc", {st_mode=S_IFDIR|S_ISGID|0755, st_size=4096, ...}) = 0
open("/var/www/uc/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
read(11, "ErrorDocument 404 /index.html\n", 4096) = 30
read(11, "", 4096)                      = 0
close(11)                               = 0
lstat64("/var/www/uc/test", {st_mode=S_IFDIR|S_ISGID|0755, st_size=48, ...}) = 0
open("/var/www/uc/test/.htaccess", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
fcntl64(11, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
fstat64(11, {st_mode=S_IFREG|0644, st_size=72, ...}) = 0
read(11, "RewriteEngine on\nRewriteRule ^ap"..., 4096) = 72
read(11, "", 4096)                      = 0
close(11)                               = 0
lstat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
writev(10, [{"HTTP/1.1 301 Moved Permanently\r\n"..., 303}, {"\37\213\10\0\0\0\0\0\0\3", 10}, {"mP\301N\3030\f\275\367+LOpX\334\26\t!\224E\32k\21\2236\250D9p\364\32\263"..., 236}, {"\314\226,\242>\1\0\0", 8}], 4) = 557
gettimeofday({1305565791, 435764}, NULL) = 0
gettimeofday({1305565791, 435986}, NULL) = 0
read(10, 0xb934a9e8, 8000)              = -1 EAGAIN (Resource temporarily unavailable)
write(7, "2001:470:5:590::cd34 - - [16/May"..., 215) = 215
write(8, "vhost_combined\n", 15)        = 15

So, in this case we’re left with:
stat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)
and
lstat64("/var/www/uc/test/api.php", 0xbf8e9bfc) = -1 ENOENT (No such file or directory)

And we’re not trying to open /var/www/uc/test/api.php/.htaccess, so, we’ve made the process a little smoother.
Briefly, when you use mod_rewrite to redirect traffic from an existing file, move the file out of the way to save extra lookups.
Additionally, you can move your mod_rewrite into your config file, and set AllowOverride none in your config which will prevent Apache from looking for .htaccess files in each of your directories. If you have a lot of static content being accessed, this will help considerably.

Posted in Scalability | No Comments »

« Older Entries

Newer Entries »

Random Musings of an Insane Mind

This is my blog, there are many others like it but this one is mine.

Updated WordPress VCL – still not complete, but, closer

Google+, Python, and mechanize

When to Cache, What to Cache, How to Cache

A brief introduction

But my site is dynamic!

But my site has user personalization!

The Presentation begins here

Primary Goal

Metrics We Use

Why Cache?

Expecting Traffic

Receiving Traffic

Fighting DDOS

Expecting Traffic

Receiving Traffic

DDOS Handling

Caching Easy, Purging Hard

Page Caching

Fragment Caching

Cache Methods

OS Buffering/Caching

MySQL Query Cache

Redis/Memcached

Beaker/Functional Caching

Edge/Front End Caching

Content Delivery Network (CDN)

Bigpipe

Google’s Answer

What else can we do?

WordPress, Varnish and ESI Plugin

What problem are we solving?

What are our goals?

How does the WordPress front page work?

How does an Article page work?

What Options do we Have?

Page Caching

Fragment Caching

WordPress and Varnish

WordPress and Varnish with ESI

Possible Solutions

Technical Problems

What sort of Improvement?

WordPress Plugin

Varnish VCL – vcl_recv

Varnish VCL – vcl_fetch

Did Things Break?

End Result

Other Improvements

How can I get the code?

Apache mod_rewrite Performance issue discussion and fix

What happens in a request for that url

Home

Pages

Categories

Tags

Links