Posts Tagged ‘Apache’

Varnish saves the day…. maybe

Tuesday, April 28th, 2009

We had a client that had a machine where apache was being overrun… or so we thought.  Everything pointed at this one set of domains owned by a client and in particular two sites with 100+ elements on the page.  Images, css, javascript and iframes composed their main page.  Apache was handling things reasonably well, but, it was immediately obvious that it could be better.

The conversion to Varnish was quite simple to do even on a live server.  Slight modifications to the Apache config file to listen to port 81 on the set of domains in question, and a quick restart.  Varnish was configured to listen to port 80 on that particular IP and some minor modifications were made to the startup.vcl file to modify things slightly:

sub vcl_fetch {
  if (req.url ~ “\.(png|gif|jpg|swf|css|js)$”) {
    set obj.ttl = 3600s;
  }
}

A one hour cache should be granular enough to do a bit more good on these sites, overriding the default of two minutes.  After an hour, it was evident that the sites did peform much more quickly, but, we still had a load issue.  Some modifications of the apache config alleviated some of the other load problems after we dug further into things.

After 5 hours, we ended up with the following statistics from varnish:

0+05:18:24                                                               xxxxxx
Hitrate ratio:       10      100     1000
Hitrate avg:     0.9368   0.9231   0.9156

62576         1.00         3.28 Client connections accepted
466684        57.88        24.43 Client requests received
411765        48.90        21.55 Cache hits
148         0.00         0.01 Cache hits for pass
32018         7.98         1.68 Cache misses
54761         8.98         2.87 Backend connections success
0         0.00         0.00 Backend connections failures
45411         7.98         2.38 Backend connections reuses
48598         7.98         2.54 Backend connections recycles

Varnish is doing a great job.  The site does load considerably faster, but, it didn’t solve the entire problem.  It did reduce the number of apache processes on that machine from 450 to 170 or so, freed up some ram for cache, and did make the server more responsive, but, it probably only contributed to 50% of the issue.  The rest of it was cleaning up some poorly written php code, modifying a few mysql tables and adding some indexes to make things work more quickly.

After we fixed the code problems, we debated removing Varnish from their configuration.  Varnish did buy us time to fix the problem and does result in a better experience for surfers on the sites, but, after the backend changes, it is hard to tell whether it makes enough impact to keep a non-standard configuration running.  Since it is not caching the main page of the site and is only serving the static elements (the site sets an expire time on each generated page), the only real benefit is that we are removing the need for apache to serve the static elements.

While testing another application, we were able to override hardcoded expire times and forcing a minimally cached page.  Even if we cached a generated page for two minutes, it could be the difference between a responsive server and a machine struggling to keep up.  Since WordPress, Joomla, Drupal and others set expire times using dates that have passed, they ensure that the site html being output is not cached.  Varnish allows us to ignore that, and to set our own cache time which could save a site hit with a lot of traffic.

sub vcl_fetch {
  if (obj.ttl < 120s) {
    set obj.ttl = 120s;
  }
}

would give us a minimum two minute cache which would cut the requests to a dynamically generated page considerably.

It is a juggling act.  Where do you make the tradeoff and what do you accelerate? Too many times the solution to a website’s performance problem is to throw more hardware at it.  At some point you have to split the load on multiple servers, adding new bottlenecks.  An application designed to run on a single machine becomes difficult to split to two or more machines, so, many times we do what we can to keep things running on a single machine.

Nginx after one day and conversion of two more machines

Wednesday, April 8th, 2009

Nginx impressed me with the way it was written and its performance has impressed me as well.

This one client has 3 machines that ran Apache2-mpm-worker with php5 running under fastcgi.  While page response time was good, the machines constantly ran at roughly 15% idle cpu time, with roughly 600mb-700mb of the ram used for cache.  All of the machines are quadcore with 4gb RAM and have been running for quite a while and have been tweaked and tuned along the way.

We started with the conversion of one site on one machine which resulted in the client being so impressed that we converted a second site on that machine which resulted in about 80mb/sec being served from nginx within minutes of deployment.  The next morning after we glanced over everything and confirmed that nginx was holding up, we converted the rest of that machine over to Nginx.  Traffic grew almost 20% after that change.

We started looking at the other machines, one of which runs phpadsnew on a relatively large network of his sites and the banners that are served from two of the main sites on one machine.  Converting those two over to nginx meant another 50mb/sec of traffic swapped from Apache.  Immediately he saw results with faster pageloads of his sites that pulled content from a central domain and with the banner ads being displayed more quickly.  After a few moments of analysis, it was decided to swap the entire machine from Apache2 to Nginx.  That process took a few hours due to the number of virtual hosts and the lack of any real script to migrate the configurations.  Response time on the sites was definitely faster.  After a little more discussion, rather than give that machine a day to settle in to see if we would find any problems, we converted his third machine.

First response in the morning:

yesterday we sent 69.1k unique surfers to sponsors, that is the highest we have ever done.

While only one of three machines was running Nginx for the entire day, one machine had about 8 hours under Nginx and the other about 2 hours under Nginx for that ‘day.’

Today, the results are somewhat clear.  Traffic is up overall, the machines are much more responsive.  Each machine is now roughly 80% idle and has roughly 2.4gb of memory reserved for cache.

75

76

861

Backups are scheduled at 3am on the boxes, a few rsync jobs are run to keep some content directories synced between the machines.  Overall you can see the impact on the first graph as the right hand side shows a bit more growth.  The last graph was running nginx, but, struggled to push more than 85mb/sec or so.  The middle graph shows a decline, but, they believe that is external to the process.  The sites are loading more quickly and they expect that the sites will grow quite a bit.  So far, they are reporting roughly an 18% increase in clicks to their sponsor.

Varnish and Apache2

Tuesday, April 7th, 2009

One client had some issues with Apache2 and a WordPress site. While WordPress isn’t really a great performer, this client had multiple domains on the same IP and dropping Nginx in didn’t seem like it would make sense to solve the immediate problem.

First things first, we evaluated where the issue was with WordPress and installed db-cache and wp-cache-2. We had tried wp-super-cache but had seen some issues with it in some configurations. Immediately the pageload time dropped from 41 seconds to 11 seconds. Since the machine was running on a quadcore with 4gb ram and was running mostly idle, the only thing left was the 91 page elements being served. Each pageload, even with pipelining still seemed to cause some stress. Two external javascripts and one external flash object caused some delay in rendering the page. The javascripts were actually responsible for holding up the page rendering which made the site seem even slower than it was. We made some minor modifications, but, while apache2 was configured to serve things as best it could, we felt there was still some room for improvement.

While I had tested Varnish in front of Apache2, I knew it would make an impact in this situation due to the number of elements on the page and the fact that apache did a lot of work to serve each request. Varnish and its VCL eliminated a lot of the overhead Apache had and should result in the capacity for roughly 70% better performance. For this installation, we removed the one IP that was in use by the problem domain from Apache and used that for Varnish and ran Varnish on that IP, using 127.0.0.1 port 80 as the backend.

Converting a site that is in production and live is not for the fainthearted, but, here are a few notes.

For Apache you’ll want to add a line like this to make sure your logs show the remote IP rather than the IP of the Varnish server:

LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-A
gent}i\"" varnishcombined

Modify each of the VirtualHost configs to say:

<VirtualHost 127.0.0.1:80>

and change the line for the logfile to say:

CustomLog /var/log/apache2/domain.com-access.log varnishcombined

Add Listen Directives to prevent Apache from listening to port 80 on the IP address that you want varnish to answer and comment out the default Listen 80:

#Listen 80
Listen 127.0.0.1:80
Listen 66.55.44.33:80

Configuration changes for Varnish:

backend default {
.host = "127.0.0.1";
.port = "80";
}

sub vcl_recv {
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|mp3|mp4|m4a|ogg|mov|avi|wmv)$") {
      lookup;
  }

  if (req.url ~ "\.(css|js)$") {
      lookup;
  }
}
sub vcl_fetch {
        if( req.request != "POST" )
        {
                unset obj.http.set-cookie;
        }

        set obj.ttl = 600s;
        set obj.prefetch =  -30s;
        deliver;
}

Shut down Apache, Restart Apache, Start Varnish.

tail -f the logfile for Apache for one of the domains that you have moved. Go to the site. Varnish will load everything the first time, but, successive reloads shouldn’t show requests for images, javascript, css. For this client we opted to hold things in cache for 10 minutes (600 seconds).

Overall, the process was rather seamless. Unlike converting a site to Nginx, we are not required to make changes to the rewrite config or worry about setting up a fastcgi server to answer .php requests. Overall, varnish is quite seamless to the end product. Clients will lose the ability to do some things like deny hotlinking, but, Varnish will run almost invisibly to the client. Short of the page loading considerably quicker, the client was not aware we had made any server changes and that is the true measure of success.

First Impressions of Nginx

Monday, April 6th, 2009

When I did the testing last week, I didn’t expect overly dramatic results. Yes, replacing apache and moving to a FastCGI/PHP installation did seem to make sense and nginx definitely is designed to handle things well.

The conversion of one virtualhost on that machine resulted in a few minor hitches. Rewrite rules are a little different and while our conversion of those rules mostly worked, a few minor differences in the syntax cropped up and needed slight adjustments.

RewriteRule ^external/([0-9]+)/? external.php?vid=$1 [L]

changes to

rewrite "^/external/([0-9]+)/?" /external.php?vid=$1 last;

The leading / is now required in both places but the rule converts over fairly nicely.

Performance tuning is tricky at best since there aren’t too many documents that explain the different config arguments, and, very few that explain how to diagnose and tune. Most is done through trial and error watching the processes, watching logs, seeing the system react and making adjustments.

Virtual Host configuration was a challenge at first as the documentation assumes that the machine will just listen on port 80. When the machine shares the IPs with Apache which is also answering on port 80 and you’re just moving a few things over, you need to make some minor changes.

server {
        listen 66.55.44.33:80 default;
        server_name  _;

        access_log  /var/log/nginx/access.log;

        location / {
                root   /var/www/uc;
                index  index.html;
        }
}

server {
        listen 66.55.44.33:80;
        server_name  www.domain.com domain.com;

made virtual hosting work when you are only able to listen to a few IPs.

Overall, I am reasonably impressed with Nginx. The machine we upgraded was pushing about 65mb/sec, with a load of 15 and roughly 15% idle CPU. After moving 2 domains over to Nginx, the machine almost instantly climbed to 80mb/sec with a load of 2 and roughly 85% idle. System Cache went from 880mb to 2.7gb and the number of Apache processes dropped from 350 to 40. The machine is incredibly responsive now and the pages load almost instantly.

I’ll continue to monitor it, but, at this point it looks like a userland process will challenge Tux’s performance.

Apache, Varnish, nginx and lighttpd

Wednesday, April 1st, 2009

I’ve never been happy with Apache’s performance.  It seemed that it always had problems with high volume sites.  Even extremely tweaked configurations resulted in decent performance to a point which then required more hardware to continue going.  While I had been a huge fan of Tux, sadly, Tux doesn’t work with Linux 2.6 kernels very well.

So, the search was on.  I’ve used many webservers over the years ranging from AOLServer to Paster to Caudium looking for a robust, high-performance solution.  I’ve debated caching servers in front of Apache, a server to handle just static files and coding the web sites to utilize that, but, I never really found the ultimate solution to handle particular requirements.

This current problem is a php driven site with roughly 100 page elements plus the generated page itself.  The site receives quite a bit of traffic and we’ve had to tweak Apache quite a bit from our default configuration to keep the machine performing well.

Apache can be run many different ways.  Generally when a site uses php, we’ll run mod_php because it is faster.  Eaccelerator can help sometimes — though, does create a few small problems, but, in general, Apache-mpm-prefork runs quite well.  On sites where we’ve had issues with traffic, we’ve switched over to Apache-mpm-worker with a fastcgi php process.  This works quite well even though php scripts are slightly slower.

After considerable testing, I came up with three decent metrics that I used to judge things.  Almost all testing was done with ab (apachebench) running 10000 connections with keepalives and 50 concurrent sessions from a dual quad-core xeon machine to a gigE connected machine on the same switch running a core2quad machine.  On the first IP was bare apache, the second IP had lighttpd, the third IP ran nginx and the fourth IP ran Varnish in front of Apache.  Everything was set up so that no restarts of daemons would need to be made, the tests were run twice with the second result generally being the higher of the two which was used.  The linux kernel does some caching and we’re after the performance after the kernel has done its caching, apache has forked its processes and hasn’t killed off the children, etc.

First impressions from Apache-mpm-prefork were that it handled php exceedingly well, but, has never had great performance with static files.  This is why Tux prevailed for us as Apache handled what it did best and Tux handled what it did best.  Regrettably, Tux didn’t keep up with the 2.6 kernel and development ceased.  With new hardware, the 2.6 kernel and the ability for userland processes to get access to sendfile, large file transfer should be almost the same for all of the processes so, startup latency of the tiny files was what really seemed to harm Apache.  Apache-mpm-worker with php running as fastcgi has always been a fallback for us to gain a little more serving capacity as most sites have a relatively heavy static file to dynamic file construction.

But, Apache seemed to have problems with the type of traffic our clients are putting through and we felt that there had to be a better way.  I’ve read page after page of people complaining about their Drupal installation being able to take 50 users and then they upgraded to nginx or lighttpd and now their site doesn’t run into swap issues.  If your server is having problems with 50 simultaneous users with apache, you have serious problems with your setup.  It is not uncommon for us to push a P4/3.0ghz with 2gb ram with 80mb/sec traffic and MySQL running 1000 queries per second.  Where your apache logfile reaches 6gb/day for a domain not including the other 30 domains configured on the machine.  VBulletin will easily run 350 online users and 250 guests on the same hardware without any difficulties.  The same with Joomla, Drupal and the other CMS products out there.  If you can’t run 50 simultaneous users, with any of those products, dig into the configs FIRST so that you are comparing a tuned configuration to a tuned configuration.

Uptime: 593254  Threads: 571  Questions: 609585858  Slow queries: 1680967  Opens: 27182  Flush tables: 1  Open tables: 2337  Queries per second avg: 1027.529

86

Based on all of my reading, I expected Varnish -> Apache2 to be the fastest followed by nginx, lighttpd and bare Apache.  Lighttpd has some interesting design issues that I believed would put it behind nginx, I really expected Varnish would do really well.  For this client, we needed the FLV streaming so, I knew I would be running nginx or lighttpd for a backend for the .flv files and contemplated running Varnish in front of whichever of those performed best.  Splitting things so that the .flv files were served from a different domain was no problem for this client, so, we weren’t having to put a solution in place where we couldn’t make changes.

The testing methodology was based on numerous runs of ab where I tested and tweaked each setup.  I am reasonably sure that someone with vast knowledge of Varnish, nginx or lighttpd would not be able to substantially change the results.  Picking out the three or four valid pieces of information from all of the testing to give me a generalized result was difficult.

The first thing I was concerned with was the raw speed on a small 6.3kb file.  With keepalives enabled, that was a good starting point.  The second test was to run a page that called phpinfo();.  Not an exceedingly difficult test, it does at least start the php engine, process a page and return the result.  The third test was to download a 21mb flv file.  All of the tests were run with 10000 iterations and 50 concurrent threads except the 21mb flv file which ran 100 iterations and 10 concurrent threads due to the time it took.

Server Small File Requests Per Second phpinfo() Requests Per Second .flv MB/Sec Min/Max time to serve .flv Time to run ab for .flv test
Apache-mpm-prefork 1000 164 11.5MB/sec 10-26 seconds 182 seconds
Apache-mpm-worker 1042 132 11.5MB 11-25 seconds 181 seconds
Lighttpd 1333 181 11.4MB 13-23 seconds 190 seconds
nginx 1800 195 11.5MB 14-24 seconds 187 seconds
Varnish 1701 198 11.3MB 18-30 seconds 188 seconds

Granted, I expected more from Varnish and it’s caching nature does shine through.  It is considerably more powerful than nginx due to some of the internal features it has for load balancing, multiple backends, etc.  However, based on the results above, I have to believe that in this case, nginx wins.

There are a number of things about the nginx documentation that were confusing.  First was that they used inet rather than a local socket for communication with the php-cgi process.  That alone bumped up php almost 30 transactions per second.  The documentation for nginx is sometimes very terse and it required a bit more time to get configured correctly.  While I do have both php and perl cgi working with nginx natively, some perl cgi scripts do have minor issues which I’m still working out.

Lighttpd performed about as well as I expected.  Due to some backend design issues, there are some things that made me believe it wouldn’t be the top performer.  It is also older and more mature than Nginx and Varnish which use today’s tricks to accomplish their magic.  File transfer speed is going to be somewhat capped because the Linux kernel opens up some APIs that allow a userspace application to ask the kernel to handle the transfer.  Every application tested takes advantage of this.

Given the choice of Varnish or Nginx for a project that didn’t require .flv streaming, I might consider Varnish.  Lighttpd did have one very interesting module that prevented hotlinking of files in a much different manner than normal — I’ll be testing that for another application. If you are used to Apache mod_rewrite rules, Nginx and Lighttpd have a completely different structure for these.  They work in almost the same manner with some minor syntax changes.  Varnish runs as a cache to the frontend of your site, so, everything works with it the same way it does under Apache since Varnish merely connects to your Apache backend and caches what it can.  Its configuration language allows considerable control over the process.

Short of a few minor configuration tweaks, this particular client will be getting nginx.

Overall, I don’t believe you can take an agnostic approach to webservers.  Every client’s requirements are different and they don’t all fit into the same category.  If you run your own web server, you can make choices to make sure your site runs as well as it can.  From the number of pages showing stellar performance gains from switching from Apache to something else, if most of those writers spent the same time debugging their apache installation as they did migrating to a new web server, I would imagine 90% of them would find Apache meets their needs just fine.

The default out of the box configuration of MySQL and Apache in most Linux distributions leaves a lot to be desired.  To compare those configurations with a more sane default supplied by the software developers of competing products doesn’t really give a good comparison.  I use Debian, and their default configurations for Apache, MySQL and a number of other applications are terrible for any sort of production use.  Even Redhat has some fairly poor default configurations for many of the applications you would use to serve your website.  Do yourself a favor and do a little performance tuning with your current setup before you start making changes.  You might find the time invested well worth it.

Entries (RSS) and Comments (RSS).
Cluster host: li