<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Random Musings of an Insane Mind &#187; Scalability</title>
	<atom:link href="http://cd34.com/blog/category/scalability/feed/" rel="self" type="application/rss+xml" />
	<link>http://cd34.com/blog</link>
	<description>This is my blog, there are many others like it but this one is mine.</description>
	<lastBuildDate>Tue, 29 Jun 2010 04:22:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>WordPress Cache Plugin Benchmarks</title>
		<link>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/</link>
		<comments>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 15:55:03 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=900</guid>
		<description><![CDATA[A lot of time and effort goes into keeping a WordPress site alive when it starts to accumulate traffic. While not every site has the same goals, keeping a site responsive and online is the number one priority. When a surfer requests the page, it should load quickly and be responsive. Each addon handles caching [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of time and effort goes into keeping a WordPress site alive when it starts to accumulate traffic.  While not every site has the same goals, keeping a site responsive and online is the number one priority.  When a surfer requests the page, it should load quickly and be responsive.  Each addon handles caching a little differently and should be used in different cases.</p>
<p>For many sites, page caching will provide decent performance.  Once your sites starts receiving comments, or people log in, many cache solutions cache too heavily or not enough.  As many solutions as there are, it is obvious that WordPress underperforms in higher traffic situations.</p>
<p>The list of caching addons that we&#8217;re testing:</p>
<p>* <a href="http://wordpress.org/extend/plugins/db-cache/">DB Cache</a> (version 0.6)<br />
* <a href="http://wordpress.org/extend/plugins/db-cache-reloaded/">DB Cache Reloaded</a> (version 2.0.2)<br />
* <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3 Total Cache</a> (version 0.8.5.1)<br />
* <a href="http://wordpress.org/extend/plugins/wp-cache/">WP Cache</a> (version 2.1.2)<br />
* <a href="http://wordpress.org/extend/plugins/wp-super-cache/">WP Super Cache</a> (version 0.9.9)<br />
* <a href="http://wordpress.org/extend/plugins/wp-widget-cache/">WP Widget Cache</a> (version 0.25.2)<br />
* <a href="http://wordpress.org/extend/plugins/wp-file-cache/">WP File Cache</a>(version 1.2.5)<br />
* <a href="http://github.com/pkhamre/wp-varnish">WP Varnish</a> (in beta)<br />
* <a href="http://cd34.com/blog/scalability/wordpress-varnish-and-edge-side-includes/">WP Varnish ESI Widget</a> (in beta)</p>
<h2>What are we testing?</h2>
<p>* Frontpage hits<br />
* httpload through a series of urls</p>
<p>We take two measurements.  The cold start measurement is taken after any plugin cache has been cleared and Apache2 and MySQL have been restarted.  A 30 second pause is inserted prior to starting the tests.  We perform a frontpage hit 1000 times with 10 parallel connections.  We then repeat that test after Apache2 and the caching solution have had time to cache that page.  Afterwards, http_load requests a series of 30 URLs to simulate people surfing other pages.  Between those two measurements, we should have a pretty good indicator of how well a site is going to  perform in real life. </p>
<h2>What does the Test Environment look like?</h2>
<p>* Debian 3.1/Squeeze VPS<br />
* Linux Kernel 2.6.33<br />
* Single core of a Xen Virtualized Xeon X3220 (2.40ghz)<br />
* 2gb RAM<br />
* CoW file is written on a Raid-10 System using 4x1tb 7200RPM Drives<br />
* Apache 2.2.14 mpm-prefork<br />
* PHP 5.3.1<br />
* <a href="http://svn.automattic.com/wpcom-themes/test-data.2008-12-22.xml">WordPress Theme Test Data</a><br />
* Tests are performed from a Quadcore Xeon machine connected via 1000 Base T on the same switch and /24 as the VPS machine</p>
<p>This setup is designed to replicate what most people might choose to host a reasonably popular wordpress site.</p>
<h2><a name="tldr">tl;dr Results</a></h2>
<p>If you aren&#8217;t using Varnish in front of your web site, the clear winner is W3 Total Cache using Page Caching &#8211; Disk (Enhanced), Minify Caching &#8211; Alternative PHP Cache (APC), Database Caching &#8211; Alternative PHP Cache (APC).  </p>
<p>If you can use Varnish, WP Varnish would be a very simple way to gain quite a bit of performance while maintaining interactivity.  WP Varnish purges the cache when posts are made, allowing the site to be more dynamic and not suffer from the long cache delay before a page is updated.</p>
<p>W3 Total Cache has a number of options and sometimes settings can be quite detrimental to site performance.  If you can&#8217;t use APC caching or Memcached for caching Database queries or Minification, turn both off.  W3 Total Cache&#8217;s interface is overwhelming but the plugin author has indicated that he&#8217;ll be making a new &#8216;Wizard&#8217; configuration menu in the next version along with Fragment Caching.</p>
<p>WP Super Cache isn&#8217;t too far behind and is also a reasonable alternative.</p>
<p>Either way, if you want your site to survive, you need to use a cache addon.  Going from 2.5 requests per second to 800+ requests per second makes a considerable difference in the usability of your site for visitors.  Logged in users and search engine bots still see uncached/live results, so, you don&#8217;t need to worry that your site won&#8217;t be indexed properly.</p>
<h2>Results</h2>
<style>
.tborder { border:1px solid #000; }
.th { background-color: #ccc; }
.teven { background-color: #ddd; }
.tdnum { text-align: right; }
.trecommend { background-color: #cfc; }
.thonmen { background-color: #ffc; }
.tsmall { font-size: 8pt; }
</style>
<p>Sorted in Ascending order in terms of higher overall performance</p>
<table class="tborder">
<tr class="th">
<td>Addon</td>
<td>Apachebench</td>
<td colspan="2">Cold Start<br />Warm Start</td>
<td>http_load</td>
<td colspan="2">Cold Start<br />Warm Start</td>
</tr>
<tr class="th">
<td></td>
<td>Req/Second</td>
<td>Time/Request</td>
<td>50% within x ms</td>
<td>Fetches/Second</td>
<td>Min First Response</td>
<td>Avg First Response</td>
</tr>
<tr>
<td>Baseline</td>
<td class="tdnum">4.97</td>
<td class="tdnum">201.006</td>
<td class="tdnum">2004</td>
<td class="tdnum">15.1021</td>
<td class="tdnum">335.708</td>
<td class="tdnum">583.363</td>
</tr>
<tr>
<td class="tdnum"></td>
<td class="tdnum">5.00</td>
<td class="tdnum">200.089</td>
<td class="tdnum">2000</td>
<td class="tdnum">15.1712</td>
<td class="tdnum">304.446</td>
<td class="tdnum">583.684</td>
</tr>
<tr class="teven">
<td>DB Cache</td>
<td class="tdnum">4.80</td>
<td class="tdnum">208.436</td>
<td class="tdnum">2087</td>
<td class="tdnum">15.1021</td>
<td class="tdnum">335.708</td>
<td class="tdnum">583.363</td>
</tr>
<tr class="teven">
<td class="tsmall">Cached all SQL queries</td>
<td class="tdnum">4.81</td>
<td class="tdnum">207.776</td>
<td class="tdnum">2091</td>
<td class="tdnum">15.1712</td>
<td class="tdnum">304.446</td>
<td class="tdnum">583.684</td>
</tr>
<tr>
<td>DB Cache</td>
<td class="tdnum">4.87</td>
<td class="tdnum">205.250</td>
<td class="tdnum">2035</td>
<td class="tdnum">14.1992</td>
<td class="tdnum">302.335</td>
<td class="tdnum">621.092</td>
</tr>
<tr>
<td class="tsmall">Out of Box config</td>
<td class="tdnum">4.94</td>
<td class="tdnum">202.624</td>
<td class="tdnum">2026</td>
<td class="tdnum">14.432</td>
<td class="tdnum">114.983</td>
<td class="tdnum">618.434</td>
</tr>
<tr class="teven">
<td>WP File Cache</td>
<td class="tdnum">4.95</td>
<td class="tdnum">201.890</td>
<td class="tdnum">2009</td>
<td class="tdnum">15.8869</td>
<td class="tdnum">158.597</td>
<td class="tdnum">549.176</td>
</tr>
<tr class="teven">
<td></td>
<td class="tdnum">4.99</td>
<td class="tdnum">200.211</td>
<td class="tdnum">2004</td>
<td class="tdnum">16.1758</td>
<td class="tdnum">99.728</td>
<td class="tdnum">544.107</td>
</tr>
<tr>
<td>DB Cache Reloaded</td>
<td class="tdnum">5.02</td>
<td class="tdnum">199.387</td>
<td class="tdnum">1983</td>
<td class="tdnum">15.0167</td>
<td class="tdnum">187.343</td>
<td class="tdnum">589.196</td>
</tr>
<tr>
<td class="tsmall">All SQL Queries Cached</td>
<td class="tdnum">5.03</td>
<td class="tdnum">200.089</td>
<td class="tdnum">1985</td>
<td class="tdnum">14.9233</td>
<td class="tdnum">150.145</td>
<td class="tdnum">586.443</td>
</tr>
<tr class="teven">
<td>DB Cache Reloaded</td>
<td class="tdnum">5.06</td>
<td class="tdnum">197.636</td>
<td class="tdnum">1968</td>
<td class="tdnum">14.9697</td>
<td class="tdnum">174.857</td>
<td class="tdnum">589.161</td>
</tr>
<tr class="teven">
<td class="tsmall">Out of Box config</td>
<td class="tdnum">5.08</td>
<td class="tdnum">196.980</td>
<td class="tdnum">1968</td>
<td class="tdnum">15.181</td>
<td class="tdnum">257.533</td>
<td class="tdnum">587.737</td>
</tr>
<tr>
<td>Widgetcache</td>
<td class="tdnum">6.667</td>
<td class="tdnum">149.903</td>
<td class="tdnum">1492</td>
<td class="tdnum">15.0264</td>
<td class="tdnum">245.332</td>
<td class="tdnum">602.039</td>
</tr>
<tr>
<td class="tdnum"></td>
<td class="tdnum">6.72</td>
<td class="tdnum">148.734</td>
<td class="tdnum">1487</td>
<td class="tdnum">15.1887</td>
<td class="tdnum">299.65</td>
<td class="tdnum">598.017</td>
</tr>
<tr class="teven">
<td>W3 Total Cache</td>
<td class="tdnum">153.45</td>
<td class="tdnum">65.167</td>
<td class="tdnum">60</td>
<td class="tdnum">133.1898</td>
<td class="tdnum">8.916</td>
<td class="tdnum">85.7177</td>
</tr>
<tr class="teven">
<td class="tsmall">DB Cache off, Page Caching with Memcached</td>
<td class="tdnum">169.46</td>
<td class="tdnum">59.011</td>
<td class="tdnum">57</td>
<td class="tdnum">188.4</td>
<td class="tdnum">9.107</td>
<td class="tdnum">50.142</td>
</tr>
<tr>
<td>W3 Total Cache</td>
<td class="tdnum">173.49</td>
<td class="tdnum">57.639</td>
<td class="tdnum">52</td>
<td class="tdnum">108.898</td>
<td class="tdnum">7.668</td>
<td class="tdnum">86.4077</td>
</tr>
<tr>
<td class="tsmall">DB Cache off, Minify Cache with Memcached</td>
<td class="tdnum">189.76</td>
<td class="tdnum">52.698</td>
<td class="tdnum">48</td>
<td class="tdnum">203.522</td>
<td class="tdnum">8.122</td>
<td class="tdnum">43.8795</td>
</tr>
<tr class="teven">
<td>W3 Total Cache</td>
<td class="tdnum">171.34</td>
<td class="tdnum">58.364</td>
<td class="tdnum">50</td>
<td class="tdnum">203.718</td>
<td class="tdnum">8.097</td>
<td class="tdnum">44.1234</td>
</tr>
<tr class="teven">
<td class="tsmall">DB Cache using Memcached</td>
<td class="tdnum">190.01</td>
<td class="tdnum">52.269</td>
<td class="tdnum">48</td>
<td class="tdnum">206.187</td>
<td class="tdnum">8.186</td>
<td class="tdnum">42.4438</td>
</tr>
<tr>
<td>W3 Total Cache</td>
<td class="tdnum">175.29</td>
<td class="tdnum">57.048</td>
<td class="tdnum">48</td>
<td class="tdnum">87.423</td>
<td class="tdnum">7.515</td>
<td class="tdnum">107.973</td>
</tr>
<tr>
<td class="tsmall">Out of Box config</td>
<td class="tdnum">191.15</td>
<td class="tdnum">52.314</td>
<td class="tdnum">47</td>
<td class="tdnum">204.387</td>
<td class="tdnum">8.288</td>
<td class="tdnum">43.217</td>
</tr>
<tr class="teven">
<td>W3 Total Cache</td>
<td class="tdnum">175.29</td>
<td class="tdnum">57.047</td>
<td class="tdnum">51</td>
<td class="tdnum">204.557</td>
<td class="tdnum">8.199</td>
<td class="tdnum">42.9365</td>
</tr>
<tr class="teven">
<td class="tsmall">Database Cache using APC</td>
<td class="tdnum">191.19</td>
<td class="tdnum">52.304</td>
<td class="tdnum">48</td>
<td class="tdnum">200.612</td>
<td class="tdnum">8.11</td>
<td class="tdnum">44.6691</td>
</tr>
<tr>
<td>W3 Total Cache</td>
<td class="tdnum">114.02</td>
<td class="tdnum">87.703</td>
<td class="tdnum">49</td>
<td class="tdnum">114.393</td>
<td class="tdnum">8.206</td>
<td class="tdnum">82.0678</td>
</tr>
<tr>
<td class="tsmall">Database Cache Disabled</td>
<td class="tdnum">191.76</td>
<td class="tdnum">52.150</td>
<td class="tdnum">49</td>
<td class="tdnum">203.781</td>
<td class="tdnum">8.095</td>
<td class="tdnum">42.558</td>
</tr>
<tr class="teven">
<td>W3 Total Cache</td>
<td class="tdnum">175.80</td>
<td class="tdnum">56.884</td>
<td class="tdnum">51</td>
<td class="tdnum">107.842</td>
<td class="tdnum">7.281</td>
<td class="tdnum">87.2761</td>
</tr>
<tr class="teven">
<td class="tsmall">Database Cache Disabled, Minify Cache using APC</td>
<td class="tdnum">192.01</td>
<td class="tdnum">52.082</td>
<td class="tdnum">50</td>
<td class="tdnum">205.66</td>
<td class="tdnum">8.244</td>
<td class="tdnum">43.1231</td>
</tr>
<tr>
<td>W3 Total Cache</td>
<td class="tdnum">104.90</td>
<td class="tdnum">95.325</td>
<td class="tdnum">51</td>
<td class="tdnum">123.041</td>
<td class="tdnum">7.868</td>
<td class="tdnum">74.5887</td>
</tr>
<tr>
<td class="tsmall">Database Cache Disabled, Page Caching using APC</td>
<td class="tdnum">197.55</td>
<td class="tdnum">50.620</td>
<td class="tdnum">46</td>
<td class="tdnum">210.445</td>
<td class="tdnum">7.907</td>
<td class="tdnum">41.4102</td>
</tr>
<tr class="teven">
<td>WP Super Cache</td>
<td class="tdnum">336.88</td>
<td class="tdnum">2.968</td>
<td class="tdnum">16</td>
<td class="tdnum">15.1021</td>
<td class="tdnum">335.708</td>
<td class="tdnum">583.363</td>
</tr>
<tr class="teven">
<td class="tsmall">Out of Box config, Half On</td>
<td class="tdnum">391.59</td>
<td class="tdnum">2.554</td>
<td class="tdnum">16</td>
<td class="tdnum">15.1712</td>
<td class="tdnum">304.446</td>
<td class="tdnum">583.684</td>
</tr>
<tr>
<td>WP Cache</td>
<td class="tdnum">161.63</td>
<td class="tdnum">6.187</td>
<td class="tdnum">12</td>
<td class="tdnum">15.1021</td>
<td class="tdnum">335.708</td>
<td class="tdnum">583.363</td>
</tr>
<tr>
<td></td>
<td class="tdnum">482.29</td>
<td class="tdnum">20.735</td>
<td class="tdnum">11</td>
<td class="tdnum">15.1712</td>
<td class="tdnum">304.446</td>
<td class="tdnum">583.684</td>
</tr>
<tr class="teven">
<td>WP Super Cache</td>
<td class="tdnum">919.11</td>
<td class="tdnum">1.088</td>
<td class="tdnum">3</td>
<td class="tdnum">190.117</td>
<td class="tdnum">1.473</td>
<td class="tdnum">47.9367</td>
</tr>
<tr class="teven">
<td class="tsmall">Full on, Lockdown mode</td>
<td class="tdnum">965.69</td>
<td class="tdnum">1.036</td>
<td class="tdnum">3</td>
<td class="tdnum">975.979</td>
<td class="tdnum">1.455</td>
<td class="tdnum">9.67185</td>
</tr>
<tr class="thonmen">
<td>WP Super Cache</td>
<td class="tdnum">928.45</td>
<td class="tdnum">1.077</td>
<td class="tdnum">3</td>
<td class="tdnum">210.106</td>
<td class="tdnum">1.468</td>
<td class="tdnum">43.8167</td>
</tr>
<tr class="thonmen">
<td class="tsmall">Full on</td>
<td class="tdnum">970.45</td>
<td class="tdnum">1.030</td>
<td class="tdnum">3</td>
<td class="tdnum">969.256</td>
<td class="tdnum">1.488</td>
<td class="tdnum">9.78753</td>
</tr>
<tr class="teven">
<td>W3 Total Cache</td>
<td class="tdnum">1143.94</td>
<td class="tdnum">8.742</td>
<td class="tdnum">2</td>
<td class="tdnum">165.547</td>
<td class="tdnum">0.958</td>
<td class="tdnum">56.7702</td>
</tr>
<tr class="teven">
<td class="tsmall">Page Cache using Disk Enhanced</td>
<td class="tdnum">1222.16</td>
<td class="tdnum">8.182</td>
<td class="tdnum">3</td>
<td class="tdnum">1290.43</td>
<td class="tdnum">0.961</td>
<td class="tdnum">7.15632</td>
</tr>
<tr class="trecommend">
<td>W3 Total Cache</td>
<td class="tdnum">1153.50</td>
<td class="tdnum">8.669</td>
<td class="tdnum">3</td>
<td class="tdnum">165.725</td>
<td class="tdnum">0.916</td>
<td class="tdnum">56.5004</td>
</tr>
<tr class="trecommend">
<td class="tsmall">Page Caching &#8211; Disk Enhanced, Minify/Database using APC</td>
<td class="tdnum">1211.22</td>
<td class="tdnum">8.256</td>
<td class="tdnum">2</td>
<td class="tdnum">1305.94</td>
<td class="tdnum">0.948</td>
<td class="tdnum">6.97114</td>
</tr>
<tr class="teven">
<td>Varnish ESI</td>
<td class="tdnum">2304.18</td>
<td class="tdnum">0.434</td>
<td class="tdnum">4</td>
<td class="tdnum">349.351</td>
<td class="tdnum">0.221</td>
<td class="tdnum">28.1079</td>
</tr>
<tr class="teven">
<td></td>
<td class="tdnum">2243.33</td>
<td class="tdnum">0.44689</td>
<td class="tdnum">4</td>
<td class="tdnum">4312.78</td>
<td class="tdnum">0.152</td>
<td class="tdnum">2.09931</td>
</tr>
<tr class="trecommend">
<td>WP Varnish</td>
<td class="tdnum">1683.89</td>
<td class="tdnum">0.594</td>
<td class="tdnum">3</td>
<td class="tdnum">369.543</td>
<td class="tdnum">0.155</td>
<td class="tdnum">26.8906</td>
</tr>
<tr class="trecommend">
<td></td>
<td class="tdnum">3028.41</td>
<td class="tdnum">0.330</td>
<td class="tdnum">3</td>
<td class="tdnum">4318.48</td>
<td class="tdnum">0.148</td>
<td class="tdnum">2.15063</td>
</tr>
</table>
<h2>Test Script</h2>
<pre>
#!/bin/sh

FETCHES=1000
PARALLEL=10

/usr/sbin/apache2ctl stop
/etc/init.d/mysql restart
apache2ctl start
echo Sleeping
sleep 30
time ( \
echo First Run; \
ab -n $FETCHES -c $PARALLEL http://example.com/; \
echo Second Run; \
ab -n $FETCHES -c $PARALLEL http://example.com/; \
\
echo First Run; \
./http_load -parallel $PARALLEL -fetches $FETCHES wordpresstest; \
echo Second Run; \
./http_load -parallel $PARALLEL -fetches $FETCHES wordpresstest; \
)
</pre>
<h2>URL File for http_load</h2>
<pre>

http://example.com/

http://example.com/2010/03/hello-world/

http://example.com/2008/09/layout-test/

http://example.com/2008/04/simple-gallery-test/

http://example.com/2007/12/category-name-clash/

http://example.com/2007/12/test-with-enclosures/

http://example.com/2007/11/block-quotes/

http://example.com/2007/11/many-categories/

http://example.com/2007/11/many-tags/

http://example.com/2007/11/tags-a-and-c/

http://example.com/2007/11/tags-b-and-c/

http://example.com/2007/11/tags-a-and-b/

http://example.com/2007/11/tag-c/

http://example.com/2007/11/tag-b/

http://example.com/2007/11/tag-a/

http://example.com/2007/09/tags-a-b-c/

http://example.com/2007/09/raw-html-code/

http://example.com/2007/09/simple-markup-test/

http://example.com/2007/09/embedded-video/

http://example.com/2007/09/contributor-post-approved/

http://example.com/2007/09/one-comment/

http://example.com/2007/09/no-comments/

http://example.com/2007/09/many-trackbacks/

http://example.com/2007/09/one-trackback/

http://example.com/2007/09/comment-test/

http://example.com/2007/09/a-post-with-multiple-pages/

http://example.com/2007/09/lorem-ipsum/

http://example.com/2007/09/cat-c/

http://example.com/2007/09/cat-b/

http://example.com/2007/09/cat-a/

http://example.com/2007/09/cats-a-and-c/
</pre>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Converting to a Varnish CDN with WordPress</title>
		<link>http://cd34.com/blog/scalability/converting-to-a-varnish-cdn-with-wordpress/</link>
		<comments>http://cd34.com/blog/scalability/converting-to-a-varnish-cdn-with-wordpress/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 18:57:49 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[cdn]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[website performance]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=813</guid>
		<description><![CDATA[While working with Varnish I decided to try an experiment. I knew that Varnish could assist sites, but, it has never been easy to run Varnish on a shared virtual or clustered virtual host. VPS or Dedicated servers are no problem because you can do some configuration. However, in this case, I wanted to see [...]]]></description>
			<content:encoded><![CDATA[<p>While working with Varnish I decided to try an experiment.  I knew that Varnish could assist sites, but, it has never been easy to run Varnish on a shared virtual or clustered virtual host.  VPS or Dedicated servers are no problem because you can do some configuration.  However, in this case, I wanted to see if we could use Varnish to emulate a CDN, and if so, how difficult would it be for wordpress.</p>
<p>As it turns out, WordPress has a particular capability built in that handles media uploads.  In the admin, under Settings, Miscellaneous, there are two values.  One that asks where uploads should be stored.  That path is a relative path under your blog&#8217;s home directory.  The second is the URL that points to that path.  In most cases you need to leave this blank, but, we can use that to point the URL for images to use the CDN.</p>
<p>Settings, Miscellaneous</p>
<p>Store uploads in this folder: wp-content/uploads<br />
Full URL path to files: http://cd34.colocdn.com/blog/wp-content/uploads</p>
<p>Second, all of the images that have been already posted need to have their URLs modified.  Since I am a command line guy, I executed the following command in MySQL.</p>
<pre>
update wp_posts set post_content=replace(post_content,'http://cd34.com/blog/wp-content/uploads/','http://cd34.colocdn.com/blog/wp-content/uploads/');
</pre>
<p>According to the Yahoo YSlow plugin, my blog went from a 72 to a 98 out of 100 with this and a few other modifications.   The site does appear to be much snappier as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/converting-to-a-varnish-cdn-with-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mysql 5.1&#8242;s query optimizer</title>
		<link>http://cd34.com/blog/scalability/mysql-5-1s-query-optimizer/</link>
		<comments>http://cd34.com/blog/scalability/mysql-5-1s-query-optimizer/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 06:28:51 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=781</guid>
		<description><![CDATA[While debugging an issue with an application that relies heavily on MySQL, an issue was brought up regarding the cardinality of the keys selected, and, the order in which the keys were indexed. With any relational database, in order to get the fastest performance, your query should reduce the result set as quickly as possible. [...]]]></description>
			<content:encoded><![CDATA[<p>While debugging an issue with an application that relies heavily on MySQL, an issue was brought up regarding the cardinality of the keys selected, and, the order in which the keys were indexed.  With any relational database, in order to get the fastest performance, your query should reduce the result set as quickly as possible.  Your data should have a high cardinality or variation in the data so that the B-Tree (or R-Tree) is more balanced.   If your data consists of:</p>
<p>One thousand records with the date 2009-01-01<br />
One thousand records with the date 2009-01-02<br />
&#8230;<br />
One thousand records with the date 2009-12-31</p>
<p>The cardinality or uniqueness of that column is low given the fact that you&#8217;ll have 365000 rows with blocks of one thousand having the same key.  If you consider 125 different IP addresses per day generating those same thousand records, the cardinality or uniqueness of the IP addresses will be very high.</p>
<p>In order to show the performance differences in multiple indexing schemes and representations, a table has been created with an Unsigned Int column for the IP address, a varchar(15) for the IP address, a date column, and a varchar(80) for some text data.  Because of the way the MySQL query processor works, it is possible to construct your query so that the results are answered from the index and the data file is never hit.  A test sample was created that will be used for all of the tests.  The file will be indexed, optimized, and the test run five times with the cumulative time used.  The sample data that generates the queries against the database include 48000 of the ten million rows, plus 2000 randomly generated queries.  Those results are then shuffled and written to a file for the tests.  Testing hits versus misses emulates real world situations a little more accurately.  All of the code used to run these tests is included in this post.</p>
<h3>Test Setup</h3>
<p>Creation of the table:</p>
<pre>
CREATE TABLE `querytest` (
  `iip` int(10) unsigned DEFAULT NULL,
  `ipv` varchar(15) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `randomtext` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
</pre>
<p>Filling the table with data:</p>
<pre>
#!/usr/bin/python

import MySQLdb
import random
import datetime
import time

lipsum = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi gravida congue nisi, nec auctor leo placerat nec. In hac habitasse platea dictumst. In rutrum blandit velit et varius. Integer commodo ipsum ut diam placerat feugiat. Curabitur viverra erat ut felis cursus mollis. Sed tempus tempor faucibus. Etiam eget arcu massa, eget dictum sapien. Nullam euismod purus vitae risus ultrices tempus. Mauris semper rhoncus lectus, sit amet laoreet mauris tincidunt et. Duis ut mauris massa. Nam semper, enim id fermentum tristique, ligula velit suscipit lacus, vitae ultrices mi arcu sit amet felis. Ut sit amet tellus eget lorem gravida malesuada.

Integer nec massa quis mauris porta laoreet. Curabitur tincidunt nunc at mauris porttitor auctor. Mauris auctor faucibus tortor dignissim sodales. Sed ut tellus nisi, laoreet malesuada tortor. Vivamus blandit neque et nunc fringilla quis dignissim felis tincidunt. Nam nec varius orci. Duis pretium magna id urna fermentum commodo. Aliquam sollicitudin imperdiet leo eget ullamcorper. Quisque id mauris nec purus pulvinar bibendum. Fusce nunc metus, viverra in iaculis id, tempus nec neque. Aenean ac diam arcu, vitae condimentum lectus. Vivamus cursus iaculis tortor eget bibendum. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean elementum odio et nisl ornare at sodales eros porta. Duis mollis tincidunt neque, sed pulvinar enim ultrices a. Sed laoreet nunc ut nisl luctus a egestas quam luctus. Pellentesque non dui et neque ullamcorper condimentum ac ut turpis. Etiam a lectus odio, vitae bibendum arcu. Nulla egestas dolor ligula.

Quisque rhoncus neque ultrices mi lacinia tempus. Sed scelerisque libero dui, quis vulputate leo. Phasellus nibh ante, viverra sed cursus ac, dictum et lectus. Suspendisse potenti. Ut dapibus augue vitae sem convallis in iaculis nibh bibendum. Mauris eu sapien in lacus pharetra fermentum. Etiam eleifend vulputate velit, a tempor augue ultrices vitae. Vestibulum varius orci ac justo adipiscing quis dignissim odio porttitor. Nam ac metus leo. Ut a porttitor lectus. Nunc accumsan ante non eros feugiat suscipit.

Nulla facilisi. Nam molestie dignissim purus sed lacinia. Etiam tristique, eros vel condimentum fermentum, ipsum justo vulputate erat, sed faucibus nunc nisl id tellus. Aliquam a tempus leo. Nullam et sem nunc. Suspendisse potenti. Quisque ante lorem, aliquam sed aliquet vel, malesuada sit amet nisl. Vestibulum tristique velit pellentesque sapien ultrices non gravida ante blandit. Donec luctus nunc dictum felis feugiat sollicitudin. Nam lectus mi, porttitor sed adipiscing ac, pharetra a orci. Ut vitae eros vitae metus.
"""

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

length = len(lipsum)
jan_1_2009 = time.mktime((2009, 1, 1, 0, 0, 0, 0, 0, 0))

for i in range (1, 10000001):

  # generate a random IP address
  rand_ip = random.randint(1,4294967295)

  # pull a random piece of text from lipsum with a random length
  start_pos = random.randint(1,length)
  end_pos = start_pos + random.randint(200,2000)
  random_text = lipsum[start_pos:end_pos]

  # pick a random date in 2009
  rand_date = time.strftime("%Y-%m-%d",time.gmtime(jan_1_2009 + random.randint(1,365*60*60*24)))

  cursor.execute("insert into querytest (iip,ipv,date,randomtext) values (%s,inet_ntoa(%s),%s,%s)", (rand_ip, rand_ip, rand_date, random_text))

cursor.close ()
db.close ()
</pre>
<p>Generate test set:</p>
<pre>
#!/usr/bin/python

import MySQLdb
import random
import datetime
import time
import socket
import struct

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

jan_1_2009 = time.mktime((2009, 1, 1, 0, 0, 0, 0, 0, 0))

cursor.execute("select iip,ipv,date from querytest order by rand() limit 48000")

data = list(cursor.fetchall())

for i in range (1, 2001):

  # generate a random IP address
  rand_ip = random.randint(1,4294967295)

  # pick a random date in 2009
  rand_date = time.strftime("%Y-%m-%d",time.gmtime(jan_1_2009 + random.randint(1,365*60*60*24)))

  data.append((rand_ip, socket.inet_ntoa(struct.pack('L',rand_ip)), rand_date))

random.shuffle(data)
for datum in data:
  print "%s,%s,%s" % (datum[0], datum[1], datum[2])

cursor.close ()
db.close ()
</pre>
<p>At this point we have created the table, filled it with ten million rows, and generated a fifty thousand row query set to run against the table.  Now, we need to categorize the theories to see whether cardinality plays as large a role as it used to.</p>
<h3>The following tests will be performed</h3>
<p>Index of iip,date</p>
<p>* Use the unsigned int representation of the IP address and the date<br />
* Use the text representation of the IP address passed through inet_aton() and the date</p>
<p>Index of ipv, date</p>
<p>* Use the text representation of the IP address and the date<br />
* Use the unsigned int representation of the IP address passed through inet_ntoa() and the date</p>
<p>Index of date,iip</p>
<p>* Use date and the unsigned int representation of the IP address<br />
* Use date and the text representation of the IP address passed through inet_aton()</p>
<p>Index of date,ipv</p>
<p>* Use date and the unsigned int representation of the IP address<br />
* Use date and the text representation of the IP address passed through inet_aton()</p>
<p>Each of the above tests will be run twice, once with select * and once with select ipv,date.</p>
<h3>Benchmark Code</h3>
<pre>
#!/usr/bin/python

import MySQLdb
import random
import datetime
import time
import socket
import struct
import array

def run_query(query, data, columna, columnb):
    for datum in data:
      cursor.execute(query, (datum[columna], datum[columnb]))
      result = cursor.fetchall()

query_tests = [
               ['create index querytest on querytest (iip,date)',
                'select * from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select * from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select iip,date from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select iip,date from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select * from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select * from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select iip,date from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select iip,date from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select * from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select * from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select ipv,date from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select ipv,date from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select * from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select * from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (date,iip)',
                'select * from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select * from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip)',
                'select iip,date from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select iip,date from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip)',
                'select * from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select * from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip)',
                'select iip,date from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select iip,date from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select * from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select * from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select ipv,date from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select ipv,date from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select * from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select * from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv)',
                'select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
              ]

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

queries = open('testquery.txt').readlines()

query_array = []
for query_data in queries:
  query_array.append(query_data.rstrip('\n').split(','))

for test in query_tests:
  try:
    cursor.execute('alter table querytest drop index querytest')
  except:
    pass
  cursor.execute(test[0])
  cursor.execute('optimize table querytest')

  print "Test: %s\n with Index: %s" % (test[1], test[0])
  start_time = time.time()

  for loop in range (1,6):
    run_query(test[1], query_array, test[2], test[3])

  end_time = time.time()
  print "Duration: %f seconds\n" % (end_time - start_time)

cursor.close ()
db.close ()
</pre>
<h3>Miscellaneous notes</h3>
<p>P4/3.0ghz, 2gb RAM, Debian 3/Squeeze, Linux 2.6.31.1, WD 7200RPM SATA drive, SuperMicro P4SCI Motherboard</p>
<p>There are multiple tests that could have been run without dropping the index, recreating the index and optimizing the table.  When testing a more limited set, results were a little sporadic due to a smaller initial test set and portions of the table and index being cached in the kernel cache.  To ensure more consistent test results, every test was run in a consistent manner.</p>
<h3>Benchmark Results</h3>
<pre>
Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 679.169198 seconds

Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 692.634291 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 179.039791 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 178.993962 seconds

Test: select * from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 672.836734 seconds

Test: select * from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 606.268787 seconds

Test: select iip,date from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 195.253512 seconds

Test: select iip,date from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 195.222058 seconds

Test: select * from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 741.876227 seconds

Test: select * from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 639.109309 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 167.049333 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 167.016152 seconds

Test: select * from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 578.565762 seconds

Test: select * from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 655.869390 seconds

Test: select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 181.555567 seconds

Test: select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 181.230911 seconds

Test: select * from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 655.928799 seconds

Test: select * from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 637.146124 seconds

Test: select iip,date from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 181.637912 seconds

Test: select iip,date from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 181.512190 seconds

Test: select * from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip)
Duration: 603.553238 seconds

Test: select * from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 605.363284 seconds

Test: select iip,date from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip)
Duration: 196.680399 seconds

Test: select iip,date from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 194.746056 seconds

Test: select * from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv)
Duration: 657.619028 seconds

Test: select * from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 686.560066 seconds

Test: select ipv,date from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv)
Duration: 172.222691 seconds

Test: select ipv,date from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 172.079220 seconds

Test: select * from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv)
Duration: 726.031732 seconds

Test: select * from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 678.099808 seconds

Test: select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv)
Duration: 185.415666 seconds

Test: select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 185.280880 seconds
</pre>
<h3>Conclusions</h3>
<p>Based on the data, I think we can say that the argument of B-Tree versus Hash doesn&#8217;t seem to make much difference.  Neither is consistently better, and since the data and query test is identical, the results don&#8217;t really point to a clear winner.  Avoiding Select * and pulling only the required fields makes a difference and if your result can be answered from the index rather than the data file, there is a substantial boost.  Analysis of the results suggests that cardinality isn&#8217;t as important as it used to be.  I am devising a method to further test cardinality as I do believe that live data will have somewhat different results from data after an optimize table has been run.</p>
<p>The winner in this case is:</p>
<pre>
Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 167.049333 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 167.016152 seconds
</pre>
<p>I had actually expected int represented as unsigned int would be the fastest.  However, there is probably a reasonable explanation why these two queries are slower:</p>
<pre>
Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 179.039791 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 178.993962 seconds
</pre>
<p>Data in MySQL is represented as binary. The IP stored as an unsigned int takes 4 bytes, and the date takes 3.  The key length in this case would be 7 bytes versus the index on IP stored as varchar(15) and the date taking 18 bytes.  Even though the index in the second case is almost three times the size of the unsigned int IP, the MySQL client library converts all binary data to ASCII when communicating to avoid endian issues.  That extra conversion results in a slightly slower result &#8212; measurable when you do 250000 queries against a 10 million record database.</p>
<p>A quick modification of the test shows the results of select *, versus select keyvaluea,keyvalueb and select data,keyvalueb.  As you can see from the results below, MySQL will answer queries from the index if it doesn&#8217;t need to hit the data file.</p>
<pre>
Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 637.420786 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 178.434477 seconds

Test: select ipv,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 690.804990 seconds

Test: select inet_ntoa(iip) as iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 183.817643 seconds
</pre>
<p>If you can structure your data well, there are significant performance gains to be had.</p>
<h3>What does this mean?</h3>
<p>Do you store IPs as unsigned int in the database?  If you use varchar(15) or char(15), you&#8217;re talking about an eleven or ten byte savings per record at the expense of some conversion time.  varchar uses 1 character to store the length of the stored data plus the length of the data.  char is a fixed length based on the column length you specify.</p>
<p>Make sure you return only the columns that you need in your calculations &#8212; especially if you are running MySQL over a network.</p>
<p>Try to create your index to match the conditions that you are looking for, and, when possible, if you are searching for the result from a particular column, consider adding it to the index as well.</p>
<p>Always use count(*) rather than count(column) unless there is a valid reason for that column to contain NULL.</p>
<h3>The Effect of count(*) versus count(date)</h3>
<p>count(*) gives you the number of rows in the set that match the criteria you have set.  count(date) counts the number of rows in the set that match the criteria where the date is not null.  Many times, you&#8217;ll see someone do a count(id), and id by definition is a primary key, auto_increment and cannot be null.  Because count(column) must read the table to ensure that the column specified is not null, it is forced to check every key, or, read the table for all of the matching rows to make sure the column retrieved doesn&#8217;t contain a null value.  If the column being counted is one of the keys in the index, the performance change won&#8217;t be as dramatic.  By counting a column that isn&#8217;t in the key and having to read the data, count(column) is considerably slower.</p>
<p>Results when the counted column is within the key and only 1 or 0 rows are expected:</p>
<pre>
Test: select count(*) as ct from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 175.727338 seconds

Test: select count(iip) as ct from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 176.495198 seconds
</pre>
<p>When count returns more than one row, you can see the effect is much more detrimental.  The first iteration of this test took so long that I shortened it to do five iterations of 100 queries.  After 4 hours, and 18% complete, I shortened the test to do one iteration of ten queries.  The results clearly demonstrate the issue without taking 20+ hours to run a single simple benchmark.  Simply stated, unless you really have a valid reason to check your results to see if the column is null, DON&#8217;T!</p>
<pre>
Test: select count(*) as ct from querytest where date=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 0.408268 seconds

Test: select count(ipv) as ct from querytest where date=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 3085.770998 seconds
</pre>
<h3>The Fine Print</h3>
<p>* Index columns used in your where conditions<br />
* B-Tree versus Hash doesn&#8217;t appear to materially affect results<br />
* storing IP as char(15) if the data is being returned to the client can be faster than storing an IP as an unsigned int.  If the IP is not fetched but only used in comparisons, unsigned int is probably the better choice.<br />
* Consider adding that extra column to your index to prevent MySQL from having to read the data file.  Answering your query from the index is significantly faster.<br />
* count(*) rather than count(column)</p>
<p>Live data will not act precisely as the benchmark &#8212; what live scenario ever does?  But, I believe the tests above should show some of the performance gains available by structuring your tables and queries.</p>
<p>While MySQL 4, 5.0 and 5.1 will reorder conditions to match the index key, there are some significant performance gains from 4.x to 5.0.  MySQL 5.1 didn&#8217;t show considerable gains from MySQL 5.0, but, there are some minor speed increases.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/mysql-5-1s-query-optimizer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mysql Query Optimization</title>
		<link>http://cd34.com/blog/scalability/mysql-query-optimization/</link>
		<comments>http://cd34.com/blog/scalability/mysql-query-optimization/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 04:16:30 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=745</guid>
		<description><![CDATA[I heard a comment from a developer the other day: You don&#8217;t need indexes on small tables. So I asked what the definition of a small table was. He said, anything with a few hundred rows. So I said, 2300 rows? Well&#8230;.. 24000 rows? Well&#8230;.. 292000 rows? That&#8217;s large. I showed him unindexed queries in [...]]]></description>
			<content:encoded><![CDATA[<p>I heard a comment from a developer the other day:</p>
<blockquote><p>You don&#8217;t need indexes on small tables.</p></blockquote>
<p>So I asked what the definition of a small table was.  He said, anything with a few hundred rows.  So I said, 2300 rows?  Well&#8230;.. 24000 rows? Well&#8230;.. 292000 rows?  That&#8217;s large.  I showed him unindexed queries in his application dealing with tables that had 2300, 24000 and 292000 rows.</p>
<h3>Avoid tablescans</h3>
<p>When MySQL deals with a query that is unindexed, it does a full tablescan to see if each record in the table meets the criteria specified.  On a small table, if the query is executed frequently, the MySQL query cache might be able to serve the query.  However, on a larger table, or a table with large rows, it must read every row, check the fields, possibly create a temporary table in ram or disk, and return the results.  On a small site, you might not notice it, but, on a large system, forcing tablescans on tables with even a few thousand rows will slow things down considerably:</p>
<blockquote><p>Uptime: 60016  Threads: 11  Questions: 105460332  Slow queries: 197769  Opens: 5819  Flush tables: 1  Open tables: 1320  Queries per second avg: 1757.204</p></blockquote>
<p>Slow queries are sometimes unavoidable, but, often, slow queries are missing an index.</p>
<h3>Use the slow-query log to find potential issues</h3>
<p>When analyzing a system to find problems, putting:</p>
<blockquote><p>log-queries-not-using-indexes</p></blockquote>
<p>in the my.cnf file and restarting mysql will log the unindexed queries to the slowquery log.</p>
<h3>What can be indexed?</h3>
<p>The rule of thumb when writing indexes is to write your query in such a way that you reduce the result set as quickly as possible, with the highest cardinality possible.  What does this mean?</p>
<p>If you are collecting data of the IP address and the Date, your query against date,ip will actually be worse than ip,date.  Imagine receiving 40000 hits to your site on the same date.  If you were looking for the number of hits that a particular IP had, you would search the 41 hits they have made over time, and then the 8 that they had today.  If you queried by date,ip, you would search 40000 rows then would receive the 8 they had today.  Each index you have, adds extra overhead and an index file should be as small as possible.  IP addresses can be represented in an unsigned int which takes much less space than the varchar(15) usually used.  Remember when you index a varchar field, indexing will spacepad the key to the full length.  If you have a variable length field you want indexed, you might be able to figure out the significant portion of that field by finding the average length and adding a few characters for good measure and indexing fieldname(15) rather than the entire field.  If a query is longer than the 15 characters, you have still created a significant reduction in the number of rows that it must check.</p>
<p>Cardinality refers to the uniqueness of the data.  The more unique the data, the lower the chance that you&#8217;ll have thousands of records that match the first criteria.  When the data is very similar, the index as built on disk will become imbalanced resulting in slower queries.  Since MyISAM and InnoDB use a B-Tree index (or R-Tree if you use a spatial index), data that is similar when inserted, can create a very imbalanced tree which leads to slower lookups.  An optimize table can resort and reindex the table to eliminate this, but, you can&#8217;t do that on an extremely large, active table without impacting response times.</p>
<blockquote><p># Query_time: 0  Lock_time: 0  Rows_sent: 1  Rows_examined: 3323<br />
SELECT * FROM websites_geo where (zoneid = &#8217;5135&#8242;) LIMIT 1;</p></blockquote>
<p>In this case, zoneid is not indexed on the table websites_geo.  Adding an index on zoneid eliminates the tablescan on this query.</p>
<h3>Check for equality, not inequality.</h3>
<p>An index can only check equality.  A query checking to see if values are not equal, cannot be indexed.</p>
<blockquote><p># Query_time: 0  Lock_time: 0  Rows_sent: 5  Rows_examined: 2548<br />
SELECT * FROM websites where (id = &#8217;1056692&#8242; &#038;&#038; status != &#8216;d&#8217; &#038;&#038; status != &#8216;n&#8217;) order by rand() LIMIT 5;</p>
<p># Query_time: 0  Lock_time: 0  Rows_sent: 10  Rows_examined: 2544<br />
SELECT * FROM websites where (status != &#8216;n&#8217; &#038;&#038; status != &#8216;d&#8217; &#038;&#038; traffic > 3000) order by added desc LIMIT 10;</p></blockquote>
<p>These two queries show two different issues, but, deal with the same fundamental issue.  First, id is not indexed which would have at least limited the result set to 9 records rather than 2548.  The status check isn&#8217;t able to use an index.  On the second query, status is checked followed by traffic.  There are other queries issued that check status,traffic,clicks_high.  When we look at status (which should be an enum or char(1) rather than varchar(1)), we find that there are only 4 values used.  By indexing on id,status and status,traffic,clicks_high, we could alter the queries as such:</p>
<blockquote><p>SELECT * FROM websites where (id = &#8217;1056692&#8242; &#038;&#038; status in (&#8216;g&#8217;,&#8217; &#8216;)) order by rand() LIMIT 5;</p>
<p>SELECT * FROM websites where (status in (&#8216;g&#8217;,&#8217; &#8216;) &#038;&#038; traffic > 3000) order by added desc LIMIT 10;</p></blockquote>
<p>which would result in both queries using an index.  </p>
<h3>Choose your data types intelligently.</h3>
<p>As a secondary point, id (though it is numeric) happens to be a text field.  If you index id in this case, you would have to specify a key length.</p>
<blockquote><p>mysql> select max(length(id)) from websites;<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
| max(length(id)) |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
|              22 |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
1 row in set (0.02 sec)</p>
<p>mysql> select avg(length(id)) from websites;<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
| avg(length(id)) |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
|          8.3315 |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+<br />
1 row in set (0.00 sec)</p>
<p>mysql> </p></blockquote>
<p>Based on this, we might decide to set the key length to 22 as it is a relatively small number and allows room to grow.  Personally, I would have opted to have the id be an unsigned int which would be much smaller, but, the application developer uses alphanumeric id&#8217;s which are exposed externally.  With sharding, you could use the id throughout the various tables, or, you could map the text id to a numeric id internally for all of the various tables.</p>
<p>There are a number of possible solutions to help any SQL engine perform better.  And your data set will dictate some of the things that you can do to make data access quicker.</p>
<h3>Helping MySQL Help You</h3>
<p>If you do <strong>select * from table where condition_a=1 and condition_b=2</strong> in one place, and <strong>select * from table where condition_b=2 and condition_a=1</strong>, setting up a single index on condition_a,condition_b and adjusting your second query, reversing the conditions to the same order as the keys on the index will increase performance.</p>
<h3>Limit your results</h3>
<p>Another thing that will help considerably is using a limit clause.  So many times a programmer will do:  <strong>select * from table where condition_a=1</strong> which returns 2300 rows but only the first few rows are used.  A limit clause will prevent a lot of data from being fetched by MySQL and buffered waiting for the response.  <strong>select * from table where condition_a=1 limit 20</strong> would hand you the first 20 records.</p>
<h3>Avoid reading the data file, do all your work from the Index</h3>
<p>Additionally, if you have a table and only need three of the columns from the result, <strong>select fielda,fieldb,fieldc from table where condition_a=1</strong> will return only the three fields.  As an added boost, if the fields you are checking can be answered from the index, the query will never hit the actual data file and will be answered from the index.  Many times I&#8217;ve added a field that wasn&#8217;t needed in the index, just to eliminate the lookup of the key in the index then the corresponding read of the data file.</p>
<h3>Let MySQL do the work</h3>
<p>MySQL reads tables, filters results, can do some calculations.  Going through 40000 records to pick the best 100 is still faster in MySQL than allowing PHP to fetch 40000 rows and do calculations and sorts to come up with that 100 rows.  Index, optimize, and allow MySQL to do the database work.</p>
<h3>Summary</h3>
<p>Making MySQL work more efficiently goes a long way towards making your database driven site work better.  Adding six indexes to the system resulted in quicker response times and an increase in the transactions per second.</p>
<blockquote><p>Uptime: 32405  Threads: 1  Questions: 58729705  Slow queries: 64122  Opens: 2911  Flush tables: 1  Open tables: 295  Queries per second avg: 1812.366</p></blockquote>
<p>Previously, MySQL was generating 3.26 slow queries per second.  Now we&#8217;re just beneath 2 slow queries per second and our system is processing 55 more transactions per second.  There is still a bit more analysis to do to identify the slow queries that are still running and to alter the queries to reverse the inequality checks, but, even just adding indexes to a few tables has helped noticeably.  Once the developer is able to make some changes to the application, I&#8217;m sure we&#8217;ll see an additional speedup.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/mysql-query-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ESI Widget Issues in the Varnish, ESI, WordPress experiment</title>
		<link>http://cd34.com/blog/scalability/esi-widget-issues-in-the-varnish-esi-wordpress-experiment/</link>
		<comments>http://cd34.com/blog/scalability/esi-widget-issues-in-the-varnish-esi-wordpress-experiment/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 01:41:16 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[esi]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=729</guid>
		<description><![CDATA[The administration interface is quite simple. When the widget is installed, drag it to the Sidebar, then, drag any widgets that you want displayed to the ESI Widget Sidebar. Current issues: * When a user is logged in and comments on a post, their &#8216;login&#8217; information is left on the page if they are the [...]]]></description>
			<content:encoded><![CDATA[<p>The administration interface is quite simple.  When the widget is installed, drag it to the Sidebar, then, drag any widgets that you want displayed to the ESI Widget Sidebar.</p>
<p><a href="http://cd34.colocdn.com/blog/wp-content/uploads/2009/07/esi-widget.png"><img src="http://cd34.colocdn.com/blog/wp-content/uploads/2009/07/esi-widget-300x205.png" alt="esi-widget" title="esi-widget" width="300" height="205" class="aligncenter size-medium wp-image-730" /></a></p>
<p>Current issues:<br />
* When a user is logged in and comments on a post, their &#8216;login&#8217; information is left on the page if they are the first person to hit the page when Varnish caches the page.  If someone is logged in and visits a post page and the page hasn&#8217;t been previously cached, the html that shows their login status is cached, though, new visitors see the information, but lack the credentials.</p>
<p>Addons that don&#8217;t work properly:<br />
* Any poll application (possible solution to wrap widget in an ESI block)<br />
* Any stat application (unless they convert to a webbug tracker, this probably cannot be fixed easily)<br />
* Any advertisement/banner rotator that runs internal.  OpenX will work, as will most non-plugin<br />
* Any postcount/postviews addon<br />
* CommentLuv?<br />
* ExecPHP (will cache the output, but does work)<br />
* Manageable</p>
<p>Any plugin that does something at the time of the post or comment phase, that isn&#8217;t dependent on the logged in data should work without a problem.  If it requires a login, or uses the IP address to determine whether a visitor has performed an action, will have a problem due to the excessive caching.  For sites where the content is needed to be served quickly and there aren&#8217;t many comments, ESI Widget would work well.</p>
<p>Because of the way Varnish works, you wouldn&#8217;t necessarily have to run Varnish on the server running WordPress.  Point the DNS at the Varnish server and set the backend for the host to your WordPress server&#8217;s IP address and you can have a Varnish server across the country caching your blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/esi-widget-issues-in-the-varnish-esi-wordpress-experiment/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WordPress, Varnish and Edge Side Includes</title>
		<link>http://cd34.com/blog/scalability/wordpress-varnish-and-edge-side-includes/</link>
		<comments>http://cd34.com/blog/scalability/wordpress-varnish-and-edge-side-includes/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 19:34:29 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[esi]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=723</guid>
		<description><![CDATA[While talking about WordPress and it&#8217;s abysmal performance in high traffic situations to a client, we started looking back at Varnish and other solutions to keep their machine responsive. Since most of the caching solutions generate a page, serve it and cache it, posts and comments tend to lag behind the cache. db-cache does work [...]]]></description>
			<content:encoded><![CDATA[<p>While talking about WordPress and it&#8217;s abysmal performance in high traffic situations to a client, we started looking back at Varnish and other solutions to keep their machine responsive.  Since most of the caching solutions generate a page, serve it and cache it, posts and comments tend to lag behind the cache.  db-cache does work around this by caching the query objects so that the pages can be generated more quickly and does expire the cache when tables are updated, but, its performance is still lacking.  Using APC&#8217;s opcode cache or memcached just seemed to add complexity to the overall solution.</p>
<p>Sites like <a href="http://perezhilton.com/">perezhilton.com</a> appear to run behind multiple servers running Varnish, use wp-cache, move the images off to a CDN which results in a 3 request per second site with an 18 second pageload.  Varnish&#8217;s cache always shows an age of 0 meaning Varnish is acting more as a load balancer than a front-end cache.</p>
<p>Caching isn&#8217;t without its downside.  Your weblogs will not represent the true traffic.  Since Varnish intercepts and serves requests before they get to the backend, those hits never hit the log. Forget pageview/postview stats (even with addons) because the addon won&#8217;t get loaded except during caching.  Certain Widgets that rely on cookies or IP addresses will need to be modified.  A workaround is to use a Text Box Widget and do an ESI include of the widget.  For this client, we needed only some of the basic widgets.  The hits in the apache logs will come from an IP of 127.0.0.1.  Adjust your <a href="/blog/infrastructure/varnish-and-apache2/">apache configuration</a> to show the X-Forwarded-For IP address in the logs.  If you truly need statistics, you&#8217;ll need to use something like Google Analytics.  Put their code outside your page elements so that waiting for that javascript to load doesn&#8217;t slow down the rendering in the browser.</p>
<p>The test site, <a href="http://varnish.cd34.com/">http://varnish.cd34.com/</a> is running Varnish 2.0.4, Apache2-mpm-prefork 2.2.11, Debian/Testing, WordPress 2.8.2.  I&#8217;ve loaded the default .xml import for testing templates so that there were posts with varied dates and construction in the site.  To replicate the client&#8217;s site, the following Widgets were added the sidebar:  Search, Archives, Categories, Pages, Recent Posts, Tag Cloud, Calendar.  Calendar isn&#8217;t in the existing site, but, since it is a very &#8216;expensive&#8217; SQL query to run, it made for a good benchmark.</p>
<p>The demo site is running on:</p>
<pre>
model name	: Intel(R) Celeron(R) CPU 2.40GHz
stepping	: 9
cpu MHz		: 2400.389
cache size	: 128 KB
</pre>
<p>with a Western Digital 80gb 7200RPM IDE drive.  Since all of the benchmarking was done on the same machine without any config changes taking place between tests, our benchmarks should represent as even a test base as we can expect.</p>
<p>Regrettably, our underpowered machine couldn&#8217;t run the benchmark with 50 concurrent tests, nor, could it run the benchmarks with the Calendar Widget enabled.  In order to get apachebench to run, we had to bump the number of requests down and reduce the number of concurrent tests.</p>
<p>These results are from Apache without Varnish.</p>
<pre>
Server Software:        Apache
Server Hostname:        varnish.cd34.com
Server Port:            80

Document Path:          /
Document Length:        43903 bytes

Concurrency Level:      10
Time taken for tests:   159.210 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      4408200 bytes
HTML transferred:       4390300 bytes
Requests per second:    0.63 [#/sec] (mean)
Time per request:       15921.022 [ms] (mean)
Time per request:       1592.102 [ms] (mean, across all concurrent requests)
Transfer rate:          27.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   7.0      0      25
Processing: 14785 15863 450.2  15841   17142
Waiting:     8209 8686 363.4   8517    9708
Total:      14785 15865 451.4  15841   17142

Percentage of the requests served within a certain time (ms)
  50%  15841
  66%  15975
  75%  16109
  80%  16153
  90%  16628
  95%  16836
  98%  17001
  99%  17142
 100%  17142 (longest request)
</pre>
<p>Normally we would have run the Varnish enabled test without the Calendar Widget, but, I felt confident enough to run the test with the widget in the sidebar.  Varnish was configured with a 12 hour cache (yes, I know, I&#8217;ll address that later) and the ESI Widget was loaded.</p>
<pre>
Server Software:        Apache
Server Hostname:        varnish.cd34.com
Server Port:            80

Document Path:          /
Document Length:        45544 bytes

Concurrency Level:      50
Time taken for tests:   18.607 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      457980000 bytes
HTML transferred:       455440000 bytes
Requests per second:    537.44 [#/sec] (mean)
Time per request:       93.034 [ms] (mean)
Time per request:       1.861 [ms] (mean, across all concurrent requests)
Transfer rate:          24036.81 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.8      0      42
Processing:     1   92  46.2    105     451
Waiting:        0   91  45.8    104     228
Total:          2   93  46.0    105     451

Percentage of the requests served within a certain time (ms)
  50%    105
  66%    117
  75%    123
  80%    128
  90%    142
  95%    155
  98%    171
  99%    181
 100%    451 (longest request)
</pre>
<p>As you can see, even with the aging hardware, we went from .63 requests per second to 537.44 requests per second.  </p>
<p>But, more about that 12 hour cache.  The ESI Widget uses an Edge Side Include to include the sidebar into the template.  Rather than just cache the entire page, we instruct Varnish to cache the page and include the sidebar.  As a result, when a person surfs the site and goes from the front page to a post page, the sidebar doesn&#8217;t need to be regenerated when they go to the 2nd page.  With wp-cache, it would have regenerated the sidebar Widgets and then cached the resulting page.  Obviously, that 12 hour cache is going to affect the usability of the site, so, ESI widget purges the sidebar, front page and post page any time a post is updated or deleted or commented on.  Voila, even with a long cache time, we are presented with a site that is dynamic and not delayed until wp-cache&#8217;s page cache expires.  As this widget is a concept, I&#8217;m sure a little intelligence can be added to prevent the excessive purging in some cases, but, it does handle things reasonably well.  There are some issues not currently handled with the ESI including how to handle users that are logged for comments.  With some template modifications, I think those pieces can be handled with ESI to provide a lightweight method for the authentication portion.</p>
<p>While I have seen other sites mention Varnish and other methods to keep your wordpress installation alive in high traffic, I believe this approach is a step in the right direction.  With the <a href="http://cd34.com/esi-widget/">ESI widget</a>, you can focus on your site, and let the server do the hard work.  This methodology is based on a CMS that I have contemplated writing for many years, though, using Varnish rather than static files.</p>
<p>It is a concept developed in roughly four hours including the time to write the widget and do the benchmarking.  It isn&#8217;t perfect, but does address the immediate needs of the one client.  I think we can consider this concept a success.</p>
<p>If you don&#8217;t have the ability to modify your system to run Varnish, then you would be limited to running wp-cache and db-cache.  If you can connect to a memcached server, you might consider running <a href="http://fairyfish.com/2008/03/13/enable-memcached-for-your-wordpress/">Memcached for WordPress</a> as it will make quite a difference as well.</p>
<p>This blog site, http://cd34.com/blog/ is not running behind Varnish.  To see the Varnish enabled site with ESI Widget, go to <a href="http://varnish.cd34.com/">http://varnish.cd34.com/</a></p>
<p>Software Mentioned:</p>
<p>* <a href="http://varnish.projects.linpro.no/">Varnish</a> <a href="http://varnish.projects.linpro.no/wiki/ESIfeatures">ESI</a> and <a href="http://varnish.projects.linpro.no/wiki/VCLSyntaxPurge">Purge</a> and Varnish&#8217;s suggestions for helping <a href="http://varnish.projects.linpro.no/wiki/VarnishAndWordPress">WordPress</a><br />
* <a href="http://wordpress.org/">WordPress</a><br />
* <a href="http://wordpress.org/extend/plugins/wp-cache/">wp-cache</a><br />
* <a href="http://wordpress.org/extend/plugins/db-cache/">db-cache</a></p>
<p>Sites used for reference:</p>
<p>* <a href="http://blog.darkhax.com/2009/06/08/supercharge-wordpress">Supercharge WordPress</a><br />
* <a href="http://jimmyg.org/blog/2009/ssi-memcached-nginx.html">SSI, Memcached and Nginx</a> (with mentions of a Varnish/ESI configuration)</p>
<p>Varnish configuration used for ESI-Widget:</p>
<pre>
backend default {
.host = "127.0.0.1";
.port = "81";
}

sub vcl_recv {
 if (req.request == "PURGE") {
     purge("req.url == " req.url);
 }

 if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
    unset req.http.cookie;
  }
  if (!(req.url ~ "wp-(login|admin)")) {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
   set obj.ttl = 12h;
   if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
      set obj.ttl = 24 h;
   } else {
      esi;  /* Do ESI processing */
   }
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/wordpress-varnish-and-edge-side-includes/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Varnish proves itself against a DDOS</title>
		<link>http://cd34.com/blog/scalability/varnish-proves-itself-against-a-ddos/</link>
		<comments>http://cd34.com/blog/scalability/varnish-proves-itself-against-a-ddos/#comments</comments>
		<pubDate>Sat, 02 May 2009 20:22:13 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Varnish]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=637</guid>
		<description><![CDATA[I&#8217;ve worked a lot with Varnish over the last few weeks and we&#8217;ve had a rather persistent hacker that has been sending a small but annoying DDOS to a client on one of our machines. Usually we isolate the client and move their affected sites to a machine that won&#8217;t affect other clients. Then we [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve worked a lot with Varnish over the last few weeks and we&#8217;ve had a rather persistent hacker that has been sending a small but annoying DDOS to a client on one of our machines.  Usually we isolate the client and move their affected sites to a machine that won&#8217;t affect other clients.  Then we can modify firewall rules, find the issue, wait for the attack to end and move them back.  Usually this results in a bit of turmoil because not every client is easy to shuffle around.  Some have multiple databases and perhaps the application they are running takes a bit more horsepower to run due to the attack.</p>
<p>In this case, the application wasn&#8217;t too badly written and it was just a matter of firewalling certain types of packets and modifying the TCP settings to allow things to time out a bit more quickly while the attack persisted.  In order to do this seamlessly we had to move the physical IP that client was using to another machine running varnish.</p>
<p>What we ended up with was running Varnish on a machine where we had the ability to freely firewall packets, could turn on more verbose packet logging and, pulled the requests from the original machine.  Short of moving the IP address and making config changes on the existing machine, it was straightforward:</p>
<p>Original Machine<br />
* changed apache config to listen to a different IP address on port 81<br />
* modified the firewall to allow port 81<br />
* adjusted the apache config to listen to port 81 on that IP address<br />
* shut down the virtual ethernet interface<br />
* restarted apache</p>
<p>Varnish Machine<br />
* set up the backend to request files from port 81 on the new IP assigned from the old machine<br />
* copied the firewall rules from the Original Machine to the Varnish Machine<br />
* brought up the IP from the original machine<br />
* restarted varnish</p>
<p>Cleared the Arp-cache in the switches that both machines were connected to.</p>
<p>Within seconds, the load on the Original machine dropped to half of what it was before.  Varnish had been running on that machine, but, the DDOS was still hitting the firewall rules and causing apache to open connections.  Moving both of those pieces of the equation off the machine resulted in an immediate improvement on the Original Machine.  Since the same cpu horsepower is being used with the script &#8211; Varnish passes those requests through, and we&#8217;ve only removed some of the static files from being served from the machine, I believe we can safely conclude that it wasn&#8217;t the application that had the problems.  Apache has roughly the same number of processes as it had when we were running varnish on that machine, so, the load reduction appears to be mostly related to the firewall rules or the traffic that was still coming through.</p>
<p>Since moving the traffic over to the other machine, we see the same issues being exhibited there.  Since that machine isn&#8217;t doing anything but caching the apache responses, we can reasonably assume that the firewall is adding quite a bit of overhead to things.  The inbound traffic on the Original Machine was cut almost in half with a corresponding jump on the Varnish machine.  Since Varnish is dealing with inbound traffic from the original machine and from the DDOS attack, it is difficult to say with certainty that the inbound traffic on that machine is reflecting it, however, based on the 90% cache hit rate and the size of the cached pages, I don&#8217;t believe the inbound traffic on that machine should be what it is, so, it is evident that the DDOS traffic moved.</p>
<p>After moving one set of sites, and analyzing the Original Machine, it does appear that a second set of his sites is also impacted.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/varnish-proves-itself-against-a-ddos/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Varnish saves the day&#8230;. maybe</title>
		<link>http://cd34.com/blog/scalability/varnish-saves-the-day-maybe/</link>
		<comments>http://cd34.com/blog/scalability/varnish-saves-the-day-maybe/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 04:41:09 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Varnish]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=631</guid>
		<description><![CDATA[We had a client that had a machine where apache was being overrun&#8230; or so we thought.  Everything pointed at this one set of domains owned by a client and in particular two sites with 100+ elements on the page.  Images, css, javascript and iframes composed their main page.  Apache was handling things reasonably well, [...]]]></description>
			<content:encoded><![CDATA[<p>We had a client that had a machine where apache was being overrun&#8230; or so we thought.  Everything pointed at this one set of domains owned by a client and in particular two sites with 100+ elements on the page.  Images, css, javascript and iframes composed their main page.  Apache was handling things reasonably well, but, it was immediately obvious that it could be better.</p>
<p>The conversion to <a href="http://varnish.projects.linpro.no/">Varnish</a> was quite simple to do even on a live server.  Slight modifications to the Apache config file to listen to port 81 on the set of domains in question, and a quick restart.  Varnish was configured to listen to port 80 on that particular IP and some minor modifications were made to the startup.vcl file to modify things slightly:</p>
<blockquote><p>sub vcl_fetch {<br />
&nbsp;&nbsp;if (req.url ~ &#8220;\.(png|gif|jpg|swf|css|js)$&#8221;) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;set obj.ttl = 3600s;<br />
&nbsp;&nbsp;}<br />
}</p></blockquote>
<p>A one hour cache should be granular enough to do a bit more good on these sites, overriding the default of two minutes.  After an hour, it was evident that the sites did peform much more quickly, but, we still had a load issue.  Some modifications of the apache config alleviated some of the other load problems after we dug further into things.</p>
<p>After 5 hours, we ended up with the following statistics from varnish:</p>
<pre>
0+05:18:24                                                               xxxxxx
Hitrate ratio:       10      100     1000
Hitrate avg:     0.9368   0.9231   0.9156

62576         1.00         3.28 Client connections accepted
466684        57.88        24.43 Client requests received
411765        48.90        21.55 Cache hits
148         0.00         0.01 Cache hits for pass
32018         7.98         1.68 Cache misses
54761         8.98         2.87 Backend connections success
0         0.00         0.00 Backend connections failures
45411         7.98         2.38 Backend connections reuses
48598         7.98         2.54 Backend connections recycles
</pre>
<p>Varnish is doing a great job.  The site does load considerably faster, but, it didn&#8217;t solve the entire problem.  It did reduce the number of apache processes on that machine from 450 to 170 or so, freed up some ram for cache, and did make the server more responsive, but, it probably only contributed to 50% of the issue.  The rest of it was cleaning up some poorly written php code, modifying a few mysql tables and adding some indexes to make things work more quickly.</p>
<p>After we fixed the code problems, we debated removing Varnish from their configuration.  Varnish did buy us time to fix the problem and does result in a better experience for surfers on the sites, but, after the backend changes, it is hard to tell whether it makes enough impact to keep a non-standard configuration running.  Since it is not caching the main page of the site and is only serving the static elements (the site sets an expire time on each generated page), the only real benefit is that we are removing the need for apache to serve the static elements.</p>
<p>While testing another application, we were able to override hardcoded expire times and forcing a minimally cached page.  Even if we cached a generated page for two minutes, it could be the difference between a responsive server and a machine struggling to keep up.  Since WordPress, Joomla, Drupal and others set expire times using dates that have passed, they ensure that the site html being output is not cached.  Varnish allows us to ignore that, and to set our own cache time which could save a site hit with a lot of traffic.</p>
<blockquote><p>sub vcl_fetch {<br />
&nbsp;&nbsp;if (obj.ttl &lt; 120s) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;set obj.ttl = 120s;<br />
&nbsp;&nbsp;}<br />
}</p></blockquote>
<p>would give us a minimum two minute cache which would cut the requests to a dynamically generated page considerably.</p>
<p>It is a juggling act.  Where do you make the tradeoff and what do you accelerate? Too many times the solution to a website&#8217;s performance problem is to throw more hardware at it.  At some point you have to split the load on multiple servers, adding new bottlenecks.  An application designed to run on a single machine becomes difficult to split to two or more machines, so, many times we do what we can to keep things running on a single machine.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/varnish-saves-the-day-maybe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nginx after one day and conversion of two more machines</title>
		<link>http://cd34.com/blog/scalability/nginx-after-one-day-and-conversion-of-two-more-machines/</link>
		<comments>http://cd34.com/blog/scalability/nginx-after-one-day-and-conversion-of-two-more-machines/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 20:16:59 +0000</pubDate>
		<dc:creator>cd34</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Webserver Software]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[openx]]></category>
		<category><![CDATA[phpadsnew]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=617</guid>
		<description><![CDATA[Nginx impressed me with the way it was written and its performance has impressed me as well. This one client has 3 machines that ran Apache2-mpm-worker with php5 running under fastcgi.  While page response time was good, the machines constantly ran at roughly 15% idle cpu time, with roughly 600mb-700mb of the ram used for [...]]]></description>
			<content:encoded><![CDATA[<p>Nginx impressed me with the way it was written and its performance has impressed me as well.</p>
<p>This one client has 3 machines that ran Apache2-mpm-worker with php5 running under fastcgi.  While page response time was good, the machines constantly ran at roughly 15% idle cpu time, with roughly 600mb-700mb of the ram used for cache.  All of the machines are quadcore with 4gb RAM and have been running for quite a while and have been tweaked and tuned along the way.</p>
<p>We started with the conversion of one site on one machine which resulted in the client being so impressed that we converted a second site on that machine which resulted in about 80mb/sec being served from nginx within minutes of deployment.  The next morning after we glanced over everything and confirmed that nginx was holding up, we converted the rest of that machine over to Nginx.  Traffic grew almost 20% after that change.</p>
<p>We started looking at the other machines, one of which runs phpadsnew on a relatively large network of his sites and the banners that are served from two of the main sites on one machine.  Converting those two over to nginx meant another 50mb/sec of traffic swapped from Apache.  Immediately he saw results with faster pageloads of his sites that pulled content from a central domain and with the banner ads being displayed more quickly.  After a few moments of analysis, it was decided to swap the entire machine from Apache2 to Nginx.  That process took a few hours due to the number of virtual hosts and the lack of any real script to migrate the configurations.  Response time on the sites was definitely faster.  After a little more discussion, rather than give that machine a day to settle in to see if we would find any problems, we converted his third machine.</p>
<p>First response in the morning:</p>
<blockquote><p>yesterday we sent 69.1k unique surfers to sponsors, that is the highest we have ever done.</p></blockquote>
<p>While only one of three machines was running Nginx for the entire day, one machine had about 8 hours under Nginx and the other about 2 hours under Nginx for that &#8216;day.&#8217;</p>
<p>Today, the results are somewhat clear.  Traffic is up overall, the machines are much more responsive.  Each machine is now roughly 80% idle and has roughly 2.4gb of memory reserved for cache.</p>
<p><a href="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/75.png"><img class="aligncenter size-medium wp-image-618" src="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/75-300x135.png" alt="75" width="300" height="135" /></a></p>
<p><a href="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/76.png"><img class="aligncenter size-medium wp-image-619" src="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/76-300x135.png" alt="76" width="300" height="135" /></a></p>
<p><a href="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/861.png"><img class="aligncenter size-medium wp-image-620" src="http://cd34.colocdn.com/blog/wp-content/uploads/2009/04/861-300x135.png" alt="861" width="300" height="135" /></a></p>
<p>Backups are scheduled at 3am on the boxes, a few rsync jobs are run to keep some content directories synced between the machines.  Overall you can see the impact on the first graph as the right hand side shows a bit more growth.  The last graph was running nginx, but, struggled to push more than 85mb/sec or so.  The middle graph shows a decline, but, they believe that is external to the process.  The sites are loading more quickly and they expect that the sites will grow quite a bit.  So far, they are reporting roughly an 18% increase in clicks to their sponsor.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/nginx-after-one-day-and-conversion-of-two-more-machines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Varnish and Apache2</title>
		<link>http://cd34.com/blog/scalability/varnish-and-apache2/</link>
		<comments>http://cd34.com/blog/scalability/varnish-and-apache2/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 20:07:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Webserver Software]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://cd34.com/blog/?p=615</guid>
		<description><![CDATA[One client had some issues with Apache2 and a WordPress site. While WordPress isn&#8217;t really a great performer, this client had multiple domains on the same IP and dropping Nginx in didn&#8217;t seem like it would make sense to solve the immediate problem. First things first, we evaluated where the issue was with WordPress and [...]]]></description>
			<content:encoded><![CDATA[<p>One client had some issues with Apache2 and a WordPress site.  While WordPress isn&#8217;t really a great performer, this client had multiple domains on the same IP and dropping Nginx in didn&#8217;t seem like it would make sense to solve the immediate problem.</p>
<p>First things first, we evaluated where the issue was with WordPress and installed db-cache and wp-cache-2.  We had tried wp-super-cache but had seen some issues with it in some configurations.  Immediately the pageload time dropped from 41 seconds to 11 seconds.  Since the machine was running on a quadcore with 4gb ram and was running mostly idle, the only thing left was the 91 page elements being served.  Each pageload, even with pipelining still seemed to cause some stress.  Two external javascripts and one external flash object caused some delay in rendering the page.  The javascripts were actually responsible for holding up the page rendering which made the site seem even slower than it was.  We made some minor modifications, but, while apache2 was configured to serve things as best it could, we felt there was still some room for improvement.</p>
<p>While I had tested <a href="/blog/infrastructure/apache-varnish-nginx-and-lighttpd/">Varnish in front of Apache2</a>, I knew it would make an impact in this situation due to the number of elements on the page and the fact that apache did a lot of work to serve each request.  Varnish and its VCL eliminated a lot of the overhead Apache had and should result in the capacity for roughly 70% better performance.  For this installation, we removed the one IP that was in use by the problem domain from Apache and used that for Varnish and ran Varnish on that IP, using 127.0.0.1 port 80 as the backend.</p>
<p>Converting a site that is in production and live is not for the fainthearted, but, here are a few notes.</p>
<p>For Apache you&#8217;ll want to add a line like this to make sure your logs show the remote IP rather than the IP of the Varnish server:</p>
<pre>
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-A
gent}i\"" varnishcombined
</pre>
<p>Modify each of the VirtualHost configs to say:</p>
<pre>
&lt;VirtualHost 127.0.0.1:80>
</pre>
<p>and change the line for the logfile to say:</p>
<pre>
CustomLog /var/log/apache2/domain.com-access.log varnishcombined
</pre>
<p>Add Listen Directives to prevent Apache from listening to port 80 on the IP address that you want varnish to answer and comment out the default Listen 80:</p>
<pre>
#Listen 80
Listen 127.0.0.1:80
Listen 66.55.44.33:80
</pre>
<p>Configuration changes for Varnish:</p>
<pre>
backend default {
.host = "127.0.0.1";
.port = "80";
}

sub vcl_recv {
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|mp3|mp4|m4a|ogg|mov|avi|wmv)$") {
      lookup;
  }

  if (req.url ~ "\.(css|js)$") {
      lookup;
  }
}
sub vcl_fetch {
        if( req.request != "POST" )
        {
                unset obj.http.set-cookie;
        }

        set obj.ttl = 600s;
        set obj.prefetch =  -30s;
        deliver;
}
</pre>
<p>Shut down Apache, Restart Apache, Start Varnish.</p>
<p>tail -f the logfile for Apache for one of the domains that you have moved.  Go to the site.  Varnish will load everything the first time, but, successive reloads shouldn&#8217;t show requests for images, javascript, css.  For this client we opted to hold things in cache for 10 minutes (600 seconds).</p>
<p>Overall, the process was rather seamless.  Unlike converting a site to Nginx, we are not required to make changes to the rewrite config or worry about setting up a fastcgi server to answer .php requests.  Overall, varnish is quite seamless to the end product.  Clients will lose the ability to do some things like deny hotlinking, but, Varnish will run almost invisibly to the client.  Short of the page loading considerably quicker, the client was not aware we had made any server changes and that is the true measure of success.</p>
]]></content:encoded>
			<wfw:commentRss>http://cd34.com/blog/scalability/varnish-and-apache2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
