<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Random Musings of an Insane Mind</title> <atom:link href="http://cd34.com/blog/feed/" rel="self" type="application/rss+xml" /><link>http://cd34.com/blog</link> <description>This is my blog, there are many others like it but this one is mine.</description> <lastBuildDate>Fri, 05 Mar 2010 23:01:28 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.2</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Wordpress Cache Plugin Benchmarks</title><link>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/</link> <comments>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/#comments</comments> <pubDate>Thu, 04 Mar 2010 15:55:03 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Scalability]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[Varnish]]></category> <category><![CDATA[wordpress]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=900</guid> <description><![CDATA[A lot of time and effort goes into keeping a Wordpress site alive when it starts to accumulate traffic.  While not every site has the same goals, keeping a site responsive and online is the number one priority.  When a surfer requests the page, it should load quickly and be responsive.  Each [...]]]></description> <content:encoded><![CDATA[<p>A lot of time and effort goes into keeping a Wordpress site alive when it starts to accumulate traffic.  While not every site has the same goals, keeping a site responsive and online is the number one priority.  When a surfer requests the page, it should load quickly and be responsive.  Each addon handles caching a little differently and should be used in different cases.</p><p>For many sites, page caching will provide decent performance.  Once your sites starts receiving comments, or people log in, many cache solutions cache too heavily or not enough.  As many solutions as there are, it is obvious that Wordpress underperforms in higher traffic situations.</p><p>The list of caching addons that we&#8217;re testing:</p><p>* <a href="http://wordpress.org/extend/plugins/db-cache/">DB Cache</a> (version 0.6)<br /> * <a href="http://wordpress.org/extend/plugins/db-cache-reloaded/">DB Cache Reloaded</a> (version 2.0.2)<br /> * <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3 Total Cache</a> (version 0.8.5.1)<br /> * <a href="http://wordpress.org/extend/plugins/wp-cache/">WP Cache</a> (version 2.1.2)<br /> * <a href="http://wordpress.org/extend/plugins/wp-super-cache/">WP Super Cache</a> (version 0.9.9)<br /> * <a href="http://wordpress.org/extend/plugins/wp-widget-cache/">WP Widget Cache</a> (version 0.25.2)<br /> * <a href="http://wordpress.org/extend/plugins/wp-file-cache/">WP File Cache</a>(version 1.2.5)<br /> * <a href="http://github.com/pkhamre/wp-varnish">WP Varnish</a> (in beta)<br /> * <a href="http://cd34.com/blog/scalability/wordpress-varnish-and-edge-side-includes/">WP Varnish ESI Widget</a> (in beta)</p><h2>What are we testing?</h2><p>* Frontpage hits<br /> * httpload through a series of urls</p><p>We take two measurements.  The cold start measurement is taken after any plugin cache has been cleared and Apache2 and MySQL have been restarted.  A 30 second pause is inserted prior to starting the tests.  We perform a frontpage hit 1000 times with 10 parallel connections.  We then repeat that test after Apache2 and the caching solution have had time to cache that page.  Afterwards, http_load requests a series of 30 URLs to simulate people surfing other pages.  Between those two measurements, we should have a pretty good indicator of how well a site is going to  perform in real life.</p><h2>What does the Test Environment look like?</h2><p>* Debian 3.1/Squeeze VPS<br /> * Linux Kernel 2.6.33<br /> * Single core of a Xen Virtualized Xeon X3220 (2.40ghz)<br /> * 2gb RAM<br /> * CoW file is written on a Raid-10 System using 4&#215;1tb 7200RPM Drives<br /> * Apache 2.2.14 mpm-prefork<br /> * PHP 5.3.1<br /> * <a href="http://svn.automattic.com/wpcom-themes/test-data.2008-12-22.xml">Wordpress Theme Test Data</a><br /> * Tests are performed from a Quadcore Xeon machine connected via 1000 Base T on the same switch and /24 as the VPS machine</p><p>This setup is designed to replicate what most people might choose to host a reasonably popular wordpress site.</p><h2><a name="tldr">tl;dr Results</a></h2><p>If you aren&#8217;t using Varnish in front of your web site, the clear winner is W3 Total Cache using Page Caching &#8211; Disk (Enhanced), Minify Caching &#8211; Alternative PHP Cache (APC), Database Caching &#8211; Alternative PHP Cache (APC).</p><p>If you can use Varnish, WP Varnish would be a very simple way to gain quite a bit of performance while maintaining interactivity.  WP Varnish purges the cache when posts are made, allowing the site to be more dynamic and not suffer from the long cache delay before a page is updated.</p><p>W3 Total Cache has a number of options and sometimes settings can be quite detrimental to site performance.  If you can&#8217;t use APC caching or Memcached for caching Database queries or Minification, turn both off.  W3 Total Cache&#8217;s interface is overwhelming but the plugin author has indicated that he&#8217;ll be making a new &#8216;Wizard&#8217; configuration menu in the next version along with Fragment Caching.</p><p>WP Super Cache isn&#8217;t too far behind and is also a reasonable alternative.</p><p>Either way, if you want your site to survive, you need to use a cache addon.  Going from 2.5 requests per second to 800+ requests per second makes a considerable difference in the usability of your site for visitors.  Logged in users and search engine bots still see uncached/live results, so, you don&#8217;t need to worry that your site won&#8217;t be indexed properly.</p><h2>Results</h2><style>.tborder{border:1px solid #000}.th{background-color:#ccc}.teven{background-color:#ddd}.tdnum{text-align:right}.trecommend{background-color:#cfc}.thonmen{background-color:#ffc}.tsmall{font-size:8pt}</style><p>Sorted in Ascending order in terms of higher overall performance</p><table class="tborder"><tr class="th"><td>Addon</td><td>Apachebench</td><td colspan="2">Cold Start<br />Warm Start</td><td>http_load</td><td colspan="2">Cold Start<br />Warm Start</td></tr><tr class="th"><td></td><td>Req/Second</td><td>Time/Request</td><td>50% within x ms</td><td>Fetches/Second</td><td>Min First Response</td><td>Avg First Response</td></tr><tr><td>Baseline</td><td class="tdnum">4.97</td><td class="tdnum">201.006</td><td class="tdnum">2004</td><td class="tdnum">15.1021</td><td class="tdnum">335.708</td><td class="tdnum">583.363</td></tr><tr><td class="tdnum"></td><td class="tdnum">5.00</td><td class="tdnum">200.089</td><td class="tdnum">2000</td><td class="tdnum">15.1712</td><td class="tdnum">304.446</td><td class="tdnum">583.684</td></tr><tr class="teven"><td>DB Cache</td><td class="tdnum">4.80</td><td class="tdnum">208.436</td><td class="tdnum">2087</td><td class="tdnum">15.1021</td><td class="tdnum">335.708</td><td class="tdnum">583.363</td></tr><tr class="teven"><td class="tsmall">Cached all SQL queries</td><td class="tdnum">4.81</td><td class="tdnum">207.776</td><td class="tdnum">2091</td><td class="tdnum">15.1712</td><td class="tdnum">304.446</td><td class="tdnum">583.684</td></tr><tr><td>DB Cache</td><td class="tdnum">4.87</td><td class="tdnum">205.250</td><td class="tdnum">2035</td><td class="tdnum">14.1992</td><td class="tdnum">302.335</td><td class="tdnum">621.092</td></tr><tr><td class="tsmall">Out of Box config</td><td class="tdnum">4.94</td><td class="tdnum">202.624</td><td class="tdnum">2026</td><td class="tdnum">14.432</td><td class="tdnum">114.983</td><td class="tdnum">618.434</td></tr><tr class="teven"><td>WP File Cache</td><td class="tdnum">4.95</td><td class="tdnum">201.890</td><td class="tdnum">2009</td><td class="tdnum">15.8869</td><td class="tdnum">158.597</td><td class="tdnum">549.176</td></tr><tr class="teven"><td></td><td class="tdnum">4.99</td><td class="tdnum">200.211</td><td class="tdnum">2004</td><td class="tdnum">16.1758</td><td class="tdnum">99.728</td><td class="tdnum">544.107</td></tr><tr><td>DB Cache Reloaded</td><td class="tdnum">5.02</td><td class="tdnum">199.387</td><td class="tdnum">1983</td><td class="tdnum">15.0167</td><td class="tdnum">187.343</td><td class="tdnum">589.196</td></tr><tr><td class="tsmall">All SQL Queries Cached</td><td class="tdnum">5.03</td><td class="tdnum">200.089</td><td class="tdnum">1985</td><td class="tdnum">14.9233</td><td class="tdnum">150.145</td><td class="tdnum">586.443</td></tr><tr class="teven"><td>DB Cache Reloaded</td><td class="tdnum">5.06</td><td class="tdnum">197.636</td><td class="tdnum">1968</td><td class="tdnum">14.9697</td><td class="tdnum">174.857</td><td class="tdnum">589.161</td></tr><tr class="teven"><td class="tsmall">Out of Box config</td><td class="tdnum">5.08</td><td class="tdnum">196.980</td><td class="tdnum">1968</td><td class="tdnum">15.181</td><td class="tdnum">257.533</td><td class="tdnum">587.737</td></tr><tr><td>Widgetcache</td><td class="tdnum">6.667</td><td class="tdnum">149.903</td><td class="tdnum">1492</td><td class="tdnum">15.0264</td><td class="tdnum">245.332</td><td class="tdnum">602.039</td></tr><tr><td class="tdnum"></td><td class="tdnum">6.72</td><td class="tdnum">148.734</td><td class="tdnum">1487</td><td class="tdnum">15.1887</td><td class="tdnum">299.65</td><td class="tdnum">598.017</td></tr><tr class="teven"><td>W3 Total Cache</td><td class="tdnum">153.45</td><td class="tdnum">65.167</td><td class="tdnum">60</td><td class="tdnum">133.1898</td><td class="tdnum">8.916</td><td class="tdnum">85.7177</td></tr><tr class="teven"><td class="tsmall">DB Cache off, Page Caching with Memcached</td><td class="tdnum">169.46</td><td class="tdnum">59.011</td><td class="tdnum">57</td><td class="tdnum">188.4</td><td class="tdnum">9.107</td><td class="tdnum">50.142</td></tr><tr><td>W3 Total Cache</td><td class="tdnum">173.49</td><td class="tdnum">57.639</td><td class="tdnum">52</td><td class="tdnum">108.898</td><td class="tdnum">7.668</td><td class="tdnum">86.4077</td></tr><tr><td class="tsmall">DB Cache off, Minify Cache with Memcached</td><td class="tdnum">189.76</td><td class="tdnum">52.698</td><td class="tdnum">48</td><td class="tdnum">203.522</td><td class="tdnum">8.122</td><td class="tdnum">43.8795</td></tr><tr class="teven"><td>W3 Total Cache</td><td class="tdnum">171.34</td><td class="tdnum">58.364</td><td class="tdnum">50</td><td class="tdnum">203.718</td><td class="tdnum">8.097</td><td class="tdnum">44.1234</td></tr><tr class="teven"><td class="tsmall">DB Cache using Memcached</td><td class="tdnum">190.01</td><td class="tdnum">52.269</td><td class="tdnum">48</td><td class="tdnum">206.187</td><td class="tdnum">8.186</td><td class="tdnum">42.4438</td></tr><tr><td>W3 Total Cache</td><td class="tdnum">175.29</td><td class="tdnum">57.048</td><td class="tdnum">48</td><td class="tdnum">87.423</td><td class="tdnum">7.515</td><td class="tdnum">107.973</td></tr><tr><td class="tsmall">Out of Box config</td><td class="tdnum">191.15</td><td class="tdnum">52.314</td><td class="tdnum">47</td><td class="tdnum">204.387</td><td class="tdnum">8.288</td><td class="tdnum">43.217</td></tr><tr class="teven"><td>W3 Total Cache</td><td class="tdnum">175.29</td><td class="tdnum">57.047</td><td class="tdnum">51</td><td class="tdnum">204.557</td><td class="tdnum">8.199</td><td class="tdnum">42.9365</td></tr><tr class="teven"><td class="tsmall">Database Cache using APC</td><td class="tdnum">191.19</td><td class="tdnum">52.304</td><td class="tdnum">48</td><td class="tdnum">200.612</td><td class="tdnum">8.11</td><td class="tdnum">44.6691</td></tr><tr><td>W3 Total Cache</td><td class="tdnum">114.02</td><td class="tdnum">87.703</td><td class="tdnum">49</td><td class="tdnum">114.393</td><td class="tdnum">8.206</td><td class="tdnum">82.0678</td></tr><tr><td class="tsmall">Database Cache Disabled</td><td class="tdnum">191.76</td><td class="tdnum">52.150</td><td class="tdnum">49</td><td class="tdnum">203.781</td><td class="tdnum">8.095</td><td class="tdnum">42.558</td></tr><tr class="teven"><td>W3 Total Cache</td><td class="tdnum">175.80</td><td class="tdnum">56.884</td><td class="tdnum">51</td><td class="tdnum">107.842</td><td class="tdnum">7.281</td><td class="tdnum">87.2761</td></tr><tr class="teven"><td class="tsmall">Database Cache Disabled, Minify Cache using APC</td><td class="tdnum">192.01</td><td class="tdnum">52.082</td><td class="tdnum">50</td><td class="tdnum">205.66</td><td class="tdnum">8.244</td><td class="tdnum">43.1231</td></tr><tr><td>W3 Total Cache</td><td class="tdnum">104.90</td><td class="tdnum">95.325</td><td class="tdnum">51</td><td class="tdnum">123.041</td><td class="tdnum">7.868</td><td class="tdnum">74.5887</td></tr><tr><td class="tsmall">Database Cache Disabled, Page Caching using APC</td><td class="tdnum">197.55</td><td class="tdnum">50.620</td><td class="tdnum">46</td><td class="tdnum">210.445</td><td class="tdnum">7.907</td><td class="tdnum">41.4102</td></tr><tr class="teven"><td>WP Super Cache</td><td class="tdnum">336.88</td><td class="tdnum">2.968</td><td class="tdnum">16</td><td class="tdnum">15.1021</td><td class="tdnum">335.708</td><td class="tdnum">583.363</td></tr><tr class="teven"><td class="tsmall">Out of Box config, Half On</td><td class="tdnum">391.59</td><td class="tdnum">2.554</td><td class="tdnum">16</td><td class="tdnum">15.1712</td><td class="tdnum">304.446</td><td class="tdnum">583.684</td></tr><tr><td>WP Cache</td><td class="tdnum">161.63</td><td class="tdnum">6.187</td><td class="tdnum">12</td><td class="tdnum">15.1021</td><td class="tdnum">335.708</td><td class="tdnum">583.363</td></tr><tr><td></td><td class="tdnum">482.29</td><td class="tdnum">20.735</td><td class="tdnum">11</td><td class="tdnum">15.1712</td><td class="tdnum">304.446</td><td class="tdnum">583.684</td></tr><tr class="teven"><td>WP Super Cache</td><td class="tdnum">919.11</td><td class="tdnum">1.088</td><td class="tdnum">3</td><td class="tdnum">190.117</td><td class="tdnum">1.473</td><td class="tdnum">47.9367</td></tr><tr class="teven"><td class="tsmall">Full on, Lockdown mode</td><td class="tdnum">965.69</td><td class="tdnum">1.036</td><td class="tdnum">3</td><td class="tdnum">975.979</td><td class="tdnum">1.455</td><td class="tdnum">9.67185</td></tr><tr class="thonmen"><td>WP Super Cache</td><td class="tdnum">928.45</td><td class="tdnum">1.077</td><td class="tdnum">3</td><td class="tdnum">210.106</td><td class="tdnum">1.468</td><td class="tdnum">43.8167</td></tr><tr class="thonmen"><td class="tsmall">Full on</td><td class="tdnum">970.45</td><td class="tdnum">1.030</td><td class="tdnum">3</td><td class="tdnum">969.256</td><td class="tdnum">1.488</td><td class="tdnum">9.78753</td></tr><tr class="teven"><td>W3 Total Cache</td><td class="tdnum">1143.94</td><td class="tdnum">8.742</td><td class="tdnum">2</td><td class="tdnum">165.547</td><td class="tdnum">0.958</td><td class="tdnum">56.7702</td></tr><tr class="teven"><td class="tsmall">Page Cache using Disk Enhanced</td><td class="tdnum">1222.16</td><td class="tdnum">8.182</td><td class="tdnum">3</td><td class="tdnum">1290.43</td><td class="tdnum">0.961</td><td class="tdnum">7.15632</td></tr><tr class="trecommend"><td>W3 Total Cache</td><td class="tdnum">1153.50</td><td class="tdnum">8.669</td><td class="tdnum">3</td><td class="tdnum">165.725</td><td class="tdnum">0.916</td><td class="tdnum">56.5004</td></tr><tr class="trecommend"><td class="tsmall">Page Caching &#8211; Disk Enhanced, Minify/Database using APC</td><td class="tdnum">1211.22</td><td class="tdnum">8.256</td><td class="tdnum">2</td><td class="tdnum">1305.94</td><td class="tdnum">0.948</td><td class="tdnum">6.97114</td></tr><tr class="teven"><td>Varnish ESI</td><td class="tdnum">2304.18</td><td class="tdnum">0.434</td><td class="tdnum">4</td><td class="tdnum">349.351</td><td class="tdnum">0.221</td><td class="tdnum">28.1079</td></tr><tr class="teven"><td></td><td class="tdnum">2243.33</td><td class="tdnum">0.44689</td><td class="tdnum">4</td><td class="tdnum">4312.78</td><td class="tdnum">0.152</td><td class="tdnum">2.09931</td></tr><tr class="trecommend"><td>WP Varnish</td><td class="tdnum">1683.89</td><td class="tdnum">0.594</td><td class="tdnum">3</td><td class="tdnum">369.543</td><td class="tdnum">0.155</td><td class="tdnum">26.8906</td></tr><tr class="trecommend"><td></td><td class="tdnum">3028.41</td><td class="tdnum">0.330</td><td class="tdnum">3</td><td class="tdnum">4318.48</td><td class="tdnum">0.148</td><td class="tdnum">2.15063</td></tr></table><h2>Test Script</h2><pre>
#!/bin/sh

FETCHES=1000
PARALLEL=10

/usr/sbin/apache2ctl stop
/etc/init.d/mysql restart
apache2ctl start
echo Sleeping
sleep 30
time ( \
echo First Run; \
ab -n $FETCHES -c $PARALLEL http://example.com/; \
echo Second Run; \
ab -n $FETCHES -c $PARALLEL http://example.com/; \
\
echo First Run; \
./http_load -parallel $PARALLEL -fetches $FETCHES wordpresstest; \
echo Second Run; \
./http_load -parallel $PARALLEL -fetches $FETCHES wordpresstest; \
)
</pre><h2>URL File for http_load</h2><pre>

http://example.com/

http://example.com/2010/03/hello-world/

http://example.com/2008/09/layout-test/

http://example.com/2008/04/simple-gallery-test/

http://example.com/2007/12/category-name-clash/

http://example.com/2007/12/test-with-enclosures/

http://example.com/2007/11/block-quotes/

http://example.com/2007/11/many-categories/

http://example.com/2007/11/many-tags/

http://example.com/2007/11/tags-a-and-c/

http://example.com/2007/11/tags-b-and-c/

http://example.com/2007/11/tags-a-and-b/

http://example.com/2007/11/tag-c/

http://example.com/2007/11/tag-b/

http://example.com/2007/11/tag-a/

http://example.com/2007/09/tags-a-b-c/

http://example.com/2007/09/raw-html-code/

http://example.com/2007/09/simple-markup-test/

http://example.com/2007/09/embedded-video/

http://example.com/2007/09/contributor-post-approved/

http://example.com/2007/09/one-comment/

http://example.com/2007/09/no-comments/

http://example.com/2007/09/many-trackbacks/

http://example.com/2007/09/one-trackback/

http://example.com/2007/09/comment-test/

http://example.com/2007/09/a-post-with-multiple-pages/

http://example.com/2007/09/lorem-ipsum/

http://example.com/2007/09/cat-c/

http://example.com/2007/09/cat-b/

http://example.com/2007/09/cat-a/

http://example.com/2007/09/cats-a-and-c/
</pre>]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Using Varnish to assist with AB Testing</title><link>http://cd34.com/blog/webserver/using-varnish-to-assist-with-ab-testing/</link> <comments>http://cd34.com/blog/webserver/using-varnish-to-assist-with-ab-testing/#comments</comments> <pubDate>Fri, 26 Feb 2010 01:48:49 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Webserver Software]]></category> <category><![CDATA[abtest]]></category> <category><![CDATA[analytics]]></category> <category><![CDATA[Varnish]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=896</guid> <description><![CDATA[While working with a recent client project, they mentioned AB Testing a few designs.  While I enjoy statistics, we looked at Google&#8217;s Website Optimizer to track trials and conversions.  After some internal testing, we opted to use Funnels and Goals rather than the AB or Multivariate test.  I had little control over [...]]]></description> <content:encoded><![CDATA[<p>While working with a recent client project, they mentioned AB Testing a few designs.  While I enjoy statistics, we looked at Google&#8217;s Website Optimizer to track trials and conversions.  After some internal testing, we opted to use Funnels and Goals rather than the AB or Multivariate test.  I had little control over the origin server, but I did have control over the front-end cache.</p><p>Our situation reminded me of a situation I encountered years ago.  A client had an inhouse web designer and a subcontracted web designer.  I felt the subcontracted web designer&#8217;s design would convert better.  The client wasn&#8217;t completely convinced, but agreed to running two designs head to head.  However, their implementation of the test biased the results.</p><h2>What went wrong?</h2><p>Each design was run for a week, in series.  While this provided ample time for gathering data, the inhouse designer&#8217;s design ran during a national holiday with a three day weekend, and the subcontractor&#8217;s design ran the following week.  Internet traffic patterns, the holiday weekend, weather, sporting events, TV/Movie premieres, etc. added so many variables which should have invalidated the results.</p><p>Since Google&#8217;s AB Testing has session persistence and splits traffic between the AB tests, we need to emulate this behavior.  When people run AB tests in series rather than parallel, or, switch pages with a cron job or some other automated method, I cringe.  A test at 5pm EST and 6pm EST will yield different results.  At 5pm EST, your target audience could be driving home from work.  At 6pm EST they could be sitting down for dinner.</p><h2>How can Varnish help?</h2><p>If we allow Varnish to select the landing page/offer page outside the origin server&#8217;s control, we can run both tests run at the same time.  An internet logjam in Seattle, WA would affect both tests evenly.  Likewise, a national or worldwide event would affect both tests equally.  Now that we know how to make sure the AB Test is fairly balanced, we have to implement it.</p><p>Redirection sometimes plays havoc on browsers and spiders, so, we&#8217;ll rewrite the URL within Varnish using some Inline C and VCL.  Google uses javascript and a document.location call to send some visitors to the B/alternate page.  Users that have javascript disabled, will only see the Primary page.</p><p>Our Varnish config file contains the following:</p><pre>
sub vcl_recv {
  if (req.url == "/") {
    C{
      char buff[5];
      sprintf(buff,"%d",rand()%2 + 1);
      VRT_SetHdr(sp, HDR_REQ, "\011X-ABtest:", buff);
    }C
    set req.url = "/" req.http.X-ABtest "/" req.url;
  }
}
</pre><p>We&#8217;ve placed our landing pages in /1/ and /2/ directories on our origin server.  The only page Varnish intercepts is the index page at the root of the site.  Varnish randomly chooses to serve the index.html page from /1/ or /2/, internally rewrites our URL and serves it from the cache or the origin server.  Since the URL rewriting is done within vcl_recv, subsequent requests for the page don&#8217;t hit the origin.  The same method can be used to test landing pages that aren&#8217;t at the root of your site by modifying the if (req.url == &#8220;&#8221;) { condition.</p><p>You can test multipage offers by placing additional pages within the /1/ and /2/ directories on your origin along with the signup form.  Unlike Google&#8217;s AB Test, Varnish does not support session persistence.  Reloading the root page will result in the surfer alternating between both test pages.  Subsequent pages need to be loaded from /1/ or /2/ based on which landing page was selected.</p><p>When doing any AB Test, change as few variables as possible, document the changes, and analyze the difference between the results.  Running at least 1000 views of each is an absolute minimum.  While Google&#8217;s Multivariate test provides a lot more options, a simple AB test between two pages or site tours can give some insight into what works rather easily.</p><p>If you cannot use Google&#8217;s AB Test or the Multivariate Test, using their Funnels and Goals tool will still allow you to do AB Testing.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/webserver/using-varnish-to-assist-with-ab-testing/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Varnish VCL, Inline C and a random image</title><link>http://cd34.com/blog/webserver/varnish-vcl-inline-c-and-a-random-image/</link> <comments>http://cd34.com/blog/webserver/varnish-vcl-inline-c-and-a-random-image/#comments</comments> <pubDate>Thu, 18 Feb 2010 23:48:35 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Webserver Software]]></category> <category><![CDATA[inline c]]></category> <category><![CDATA[Varnish]]></category> <category><![CDATA[vcl]]></category> <category><![CDATA[vcl_recv]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=884</guid> <description><![CDATA[While working with the prototype of a site, I wanted to have a particular panel image randomly chosen when the page was viewed.  While this could be done on the server side, I wanted to move this to Varnish so that Varnish&#8217;s cache would be used rather than piping the request through each time [...]]]></description> <content:encoded><![CDATA[<p>While working with the prototype of a site, I wanted to have a particular panel image randomly chosen when the page was viewed.  While this could be done on the server side, I wanted to move this to Varnish so that Varnish&#8217;s cache would be used rather than piping the request through each time to the origin server.</p><p>At the top of /etc/varnish/default.vcl</p><pre>C{
  #include &lt;stdlib.h&gt;
  #include &lt;stdio.h&gt;
}C
</pre><p>and our vcl_recv function gets the following:</p><pre>
  if (req.url ~ "^/panel/") {
    C{
      char buff[5];
      sprintf(buff,"%d",rand()%4);
      VRT_SetHdr(sp, HDR_REQ, "\010X-Panel:", buff);
    }C
    set req.url = regsub(req.url, "^/panel/(.*)\.(.*)$", "/panel/\1.ZZZZ.\2");
    set req.url = regsub(req.url, "ZZZZ", req.http.X-Panel);
  }
</pre><p>The above code allows for us to specify the source code in the html document as:</p><pre>
&lt;img src="/panel/random.jpg" width="300" height="300" alt="Panel Image"/&gt;
</pre><p>Since we have modified the request uri in vcl_recv before the object is cached, subsequent requests for the same modified URI will be served from Varnish&#8217;s cache, without requiring another fetch from the origin server.  Based on the other VCL and preferences, you can specify a long expire time, remove cookies, or do ESI processing.  Since the regexp passes the extension through, we could also randomly choose .html, .css, .jpg or any other extension you desire.</p><p>In the directory panel, you would need to have</p><pre>
/panel/random.0.jpg
/panel/random.1.jpg
/panel/random.2.jpg
/panel/random.3.jpg
</pre><p>which would be served by Varnish when the url /panel/random.jpg is requested.</p><p>Moving that process to Varnish should cut down on the load from the origin server while making your site look active and dynamic.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/webserver/varnish-vcl-inline-c-and-a-random-image/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>SEOProfilerBot, Amazon ECS, and poor programming</title><link>http://cd34.com/blog/infrastructure/seoprofilerbot-amazon-ecs-and-poor-programming/</link> <comments>http://cd34.com/blog/infrastructure/seoprofilerbot-amazon-ecs-and-poor-programming/#comments</comments> <pubDate>Mon, 08 Feb 2010 07:36:05 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Web Infrastructure]]></category> <category><![CDATA[Amazon ECS]]></category> <category><![CDATA[Cuill]]></category> <category><![CDATA[SEOProfilerBot]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=881</guid> <description><![CDATA[This morning a client&#8217;s machine alerted several times due to high load.  As the machine runs roughly 50 wordpress powered sites and rarely has issues, we did some investigation.  Evidently a bot called SEOProfiler was hitting the machine and causing problems. From SEOProfiler&#8217;s page, http://www.seoprofiler.com/bot/, The spbot is bandwidth-friendly. It tries to wait at [...]]]></description> <content:encoded><![CDATA[<p>This morning a client&#8217;s machine alerted several times due to high load.  As the machine runs roughly 50 wordpress powered sites and rarely has issues, we did some investigation.  Evidently a bot called SEOProfiler was hitting the machine and causing problems.</p><p>From SEOProfiler&#8217;s page, http://www.seoprofiler.com/bot/,</p><blockquote><p>The spbot is bandwidth-friendly. It tries to wait at least 5 minutes until it visits another page of your domain. In general, it takes days or weeks until spbot visits another page of your website.</p></blockquote><h2>Oh really?</h2><p>In a three hour period on a machine with 50 domains:</p><pre>
# grep -l '+http://www.seoprofiler.com/bot/' *.log|wc -l
50
# grep '+http://www.seoprofiler.com/bot/' *.log|wc -l
375938
</pre><p>In a period of three and a half hours, I calculate that to be roughly two pages per second requested.</p><p>Let&#8217;s see how friendly they really are:</p><pre>
# grep seoprofiler.com xxxxxx.com-access.log | grep 'GET /robots.txt ' | wc -l
2005
</pre><p>2005 requests for robots.txt in three and a half hours, well, at least they are checking.</p><pre>
# grep seoprofiler.com xxxxxx.com-access.log | grep -v 'GET /robots.txt ' |wc -l
1883
</pre><p>1883 requests for documents in that same period.  They actually requested robots.txt more frequently than pages on this particular domain.  Here are the first 50 lines from one of the sites on this machine with robots.txt requests excluded:</p><pre>
67.202.41.44 - - [07/Feb/2010:06:38:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11857 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.214.118 - - [07/Feb/2010:06:38:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10214 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:38:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71830 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:38:45 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 20829 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.48.58 - - [07/Feb/2010:06:38:48 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 19576 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:39:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 73199 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:39:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60596 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.191.9 - - [07/Feb/2010:06:39:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21406 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:39:51 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 24076 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 29957 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:40:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 9871 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:40:40 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11748 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:40:43 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10781 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.197.161 - - [07/Feb/2010:06:40:44 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14995 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.93.177 - - [07/Feb/2010:06:40:45 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 72244 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.197.86 - - [07/Feb/2010:06:40:57 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 13103 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:40:58 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12032 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:41:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17798 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:41:22 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 38199 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:41:38 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17484 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.197.86 - - [07/Feb/2010:06:41:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 23264 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:41:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17145 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.42.173 - - [07/Feb/2010:06:41:48 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 23440 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.244.231 - - [07/Feb/2010:06:41:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 29496 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.214.118 - - [07/Feb/2010:06:41:52 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 69694 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:41:56 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14958 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12272 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:42:55 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60345 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.16.163 - - [07/Feb/2010:06:43:03 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 16470 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:43:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21739 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:43:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59288 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:43:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11407 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.42.173 - - [07/Feb/2010:06:43:09 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14459 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:43:31 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10561 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.93.177 - - [07/Feb/2010:06:43:46 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14947 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:43:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 19598 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:43:55 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12090 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:44:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11853 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:44:16 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11612 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.41.44 - - [07/Feb/2010:06:44:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71920 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:44:22 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14007 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.191.9 - - [07/Feb/2010:06:44:31 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 130288 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:45:01 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21739 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:45:26 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 18281 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:45:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59638 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:45:40 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12372 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:46:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14353 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:46:07 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 27416 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:46:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 22271 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.197.161 - - [07/Feb/2010:06:46:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14548 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
</pre><p>While we don&#8217;t see many duplicate IPs here, let&#8217;s analyze the one that has six hits:</p><pre>
174.129.65.79 - - [07/Feb/2010:06:38:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71830 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:39:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60596 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:40:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 9871 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:41:38 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17484 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:45:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59638 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:46:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14353 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
</pre><p>The longest delay between page fetches is 3 minutes, 54 seconds, with a minimum of 28 seconds.</p><p>In that same period of time, you can see that they used a number of Amazon ECS instances:</p><pre>
  10655 67.202.0.47
  10454 204.236.242.36
  10353 174.129.103.67
  10343 75.101.254.111
  10295 204.236.197.86
  10128 174.129.65.79
   9908 174.129.191.9
   9883 75.101.214.118
   9835 72.44.54.185
   9833 72.44.42.173
   9769 174.129.136.94
   9718 75.101.197.161
   9290 174.129.106.91
   9063 72.44.48.77
   9017 174.129.152.208
   8850 204.236.212.138
   8712 174.129.93.177
   8423 174.129.140.41
   8415 67.202.41.44
   8302 67.202.16.163
   8116 72.44.57.92
   7923 204.236.245.5
   7633 75.101.219.131
   7519 67.202.48.58
   7510 174.129.72.66
   7429 67.202.2.164
   7356 174.129.155.12
   7335 174.129.172.253
   7036 75.101.214.102
   6998 67.202.42.161
   6835 174.129.159.143
   6109 204.236.244.231
   6002 174.129.127.87
   5961 75.101.168.14
   5841 174.129.84.116
   5201 174.129.163.50
   5114 72.44.49.238
   4744 174.129.153.52
   4654 75.101.241.159
   4615 204.236.241.141
   4585 75.101.179.97
   4463 174.129.61.74
   4387 75.101.179.141
   4379 72.44.56.37
   4332 75.101.187.208
   4169 67.202.56.227
   4106 204.236.211.119
   4075 174.129.93.123
   3722 204.236.242.141
   3332 67.202.11.26
   3276 67.202.0.31
   3097 174.129.171.75
   2360 75.101.234.148
   1837 174.129.136.47
   1689 67.202.56.158
    853 67.202.10.125
     67 75.101.204.87
     14 204.236.212.231
     12 174.129.144.34
      6 174.129.106.64
</pre><p>Even if we look at only one of the domains that was spidered:</p><pre>
    125 72.44.48.77
    123 174.129.140.41
    112 174.129.65.79
    109 75.101.254.111
    108 174.129.172.253
    104 75.101.197.161
    104 174.129.93.177
    104 174.129.103.67
    102 204.236.197.86
    102 174.129.136.94
    101 67.202.2.164
     99 75.101.214.118
     98 67.202.0.47
     96 67.202.48.58
     95 204.236.212.138
     93 174.129.106.91
     86 67.202.41.44
     85 72.44.54.185
     84 204.236.242.36
     82 75.101.219.131
     82 72.44.42.173
     76 67.202.42.161
     76 174.129.191.9
     75 174.129.152.208
     73 72.44.57.92
     73 67.202.16.163
     71 75.101.168.14
     71 174.129.159.143
     68 204.236.245.5
     68 174.129.72.66
     61 174.129.155.12
     60 204.236.244.231
     60 204.236.211.119
     59 174.129.153.52
     58 72.44.49.238
     54 72.44.56.37
     54 174.129.93.123
     54 174.129.61.74
     51 75.101.179.141
     51 174.129.163.50
     50 204.236.242.141
     47 174.129.127.87
     45 75.101.241.159
     44 75.101.214.102
     43 67.202.56.227
     42 174.129.171.75
     41 67.202.11.26
     40 67.202.0.31
     39 75.101.187.208
     39 204.236.241.141
     36 174.129.84.116
     32 75.101.179.97
     30 75.101.234.148
     22 174.129.136.47
     19 67.202.56.158
     12 67.202.10.125
</pre><p>While their goals stated on their page are admirable, it is clear that they lack some understanding of how ECS works.  Writing code to run across distributed instances is not a simple process, so, I can see where handing out spider assignments to nodes could run into problems.  But, looking at a single IP address, we can see that their bot probably doesn&#8217;t maintain state between fetches since it fetches robots.txt prior to each URL and then violates their &#8216;no more than one page every five minutes&#8217;.</p><pre>
72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET / HTTP/1.1" 200 29957 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:40 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:41 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 12272 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:49:26 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:49:26 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 16855 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:53:11 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:53:11 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 68020 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
</pre><p>Based on the times, I don&#8217;t believe they could have spun up a new ECS instance on the same IP address which leads me to believe that they are spidering links from the site and requesting robots.txt each time.</p><p>While I believe using cloud services is a good thing, companies like this that abuse it are going to cause problems for other people that adopt the same methods.  Amazon&#8217;s ECS instances have already hit numerous anti-spam blacklists due to Amazon&#8217;s lax policy or inability to quickly track down spam.  While I have resisted the temptation to block ECS instances for inbound email, this client requested that we block the IP addresses that SEOProfilerBot was coming from &#8211; which means that any other search engine that comes along that uses Amazon&#8217;s ECS will not be able to reach his sites.</p><p>Cuill did the same thing to his sites a while back and we altered the robots.txt file, but, that didn&#8217;t stop the constant pounding from their spiders that had already fetched the robots.txt.</p><p>At some point, Amazon ECS and other cloud vendors will be firewalled from large portions of the net &#8212; limiting the usefulness of writing applications that run on the cloud.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/infrastructure/seoprofilerbot-amazon-ecs-and-poor-programming/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>unable to mount root fs on unknown-block(0,0)</title><link>http://cd34.com/blog/infrastructure/unable-to-mount-root-fs-on-unknown-block00/</link> <comments>http://cd34.com/blog/infrastructure/unable-to-mount-root-fs-on-unknown-block00/#comments</comments> <pubDate>Sun, 31 Jan 2010 16:56:09 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Web Infrastructure]]></category> <category><![CDATA[Adaptec 31205]]></category> <category><![CDATA[kernel]]></category> <category><![CDATA[linux]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=879</guid> <description><![CDATA[After building a system for the new backup servers that utilized an Adaptec 31205 controller, I always prefer to use a kernel that we&#8217;ve tuned inhouse. Upon booting into the kernel I had built, I received: unable to mount root fs on unknown-block(0,0) Since the drive size on the array was very large, the Debian Installer automatically created [...]]]></description> <content:encoded><![CDATA[<p>After building a system for the new backup servers that utilized an Adaptec 31205 controller, I always prefer to use a kernel that we&#8217;ve tuned inhouse.</p><p>Upon booting into the kernel I had built, I received:</p><pre>unable to mount root fs on unknown-block(0,0)</pre><p>Since the drive size on the array was very large, the Debian Installer automatically created an EFI GUID Partition table, which my kernel was not set up for.</p><p>In the kernel makemenu, <strong>File Systems</strong>, <strong>Partition Types</strong>, enable <strong>Advanced partition selection</strong>.  Near the bottom is <strong>EFI GUID Partition support</strong>.  Enable that, recompile your kernel and you should be set.</p><p>One reboot later and voila:</p><pre>
st1:/colobk1# uname -a
Linux st1 2.6.32.7 #1 SMP Fri Jan 29 21:43:32 EST 2010 x86_64 GNU/Linux
st1:/colobk1# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             462M  232M  207M  53% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                   10M   60K   10M   1% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sda8              19T  305G   18T   2% /colobk1
/dev/sda5             1.9G   55M  1.8G   3% /home
/dev/sda4             949M  4.2M  945M   1% /tmp
/dev/sda6             2.4G  204M  2.2G   9% /usr
/dev/sda7             9.4G  237M  9.1G   3% /var
</pre>]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/infrastructure/unable-to-mount-root-fs-on-unknown-block00/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Django CMS to support Varnish and Akamai ESI</title><link>http://cd34.com/blog/framework/django-cms-to-support-varnish-and-akamai-esi/</link> <comments>http://cd34.com/blog/framework/django-cms-to-support-varnish-and-akamai-esi/#comments</comments> <pubDate>Fri, 18 Dec 2009 22:50:00 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Framework]]></category> <category><![CDATA[akamai]]></category> <category><![CDATA[cms]]></category> <category><![CDATA[django]]></category> <category><![CDATA[esi]]></category> <category><![CDATA[pylons]]></category> <category><![CDATA[Varnish]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=870</guid> <description><![CDATA[Many years ago I ran into a situation with a client where the amount of traffic they were receiving was crushing their dynamically created site.  Computation is always the enemy of a quick pageload, so, it is very important to do as little computation as possible when delivering a page. While there are many ways [...]]]></description> <content:encoded><![CDATA[<p>Many years ago I ran into a situation with a client where the amount of traffic they were receiving was crushing their dynamically created site.  Computation is always the enemy of a quick pageload, so, it is very important to do as little computation as possible when delivering a page.</p><p>While there are many ways to put together a CMS, high traffic CMS sites usually involve caching or lots of hardware.  Some write static files which are much less strenuous, but, you lose some of the dynamic capabilities.  Fragment caching becomes a method to make things a bit more dynamic as <a href="http://masonhq.com/">MasonHQ</a> does with their page and block structure. <a href="http://code.google.com/p/django-blocks/">Django-blocks</a> was surely influenced by this or reinvented this method.</p><p>In order to get the highest performance out of a CMS with a page and block method, I had considered writing a filesystem or inode linklist that would allow the webserver to assemble the page by following the inodes on the disk to build the page.  Obviously there are some issues here, but, if a block was updated by a process, it would automatically be reassembled.  This emulates a write-through cache and would have provisions for dynamic content to be mixed in with the static content on disk.  Assembly of the page still takes more compute cycles than a static file but is significantly less than dynamically creating the page from multiple queries.</p><p>That design seriously limits the ability to deploy the system widely.  While I can control the hosting environment for personal projects, the CMS couldn&#8217;t gain wide acceptance.  While Varnish is a rather simple piece of software to install, it does limit deploy-ability, but, provides a significant piece of the puzzle due to Edge Side Includes (ESI).  If the CMS gets used beyond personal and small deployments, Akamai supports Edge Side Includes as well.</p><p>Rather than explain ESI, <a href="http://www.trygve-lie.com/blog/entry/esi_explained_simple">ESI Explained Simply</a> contains about the best writeup I&#8217;ve seen to date to explain how ESI can be used.</p><p>The distinction here is using fragment caching controlled by ESI to represent different zones on the page.  As a simple example, lets consider our page template contains an article and a block with the top five articles on the site.  When a new post is added, we can expire the block that contains the top five articles so that it is requested on the next page fetch.  Since the existing article didn&#8217;t change, the interior ESI included block doesn&#8217;t need to be purged.  This allows the page to be constructed on the Edge rather than on the Origin server.</p><p>As I have worked with a number of PHP frameworks, none really met my needs so I started using Python frameworks roughly two years ago.  For this CMS, I debated using Pylons or Django and ended up choosing <a href="http://www.djangoproject.com/">Django</a>.  Since both can be run behind WSGI compliant servers, we&#8217;ve opened ourselves up to a number of potential solutions.  Since we are running Varnish in front of our Origin server, we can run Apache2 with mod_wsgi, but, we&#8217;re not limited to that configuration.  At this point, we have a relatively generic configuration the CMS can run on, but, there are many other places we can adapt the configuration for our preferences.</p><p>Some of the potential caveats:<br /> * With <a href="http://varnish.projects.linpro.no/">Varnish</a> or <a href="http://www.akamai.com/">Akamai</a> as a frontend, we need to pay closer attention to X-Forwarded-For:<br /> * Web logs won&#8217;t exist because Varnish is serving and assembling the pages (There is a trick using ESI that could be employed if logging was critical)<br /> * ESI processed pages with Varnish are not compressed.  This is on their <a href="http://varnish.projects.linpro.no/wiki/PostTwoShoppingList">wishlist</a>.</p><p>Features:<br /> * Content can exist in multiple categories or tags<br /> * Flexible URL mapping<br /> * Plugin architecture for Blocks and Elements<br /> * Content will maintain revisions and by default allow comments and threaded comments</p><p>Terms:<br /> * Template &#8211; the graphical layout of the page with minimal CMS markup<br /> * Element &#8211; the graphical template that is used to render a Block<br /> * Block &#8211; a module that generates the data rendered by an Element<br /> * Page &#8211; a Page determined by a Title, Slug and elements<br /> * Content &#8211; The actual data that rendered by a block</p><p>Goals:<br /> * Flexible enough to handle something as simple as a personal blog, but, also capable of powering a highly trafficed site.<br /> * Data storage of common elements to handle publishing of content and comments with the ability to store information to allow threaded comments.  This would allow the CMS to handle a blog application, a CMS, or, a forum.<br /> * A method to store ancillary data in a model so that upgrades to the existing database model will not affect developed plugins.<br /> * Block system to allow prepackaged css/templating while allowing local replacement without affecting the default package.<br /> * Upgrades through pypy or easy_install.<br /> * Ability to add CDN/ESI without needing to modify templates.  The system will run without needing to be behind Varnish, but, its full power won&#8217;t be realized without Varnish or Akamai in front of the origin server.<br /> * Seamless integration of affiliate referral tracking and conversion statistics</p><p>At this point, the question in my mind was whether or not to start with an existing project and adapt it or start from scratch.  At this point, the closest Django CMS I could find was Django-Blocks and I do intend to look it over fairly closely, but, a cursory look showed the authors were taking it in a slightly different direction than I anticipated.  I&#8217;ll certainly look through the code again, but, the way I&#8217;ve envisioned this, I think there are some fundamental points that clash.</p><p>As I already have much of the database model written for an older PHP CMS that I wrote, I&#8217;m addressing some of the shortcomings I ran across with that design and modifying the models to be a little more generic.  While I am sure there are proprietary products that currently utilize ESI, I believe my approach is unique and flexible enough to power everything from a blog to a site or forums or even a classified ads site.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/framework/django-cms-to-support-varnish-and-akamai-esi/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Journalistic Responsibility</title><link>http://cd34.com/blog/boring/journalistic-responsibility/</link> <comments>http://cd34.com/blog/boring/journalistic-responsibility/#comments</comments> <pubDate>Mon, 14 Dec 2009 06:53:07 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Boring Stuff]]></category> <category><![CDATA[Facebook Pro]]></category> <category><![CDATA[journalism]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=855</guid> <description><![CDATA[A week or two ago, a story broke regarding a security upgrade in Windows.  In the race to scoop the story first, facts were not checked, the validity of the story was based on a blog post at a security company. Ed Bott @ Ziff Davis covered it in What the &#8220;Black screen [...]]]></description> <content:encoded><![CDATA[<p>A week or two ago, a story broke regarding a security upgrade in Windows.  In the race to scoop the story first, facts were not checked, the validity of the story was based on a blog post at a security company.</p><p>Ed Bott @ Ziff Davis covered it in <a href="http://blogs.zdnet.com/Bott/?p=1575">What the &#8220;Black screen of death&#8221; story says about tech journalism</a>.</p><p>Even TechCrunch falls into this with a spoofed <a href="http://www.techcrunch.com/2009/12/06/eric-schmidt-twitter/">Eric Schmidt joins Twitter</a>.  Post first, ask later.  Rather than correct the incorrect article, let it run for the adviews.</p><p>Since the introduction of the Internet, journalistic accuracy has dropped substantially.  While spell-check should eliminate most of the errors, typographic errors occur frequently.  The number of journalists that get your and you&#8217;re confused or their and there is staggering.  Tribune Media, CNN/Turner, ABC, Fox and MSNBC are not immune.  Associated Press, Reuters and United Press International remain news leaders with accurate, verified and grammatically correct articles.  With the downturn in paper journalism, competent writers have been replaced with less expensive writers that are more interested in the number of bylines they can generate than the quality of their work.</p><p>To test a theory, a mock-up of a Facebook Beta application, a ruse posted on a few news sites with corroborating evidence and a &#8216;hot tip&#8217; to two media outlets resulted in 31 different locations picking up on the post, 2700 or so retweets and precisely one site validating the facts.</p><p>The first site it was posted to, Hacker News, suspected it was fake almost immediately.  However, they missed the significance of the names chosen, the times that the other comments were posted and the sequence of names.  Hackers indeed.  A spoof post about a <a href="http://news.ycombinator.com/item?id=928054">hamster falling into the LHC</a> stayed within the top 210 posts for almost four days before enough &#8216;news&#8217; displaced it.</p><p>In the end, it took a security person from Facebook to post and the thread was <a href="http://news.ycombinator.com/item?id=990454">subsequently killed</a>.  Did Facebook violate someone&#8217;s privacy to get to the bottom of this?  There sure wasn&#8217;t much red tape for the Facebook engineer to peer into someone&#8217;s profile to get to the bottom of it.</p><p><a href="http://thenextweb.com/appetite/2009/12/11/facebook-pro-tested-hoax/">TheNextWeb</a> suspected something was amiss and updated their post throughout the day clearly indicating the updates.  Martin Bryant contacted me via email to ask quite directly whether the information was true.  This is good journalism.</p><p>I suppose most of the sites that ran the story are just pulling RSS feeds from somewhere with no editorial oversight.  A trusted syndicated source could distribute a hoax fairly widely and the remnants would be available on the web and search engines for years.</p><p>Do sites knowingly run with incorrect headlines in search of ad dollars associated with a hot story &#8212; hoax or not?  Three sites that picked up the story clearly wanted the the hysteria and hype to drive adviews.</p><p>In the end, the glut of news available at our fingertips means that the overall quality of news has diminished.  Is there a solution?  With automation moving at breakneck speed, it is a problem we&#8217;re going to have to deal with for quite some time.  Even Google&#8217;s news site presents stories without any editorial control and would be a difficult, but not impossible vector to exploit.</p><p>Peer reviewed news isn&#8217;t the answer as so many sites have proven and editorially controlled sites contain bias no matter how independent they claim to be.</p><p>Want to design the killer app of 2010?  Fix news distribution.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/boring/journalistic-responsibility/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Facebook Pro &#8211; Facebook&#8217;s Revenue Stream</title><link>http://cd34.com/blog/boring/facebook-pro-facebooks-revenue-stream/</link> <comments>http://cd34.com/blog/boring/facebook-pro-facebooks-revenue-stream/#comments</comments> <pubDate>Fri, 11 Dec 2009 20:56:30 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Boring Stuff]]></category> <category><![CDATA[facebook]]></category> <category><![CDATA[Facebook Pro]]></category> <category><![CDATA[Google Voice]]></category> <category><![CDATA[google wave]]></category> <category><![CDATA[linkedin]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=850</guid> <description><![CDATA[I&#8217;ve always been an early adopter of technology, social media and new websites that had a technological edge.  I read quite a few of the tech news websites and love to get in on early beta and beta offerings from companies.  One of my recent favorite betas that I was invited to was [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve always been an early adopter of technology, social media and new websites that had a technological edge.  I read quite a few of the tech news websites and love to get in on early beta and beta offerings from companies.  One of my recent favorite betas that I was invited to was <a href="http://lite.facebook.com/">lite.facebook.com</a>.  On the surface, it seemed to lack a certain finesse, but, the biggest feature it had was that it was extremely quick, lacked the application spam and let me see 99% of what I was interested in.</p><p>I&#8217;ve loved Google Voice and was a fairly early adopter.  I had tried Grand Central, but, it didn&#8217;t replace enough functionality with what I had currently set up with the local phone company.  Google Wave and their Sandbox is another product that I find very intriguing.  I have worked with Wave Federations and I think once someone develops a killer app for Wave, it&#8217;ll gain wide acceptance.</p><p>But, this isn&#8217;t about Google, this is about Facebook.</p><p>I was an early adopter of FB Connect.  I&#8217;ve written a number of applications that I&#8217;ve not released to experiment with their API and have been generally impressed by their openness.  Some of the information an application is able to access is a privacy nightmare.  People complain day in and day out about Google and Privacy &#8211; perhaps because Google has to collect all of its market intelligence based on your surfing habits, and then Facebook finds a way to have you spend hours customizing your profile &#8211; giving Facebook precisely the information that makes their advertising system 10x more intrusive than Google could ever be.  Back to the point.</p><p>In August I received an email from Facebook asking if I would participate in another beta project.   I was warned that this one would entail a purchase from their store, but, in exchange, I would receive credit towards advertising.  It makes perfect sense to test the payment system ahead of time on a major release &#8211; something many new electronic stores fail to do.  I clicked the link saying I would be a part of their beta and waited.</p><p>And waited.</p><p>Last night, a very cryptic email arrived with a link to follow to read about this exciting new product Facebook had to offer.  As I read the page, I was already pulling out my wallet to get my credit card because the service seemed perfect for me.  Having to maintain a LinkedIn profile and a Facebook Profile has always been an exercise in duplication.  Facebook doesn&#8217;t ask enough questions to really be useful in business and I suspect if they put their heads together, they could develop a new angle.</p><p>It appears they listened.</p><p>The page was very basic, it talked about the benefits of a &#8216;Facebook Pro&#8217; account, pricing hadn&#8217;t been established but they had set a test price of $29.95 for a 6 month recurring membership.</p><p>Some of the benefits listed included:</p><p>* Ability to store Work History<br /> * Ability to write Recommendations on profiles<br /> * Tighter control over Profile Security<br /> * Additional Contact Method fields<br /> * Certification badges<br /> * Digital Business cards</p><p><a href="http://cd34.colocdn.com/blog/wp-content/uploads/2009/12/facebook-pro-beta.png"><img src="http://cd34.colocdn.com/blog/wp-content/uploads/2009/12/facebook-pro-beta-300x181.png" alt="facebook pro beta" title="facebook pro beta" width="300" height="181" class="aligncenter size-medium wp-image-853" /></a></p><p>Once you get in, there is a small NDA that prevents screenshots of the interface, but, it is obvious that there are hundreds of people in the beta.  Even as I have set up some business interests, it is listing profiles in a &#8216;Business Network&#8217; that are staggeringly accurate.  A refreshing change from the People You May Know lottery.</p><p>So far, the new options are quite intriguing and if the quality of the business contacts I&#8217;ve made in the beta are indicative of the trend, I think Facebook has a real winner here.</p><p>I found it interesting that the beta was released which allows tighter control over privacy the day after they release new privacy options that the masses are hailing as anti-privacy.  Perhaps this is why Facebook chose this week to release the beta.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/boring/facebook-pro-facebooks-revenue-stream/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Upgraded GFS2 Cluster Tools from 2.2 to 3.0.4</title><link>http://cd34.com/blog/infrastructure/upgraded-gfs2-cluster-tools-from-2-2-to-3-0-4/</link> <comments>http://cd34.com/blog/infrastructure/upgraded-gfs2-cluster-tools-from-2-2-to-3-0-4/#comments</comments> <pubDate>Thu, 10 Dec 2009 18:05:28 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Web Infrastructure]]></category> <category><![CDATA[cman]]></category> <category><![CDATA[dlm_controld]]></category> <category><![CDATA[gfs2]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=861</guid> <description><![CDATA[With a few words of warning, we upgraded one of our clusters from 2.2 to 3.0.4.  While this is normally a seamless project, it needed to be coordinated with both storage nodes in the cluster since the changes from 2.2 to 3.0 in openais were incompatible.  Some minor changes to the cluster config [...]]]></description> <content:encoded><![CDATA[<p>With a few words of warning, we upgraded one of our clusters from 2.2 to 3.0.4.  While this is normally a seamless project, it needed to be coordinated with both storage nodes in the cluster since the changes from 2.2 to 3.0 in openais were incompatible.  Some minor changes to the cluster config file were needed which results in a cleaner file, and, a new dependency for rgmanager was added for the upgrade to 3.0.</p><p>This meant some downtime while openais was upgraded.  Since we run behind a pair of load balancers, we were able to shut down the first filesystem, disconnect it from cman, upgrade one side, shut off the services on the other, bring this side up, bring up services, then upgrade the second node.</p><p>While this should have worked, cman on the primary node had no problem, but the secondary node refused to start dlm_controld.</p><pre>
Dec 10 12:29:20 dlm_controld dlm_controld 3.0.4 started
Dec 10 12:29:30 dlm_controld cannot find device /dev/misc/lock_dlm_plock with minor 58
</pre><p>For some odd reason, lock_dlm_plock was created in /dev rather than /dev/misc after the udev upgrade.  Moving it into place allowed cman to start on the second node, and, allowed the cluster to run in non-degraded mode.</p><p>Why lock_dlm_plock was in the wrong place on one node and in the correct place on the other node, I&#8217;m not sure.  I think prior to rgmanager being installed, the init script for cman didn&#8217;t stop when dlm couldn&#8217;t be loaded, and since the /dev/misc folder hadn&#8217;t been created, it created the lock file in /dev.  Subsequent restarts of the machine have resulted in it coming up without an issue, so, it appears to be an issue somewhere in one of the startup scripts.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/infrastructure/upgraded-gfs2-cluster-tools-from-2-2-to-3-0-4/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>No ESI processing, first char not &#8216;</title><link>http://cd34.com/blog/infrastructure/no-esi-processing-first-char-not/</link> <comments>http://cd34.com/blog/infrastructure/no-esi-processing-first-char-not/#comments</comments> <pubDate>Wed, 02 Dec 2009 03:26:49 +0000</pubDate> <dc:creator>cd34</dc:creator> <category><![CDATA[Web Infrastructure]]></category> <category><![CDATA[edge side include]]></category> <category><![CDATA[esi]]></category> <category><![CDATA[Varnish]]></category><guid isPermaLink="false">http://cd34.com/blog/?p=837</guid> <description><![CDATA[After installing Varnish 2.0.5 on a machine, ESI Includes didn&#8217;t work.  When using varnishlog, the first error that occurred when debugging was: No ESI processing, first char not &#8216;< '12 SessionClose &#8211; timeout 12 StatSess     &#8211; 124.177.181.149 50662 4 0 0 0 0 0 0 0 [...]]]></description> <content:encoded><![CDATA[<p>After installing Varnish 2.0.5 on a machine, ESI Includes didn&#8217;t work.  When using varnishlog, the first error that occurred when debugging was:</p><p>No ESI processing, first char not &#8216;< '</p><pre>
   12 SessionClose &#8211; timeout
   12 StatSess     &#8211; 124.177.181.149 50662 4 0 0 0 0 0 0 0
   12 SessionOpen  c 68.212.183.136 60087 66.244.147.44:80
   12 ReqStart     c 68.212.183.136 60087 409391565
   12 RxRequest    c GET
   12 RxURL        c /esi.html
   12 RxProtocol   c HTTP/1.1
   12 RxHeader     c Host: cd34.colocdn.com
   12 RxHeader     c User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b4) Gecko/20091124 Firefox/3.6b4
   12 RxHeader     c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
   12 RxHeader     c Accept-Language: en-us,en;q=0.5
   12 RxHeader     c Accept-Encoding: gzip,deflate
   12 RxHeader     c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   12 RxHeader     c Keep-Alive: 115
   12 RxHeader     c Connection: keep-alive
   12 RxHeader     c X-lori-time-1: 1259718658980
   12 RxHeader     c Cache-Control: max-age=0
   12 VCL_call     c recv
   12 VCL_return   c lookup
   12 VCL_call     c hash
   12 VCL_return   c hash
   12 VCL_call     c miss
   12 VCL_return   c fetch
   12 Backend      c 14 cd34_com cd34_com
   12 ObjProtocol  c HTTP/1.1
   12 ObjStatus    c 200
   12 ObjResponse  c OK
   12 ObjHeader    c Date: Wed, 02 Dec 2009 01:50:59 GMT
   12 ObjHeader    c Server: Apache
   12 ObjHeader    c Vary: Accept-Encoding
   12 ObjHeader    c Content-Encoding: gzip
   12 ObjHeader    c Content-Type: text/html
   12 TTL          c 409391565 RFC 120 1259718659 0 0 0 0
   12 VCL_call     c fetch
   12 TTL          c 409391565 VCL 43200 1259718659
   12 ESI_xmlerror c No ESI processing, first char not &#8216;< '
   12 TTL          c 409391565 VCL 0 1259718659
   12 VCL_info     c XID 409391565: obj.prefetch (-30) less than ttl (-1), ignored.
   12 VCL_return   c deliver
   12 Length       c 68
   12 VCL_call     c deliver
   12 VCL_return   c deliver
   12 TxProtocol   c HTTP/1.1
   12 TxStatus     c 200
   12 TxResponse   c OK
   12 TxHeader     c Server: Apache
   12 TxHeader     c Vary: Accept-Encoding
   12 TxHeader     c Content-Encoding: gzip
   12 TxHeader     c Content-Type: text/html
   12 TxHeader     c Content-Length: 68
   12 TxHeader     c Date: Wed, 02 Dec 2009 01:50:59 GMT
   12 TxHeader     c X-Varnish: 409391565
   12 TxHeader     c Age: 0
   12 TxHeader     c Via: 1.1 varnish
   12 TxHeader     c Connection: keep-alive
   12 TxHeader     c X-Cache: MISS
   12 ReqEnd       c 409391565 1259718659.088263512 1259718659.127703667 0.000059366 0.039401770 0.000038385
   12 Debug        c "herding"
</pre><p>ESI received significant performance enhancements in 2.0.4 and 2.0.5 so, it seemed something was incompatible.  Downgrading to 2.0.3 and using the VCL from another machine still resulted in ESI not working.</p><p>In this case, mod_deflate was running on the backend which was causing the issue.  However, in reading the source code, it appears that message could also occur if your ESI include wasn&#8217;t handing back properly formed XML/HTML content.  If your include doesn&#8217;t contain valid content and is only returning a small snippet, you might consider passing:</p><pre>
-p esi_syntax=0x1
</pre><p>on the command line that starts Varnish.</p><p>The changes in Varnish address the issue of ESI being enabled on binary content.  Since the first character isn&#8217;t an < in almost all binary files (jpg, mpg, gif) and isn't the start of most .css/.js files, varnish doesn't need to spend extra time checking those files for includes.  While you can and should selectively enable esi processing, this is just an added safeguard and a performance boost to compensate for vcl that might have an esi directive on static/binary content.</p><p>Since Varnish 2.0.3 now worked properly with the new machine, we upgraded to Varnish 2.0.5 which introduced a very odd issue:</p><pre>
[Tue Dec 01 20:58:11 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.htmlt
[Tue Dec 01 20:58:13 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html7
[Tue Dec 01 20:58:24 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xfa
[Tue Dec 01 20:59:01 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xb5
[Tue Dec 01 20:59:06 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xe7
[Tue Dec 01 20:59:07 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xd4
[Tue Dec 01 20:59:08 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\x1c</pre><p>This generated 404s on the piece of the page that contained the ESI include.  Downgrading to 2.0.4 fixed the issue and the issue appears to already be fixed in Trunk. <a href="http://varnish.projects.linpro.no/ticket/585">Varnish Ticket #585</a></p><p>Varnish 2.0.4 and mod_deflate disabled addressed the two issues that prevented ESI from working correctly on this new installation.</p> ]]></content:encoded> <wfw:commentRss>http://cd34.com/blog/infrastructure/no-esi-processing-first-char-not/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
<!-- This site's performance optimized by W3 Total Cache. Dramatically improve the speed and reliability of your blog!

Learn more about our WordPress Plugins: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (enhanced) (user agent is rejected)

Served from: c1ws1.mia.colo-cation.com @ 2010-03-11 00:49:49 -->