Archive for the ‘Web Infrastructure’ Category

SEOProfilerBot, Amazon ECS, and poor programming

Monday, February 8th, 2010

This morning a client’s machine alerted several times due to high load. As the machine runs roughly 50 wordpress powered sites and rarely has issues, we did some investigation. Evidently a bot called SEOProfiler was hitting the machine and causing problems.

From SEOProfiler’s page, http://www.seoprofiler.com/bot/,

The spbot is bandwidth-friendly. It tries to wait at least 5 minutes until it visits another page of your domain. In general, it takes days or weeks until spbot visits another page of your website.

Oh really?

In a three hour period on a machine with 50 domains:

# grep -l '+http://www.seoprofiler.com/bot/' *.log|wc -l
50
# grep '+http://www.seoprofiler.com/bot/' *.log|wc -l
375938

In a period of three and a half hours, I calculate that to be roughly two pages per second requested.

Let’s see how friendly they really are:

# grep seoprofiler.com xxxxxx.com-access.log | grep 'GET /robots.txt ' | wc -l
2005

2005 requests for robots.txt in three and a half hours, well, at least they are checking.

# grep seoprofiler.com xxxxxx.com-access.log | grep -v 'GET /robots.txt ' |wc -l
1883

1883 requests for documents in that same period. They actually requested robots.txt more frequently than pages on this particular domain. Here are the first 50 lines from one of the sites on this machine with robots.txt requests excluded:

67.202.41.44 - - [07/Feb/2010:06:38:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11857 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.214.118 - - [07/Feb/2010:06:38:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10214 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:38:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71830 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:38:45 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 20829 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.48.58 - - [07/Feb/2010:06:38:48 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 19576 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:39:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 73199 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:39:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60596 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.191.9 - - [07/Feb/2010:06:39:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21406 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:39:51 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 24076 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 29957 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:40:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 9871 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:40:40 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11748 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:40:43 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10781 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.197.161 - - [07/Feb/2010:06:40:44 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14995 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.93.177 - - [07/Feb/2010:06:40:45 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 72244 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.197.86 - - [07/Feb/2010:06:40:57 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 13103 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.172.253 - - [07/Feb/2010:06:40:58 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12032 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:41:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17798 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:41:22 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 38199 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:41:38 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17484 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.197.86 - - [07/Feb/2010:06:41:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 23264 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:41:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17145 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.42.173 - - [07/Feb/2010:06:41:48 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 23440 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.244.231 - - [07/Feb/2010:06:41:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 29496 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.214.118 - - [07/Feb/2010:06:41:52 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 69694 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:41:56 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14958 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12272 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:42:55 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60345 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.16.163 - - [07/Feb/2010:06:43:03 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 16470 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:43:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21739 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:43:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59288 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:43:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11407 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.42.173 - - [07/Feb/2010:06:43:09 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14459 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:43:31 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 10561 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.93.177 - - [07/Feb/2010:06:43:46 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14947 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:43:50 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 19598 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:43:55 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12090 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.140.41 - - [07/Feb/2010:06:44:05 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11853 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:44:16 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 11612 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.41.44 - - [07/Feb/2010:06:44:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71920 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
67.202.0.47 - - [07/Feb/2010:06:44:22 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14007 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.191.9 - - [07/Feb/2010:06:44:31 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 130288 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.254.111 - - [07/Feb/2010:06:45:01 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 21739 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
204.236.242.36 - - [07/Feb/2010:06:45:26 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 18281 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:45:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59638 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.103.67 - - [07/Feb/2010:06:45:40 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 12372 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:46:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14353 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.54.185 - - [07/Feb/2010:06:46:07 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 27416 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.152.208 - - [07/Feb/2010:06:46:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 22271 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
75.101.197.161 - - [07/Feb/2010:06:46:13 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14548 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"

While we don’t see many duplicate IPs here, let’s analyze the one that has six hits:

174.129.65.79 - - [07/Feb/2010:06:38:41 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 71830 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:39:47 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 60596 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:40:15 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 9871 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:41:38 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 17484 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:45:32 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 59638 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
174.129.65.79 - - [07/Feb/2010:06:46:04 -0500] "GET /xxxxx/xxxxx/xxxxx.html HTTP/1.1" 200 14353 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"

The longest delay between page fetches is 3 minutes, 54 seconds, with a minimum of 28 seconds.

In that same period of time, you can see that they used a number of Amazon ECS instances:

  10655 67.202.0.47
  10454 204.236.242.36
  10353 174.129.103.67
  10343 75.101.254.111
  10295 204.236.197.86
  10128 174.129.65.79
   9908 174.129.191.9
   9883 75.101.214.118
   9835 72.44.54.185
   9833 72.44.42.173
   9769 174.129.136.94
   9718 75.101.197.161
   9290 174.129.106.91
   9063 72.44.48.77
   9017 174.129.152.208
   8850 204.236.212.138
   8712 174.129.93.177
   8423 174.129.140.41
   8415 67.202.41.44
   8302 67.202.16.163
   8116 72.44.57.92
   7923 204.236.245.5
   7633 75.101.219.131
   7519 67.202.48.58
   7510 174.129.72.66
   7429 67.202.2.164
   7356 174.129.155.12
   7335 174.129.172.253
   7036 75.101.214.102
   6998 67.202.42.161
   6835 174.129.159.143
   6109 204.236.244.231
   6002 174.129.127.87
   5961 75.101.168.14
   5841 174.129.84.116
   5201 174.129.163.50
   5114 72.44.49.238
   4744 174.129.153.52
   4654 75.101.241.159
   4615 204.236.241.141
   4585 75.101.179.97
   4463 174.129.61.74
   4387 75.101.179.141
   4379 72.44.56.37
   4332 75.101.187.208
   4169 67.202.56.227
   4106 204.236.211.119
   4075 174.129.93.123
   3722 204.236.242.141
   3332 67.202.11.26
   3276 67.202.0.31
   3097 174.129.171.75
   2360 75.101.234.148
   1837 174.129.136.47
   1689 67.202.56.158
    853 67.202.10.125
     67 75.101.204.87
     14 204.236.212.231
     12 174.129.144.34
      6 174.129.106.64

Even if we look at only one of the domains that was spidered:

    125 72.44.48.77
    123 174.129.140.41
    112 174.129.65.79
    109 75.101.254.111
    108 174.129.172.253
    104 75.101.197.161
    104 174.129.93.177
    104 174.129.103.67
    102 204.236.197.86
    102 174.129.136.94
    101 67.202.2.164
     99 75.101.214.118
     98 67.202.0.47
     96 67.202.48.58
     95 204.236.212.138
     93 174.129.106.91
     86 67.202.41.44
     85 72.44.54.185
     84 204.236.242.36
     82 75.101.219.131
     82 72.44.42.173
     76 67.202.42.161
     76 174.129.191.9
     75 174.129.152.208
     73 72.44.57.92
     73 67.202.16.163
     71 75.101.168.14
     71 174.129.159.143
     68 204.236.245.5
     68 174.129.72.66
     61 174.129.155.12
     60 204.236.244.231
     60 204.236.211.119
     59 174.129.153.52
     58 72.44.49.238
     54 72.44.56.37
     54 174.129.93.123
     54 174.129.61.74
     51 75.101.179.141
     51 174.129.163.50
     50 204.236.242.141
     47 174.129.127.87
     45 75.101.241.159
     44 75.101.214.102
     43 67.202.56.227
     42 174.129.171.75
     41 67.202.11.26
     40 67.202.0.31
     39 75.101.187.208
     39 204.236.241.141
     36 174.129.84.116
     32 75.101.179.97
     30 75.101.234.148
     22 174.129.136.47
     19 67.202.56.158
     12 67.202.10.125

While their goals stated on their page are admirable, it is clear that they lack some understanding of how ECS works. Writing code to run across distributed instances is not a simple process, so, I can see where handing out spider assignments to nodes could run into problems. But, looking at a single IP address, we can see that their bot probably doesn’t maintain state between fetches since it fetches robots.txt prior to each URL and then violates their ‘no more than one page every five minutes’.

72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:40:10 -0500] "GET / HTTP/1.1" 200 29957 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:40 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:42:41 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 12272 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:49:26 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:49:26 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 16855 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:53:11 -0500] "GET /robots.txt HTTP/1.1" 200 2631 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"
72.44.48.77 - - [07/Feb/2010:06:53:11 -0500] "GET /xxxxxx/xxxxxx/xxxxxx.html HTTP/1.1" 200 68020 "-" "Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )"

Based on the times, I don’t believe they could have spun up a new ECS instance on the same IP address which leads me to believe that they are spidering links from the site and requesting robots.txt each time.

While I believe using cloud services is a good thing, companies like this that abuse it are going to cause problems for other people that adopt the same methods. Amazon’s ECS instances have already hit numerous anti-spam blacklists due to Amazon’s lax policy or inability to quickly track down spam. While I have resisted the temptation to block ECS instances for inbound email, this client requested that we block the IP addresses that SEOProfilerBot was coming from – which means that any other search engine that comes along that uses Amazon’s ECS will not be able to reach his sites.

Cuill did the same thing to his sites a while back and we altered the robots.txt file, but, that didn’t stop the constant pounding from their spiders that had already fetched the robots.txt.

At some point, Amazon ECS and other cloud vendors will be firewalled from large portions of the net — limiting the usefulness of writing applications that run on the cloud.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

unable to mount root fs on unknown-block(0,0)

Sunday, January 31st, 2010

After building a system for the new backup servers that utilized an Adaptec 31205 controller, I always prefer to use a kernel that we’ve tuned inhouse.

Upon booting into the kernel I had built, I received:

unable to mount root fs on unknown-block(0,0)

Since the drive size on the array was very large, the Debian Installer automatically created an EFI GUID Partition table, which my kernel was not set up for.

In the kernel makemenu, File Systems, Partition Types, enable Advanced partition selection. Near the bottom is EFI GUID Partition support. Enable that, recompile your kernel and you should be set.

One reboot later and voila:

st1:/colobk1# uname -a
Linux st1 2.6.32.7 #1 SMP Fri Jan 29 21:43:32 EST 2010 x86_64 GNU/Linux
st1:/colobk1# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             462M  232M  207M  53% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                   10M   60K   10M   1% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sda8              19T  305G   18T   2% /colobk1
/dev/sda5             1.9G   55M  1.8G   3% /home
/dev/sda4             949M  4.2M  945M   1% /tmp
/dev/sda6             2.4G  204M  2.2G   9% /usr
/dev/sda7             9.4G  237M  9.1G   3% /var
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Upgraded GFS2 Cluster Tools from 2.2 to 3.0.4

Thursday, December 10th, 2009

With a few words of warning, we upgraded one of our clusters from 2.2 to 3.0.4. While this is normally a seamless project, it needed to be coordinated with both storage nodes in the cluster since the changes from 2.2 to 3.0 in openais were incompatible. Some minor changes to the cluster config file were needed which results in a cleaner file, and, a new dependency for rgmanager was added for the upgrade to 3.0.

This meant some downtime while openais was upgraded. Since we run behind a pair of load balancers, we were able to shut down the first filesystem, disconnect it from cman, upgrade one side, shut off the services on the other, bring this side up, bring up services, then upgrade the second node.

While this should have worked, cman on the primary node had no problem, but the secondary node refused to start dlm_controld.

Dec 10 12:29:20 dlm_controld dlm_controld 3.0.4 started
Dec 10 12:29:30 dlm_controld cannot find device /dev/misc/lock_dlm_plock with minor 58

For some odd reason, lock_dlm_plock was created in /dev rather than /dev/misc after the udev upgrade. Moving it into place allowed cman to start on the second node, and, allowed the cluster to run in non-degraded mode.

Why lock_dlm_plock was in the wrong place on one node and in the correct place on the other node, I’m not sure. I think prior to rgmanager being installed, the init script for cman didn’t stop when dlm couldn’t be loaded, and since the /dev/misc folder hadn’t been created, it created the lock file in /dev. Subsequent restarts of the machine have resulted in it coming up without an issue, so, it appears to be an issue somewhere in one of the startup scripts.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

No ESI processing, first char not ‘<‘

Tuesday, December 1st, 2009

After installing Varnish 2.0.5 on a machine, ESI Includes didn’t work. When using varnishlog, the first error that occurred when debugging was:

No ESI processing, first char not ‘< '

   12 SessionClose – timeout
   12 StatSess     – 124.177.181.149 50662 4 0 0 0 0 0 0 0
   12 SessionOpen  c 68.212.183.136 60087 66.244.147.44:80
   12 ReqStart     c 68.212.183.136 60087 409391565
   12 RxRequest    c GET
   12 RxURL        c /esi.html
   12 RxProtocol   c HTTP/1.1
   12 RxHeader     c Host: cd34.colocdn.com
   12 RxHeader     c User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b4) Gecko/20091124 Firefox/3.6b4
   12 RxHeader     c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
   12 RxHeader     c Accept-Language: en-us,en;q=0.5
   12 RxHeader     c Accept-Encoding: gzip,deflate
   12 RxHeader     c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   12 RxHeader     c Keep-Alive: 115
   12 RxHeader     c Connection: keep-alive
   12 RxHeader     c X-lori-time-1: 1259718658980
   12 RxHeader     c Cache-Control: max-age=0
   12 VCL_call     c recv
   12 VCL_return   c lookup
   12 VCL_call     c hash
   12 VCL_return   c hash
   12 VCL_call     c miss
   12 VCL_return   c fetch
   12 Backend      c 14 cd34_com cd34_com
   12 ObjProtocol  c HTTP/1.1
   12 ObjStatus    c 200
   12 ObjResponse  c OK
   12 ObjHeader    c Date: Wed, 02 Dec 2009 01:50:59 GMT
   12 ObjHeader    c Server: Apache
   12 ObjHeader    c Vary: Accept-Encoding
   12 ObjHeader    c Content-Encoding: gzip
   12 ObjHeader    c Content-Type: text/html
   12 TTL          c 409391565 RFC 120 1259718659 0 0 0 0
   12 VCL_call     c fetch
   12 TTL          c 409391565 VCL 43200 1259718659
   12 ESI_xmlerror c No ESI processing, first char not ‘< '
   12 TTL          c 409391565 VCL 0 1259718659
   12 VCL_info     c XID 409391565: obj.prefetch (-30) less than ttl (-1), ignored.
   12 VCL_return   c deliver
   12 Length       c 68
   12 VCL_call     c deliver
   12 VCL_return   c deliver
   12 TxProtocol   c HTTP/1.1
   12 TxStatus     c 200
   12 TxResponse   c OK
   12 TxHeader     c Server: Apache
   12 TxHeader     c Vary: Accept-Encoding
   12 TxHeader     c Content-Encoding: gzip
   12 TxHeader     c Content-Type: text/html
   12 TxHeader     c Content-Length: 68
   12 TxHeader     c Date: Wed, 02 Dec 2009 01:50:59 GMT
   12 TxHeader     c X-Varnish: 409391565
   12 TxHeader     c Age: 0
   12 TxHeader     c Via: 1.1 varnish
   12 TxHeader     c Connection: keep-alive
   12 TxHeader     c X-Cache: MISS
   12 ReqEnd       c 409391565 1259718659.088263512 1259718659.127703667 0.000059366 0.039401770 0.000038385
   12 Debug        c "herding"

ESI received significant performance enhancements in 2.0.4 and 2.0.5 so, it seemed something was incompatible. Downgrading to 2.0.3 and using the VCL from another machine still resulted in ESI not working.

In this case, mod_deflate was running on the backend which was causing the issue. However, in reading the source code, it appears that message could also occur if your ESI include wasn’t handing back properly formed XML/HTML content. If your include doesn’t contain valid content and is only returning a small snippet, you might consider passing:

-p esi_syntax=0x1

on the command line that starts Varnish.

The changes in Varnish address the issue of ESI being enabled on binary content. Since the first character isn’t an < in almost all binary files (jpg, mpg, gif) and isn't the start of most .css/.js files, varnish doesn't need to spend extra time checking those files for includes. While you can and should selectively enable esi processing, this is just an added safeguard and a performance boost to compensate for vcl that might have an esi directive on static/binary content.

Since Varnish 2.0.3 now worked properly with the new machine, we upgraded to Varnish 2.0.5 which introduced a very odd issue:

[Tue Dec 01 20:58:11 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.htmlt
[Tue Dec 01 20:58:13 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html7
[Tue Dec 01 20:58:24 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xfa
[Tue Dec 01 20:59:01 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xb5
[Tue Dec 01 20:59:06 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xe7
[Tue Dec 01 20:59:07 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\xd4
[Tue Dec 01 20:59:08 2009] [error] [client 66.244.147.40] File does not exist: /gfs/www/cd/cd34.com/index.html\x1c

This generated 404s on the piece of the page that contained the ESI include. Downgrading to 2.0.4 fixed the issue and the issue appears to already be fixed in Trunk. Varnish Ticket #585

Varnish 2.0.4 and mod_deflate disabled addressed the two issues that prevented ESI from working correctly on this new installation.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

piix versus ahci

Sunday, September 13th, 2009

While working with some new motherboards, I decided to do some testing. Since most motherboards ship with AHCI disabled and we needed hotplug for a new project, I wanted to do some testing to see if there was going to be a performance hit by using the AHCI drivers rather than the piix driver.

To make sure we have a very stable benchmark, the same machine, without any changes other than switching AHCI on through the BIOS was tested twice. Granted, this is an older motherboard in a machine used for testing and development, the results on other motherboards should be similar.

The difference in the input/output and seeks are negligible. The sequential create and delete results generally show much improved results for create/delete but the read results are virtually unchanged. This is probably a result of the Native Command Queuing (NCQ) enabled in AHCI that isn’t present in the piix driver. Since the firmware on the disk can reorder requests based on the rotational position of the data it needs to access, there are some benefits.

Since it doesn’t appear to be detrimental to enable AHCI, and it does increase performance of two particular portions of the benchmark that may not really be exercised in a normal webserver environment, if you have the ability to run your hardware in AHCI mode rather than the native piix mode, I would suggest using AHCI.

The command line used: /usr/sbin/bonnie -n 384

piix results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kvm              4G   439  99 52686  15 27168   5  2039  99 62782   4 228.9   3
Latency             48125us    1645ms    1545ms   16144us     137ms     863ms
Version  1.96       ------Sequential Create------ --------Random Create--------
kvm                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                384 17299  43 379058  99   777   1 24781  49 522877 100   579   1
Latency              1773ms     169us   18519ms    1403ms      14us   14935ms
1.96,1.96,kvm,1,1252886317,4G,,439,99,52686,15,27168,5,2039,99,62782,4,228.9,3,384,,,,,17299,43,379058,99,777,1,24781,49,522877,100,579,1,48125us,1645ms,1545ms,16144us,137ms,863ms,1773ms,169us,18519ms,1403ms,14us,14935ms

AHCI results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kvm              4G   430  98 52639  14 27533   5  2066  99 62991   4 224.5   1
Latency             50870us    1532ms    1655ms   10203us   21423us     953ms
Version  1.96       ------Sequential Create------ --------Random Create--------
kvm                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                384 28173  56 373407  98  1645   2 31110  61 522376  99  1026   1
Latency              1363ms     157us   12877ms    1216ms      61us   11397ms
1.96,1.96,kvm,1,1252887577,4G,,430,98,52639,14,27533,5,2066,99,62991,4,224.5,1,384,,,,,28173,56,373407,98,1645,2,31110,61,522376,99,1026,1,50870us,1532ms,1655ms,10203us,21423us,953ms,1363ms,157us,12877ms,1216ms,61us,11397ms

Output from lspci -vvnn with the piix driver selected:

00:1f.2 IDE interface [0101]: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller [8086:2820] (rev 02) (prog-if 8a [Master SecP PriP])
	Subsystem: Super Micro Computer Inc Device [15d9:8780]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-

Output from lspci -vvnn with the ahci driver selected:

00:1f.2 SATA controller [0106]: Intel Corporation 82801HB (ICH8) 4 port SATA AHCI Controller [8086:2824] (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: Super Micro Computer Inc Device [15d9:8780]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-

	Kernel driver in use: ahci

Technical Specs:

Debian/Squeeze/Sid (Testing)
Supermicro P4SBE Motherboard
Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Google Voice

Friday, July 10th, 2009

A few years ago, a very unique phone service called Grand Central was purchased by Google. As with most acquisitions that Google has made, the service was closed, existing clients maintained their current service level, but, new clients weren’t added. Grand Central had a very unique service offering and much like Picassa or Postini, you knew Google was going to take the service, twist it around and make it better and change the price model. With most of the other services that Google purchased, they were quickly revamped, branded and released. This wasn’t the case with Grand Central. Google announced Google Voice, and allowed you to submit your email address to get put on the waiting list. After what has seemed like many years, and after people on the Internet had started getting invites on June 26, 2009, I was pleasantly surprised when I opened up my email to see a notice from Google. Since I was somewhat familiar with Grand Central’s offering, I was excited to see what Google had done.

Voicemail almost becomes as easy to use as email. You can listen to voicemails, read them (if you have transcription turned on) and forward the messages to other email addresses. Once you have signed into Google Voice, you are presented with the Inbox

inbox

On the left menu, we are presented with special inboxes for voicemail:

voicemail

and a number of other inboxes including SMS, Recorded, Placed, Received and Missed Calls. If you send an SMS message to your Google Voice Number, it is recorded in the Inbox and the SMS inbox, and forwarded to any phone you have tagged as able to accept SMS. You can also send SMS messages from within Google Voice by clicking the SMS button.

sms

If a number is marked as spam, future calls from that number will be sent to voicemail immediately without ringing your numbers. You can unblock a number marked as spam later.

In the settings, you can set up how voicemail notifications should occur. You can select to have the voicemail notification emailed to you and optionally page your mobile phone through SMS.

settings

Your email message will include a transcribed copy of the message. In several test calls, their transcription was fairly accurate. During playback, a green underline is displayed under each word as you listen to the message.

email

The Phones menu allows you to set up multiple phone numbers. When someone calls your Google Voice number, all of the phone numbers listed here ring at the same time. You can answer any of the phones and the first one answered receives the call.

phones

By default, when answering an incoming call, you receive a notification that Google Voice is calling along with the name of the caller. You can enter a 1 to accept the call, 2 to send it to voicemail, 3 to send it to voicemail and listen, or 4 to accept the call and record it. There is a brief notification at the beginning of the call on both sides that the call is being recorded. The recorded call is able to be accessed in the Recorded Inbox. When someone calls your Google Voice number, they are told that Google Voice has answered the call and it asks for their name which is presented to you in presentation mode.

recorded

When you add a phone, Google places a call to the number you’ve added and asks for a two digit code to be keyed in.

verifyphone

There are also advanced settings:

addphoneadvanced

You can set up Call Groups and have different behaviors depending on what group the caller is in. In this case, Friends are put through immediately when the phone is answered without me having the option to screen the call. A caller receives a ringing phone as you are being located or listening to the menu options during the incoming call.

groups

Once in groups, you can set which phones will be rung, define a special greeting and whether you want to use call presentation:

friends

Of course, you can edit your contact lists and change what group each contact is in. By default, Google Voice has already imported your Gmail contact list. There are several other import methods supported, so, importing your contact list should be easy.

contacts

Another nice feature is the Call Widget. This is a method for placing an icon on your website where a potential caller can click the graphic, enter their phone number and hit connect. Google then calls that number, establishes the connection, then proceeds to call your number. Your number is hidden within an encoded string making this a somewhat effective method for accepting callers without giving out your number.

widget1widget2widget3

The above 3 screens show the widget on a page, entering a name and phone number and connecting the call. When the name is entered, Google does do text to speech and announces the call. If you put a two word name, i.e. Bob Smith, the nature of the URL encoding shows through and the caller is announced as Bob plus Smith.

The last screen in the settings is for Billing. The prices for International calls are relatively aggressive compared to Vonage.

billing

My initial impression is quite positive. Phone calls connected through the service are extremely quick and sound great. When you want to change a message prompt, Google Voice calls your phone so that you don’t need to depend on your microphone on your computer resulting in a relatively good quality recording.

Irony? The Google Voice widget is a flash widget and I haven’t been able to get Flash to install in Chrome. I haven’t been able to install Delicious for Chrome either, and of course, the Google Toolbar doesn’t work. The web interface for Google Voice is very ajax intensive and it loads very quickly and is very responsive. Since I prefer using my keyboard shortcuts over the trackpad on my laptop, Google does capture some of the shortcuts I would normally use to switch tabs.

If you don’t have Google Voice and are looking for a good way to have a single phone number that rings your house, mobile and work numbers and allows some handy features, you might want to try applying for an Invite at Google Voice.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Google’s App Engine goof

Friday, July 3rd, 2009

While Google’s App Engine is a well planned service and it does work incredibly well for what it does, sometimes things break due to resource limits, etc.

While the app engine platform is still running, it appears to be an issue with this particular application’s committed resources. The App Gallery has exceeded it’s memory quota.

Google App Engine App Gallery

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Data Center Hardware Upgrades

Wednesday, July 1st, 2009

Many Hosting companies operate on razor thin margins trying to capture as much market share as possible. Over the long haul, many $99/month dedicated servers can be absorbed into your existing bandwidth commitments without any incremental cost. Early on, one dedicated hosting provider dumped servers on the market for $99 with 700gb/transfer per month. At the time, they were undercutting hosting providers and it was deemed impossible that they could be able to fulfill the hosting world’s needs. In reality, they knew that their average client used 2.5gb of transfer per month, so, what difference did it make if they handed their average client 700gb. By having an ‘enormous’ cap, the average consumer wouldn’t be scared about overage charges, but, there were companies that knew they would exceed that cap and the penalty rate structure forced them to go elsewhere. That hosting provider cherrypicked the clients that would make the most money, even though they were a budget provider.

Later, they offered upgrades to the hardware and bandwidth commitments leaving many of those initial customers stuck on older hardware. There was no upgrade path to get from one machine to another except for the client moving the data themselves. The hosting company was only responsible for making sure the machine had power and network. However, there needs to be an upgrade path and there needs to be enough margin in the equation to facilitate hardware and network upgrades over time.

At some point the useful life of a machine is exceeded and one is faced with upgrading the machine, or, replacing components if the machine fails. Typically, CPU fans and hard drives will fail since they are moving parts. Other times, the client installs applications that require more CPU horsepower or runs into a situation where a machine needs more RAM. Depending on the age of the machine, those upgrade costs might exceed installing a new chassis.

With today’s hardware replacing yesterday’s hardware, often times there is quite a disparity between the computing power of the existing machine and the replacement. Virtualization can allow you to put in a powerful machine and replace multiple older machines, sometimes at a much lower TCO than maintaining the older machines.

That conversion isn’t without its issues though. If you are measuring bandwidth, you can no longer use the SNMP statistics from your switch, you must use something to count the flows. Device naming becomes an issue because you need to identify the virtual machine and the physical chassis that the machine is on in case there is a hardware issue. Clients don’t always understand virtualization and want a ‘dedicated’ server, even though their CPU core can be pinned to their exclusive use. If they need extra capacity, and it is available on the chassis, they can utilize it. As a result, Virtualization of a data center can significantly decrease power consumption. An older Pentium 4/3.0ghz CPU can easily reside on a single core of a 2.4ghz Xeon with room to spare. Considering the older infrastructure, you could easily fit 8 Pentium 4/3.0ghz machines with 2GB ram on a single dual CPU Quadcore Xeon with 16gb RAM. An 8:1 consolidation considering the lower utilization machines can result in considerable density increases. Replacing those eight machines might result in using roughly one sixth the power of the previous eight, so, you can still increase the cores per rack which can increase profitabilty. Provided with a mixed infrastructure where you might be replacing single and dual core machines, again, you might lose some of the economies of scale, but, the consolidation will still ultimately increase core density.

Virtualization techniques include using Xen, Citrix, KVM, Virtuozzo and VMWare.

Intel has an interesting blog post about Optimizing Costs within the Data Center that talks about a 10:1 reduction in hardware replacing singlecore machines with virtualized instances.

In addition to the cost and power savings, they saw a processor savings as well. If you’re selling dedicated servers, it might be difficult to give someone less than a whole processor if they had been sold a single processor, but, in a corporate environment, as long as the machine has enough CPU horsepower to do its job, more than one virtual machine can be assigned per core. For example, you can install ten Virtual Machines on an eight core machine and probably still have excess CPU.

However, applications are taking more CPU time than they used to, so, even if you are able to keep a 4:1 ratio, you’re still ahead of the game.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

User Interface Design – Presenting Data Intelligently

Friday, June 26th, 2009

User Interface Design is about usability and presentation. When you have data, presenting the data in a usable manner that is easy to understand is much more valuable than putting the data onscreen and letting the user try to decipher it. In my previous rant about User Interface Design, I have been faced with reworking dozens of pages to fix typos, spelling errors, bad grammar and poor interface design.

In the two pictures that follow, you’ll see the original report and the modified report. Very little HTML markup has been used in either case as the site needed to be functional and was slated to have a web designer make it look nice. Seven years later and the site still didn’t get the facelift it needed.

In the first graphic, you can see a legend that details what each of the status updates for each step were. We have information repeated that seems somewhat unnecessary. There isn’t any real reason to repeat the Transaction ID, Device Name, or Job Type. It is questionable whether Job Type is a very descriptive title.

oldtaskstatus

In the second graphic, we’ve consolidated the data that was duplicated in the prior graphic. The Task ID, Final Status, Task and Device are given once, and the individual status checks are given below with the status of each step enumerated. The task completion color is given at the top of the task rather than the bottom which I believe is more logical. The mind doesn’t want to look at the data that led to a status. You want to quickly look for tasks that failed and then look for the detail. Since you read top to bottom, searching for a red status would allow you to look at the results below.

newtaskstatus

In both cases, the important data is bold. I believe that is important as scanning the page draws the eye to the important data and the less important data can be analyzed if there is a need.

The two pictures that follow show how feng-gui.com believes the eye will react when looking at that page. Based on the 2nd screenshot, I believe we’ve reached the goal of making the page present the data clearly and concisely.

oldtaskheatmap

newtaskheatmap

One page down, two hundred to go. :)

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

User Interface Design

Wednesday, June 24th, 2009

Programmers are not designers. Technical people should not design User Interfaces.

* 810 source files
* 90658 lines of code
* 10213 lines of html

For an internal project tasked to a series of programmers throughout the years without enough oversight, it is a mass of undocumented code with multiple programming styles. PHP allowed lazy programming, Smarty didn’t have some of the finesse required, so, the User Interface suffered. Functional but confusing to anyone that hadn’t worked intimately with the interface or been walked through it.

The truest statement is that it is easier for me to do things through the MySQL command line than through the application. While this does have a tendency to introduce possible typos, it has altered SQL practices here.

update table set value=123 where othervalue=246;

could have an accidental typo of

update table set value=123 where othervalue-=246;

which would have completely unintended consequences. One typo altered the DNS entries for 48000 records. Shortly after that typo, ingrained in company policy was that I never wanted to ever see a query like that executed in the command line regardless of how simple the command.

Even within code, the above command would be entered as:

update table set value=123 where othervalue in (246);

This prevented a number of potential typos. Even limit clauses with deletions were enforced to make sure things didn’t go too haywire in an update.

With Python, indenting is mandatory which results in multiple programmer’s code looking similar and easier to troubleshoot. Utilizing SQLAlchemy which enforces bind variables when talking with the database engine, we’ve eliminated the potential for a typo updating too many records. Even cascade deletes are enforced in SQLAlchemy even when running on top of MyISAM. With MVC, our data model is much better defined and we’re not tied down to remembering the relationship between two tables and possible dependencies. Conversion from the existing MySQL database to a DeclarativeBase model hasn’t been without issues, but, a simple python program allowed the generation of a simple model that took care of most of the issues. Hand tweaking the database model while developing the application has allowed for quite a bit of insight into issues that had been worked around rather than making adjustments to the database.

Fundamental design issues in the database structure were worked around with code rather than fixed. Data that should have been retained was not, relationships between tables was defined in code rather than in the database leading to a painful conversion.

When it was decided to rewrite the application in Python using TurboGears, I wasn’t that familiar with the codebase nor the user interface. Initially it was envisioned that the templates would be copied and the backend engine would be written to power those templates. After a few hours running through the application, and attempting the conversion on a number of templates, I realized the application was functional but it was extremely difficult to use in its current state. So much for having a programmer design an interface.

Some functionality from the existing system was needed so I peered into the codebase and was unprepared for that surprise. At this point it became evident that a non-programmer had designed the interface. While Smarty was a decent template language, it was not a formtool, so, methods were designed to give a consistent user experience when dealing with error handling. A single php file was responsible for display, form submission and validation and writing to the database for each ‘page’ in the application. The code inside should have been straightforward.

* Set up default CSS classes for each form field for an ‘ok’ result
* Validate any passed values and set the CSS class as ‘error’ for any value that fails validation
* Insert/Update the record if the validation passes
* Display the page

Some validation takes place numerous times throughout the application, and, for some reason one of the ‘coders’ decided that copy and paste of another function that used that same validation code was better than writing a function to do the validation. Of course when that validation method needed to be changed, it needed to be changed in eight places.

So, what should have been somewhat simple has changed considerably:

* Evaluate each page
* Redesign each page to make the process understandable
* Adjust terminology to make it understandable to the application’s users
* modify the database model
* rewrite the form and validation

A process that should have been simple has turned into quite a bit more work than anticipated. Basically, development boils down to looking at the page, figuring out what it should be, pushing the buttons to see what they do and rewriting from scratch.

TurboGears has added a considerable amount of efficiency to the process. One page that dealt with editing a page of information was reduced from 117 lines of code to 12 lines of code. Since TurboGears uses ToscaWidgets and Formencode, validation and form presentation is removed from the code resulting in a controller that contains the code that modifies the tables in the database with validated input. Since Formencode already has 95% of the validators that are needed for this project, we can rest assured that someone else has done the work to make sure that field will be properly validated. Other validation methods can be maintained and self-tested locally, but, defined in such a manner that they are reused throughout the application rather than being cut and pasted into each model that is validating data. In addition, bugs should be much less frequent as a result of a much-reduced codebase.

Due to the MVC framework and the libraries selected by the developers at TurboGears, I wouldn’t be surprised if the new codebase is 10%-15% the size of the existing application with greater functionality. The code should be more maintainable as python enforces some structure which will increase readability.

While I am not a designer, even using ToscaWidgets and makeform, the interface is much more consistent. Picking the right words, adding the appropriate help text to the fields and making sure things work as expected has resulted in a much cleaner, understandable interface.

While there are some aspects of ToscaWidgets that are a little too structured for some pages, our current strategy is to develop the pages using ToscaWidgets or makeform to make things as clear as possible making notes to overload the Widget class for our special forms at a later date.

While it hasn’t been a seamless transition, it did provide a good opportunity to rework the site and see a number of the problems that the application has had for a long time.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter