Social Gaming Design Requirements

November 17th, 2009

Over the past few years, social gaming has become very popular. Many of the games don’t encompass the elements I believe should be present to make your game a larger success.

* Easy to learn. A game that takes minutes to learn the simple mechanics will attract clients. Design the game to have a moderate progression that can be sped up with the addition of virtual cash. Alternatively, the game can be a blitz version of a full-length game that can be purchased.

* Visibly show friend’s scores on the playscreen. Having a list of the top 10 Friends ordered to make it plainly visible where the player is compared to friends is a must.

* Long Tail Game. A game that is very engaging at the start, but, requires less time to maintain as time goes on. This can be done through game or item upgrades that take longer to obtain as the level increases. A penalty for abandoning their empire — even something as fatal as making the player start over after seven days of inactivity is enough to keep a somewhat casual user engaged. Don’t overdo it as someone that loses everything is unlikely to return. 90% of your revenue is earned within the first week of the player joining the game. However, in the event your game has multiple rounds, a player might play for free, learn some of the strategy and then make a purchase for the second round. A player that doesn’t understand the rules early on and makes mistakes also has a much higher chance of purchasing in-game currency now that they’ve figured out the game.

* Friend linking. Prevent progression without spending virtual cash or recruiting more friends. Limiting gameplay to a subset of the entire game or giving bonuses based on the number of friends recruited allows new players to be brought in which bring more potential income. Those new players have the same limits and must grow their pyramid in order to advance. Encouraging social play grows the game much more quickly. If you allow a solo player the ability to advance at an increased cost, you might entice them enough to invite friends.

* Make it easy to invite friends. Some games require you to link your real account with ‘friends’ in order to grow your social circle. While this appears to be beneficial, your real goal is to bring in the total number of users. Some people aren’t comfortable adding friends just to grow their social circle and will remove them. If you have a number of games, it might be advantageous to require the friend connection so that you can see other games that your ‘friends’ are playing. The downside to this is when a player cleans up their friend list and removes the friends they added specifically for the game.

* Notifications. Letting a user’s friends know that they have gained a level or gained some special item or promotion instills a sense of competitiveness when someone obtains a rare loot item or achievement while the friend hasn’t been playing.

While there are other aspects to Social Gaming, designing your game with these six points in mind will enable you to monetize your application more easily. Unless you’re building the game purely as a hobby, your goal is to make money by engaging users to purchase the in-game virtual currency. It’s a numbers game. A small percentage of your users will actually pay to play and you need to cater your game to increase that percentage as much as possible.

When you first deploy your game, you will need to know many things about your visitors. You want to know how many people hit the front page, how many hit the join page and where they came from. You want to track the number of people that actually sign up, how much activity each one has, and which people convert to a paid subscription. Track your adbuys, track views/impressions, clicks, and click to signup/click to paid subscription. Understand those numbers and learn how to manipulate them by changing your signup process, modifying the game or adjusting how your in-game currency is used.

Writing the game is a minor piece of the battle. Building marketshare without thinking about revenue is a fairly fast way to go into debt and leave yourself in a situation where you quickly grab any VC/Angel money and give up too much of the company. Think about how you will monetize it and start charging early on. If your clients are used to a ‘free to play’ environment and you suddenly switch, many of your clients will abandon the game.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Why do you use an Object Relational Mapping (ORM) System in Development?

October 12th, 2009

Here’s a programmer that is saying goodbye to ORMs at Hatful of Hollow.

And another site offering a tutorial of sorts dealing with ORMs Why should you use an ORM.

While both have their points, both have missed a fundamental benefit that an ORM hands you.

Most of my development is in Pylons. Django’s ORM and template language can do the same thing. A programmer that has used PHP/Smarty to develop large scale systems will likely resist ORMs. After working with a team to develop 90k+ lines of PHP/Smarty over a six year period, making the shift required a paradigm shift.

Let’s consider the following structure. We have a cp_ticket table and a cp_ticket_detail table. A Ticket can have multiple detail records. The output we wish to have is:

ticket id, ticket header information
         ticket detail line
         ticket detail line #2
ticket id, ticket header information
         ticket detail line
         ticket detail line #2
         ticket detail line #3
ticket id, ticket header information
         ticket detail line
         ticket detail line #2

Our model:

class cp_ticket(DeclarativeBase):
    __tablename__ = 'cp_ticket'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    priority = Column(mysql.MSEnum('1','2','3','4','5'), default = '3')

    ticket_detail = relation('cp_ticket_detail', order_by='cp_ticket_detail.ticket_detail_id')

class cp_ticket_detail(DeclarativeBase):
    __tablename__ = 'cp_ticket_detail'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), ForeignKey('cp_ticket.ticket_id'), default = '0')
    ticket_detail_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    stamp = Column(mysql.MSTimeStamp, PassiveDefault('CURRENT_TIMESTAMP'))
    detail = Column(mysql.MSLongText, default = '')

Our query to pass to our template:

        tickets = meta.Session.query(cp_ticket).filter(cp_ticket.client_id==1).all()

Compared with the query as you would write it without an ORM:

select * from cp_ticket,cp_ticket_detail where client_id=1 and cp_ticket.ticket_id=cp_ticket_detail.ticket_id;

Both are doing the same fundamental thing, but, the ORM maps the results almost identical to the way we want to display the data. This makes template design easy.

Using Mako, we use the following code to display the results:

<table border="1">
 <tr><th>Ticket ID</th><th>Status</th><th>Detail</th></tr>
%for ticket in tmpl_context.tickets:
  <tr>
    <td><strong>${ticket.ticket_id}</strong></td>
    <td><strong>${ticket.priority}</strong></td>
  </tr>
  %for detail in ticket.ticket_detail:
  <tr>
    <td></td>
    <td>${detail.stamp}</td>
    <td>${detail.detail}</td>
  </tr>
  % endfor
% endfor
</table>

To do the same thing without using an ORM, you need to revert to a control break structure similar to the following:

current_ticket=0
for ticket in tickets:
  if (current_ticket != ticket.ticket_id):
    #new row, print the header
    print "<tr><td>first piece</td></tr>"
    current_ticket = ticket.ticket_id
  # print our detail row
  print "<tr><td></td><td>stamp and detail</td></tr>"

Control Break structures require you to be able to set a variable within your template language. Some template languages don’t allow that. If your template language (in any language) can’t do variable assignments in the template, guess where your html generation logic needs to go?

With an ORM, the template contains your display logic. Your webmaster/design team can modify the template without having to modify html contained within your code. The loops are simple to understand and designers usually have little problem avoiding the lines that start with %.

Sure, you could wrap much of this logic in your template to do the control-break structure, but, as you get more complex data, deciding how to display the data requires a define or some other functionality.

An ORM adds some insulation to the process, but, the result is a much easier page structure when displaying related data. Granted there are some performance hits and SQLAlchemy appears to create some queries that are not optimal, unless there is a tremendous performance hit, I think the benefits of the ORM for developing a web application are tremendous.

Once you move into an environment where you are dealing with multiple developers, having a defined schema with comments is much easier than using reflection to figure out what the meaning of a status field as enum(‘U’,'A’,'P’,'C’,'R’,'S’).

However, as the original poster mentions, you can do raw SQL within SQLAlchemy and do all of your work with reflection as he has done with his ORM^H^H^H, abstraction. If he’s still using SQLAlchemy, he can selectively decide when to use it and when to avoid it.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Converting to a Varnish CDN with WordPress

October 11th, 2009

While working with Varnish I decided to try an experiment. I knew that Varnish could assist sites, but, it has never been easy to run Varnish on a shared virtual or clustered virtual host. VPS or Dedicated servers are no problem because you can do some configuration. However, in this case, I wanted to see if we could use Varnish to emulate a CDN, and if so, how difficult would it be for wordpress.

As it turns out, WordPress has a particular capability built in that handles media uploads. In the admin, under Settings, Miscellaneous, there are two values. One that asks where uploads should be stored. That path is a relative path under your blog’s home directory. The second is the URL that points to that path. In most cases you need to leave this blank, but, we can use that to point the URL for images to use the CDN.

Settings, Miscellaneous

Store uploads in this folder: wp-content/uploads
Full URL path to files: http://cd34.colocdn.com/blog/wp-content/uploads

Second, all of the images that have been already posted need to have their URLs modified. Since I am a command line guy, I executed the following command in MySQL.

update wp_posts set post_content=replace(post_content,'http://cd34.com/blog/wp-content/uploads/','http://cd34.colocdn.com/blog/wp-content/uploads/');

According to the Yahoo YSlow plugin, my blog went from a 72 to a 98 out of 100 with this and a few other modifications. The site does appear to be much snappier as well.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

mysql 5.1′s query optimizer

October 7th, 2009

While debugging an issue with an application that relies heavily on MySQL, an issue was brought up regarding the cardinality of the keys selected, and, the order in which the keys were indexed. With any relational database, in order to get the fastest performance, your query should reduce the result set as quickly as possible. Your data should have a high cardinality or variation in the data so that the B-Tree (or R-Tree) is more balanced. If your data consists of:

One thousand records with the date 2009-01-01
One thousand records with the date 2009-01-02

One thousand records with the date 2009-12-31

The cardinality or uniqueness of that column is low given the fact that you’ll have 365000 rows with blocks of one thousand having the same key. If you consider 125 different IP addresses per day generating those same thousand records, the cardinality or uniqueness of the IP addresses will be very high.

In order to show the performance differences in multiple indexing schemes and representations, a table has been created with an Unsigned Int column for the IP address, a varchar(15) for the IP address, a date column, and a varchar(80) for some text data. Because of the way the MySQL query processor works, it is possible to construct your query so that the results are answered from the index and the data file is never hit. A test sample was created that will be used for all of the tests. The file will be indexed, optimized, and the test run five times with the cumulative time used. The sample data that generates the queries against the database include 48000 of the ten million rows, plus 2000 randomly generated queries. Those results are then shuffled and written to a file for the tests. Testing hits versus misses emulates real world situations a little more accurately. All of the code used to run these tests is included in this post.

Test Setup

Creation of the table:

CREATE TABLE `querytest` (
  `iip` int(10) unsigned DEFAULT NULL,
  `ipv` varchar(15) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `randomtext` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Filling the table with data:

#!/usr/bin/python

import MySQLdb
import random
import datetime
import time

lipsum = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi gravida congue nisi, nec auctor leo placerat nec. In hac habitasse platea dictumst. In rutrum blandit velit et varius. Integer commodo ipsum ut diam placerat feugiat. Curabitur viverra erat ut felis cursus mollis. Sed tempus tempor faucibus. Etiam eget arcu massa, eget dictum sapien. Nullam euismod purus vitae risus ultrices tempus. Mauris semper rhoncus lectus, sit amet laoreet mauris tincidunt et. Duis ut mauris massa. Nam semper, enim id fermentum tristique, ligula velit suscipit lacus, vitae ultrices mi arcu sit amet felis. Ut sit amet tellus eget lorem gravida malesuada.

Integer nec massa quis mauris porta laoreet. Curabitur tincidunt nunc at mauris porttitor auctor. Mauris auctor faucibus tortor dignissim sodales. Sed ut tellus nisi, laoreet malesuada tortor. Vivamus blandit neque et nunc fringilla quis dignissim felis tincidunt. Nam nec varius orci. Duis pretium magna id urna fermentum commodo. Aliquam sollicitudin imperdiet leo eget ullamcorper. Quisque id mauris nec purus pulvinar bibendum. Fusce nunc metus, viverra in iaculis id, tempus nec neque. Aenean ac diam arcu, vitae condimentum lectus. Vivamus cursus iaculis tortor eget bibendum. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aenean elementum odio et nisl ornare at sodales eros porta. Duis mollis tincidunt neque, sed pulvinar enim ultrices a. Sed laoreet nunc ut nisl luctus a egestas quam luctus. Pellentesque non dui et neque ullamcorper condimentum ac ut turpis. Etiam a lectus odio, vitae bibendum arcu. Nulla egestas dolor ligula.

Quisque rhoncus neque ultrices mi lacinia tempus. Sed scelerisque libero dui, quis vulputate leo. Phasellus nibh ante, viverra sed cursus ac, dictum et lectus. Suspendisse potenti. Ut dapibus augue vitae sem convallis in iaculis nibh bibendum. Mauris eu sapien in lacus pharetra fermentum. Etiam eleifend vulputate velit, a tempor augue ultrices vitae. Vestibulum varius orci ac justo adipiscing quis dignissim odio porttitor. Nam ac metus leo. Ut a porttitor lectus. Nunc accumsan ante non eros feugiat suscipit.

Nulla facilisi. Nam molestie dignissim purus sed lacinia. Etiam tristique, eros vel condimentum fermentum, ipsum justo vulputate erat, sed faucibus nunc nisl id tellus. Aliquam a tempus leo. Nullam et sem nunc. Suspendisse potenti. Quisque ante lorem, aliquam sed aliquet vel, malesuada sit amet nisl. Vestibulum tristique velit pellentesque sapien ultrices non gravida ante blandit. Donec luctus nunc dictum felis feugiat sollicitudin. Nam lectus mi, porttitor sed adipiscing ac, pharetra a orci. Ut vitae eros vitae metus.
"""

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

length = len(lipsum)
jan_1_2009 = time.mktime((2009, 1, 1, 0, 0, 0, 0, 0, 0))

for i in range (1, 10000001):

  # generate a random IP address
  rand_ip = random.randint(1,4294967295)

  # pull a random piece of text from lipsum with a random length
  start_pos = random.randint(1,length)
  end_pos = start_pos + random.randint(200,2000)
  random_text = lipsum[start_pos:end_pos]

  # pick a random date in 2009
  rand_date = time.strftime("%Y-%m-%d",time.gmtime(jan_1_2009 + random.randint(1,365*60*60*24)))

  cursor.execute("insert into querytest (iip,ipv,date,randomtext) values (%s,inet_ntoa(%s),%s,%s)", (rand_ip, rand_ip, rand_date, random_text))

cursor.close ()
db.close ()

Generate test set:

#!/usr/bin/python

import MySQLdb
import random
import datetime
import time
import socket
import struct

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

jan_1_2009 = time.mktime((2009, 1, 1, 0, 0, 0, 0, 0, 0))

cursor.execute("select iip,ipv,date from querytest order by rand() limit 48000")

data = list(cursor.fetchall())

for i in range (1, 2001):

  # generate a random IP address
  rand_ip = random.randint(1,4294967295)

  # pick a random date in 2009
  rand_date = time.strftime("%Y-%m-%d",time.gmtime(jan_1_2009 + random.randint(1,365*60*60*24)))

  data.append((rand_ip, socket.inet_ntoa(struct.pack('L',rand_ip)), rand_date))

random.shuffle(data)
for datum in data:
  print "%s,%s,%s" % (datum[0], datum[1], datum[2])

cursor.close ()
db.close ()

At this point we have created the table, filled it with ten million rows, and generated a fifty thousand row query set to run against the table. Now, we need to categorize the theories to see whether cardinality plays as large a role as it used to.

The following tests will be performed

Index of iip,date

* Use the unsigned int representation of the IP address and the date
* Use the text representation of the IP address passed through inet_aton() and the date

Index of ipv, date

* Use the text representation of the IP address and the date
* Use the unsigned int representation of the IP address passed through inet_ntoa() and the date

Index of date,iip

* Use date and the unsigned int representation of the IP address
* Use date and the text representation of the IP address passed through inet_aton()

Index of date,ipv

* Use date and the unsigned int representation of the IP address
* Use date and the text representation of the IP address passed through inet_aton()

Each of the above tests will be run twice, once with select * and once with select ipv,date.

Benchmark Code

#!/usr/bin/python

import MySQLdb
import random
import datetime
import time
import socket
import struct
import array

def run_query(query, data, columna, columnb):
    for datum in data:
      cursor.execute(query, (datum[columna], datum[columnb]))
      result = cursor.fetchall()

query_tests = [
               ['create index querytest on querytest (iip,date)',
                'select * from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select * from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select iip,date from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select iip,date from querytest where iip=%s and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select * from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select * from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date)',
                'select iip,date from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (iip,date) using HASH',
                'select iip,date from querytest where iip=inet_aton(%s) and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select * from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select * from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select ipv,date from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select ipv,date from querytest where ipv=%s and date=%s',
                1,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select * from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select * from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date)',
                'select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (ipv,date) using HASH',
                'select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s',
                0,
                2
               ],
               ['create index querytest on querytest (date,iip)',
                'select * from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select * from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip)',
                'select iip,date from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select iip,date from querytest where date=%s and iip=%s',
                2,
                0
               ],
               ['create index querytest on querytest (date,iip)',
                'select * from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select * from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip)',
                'select iip,date from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,iip) using HASH',
                'select iip,date from querytest where date=%s and iip=inet_aton(%s)',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select * from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select * from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select ipv,date from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select ipv,date from querytest where date=%s and ipv=%s',
                2,
                1
               ],
               ['create index querytest on querytest (date,ipv)',
                'select * from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select * from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv)',
                'select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
               ['create index querytest on querytest (date,ipv) using HASH',
                'select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)',
                2,
                0
               ],
              ]

db = MySQLdb.connect(host="localhost", user="querytest", passwd="qt1qt1", db="querytest")
cursor = db.cursor()

queries = open('testquery.txt').readlines()

query_array = []
for query_data in queries:
  query_array.append(query_data.rstrip('\n').split(','))

for test in query_tests:
  try:
    cursor.execute('alter table querytest drop index querytest')
  except:
    pass
  cursor.execute(test[0])
  cursor.execute('optimize table querytest')

  print "Test: %s\n with Index: %s" % (test[1], test[0])
  start_time = time.time()

  for loop in range (1,6):
    run_query(test[1], query_array, test[2], test[3])

  end_time = time.time()
  print "Duration: %f seconds\n" % (end_time - start_time)

cursor.close ()
db.close ()

Miscellaneous notes

P4/3.0ghz, 2gb RAM, Debian 3/Squeeze, Linux 2.6.31.1, WD 7200RPM SATA drive, SuperMicro P4SCI Motherboard

There are multiple tests that could have been run without dropping the index, recreating the index and optimizing the table. When testing a more limited set, results were a little sporadic due to a smaller initial test set and portions of the table and index being cached in the kernel cache. To ensure more consistent test results, every test was run in a consistent manner.

Benchmark Results

Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 679.169198 seconds

Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 692.634291 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 179.039791 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 178.993962 seconds

Test: select * from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 672.836734 seconds

Test: select * from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 606.268787 seconds

Test: select iip,date from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 195.253512 seconds

Test: select iip,date from querytest where iip=inet_aton(%s) and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 195.222058 seconds

Test: select * from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 741.876227 seconds

Test: select * from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 639.109309 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 167.049333 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 167.016152 seconds

Test: select * from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 578.565762 seconds

Test: select * from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 655.869390 seconds

Test: select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 181.555567 seconds

Test: select ipv,date from querytest where ipv=inet_ntoa(%s) and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 181.230911 seconds

Test: select * from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 655.928799 seconds

Test: select * from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 637.146124 seconds

Test: select iip,date from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 181.637912 seconds

Test: select iip,date from querytest where date=%s and iip=%s
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 181.512190 seconds

Test: select * from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip)
Duration: 603.553238 seconds

Test: select * from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 605.363284 seconds

Test: select iip,date from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip)
Duration: 196.680399 seconds

Test: select iip,date from querytest where date=%s and iip=inet_aton(%s)
 with Index: create index querytest on querytest (date,iip) using HASH
Duration: 194.746056 seconds

Test: select * from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv)
Duration: 657.619028 seconds

Test: select * from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 686.560066 seconds

Test: select ipv,date from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv)
Duration: 172.222691 seconds

Test: select ipv,date from querytest where date=%s and ipv=%s
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 172.079220 seconds

Test: select * from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv)
Duration: 726.031732 seconds

Test: select * from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 678.099808 seconds

Test: select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv)
Duration: 185.415666 seconds

Test: select ipv,date from querytest where date=%s and ipv=inet_ntoa(%s)
 with Index: create index querytest on querytest (date,ipv) using HASH
Duration: 185.280880 seconds

Conclusions

Based on the data, I think we can say that the argument of B-Tree versus Hash doesn’t seem to make much difference. Neither is consistently better, and since the data and query test is identical, the results don’t really point to a clear winner. Avoiding Select * and pulling only the required fields makes a difference and if your result can be answered from the index rather than the data file, there is a substantial boost. Analysis of the results suggests that cardinality isn’t as important as it used to be. I am devising a method to further test cardinality as I do believe that live data will have somewhat different results from data after an optimize table has been run.

The winner in this case is:

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date)
Duration: 167.049333 seconds

Test: select ipv,date from querytest where ipv=%s and date=%s
 with Index: create index querytest on querytest (ipv,date) using HASH
Duration: 167.016152 seconds

I had actually expected int represented as unsigned int would be the fastest. However, there is probably a reasonable explanation why these two queries are slower:

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 179.039791 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date) using HASH
Duration: 178.993962 seconds

Data in MySQL is represented as binary. The IP stored as an unsigned int takes 4 bytes, and the date takes 3. The key length in this case would be 7 bytes versus the index on IP stored as varchar(15) and the date taking 18 bytes. Even though the index in the second case is almost three times the size of the unsigned int IP, the MySQL client library converts all binary data to ASCII when communicating to avoid endian issues. That extra conversion results in a slightly slower result — measurable when you do 250000 queries against a 10 million record database.

A quick modification of the test shows the results of select *, versus select keyvaluea,keyvalueb and select data,keyvalueb. As you can see from the results below, MySQL will answer queries from the index if it doesn’t need to hit the data file.

Test: select * from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 637.420786 seconds

Test: select iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 178.434477 seconds

Test: select ipv,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 690.804990 seconds

Test: select inet_ntoa(iip) as iip,date from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 183.817643 seconds

If you can structure your data well, there are significant performance gains to be had.

What does this mean?

Do you store IPs as unsigned int in the database? If you use varchar(15) or char(15), you’re talking about an eleven or ten byte savings per record at the expense of some conversion time. varchar uses 1 character to store the length of the stored data plus the length of the data. char is a fixed length based on the column length you specify.

Make sure you return only the columns that you need in your calculations — especially if you are running MySQL over a network.

Try to create your index to match the conditions that you are looking for, and, when possible, if you are searching for the result from a particular column, consider adding it to the index as well.

Always use count(*) rather than count(column) unless there is a valid reason for that column to contain NULL.

The Effect of count(*) versus count(date)

count(*) gives you the number of rows in the set that match the criteria you have set. count(date) counts the number of rows in the set that match the criteria where the date is not null. Many times, you’ll see someone do a count(id), and id by definition is a primary key, auto_increment and cannot be null. Because count(column) must read the table to ensure that the column specified is not null, it is forced to check every key, or, read the table for all of the matching rows to make sure the column retrieved doesn’t contain a null value. If the column being counted is one of the keys in the index, the performance change won’t be as dramatic. By counting a column that isn’t in the key and having to read the data, count(column) is considerably slower.

Results when the counted column is within the key and only 1 or 0 rows are expected:

Test: select count(*) as ct from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 175.727338 seconds

Test: select count(iip) as ct from querytest where iip=%s and date=%s
 with Index: create index querytest on querytest (iip,date)
Duration: 176.495198 seconds

When count returns more than one row, you can see the effect is much more detrimental. The first iteration of this test took so long that I shortened it to do five iterations of 100 queries. After 4 hours, and 18% complete, I shortened the test to do one iteration of ten queries. The results clearly demonstrate the issue without taking 20+ hours to run a single simple benchmark. Simply stated, unless you really have a valid reason to check your results to see if the column is null, DON’T!

Test: select count(*) as ct from querytest where date=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 0.408268 seconds

Test: select count(ipv) as ct from querytest where date=%s
 with Index: create index querytest on querytest (date,iip)
Duration: 3085.770998 seconds

The Fine Print

* Index columns used in your where conditions
* B-Tree versus Hash doesn’t appear to materially affect results
* storing IP as char(15) if the data is being returned to the client can be faster than storing an IP as an unsigned int. If the IP is not fetched but only used in comparisons, unsigned int is probably the better choice.
* Consider adding that extra column to your index to prevent MySQL from having to read the data file. Answering your query from the index is significantly faster.
* count(*) rather than count(column)

Live data will not act precisely as the benchmark — what live scenario ever does? But, I believe the tests above should show some of the performance gains available by structuring your tables and queries.

While MySQL 4, 5.0 and 5.1 will reorder conditions to match the index key, there are some significant performance gains from 4.x to 5.0. MySQL 5.1 didn’t show considerable gains from MySQL 5.0, but, there are some minor speed increases.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

AttributeError: ‘function’ object has no attribute ‘replace’

September 30th, 2009

While doing some coding to move a Turbogears2 application over to Pylons, I ran into an issue with Validation and ToscaWidgets.

AttributeError: ‘function’ object has no attribute ‘replace’

Validation in Pylons listed:

@validate(form=movie_form, error_handler=index)

for the decorator syntax, but, error_handler should be a name, not a function. The correct decorator should be:

@validate(form=movie_form, error_handler=’index’)

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

piix versus ahci

September 13th, 2009

While working with some new motherboards, I decided to do some testing. Since most motherboards ship with AHCI disabled and we needed hotplug for a new project, I wanted to do some testing to see if there was going to be a performance hit by using the AHCI drivers rather than the piix driver.

To make sure we have a very stable benchmark, the same machine, without any changes other than switching AHCI on through the BIOS was tested twice. Granted, this is an older motherboard in a machine used for testing and development, the results on other motherboards should be similar.

The difference in the input/output and seeks are negligible. The sequential create and delete results generally show much improved results for create/delete but the read results are virtually unchanged. This is probably a result of the Native Command Queuing (NCQ) enabled in AHCI that isn’t present in the piix driver. Since the firmware on the disk can reorder requests based on the rotational position of the data it needs to access, there are some benefits.

Since it doesn’t appear to be detrimental to enable AHCI, and it does increase performance of two particular portions of the benchmark that may not really be exercised in a normal webserver environment, if you have the ability to run your hardware in AHCI mode rather than the native piix mode, I would suggest using AHCI.

The command line used: /usr/sbin/bonnie -n 384

piix results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kvm              4G   439  99 52686  15 27168   5  2039  99 62782   4 228.9   3
Latency             48125us    1645ms    1545ms   16144us     137ms     863ms
Version  1.96       ------Sequential Create------ --------Random Create--------
kvm                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                384 17299  43 379058  99   777   1 24781  49 522877 100   579   1
Latency              1773ms     169us   18519ms    1403ms      14us   14935ms
1.96,1.96,kvm,1,1252886317,4G,,439,99,52686,15,27168,5,2039,99,62782,4,228.9,3,384,,,,,17299,43,379058,99,777,1,24781,49,522877,100,579,1,48125us,1645ms,1545ms,16144us,137ms,863ms,1773ms,169us,18519ms,1403ms,14us,14935ms

AHCI results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
kvm              4G   430  98 52639  14 27533   5  2066  99 62991   4 224.5   1
Latency             50870us    1532ms    1655ms   10203us   21423us     953ms
Version  1.96       ------Sequential Create------ --------Random Create--------
kvm                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                384 28173  56 373407  98  1645   2 31110  61 522376  99  1026   1
Latency              1363ms     157us   12877ms    1216ms      61us   11397ms
1.96,1.96,kvm,1,1252887577,4G,,430,98,52639,14,27533,5,2066,99,62991,4,224.5,1,384,,,,,28173,56,373407,98,1645,2,31110,61,522376,99,1026,1,50870us,1532ms,1655ms,10203us,21423us,953ms,1363ms,157us,12877ms,1216ms,61us,11397ms

Output from lspci -vvnn with the piix driver selected:

00:1f.2 IDE interface [0101]: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller [8086:2820] (rev 02) (prog-if 8a [Master SecP PriP])
	Subsystem: Super Micro Computer Inc Device [15d9:8780]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-

Output from lspci -vvnn with the ahci driver selected:

00:1f.2 SATA controller [0106]: Intel Corporation 82801HB (ICH8) 4 port SATA AHCI Controller [8086:2824] (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: Super Micro Computer Inc Device [15d9:8780]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-

	Kernel driver in use: ahci

Technical Specs:

Debian/Squeeze/Sid (Testing)
Supermicro P4SBE Motherboard
Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

mysql-python and Snow Leopard

September 1st, 2009

After the upgrade to Snow Leopard, mysql-python cannot be installed through easy_install.

* Install mysql’s x86_64 version from the .dmg file (I had a problem doing this when booted into the 64bit kernel, a 32bit kernel macbook had no problem) With the 64bit kernel, the system reported ‘no mountable file systems’ when trying to mount the .dmg file. A reboot into 32bit mode allowed the .dmg to be mounted and installed.
* Change into your virtual environment if desired, source bin/activate
* fetch MySQL-python-1.2.3c1

tar xzf MySQL-python-1.2.3c1.tar.gz
cd MySQL-python-1.2.3c1
ARCHFLAGS='-arch x86_64' python setup.py build
ARCHFLAGS='-arch x86_64' python setup.py install

If everything works, you should see:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>>

Some of the possible things you’ll encounter:

After python setup.py build:

ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture
ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture

This means that you have the i386 version of mysql installed. Or, if you have the x86_64 version, you have didn’t include the proper ARCHFLAGS command.

ImportError: dynamic module does not define init function (init_mysql)

this means that the easy_install or the build/install process was run which tried to build a ppc/i386/x86_64 combined build.

If you see messages like:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.pyc, but /Users/xxxxx/Python/django/MySQL-python-1.2.3c1 is being added to sys.path
>>>

Then you are still in the build directory. cd .. and try again.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Mysql Query Optimization

August 28th, 2009

I heard a comment from a developer the other day:

You don’t need indexes on small tables.

So I asked what the definition of a small table was. He said, anything with a few hundred rows. So I said, 2300 rows? Well….. 24000 rows? Well….. 292000 rows? That’s large. I showed him unindexed queries in his application dealing with tables that had 2300, 24000 and 292000 rows.

Avoid tablescans

When MySQL deals with a query that is unindexed, it does a full tablescan to see if each record in the table meets the criteria specified. On a small table, if the query is executed frequently, the MySQL query cache might be able to serve the query. However, on a larger table, or a table with large rows, it must read every row, check the fields, possibly create a temporary table in ram or disk, and return the results. On a small site, you might not notice it, but, on a large system, forcing tablescans on tables with even a few thousand rows will slow things down considerably:

Uptime: 60016 Threads: 11 Questions: 105460332 Slow queries: 197769 Opens: 5819 Flush tables: 1 Open tables: 1320 Queries per second avg: 1757.204

Slow queries are sometimes unavoidable, but, often, slow queries are missing an index.

Use the slow-query log to find potential issues

When analyzing a system to find problems, putting:

log-queries-not-using-indexes

in the my.cnf file and restarting mysql will log the unindexed queries to the slowquery log.

What can be indexed?

The rule of thumb when writing indexes is to write your query in such a way that you reduce the result set as quickly as possible, with the highest cardinality possible. What does this mean?

If you are collecting data of the IP address and the Date, your query against date,ip will actually be worse than ip,date. Imagine receiving 40000 hits to your site on the same date. If you were looking for the number of hits that a particular IP had, you would search the 41 hits they have made over time, and then the 8 that they had today. If you queried by date,ip, you would search 40000 rows then would receive the 8 they had today. Each index you have, adds extra overhead and an index file should be as small as possible. IP addresses can be represented in an unsigned int which takes much less space than the varchar(15) usually used. Remember when you index a varchar field, indexing will spacepad the key to the full length. If you have a variable length field you want indexed, you might be able to figure out the significant portion of that field by finding the average length and adding a few characters for good measure and indexing fieldname(15) rather than the entire field. If a query is longer than the 15 characters, you have still created a significant reduction in the number of rows that it must check.

Cardinality refers to the uniqueness of the data. The more unique the data, the lower the chance that you’ll have thousands of records that match the first criteria. When the data is very similar, the index as built on disk will become imbalanced resulting in slower queries. Since MyISAM and InnoDB use a B-Tree index (or R-Tree if you use a spatial index), data that is similar when inserted, can create a very imbalanced tree which leads to slower lookups. An optimize table can resort and reindex the table to eliminate this, but, you can’t do that on an extremely large, active table without impacting response times.

# Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 3323
SELECT * FROM websites_geo where (zoneid = ’5135′) LIMIT 1;

In this case, zoneid is not indexed on the table websites_geo. Adding an index on zoneid eliminates the tablescan on this query.

Check for equality, not inequality.

An index can only check equality. A query checking to see if values are not equal, cannot be indexed.

# Query_time: 0 Lock_time: 0 Rows_sent: 5 Rows_examined: 2548
SELECT * FROM websites where (id = ’1056692′ && status != ‘d’ && status != ‘n’) order by rand() LIMIT 5;

# Query_time: 0 Lock_time: 0 Rows_sent: 10 Rows_examined: 2544
SELECT * FROM websites where (status != ‘n’ && status != ‘d’ && traffic > 3000) order by added desc LIMIT 10;

These two queries show two different issues, but, deal with the same fundamental issue. First, id is not indexed which would have at least limited the result set to 9 records rather than 2548. The status check isn’t able to use an index. On the second query, status is checked followed by traffic. There are other queries issued that check status,traffic,clicks_high. When we look at status (which should be an enum or char(1) rather than varchar(1)), we find that there are only 4 values used. By indexing on id,status and status,traffic,clicks_high, we could alter the queries as such:

SELECT * FROM websites where (id = ’1056692′ && status in (‘g’,’ ‘)) order by rand() LIMIT 5;

SELECT * FROM websites where (status in (‘g’,’ ‘) && traffic > 3000) order by added desc LIMIT 10;

which would result in both queries using an index.

Choose your data types intelligently.

As a secondary point, id (though it is numeric) happens to be a text field. If you index id in this case, you would have to specify a key length.

mysql> select max(length(id)) from websites;
+—————–+
| max(length(id)) |
+—————–+
| 22 |
+—————–+
1 row in set (0.02 sec)

mysql> select avg(length(id)) from websites;
+—————–+
| avg(length(id)) |
+—————–+
| 8.3315 |
+—————–+
1 row in set (0.00 sec)

mysql>

Based on this, we might decide to set the key length to 22 as it is a relatively small number and allows room to grow. Personally, I would have opted to have the id be an unsigned int which would be much smaller, but, the application developer uses alphanumeric id’s which are exposed externally. With sharding, you could use the id throughout the various tables, or, you could map the text id to a numeric id internally for all of the various tables.

There are a number of possible solutions to help any SQL engine perform better. And your data set will dictate some of the things that you can do to make data access quicker.

Helping MySQL Help You

If you do select * from table where condition_a=1 and condition_b=2 in one place, and select * from table where condition_b=2 and condition_a=1, setting up a single index on condition_a,condition_b and adjusting your second query, reversing the conditions to the same order as the keys on the index will increase performance.

Limit your results

Another thing that will help considerably is using a limit clause. So many times a programmer will do: select * from table where condition_a=1 which returns 2300 rows but only the first few rows are used. A limit clause will prevent a lot of data from being fetched by MySQL and buffered waiting for the response. select * from table where condition_a=1 limit 20 would hand you the first 20 records.

Avoid reading the data file, do all your work from the Index

Additionally, if you have a table and only need three of the columns from the result, select fielda,fieldb,fieldc from table where condition_a=1 will return only the three fields. As an added boost, if the fields you are checking can be answered from the index, the query will never hit the actual data file and will be answered from the index. Many times I’ve added a field that wasn’t needed in the index, just to eliminate the lookup of the key in the index then the corresponding read of the data file.

Let MySQL do the work

MySQL reads tables, filters results, can do some calculations. Going through 40000 records to pick the best 100 is still faster in MySQL than allowing PHP to fetch 40000 rows and do calculations and sorts to come up with that 100 rows. Index, optimize, and allow MySQL to do the database work.

Summary

Making MySQL work more efficiently goes a long way towards making your database driven site work better. Adding six indexes to the system resulted in quicker response times and an increase in the transactions per second.

Uptime: 32405 Threads: 1 Questions: 58729705 Slow queries: 64122 Opens: 2911 Flush tables: 1 Open tables: 295 Queries per second avg: 1812.366

Previously, MySQL was generating 3.26 slow queries per second. Now we’re just beneath 2 slow queries per second and our system is processing 55 more transactions per second. There is still a bit more analysis to do to identify the slow queries that are still running and to alter the queries to reverse the inequality checks, but, even just adding indexes to a few tables has helped noticeably. Once the developer is able to make some changes to the application, I’m sure we’ll see an additional speedup.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Rapid Application Development using Turbogears and Django

August 8th, 2009

For the last 14 months we’ve been developing an application to replace 90000 lines of PHP code. Rewriting the application from scratch to support I18N and many of the enhancements it needed was deemed to be a better long term solution.

When that project was first started, I spent a month with Django and a month with Turbogears writing the same test application so that I could compare the development cycle. Both have matured and I needed to do a rapid turnaround on another project. I decided to give Django another look since it had hit version 1.0 and had added quite a few features that were missing in my preliminary evaluation. What follows is a discussion of the major points from both of the frameworks.

Turbogears has excellent form handling. Except for the Forms Wizard in Django, working with forms is much easier in Turbogears. Validation of the form and the resulting database update methods are much cleaner in Turbogears. Django, while slightly more complex in handling form input, does handle things with a single function which might enhance readability in a large project.

Database handling through SQL Alchemy in Turbogears is much better than the database methods in Django. Yes, you can use SQL Alchemy in Django now, but, their default ORM has plenty of room for improvement.

Turbogears is true MVC. Their terminology and methods are true to the paradigm set forth by Smalltalk. Django is MVC, but they call it MTV, Model, Template, View. The differences are slight and the developers of both have made good decisions. Adapting to either project’s methods is quick and not a hindrance.

Django’s definitely wins with Authentication/Authorization. Methods to handle registration, user creation, login and forgotten passwords are built in and wrapped with very nice templates that can be overridden. For Turbogears, repoze.who and repoze.what have been pulled from Plone and put into place. While Turbogears works with repoze, the decisions made and the lack of full support behind it make it difficult to implement.

Django feels faster. Comparing 14 months of development in Turbogears on an application to an application written in Django this week, the template engine, database access and pageload time seemed faster. Django is a lighter weight framework and you are closer to the data. Turbogears puts a little more insulation in which makes some coding easier at the expense of performance.

Maintainability of code would have to go to Turbogears. IBM once stated that the maximum number of bugfree lines of code that could be written was 23. With Turbogears, much of the heavy lifting is handled by widgets and decorators and your model resulting in a smaller codebase. Django requires more code to do the same task unless you utilize some of the snippets. Turbogears makes certain assumptions and has wrapped many of the libraries that make development easy in the default installation. Django’s default installation lacks those decisions, but, you are not prevented from establishing your own middleware. If you were developing a number of Django projects, you would pick and choose snippets that would replicate the decisions that Turbogears has already made.

URL Mapping is much easier to implement with Django. While routes support in Turbogears is present, Django’s regexp mapping is much easier to manipulate.

Community, hands down, Django wins. With a much larger installed base, bugs are found and fixed much more quickly. While Turbogears was founded on loftier principles, execution and follow through are lacking. Development is done when it is needed by a client project in the core group of developers. There is a definite air of condescension when the project developers handle questions from potential developers. With Django, there are people of all experience levels willing to help on groups.google, IRC, and thorough documentation that far exceeds most of the open source documentation out there.

Documentation, again, Django. Well organized, well thought out and well documented examples on Django’s part show dedication to having a framework that is well understood and well used. Turbogears recently moved to Sphinx, but, documentation generated from poorly documented code still means poor documentation. The tutorials and examples have been improving, but, they have a long way to go.

Genshi and Mako are supported fairly well with Turbogears and are both very good engines. Jinja is also supported which is a bit faster than Genshi and is powerful and very easy to work with. Django’s template language is also very flexible, powerful and easy to work with. Django had some definite advantages with a simpler language, but, neither Django or Turbogears stood out as a clear winner.

If you were looking to write an extremely heavy database or form involved site, I think Turbogears would be a good solution. If you choose Turbogears, be prepared to delve into the code when you are faced with problems. Bugs are not dealt with very promptly even though upgrades are pushed out. Be prepared to patch upgraded packages or hold certain packages once you move into production.

On the other hand, if you are writing a less complicated application, Django would be a good choice.

All told, the application developed this week in Django took about 12 hours and had I been working with Django for the last 14 months, I would estimate the project to have taken roughly 8 hours. Developed in Turbogears, I probably could have written it in 6 hours. PHP, to mimic all of the functionality would have taken 14-16 hours even using one of the numerous frameworks.

There is a definite advantage to using a framework and Python frameworks do appear to offer Rapid Application Development even over most of the PHP frameworks. For me, new development will probably be done in Django unless there is a strong case to use Turbogears.

* Turbogears 2.0
* Django
* Django Snippets

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

ESI Widget Issues in the Varnish, ESI, WordPress experiment

July 26th, 2009

The administration interface is quite simple. When the widget is installed, drag it to the Sidebar, then, drag any widgets that you want displayed to the ESI Widget Sidebar.

esi-widget

Current issues:
* When a user is logged in and comments on a post, their ‘login’ information is left on the page if they are the first person to hit the page when Varnish caches the page. If someone is logged in and visits a post page and the page hasn’t been previously cached, the html that shows their login status is cached, though, new visitors see the information, but lack the credentials.

Addons that don’t work properly:
* Any poll application (possible solution to wrap widget in an ESI block)
* Any stat application (unless they convert to a webbug tracker, this probably cannot be fixed easily)
* Any advertisement/banner rotator that runs internal. OpenX will work, as will most non-plugin
* Any postcount/postviews addon
* CommentLuv?
* ExecPHP (will cache the output, but does work)
* Manageable

Any plugin that does something at the time of the post or comment phase, that isn’t dependent on the logged in data should work without a problem. If it requires a login, or uses the IP address to determine whether a visitor has performed an action, will have a problem due to the excessive caching. For sites where the content is needed to be served quickly and there aren’t many comments, ESI Widget would work well.

Because of the way Varnish works, you wouldn’t necessarily have to run Varnish on the server running WordPress. Point the DNS at the Varnish server and set the backend for the host to your WordPress server’s IP address and you can have a Varnish server across the country caching your blog.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter