Archive for the ‘Python’ Category

Why do you use an Object Relational Mapping (ORM) System in Development?

Monday, October 12th, 2009

Here’s a programmer that is saying goodbye to ORMs at Hatful of Hollow.

And another site offering a tutorial of sorts dealing with ORMs Why should you use an ORM.

While both have their points, both have missed a fundamental benefit that an ORM hands you.

Most of my development is in Pylons. Django’s ORM and template language can do the same thing. A programmer that has used PHP/Smarty to develop large scale systems will likely resist ORMs. After working with a team to develop 90k+ lines of PHP/Smarty over a six year period, making the shift required a paradigm shift.

Let’s consider the following structure. We have a cp_ticket table and a cp_ticket_detail table. A Ticket can have multiple detail records. The output we wish to have is:

ticket id, ticket header information
         ticket detail line
         ticket detail line #2
ticket id, ticket header information
         ticket detail line
         ticket detail line #2
         ticket detail line #3
ticket id, ticket header information
         ticket detail line
         ticket detail line #2

Our model:

class cp_ticket(DeclarativeBase):
    __tablename__ = 'cp_ticket'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    priority = Column(mysql.MSEnum('1','2','3','4','5'), default = '3')

    ticket_detail = relation('cp_ticket_detail', order_by='cp_ticket_detail.ticket_detail_id')

class cp_ticket_detail(DeclarativeBase):
    __tablename__ = 'cp_ticket_detail'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), ForeignKey('cp_ticket.ticket_id'), default = '0')
    ticket_detail_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    stamp = Column(mysql.MSTimeStamp, PassiveDefault('CURRENT_TIMESTAMP'))
    detail = Column(mysql.MSLongText, default = '')

Our query to pass to our template:

        tickets = meta.Session.query(cp_ticket).filter(cp_ticket.client_id==1).all()

Compared with the query as you would write it without an ORM:

select * from cp_ticket,cp_ticket_detail where client_id=1 and cp_ticket.ticket_id=cp_ticket_detail.ticket_id;

Both are doing the same fundamental thing, but, the ORM maps the results almost identical to the way we want to display the data. This makes template design easy.

Using Mako, we use the following code to display the results:

<table border="1">
 <tr><th>Ticket ID</th><th>Status</th><th>Detail</th></tr>
%for ticket in tmpl_context.tickets:
  <tr>
    <td><strong>${ticket.ticket_id}</strong></td>
    <td><strong>${ticket.priority}</strong></td>
  </tr>
  %for detail in ticket.ticket_detail:
  <tr>
    <td></td>
    <td>${detail.stamp}</td>
    <td>${detail.detail}</td>
  </tr>
  % endfor
% endfor
</table>

To do the same thing without using an ORM, you need to revert to a control break structure similar to the following:

current_ticket=0
for ticket in tickets:
  if (current_ticket != ticket.ticket_id):
    #new row, print the header
    print "<tr><td>first piece</td></tr>"
    current_ticket = ticket.ticket_id
  # print our detail row
  print "<tr><td></td><td>stamp and detail</td></tr>"

Control Break structures require you to be able to set a variable within your template language. Some template languages don’t allow that. If your template language (in any language) can’t do variable assignments in the template, guess where your html generation logic needs to go?

With an ORM, the template contains your display logic. Your webmaster/design team can modify the template without having to modify html contained within your code. The loops are simple to understand and designers usually have little problem avoiding the lines that start with %.

Sure, you could wrap much of this logic in your template to do the control-break structure, but, as you get more complex data, deciding how to display the data requires a define or some other functionality.

An ORM adds some insulation to the process, but, the result is a much easier page structure when displaying related data. Granted there are some performance hits and SQLAlchemy appears to create some queries that are not optimal, unless there is a tremendous performance hit, I think the benefits of the ORM for developing a web application are tremendous.

Once you move into an environment where you are dealing with multiple developers, having a defined schema with comments is much easier than using reflection to figure out what the meaning of a status field as enum(‘U’,'A’,'P’,'C’,'R’,'S’).

However, as the original poster mentions, you can do raw SQL within SQLAlchemy and do all of your work with reflection as he has done with his ORM^H^H^H, abstraction. If he’s still using SQLAlchemy, he can selectively decide when to use it and when to avoid it.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

mysql-python and Snow Leopard

Tuesday, September 1st, 2009

After the upgrade to Snow Leopard, mysql-python cannot be installed through easy_install.

* Install mysql’s x86_64 version from the .dmg file (I had a problem doing this when booted into the 64bit kernel, a 32bit kernel macbook had no problem) With the 64bit kernel, the system reported ‘no mountable file systems’ when trying to mount the .dmg file. A reboot into 32bit mode allowed the .dmg to be mounted and installed.
* Change into your virtual environment if desired, source bin/activate
* fetch MySQL-python-1.2.3c1

tar xzf MySQL-python-1.2.3c1.tar.gz
cd MySQL-python-1.2.3c1
ARCHFLAGS='-arch x86_64' python setup.py build
ARCHFLAGS='-arch x86_64' python setup.py install

If everything works, you should see:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>>

Some of the possible things you’ll encounter:

After python setup.py build:

ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture
ld: warning: in build/temp.macosx-10.6-universal-2.6/_mysql.o, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libmygcc.a, file is not of required architecture

This means that you have the i386 version of mysql installed. Or, if you have the x86_64 version, you have didn’t include the proper ARCHFLAGS command.

ImportError: dynamic module does not define init function (init_mysql)

this means that the easy_install or the build/install process was run which tried to build a ppc/i386/x86_64 combined build.

If you see messages like:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Users/xxxxx/Python/django/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-macosx-10.6-universal.egg/_mysql.pyc, but /Users/xxxxx/Python/django/MySQL-python-1.2.3c1 is being added to sys.path
>>>

Then you are still in the build directory. cd .. and try again.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Google’s App Engine goof

Friday, July 3rd, 2009

While Google’s App Engine is a well planned service and it does work incredibly well for what it does, sometimes things break due to resource limits, etc.

While the app engine platform is still running, it appears to be an issue with this particular application’s committed resources. The App Gallery has exceeded it’s memory quota.

Google App Engine App Gallery

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

User Interface Design

Wednesday, June 24th, 2009

Programmers are not designers. Technical people should not design User Interfaces.

* 810 source files
* 90658 lines of code
* 10213 lines of html

For an internal project tasked to a series of programmers throughout the years without enough oversight, it is a mass of undocumented code with multiple programming styles. PHP allowed lazy programming, Smarty didn’t have some of the finesse required, so, the User Interface suffered. Functional but confusing to anyone that hadn’t worked intimately with the interface or been walked through it.

The truest statement is that it is easier for me to do things through the MySQL command line than through the application. While this does have a tendency to introduce possible typos, it has altered SQL practices here.

update table set value=123 where othervalue=246;

could have an accidental typo of

update table set value=123 where othervalue-=246;

which would have completely unintended consequences. One typo altered the DNS entries for 48000 records. Shortly after that typo, ingrained in company policy was that I never wanted to ever see a query like that executed in the command line regardless of how simple the command.

Even within code, the above command would be entered as:

update table set value=123 where othervalue in (246);

This prevented a number of potential typos. Even limit clauses with deletions were enforced to make sure things didn’t go too haywire in an update.

With Python, indenting is mandatory which results in multiple programmer’s code looking similar and easier to troubleshoot. Utilizing SQLAlchemy which enforces bind variables when talking with the database engine, we’ve eliminated the potential for a typo updating too many records. Even cascade deletes are enforced in SQLAlchemy even when running on top of MyISAM. With MVC, our data model is much better defined and we’re not tied down to remembering the relationship between two tables and possible dependencies. Conversion from the existing MySQL database to a DeclarativeBase model hasn’t been without issues, but, a simple python program allowed the generation of a simple model that took care of most of the issues. Hand tweaking the database model while developing the application has allowed for quite a bit of insight into issues that had been worked around rather than making adjustments to the database.

Fundamental design issues in the database structure were worked around with code rather than fixed. Data that should have been retained was not, relationships between tables was defined in code rather than in the database leading to a painful conversion.

When it was decided to rewrite the application in Python using TurboGears, I wasn’t that familiar with the codebase nor the user interface. Initially it was envisioned that the templates would be copied and the backend engine would be written to power those templates. After a few hours running through the application, and attempting the conversion on a number of templates, I realized the application was functional but it was extremely difficult to use in its current state. So much for having a programmer design an interface.

Some functionality from the existing system was needed so I peered into the codebase and was unprepared for that surprise. At this point it became evident that a non-programmer had designed the interface. While Smarty was a decent template language, it was not a formtool, so, methods were designed to give a consistent user experience when dealing with error handling. A single php file was responsible for display, form submission and validation and writing to the database for each ‘page’ in the application. The code inside should have been straightforward.

* Set up default CSS classes for each form field for an ‘ok’ result
* Validate any passed values and set the CSS class as ‘error’ for any value that fails validation
* Insert/Update the record if the validation passes
* Display the page

Some validation takes place numerous times throughout the application, and, for some reason one of the ‘coders’ decided that copy and paste of another function that used that same validation code was better than writing a function to do the validation. Of course when that validation method needed to be changed, it needed to be changed in eight places.

So, what should have been somewhat simple has changed considerably:

* Evaluate each page
* Redesign each page to make the process understandable
* Adjust terminology to make it understandable to the application’s users
* modify the database model
* rewrite the form and validation

A process that should have been simple has turned into quite a bit more work than anticipated. Basically, development boils down to looking at the page, figuring out what it should be, pushing the buttons to see what they do and rewriting from scratch.

TurboGears has added a considerable amount of efficiency to the process. One page that dealt with editing a page of information was reduced from 117 lines of code to 12 lines of code. Since TurboGears uses ToscaWidgets and Formencode, validation and form presentation is removed from the code resulting in a controller that contains the code that modifies the tables in the database with validated input. Since Formencode already has 95% of the validators that are needed for this project, we can rest assured that someone else has done the work to make sure that field will be properly validated. Other validation methods can be maintained and self-tested locally, but, defined in such a manner that they are reused throughout the application rather than being cut and pasted into each model that is validating data. In addition, bugs should be much less frequent as a result of a much-reduced codebase.

Due to the MVC framework and the libraries selected by the developers at TurboGears, I wouldn’t be surprised if the new codebase is 10%-15% the size of the existing application with greater functionality. The code should be more maintainable as python enforces some structure which will increase readability.

While I am not a designer, even using ToscaWidgets and makeform, the interface is much more consistent. Picking the right words, adding the appropriate help text to the fields and making sure things work as expected has resulted in a much cleaner, understandable interface.

While there are some aspects of ToscaWidgets that are a little too structured for some pages, our current strategy is to develop the pages using ToscaWidgets or makeform to make things as clear as possible making notes to overload the Widget class for our special forms at a later date.

While it hasn’t been a seamless transition, it did provide a good opportunity to rework the site and see a number of the problems that the application has had for a long time.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Recursive Category Table in Python

Saturday, June 13th, 2009

While working with a project, the problem with the recursive category table being built came up. The table holds the parent_id of the category name and the result is a list with the id and the name of the category. The category name is prepended with spaces equivalent to the level of indentation.

model:

class AchievementCategory(DeclarativeBase):
	__tablename__ = 'achievement_categories'

	id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key = True)
	parent_id = Column(mysql.MSBigInteger(20, unsigned = True), default = 0)
	name = Column(Unicode(80))

code:

def get_cats(n = 0, c_list = [], level = 0):
	sql = DBSession.query(AchievementCategory).filter_by(parent_id = n).order_by(AchievementCategory.name).all()
	for e in sql:
		c_list.append([e.id, level * " " + e.name, level])
		get_cats(e.id, c_list, level+1)
	return c_list

print get_cats()
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

TurboGears, Tableform and a callable option to a Widget

Wednesday, June 10th, 2009

While doing some TurboGears development I ran into an issue where I needed to generate a select field’s options from the database that was dependent on authentication. Since defining the query in the model results in a cached result when the class is instantiated, the query couldn’t be defined there. There are multiple mentions of using a callable to deal with this situation, but, no code example.

From this posting in Google Groups for TurboGears, we were able to figure out the code that made this work.

template:

<div xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      py:strip="">

${tmpl_context.form(value=value)}

</div>

controller:

    @expose('cp.templates.template')
    def form(self):
        c.form = TestForm()
        c.availips = [[3,3],[2,2]]
        return dict(template='form',title='Test Form',value=None)

model:

from pylons import c

def get_ips():
    return c.availips

class TestForm(TableForm):
    action = '/test/testadd'
    submit_text = 'Add test'

    class fields(WidgetsList):
        User = TextField(label_text='FTP Username', size=40, validator=NotEmpty())
        Password = PasswordField(label_text='FTP Password', size=40, validator=NotEmpty())
        ip = SingleSelectField(label_text="Assign to IP",options=get_ips)
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Combined Web Site Logging splitter for AWStats

Friday, May 29th, 2009

AWStats has an interesting problem when working with combined logging. When you have 500 domains and combined logfiles at roughly 2 gigabytes a day, awstats spends a lot of time shuffling through all of the log files to return the results. The simple solution appeared to be a small python script that read the awstats config directory and split the logfile into pieces so that awstats could run on individual logfiles. It requires one loop through the combined logfile to create all of the logfiles, rather than looping through the 2 gigabyte logfile for each domain when awstats was set up with combined logging.

#!/usr/bin/python

import os,re
from string import split

dirs = os.listdir('/etc/awstats')

domainlist = {}

for dir in dirs:
  if (re.search('\.conf$',dir)):
    dom = re.sub('^awstats\.', '', dir)
    dom = re.sub('\.conf$', '', dom)
    domainlist[dom] = 1

loglist = open('/var/log/apache2/combined-access.log.1','r')
for line in loglist:
  (domain,logline) = line.split(None, 1)
  if (domain in domainlist):
    if (domainlist[domain] == 1):
      domainlist[domain] = open('/var/log/apache2/' + domain + '-access.log.1', 'w')
    domainlist[domain].write(logline)

While the code isn’t particularly earthshattering, it cut down log processing to roughly 20 minutes per day rather than the previous 16-30 hours per day.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Python, Perl and PHP interoperability with pack and unpack

Monday, April 27th, 2009

Perl has very powerful capabilities for dealing with structures.  PHP’s support of those structures was based on Perl’s wisdom.  Python went a different direction.

Perl pack/unpack definitions

PING_FORMAT => ‘(a4n2N2N/a*)@245′;
TASK_FORMAT => ‘a4NIN/a*a*’;
RETR_FORMAT => ‘a4N/a*N’;
ENCPAYLOAD_FORMAT => ‘Na*’;

PHP pack/unpack definitions

define(‘TASK_FORMAT’, ‘a4NINa*a*’);
define(“ENCPAYLOAD_FORMAT”,’Na*’);

For a communications package written in perl that communicates with 32 bit and 64 bit machines that may not share the same endian structure.  The problem I’ve run into now is that Python does not support the Perl method, and, I don’t know why they didn’t at least offer some compatibility.  pack and unpack give enormous power to communication systems between machines and their support of the perl methods allowed for reasonable interoperability between the two platforms.

Python on the other hand opted to not support some of the features, which was one issue, but, their requirement is that you cannot send variable length packets.

In Python, we’re able to replicate N, network endian Long by using !L:

>>> import struct
>>> print struct.unpack(‘!L’,'\0\0\1\0′);
(256,)

However, there is no method to support a variable length payload behind that value.  We’re able to set a fixed length like 5s, but, this means that we’ve got to know the length of the payload being sent.

>>> print struct.unpack(‘!L5s’,'\0\0\1\0abcde’);
(256, ‘abcde’)

If we overstate the size of the field, Python is more than happy to tell us that the payload length doesn’t match the length of the data.

>>> print struct.unpack(‘!L8s’,'\0\0\1\0abcde’);
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/usr/lib/python2.5/struct.py”, line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 12

The cheeseshop/pypi seems to show no suitable alternative which brings up a quandry.  For this particular solution, I’ll write a wrapper function to do the heavy lifting on the two unpack strings I need to deal with and then I’ll debate pulling the perl unpack/pack routines out of the perl source and wrapping it into an .egg, possibly for distribution.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Legacy Code Fix versus Code Rewrite

Saturday, February 28th, 2009

Python Frameworks use the DRY Method — Don’t Repeat Yourself.  That is a powerful mantra to follow when developing applications.

I am faced with a quandry.  I have an application that consists of roughly 40000 lines.  That code was written in php many years ago, with a handbuilt web framework implemented with smarty.  There are a number of issues with the existing code, including inline functions duplicated through multiple files, poorly validated input and bad structure.  Outsourcing some code development appeared to be cost effective.  In the end, the code quality churned out by that vendor was sub-par.  Maintenance of that codebase easily costs twice as much as it should.

This week, a few requirements cropped up which brought up an interesting question.  Knowing that the code consists of 40000 lines of poorly written, difficult to maintain code, I debated whether fixing the existing code would be quicker than rewriting the relevent portion and coding the addendum.  TurboGears, a python webapp framework would shrink the code considerably since it is thin middleware on top of a wsgi compliant server.

Where it took 45 lines of code to do a halfway decent job of validating a few input fields in php with smarty, the equivalent code in TurboGears consists of a model definition containing the validator segment and a few lines of code to define the page.  Updating the database becomes one line of code, replacing 8-12 lines of code.

I had planned to convert the application over to TurboGears eventually, but, the scope of this current project gives me an opportunity to convert a piece of the application over while adding the new features, and leaving the legacy code running as is.

The features I need to add will take roughly 150 lines of Python/TurboGears code, or perhaps 1500-2000 lines of php to achieve the same functionality.  I have debated using another PHP framework as a stopgap, but, I have yet to find a decent form library for it that works well.

If I had to pick a favorite, ad_form from openacs would top the list.  TurboGears and Genshi with formencode come in as a close second.

I believe rewriting the portions of the app I need to write will probably take roughly the same amount of time as it would take to patch the existing code.  The investment in time will put me closer to finishing the complete rewrite of the existing system.

An added advantage is that I can fix architectural issues with the existing package that couldn’t easily be reworked without considerable effort.  If the code you are maintaining is over five years old, you owe it to yourself to check out some of the other frameworks out there.  Prior to settling on TurboGears, I looked at Django, Catalyst, Mason and a number of other frameworks.  I even searched the PHP frameworks but didn’t find anything with the strengths I saw with TurboGears.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

If you could have it all….

Tuesday, June 17th, 2008

I’m a bit of a web performance nut.  I like technology when it is used to solve real challenges and won’t use technology for technology’s sake.  When you look at today’s scalability problems of all of the web 2.0 shops, one only needs to make one real generalization.

What is the failing point of today’s sites?  How many stories have you read in the media about some rising star that gets mentioned on yahoo or digg or slashdot?  Generally, their site crashes under the crushing load (I’ve had sites slashdotted, its not as big a deal as they would have you believe).  But, the problem we face is multifaceted.

Developer learns php.  Developer discovers MySQL.  Developer stumbles across concept.  Developer cobbles together code, buys hosting — sometimes on a virtual/shared hosting environment, sometimes on a VPS, sometimes a dedicated server.  But, the software that performs well for a few friends hitting the site and acting as beta testers is never really pushed.  While the pages look nice, the engine behind them is usually poorly conceived, or worse, designed thinking that the single server or dual server web/mysql combination is going to keep them alive.

95% of the software designed and distributed under Open Source Licenses doesn’t understand the unique challenges behind a site that needs to handle 20 visitors versus 20000 visitors per hour.  Tuning apache to handle high traffic, tuning mysql indexes and mysql’s configuration and writing applications designed for high traffic is not easy.  Debugging and repairing those applications after they’ve been deployed is even harder.  Repairing while maintaining backwards compatibility adds a whole new level of complexity.

Design with scalability in mind.  I saw a blog the other day where someone was replacing a 3 server setup behind a load balancer with a single machine because the complexity of 100% uptime made their job harder.  Oh really?

What happens when your traffic needs outgrow that one server?  Whoops, I’m back to that load balanced solution that I just left.

What are the phases that you need to look for?

Is your platform ready for 100,000 users a day?  If not, what do you need to do to make sure it is ready?  Where are your bottlenecks? Where does your software break down?  What is your expansion plan?  When do you split your mysql writers and readers?  Where does your appliction boundary start and end?  What do you think breaks next?  Where is our next bottleneck?

What happens with a digg or slashdot that crushes a site?  Usually, its a site that has all sorts of dynamic content with ill conceived mysql queries generated in realtime every pageload.  I can remember a CMS framework that did 54 sql queries to display the front page.   That is just rediculous and I dumped that framework 5 minutes after seeing that.  Pity, they did have a good concept.

So, with scalability in mind, how does one engineer a solution?  LAMP isn’t the answer.

You pick a framework that doesn’t use the usual paradigms of an application.  Why should you worry about a protocol, you should design the application divorced from the protocol.  You develop an application that faces the web rather than talking direct to the web because other applications might talk to your application.  When it comes time to scale, you add machines without having to worry about task distribution.  Google does it, you should too.

Mantissa solves that problem by being a framework that encompasses all of that.  If some of these Web 2.0 sites thought about their deployment like google did — expansion wouldn’t create much turmoil.  To grow, you just add more machines to the network.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter