Archive for the ‘Programming Languages’ Category

Legacy Code Fix versus Code Rewrite

Saturday, February 28th, 2009

Python Frameworks use the DRY Method — Don’t Repeat Yourself.  That is a powerful mantra to follow when developing applications.

I am faced with a quandry.  I have an application that consists of roughly 40000 lines.  That code was written in php many years ago, with a handbuilt web framework implemented with smarty.  There are a number of issues with the existing code, including inline functions duplicated through multiple files, poorly validated input and bad structure.  Outsourcing some code development appeared to be cost effective.  In the end, the code quality churned out by that vendor was sub-par.  Maintenance of that codebase easily costs twice as much as it should.

This week, a few requirements cropped up which brought up an interesting question.  Knowing that the code consists of 40000 lines of poorly written, difficult to maintain code, I debated whether fixing the existing code would be quicker than rewriting the relevent portion and coding the addendum.  TurboGears, a python webapp framework would shrink the code considerably since it is thin middleware on top of a wsgi compliant server.

Where it took 45 lines of code to do a halfway decent job of validating a few input fields in php with smarty, the equivalent code in TurboGears consists of a model definition containing the validator segment and a few lines of code to define the page.  Updating the database becomes one line of code, replacing 8-12 lines of code.

I had planned to convert the application over to TurboGears eventually, but, the scope of this current project gives me an opportunity to convert a piece of the application over while adding the new features, and leaving the legacy code running as is.

The features I need to add will take roughly 150 lines of Python/TurboGears code, or perhaps 1500-2000 lines of php to achieve the same functionality.  I have debated using another PHP framework as a stopgap, but, I have yet to find a decent form library for it that works well.

If I had to pick a favorite, ad_form from openacs would top the list.  TurboGears and Genshi with formencode come in as a close second.

I believe rewriting the portions of the app I need to write will probably take roughly the same amount of time as it would take to patch the existing code.  The investment in time will put me closer to finishing the complete rewrite of the existing system.

An added advantage is that I can fix architectural issues with the existing package that couldn’t easily be reworked without considerable effort.  If the code you are maintaining is over five years old, you owe it to yourself to check out some of the other frameworks out there.  Prior to settling on TurboGears, I looked at Django, Catalyst, Mason and a number of other frameworks.  I even searched the PHP frameworks but didn’t find anything with the strengths I saw with TurboGears.

Concurrent processing

Thursday, February 5th, 2009

It has been a while since I’ve written parallel processing or concurrent processing code.  Threaded programming is something that even the experts that wrote apache and php have problems with, yet, writing this type of code is somewhat enjoyable.

It started a few years ago when I replaced 112k lines of C code and libraries that never quite did what the design document specified.  I communicated the idea to the coders, the coders wrote the design document and delivered a product that didn’t even do what the design document specified.

The code was scrapped and rewritten in perl and comprises about 1200 lines of code not including the CPAN libraries used.  The code is faster, more reliable, and is very agnostic to its task.  It has more capabilities but leaves more of the work to the tasks that is passes around which allows the code to handle communications and dispatch.

While initial testing was rather thorough based on the bugs and issues encountered during the previous version’s reign.  While we’ve run into minor glitches with the new code, it is considerably more reliable to the point where it is tasked to do more.  While the dispatch method was rewritten, concurrent task collision wasn’t tested nearly enough.

And therein lies the problem.  The previous system accepted a task, opened a connection and waited until the task completed.  Collisions couldn’t occur because each task would open a connection to the remote machine and wait until the task completed.  For short tasks, this wasn’t a real issue.  Longer tasks risked the socket timing out.  If 15 tasks were sent, 15 connections remained open until the tasks completed.

The replacement system handed off the task but didn’t wait for the task to complete.  The remote machine would handle its packet and return the task results.  The issue of multiple tasks being added for the same machine results in a few collisions.  Task order isn’t important, but, sometimes a task is fetched twice or a task is missed and left in queue.  A task in queue is redispatched, but, the double fetch issue has been difficult to debug.  Put in the slightest amount of debugging code and voila, tasks are dispatched properly under every test that can be thrown at it.  Remove the debugging code and the error returns.

While the task is left in the queue for processing and the file locking for the state machine has been double and triple checked, but, I’m sure once I dig into it, I’ll find some logic error that leaves a stale lock or incorrectly clears a lock.

I remember I used to love writing code like this, though, I always dreaded debugging it.

If you could have it all….

Tuesday, June 17th, 2008

I’m a bit of a web performance nut.  I like technology when it is used to solve real challenges and won’t use technology for technology’s sake.  When you look at today’s scalability problems of all of the web 2.0 shops, one only needs to make one real generalization.

What is the failing point of today’s sites?  How many stories have you read in the media about some rising star that gets mentioned on yahoo or digg or slashdot?  Generally, their site crashes under the crushing load (I’ve had sites slashdotted, its not as big a deal as they would have you believe).  But, the problem we face is multifaceted.

Developer learns php.  Developer discovers MySQL.  Developer stumbles across concept.  Developer cobbles together code, buys hosting — sometimes on a virtual/shared hosting environment, sometimes on a VPS, sometimes a dedicated server.  But, the software that performs well for a few friends hitting the site and acting as beta testers is never really pushed.  While the pages look nice, the engine behind them is usually poorly conceived, or worse, designed thinking that the single server or dual server web/mysql combination is going to keep them alive.

95% of the software designed and distributed under Open Source Licenses doesn’t understand the unique challenges behind a site that needs to handle 20 visitors versus 20000 visitors per hour.  Tuning apache to handle high traffic, tuning mysql indexes and mysql’s configuration and writing applications designed for high traffic is not easy.  Debugging and repairing those applications after they’ve been deployed is even harder.  Repairing while maintaining backwards compatibility adds a whole new level of complexity.

Design with scalability in mind.  I saw a blog the other day where someone was replacing a 3 server setup behind a load balancer with a single machine because the complexity of 100% uptime made their job harder.  Oh really?

What happens when your traffic needs outgrow that one server?  Whoops, I’m back to that load balanced solution that I just left.

What are the phases that you need to look for?

Is your platform ready for 100,000 users a day?  If not, what do you need to do to make sure it is ready?  Where are your bottlenecks? Where does your software break down?  What is your expansion plan?  When do you split your mysql writers and readers?  Where does your appliction boundary start and end?  What do you think breaks next?  Where is our next bottleneck?

What happens with a digg or slashdot that crushes a site?  Usually, its a site that has all sorts of dynamic content with ill conceived mysql queries generated in realtime every pageload.  I can remember a CMS framework that did 54 sql queries to display the front page.   That is just rediculous and I dumped that framework 5 minutes after seeing that.  Pity, they did have a good concept.

So, with scalability in mind, how does one engineer a solution?  LAMP isn’t the answer.

You pick a framework that doesn’t use the usual paradigms of an application.  Why should you worry about a protocol, you should design the application divorced from the protocol.  You develop an application that faces the web rather than talking direct to the web because other applications might talk to your application.  When it comes time to scale, you add machines without having to worry about task distribution.  Google does it, you should too.

Mantissa solves that problem by being a framework that encompasses all of that.  If some of these Web 2.0 sites thought about their deployment like google did — expansion wouldn’t create much turmoil.  To grow, you just add more machines to the network.

Rails… ugh.

Tuesday, June 17th, 2008

While I am not a fan of Ruby, and much less rails, there is a new project that does seem to at least raise the bar.  While I am always concerned about application performance, rails usually falls pretty flat when hit with the thundering herd.  Passenger does appear to remedy that partially.

Turbogears 1.x

Tuesday, June 17th, 2008

TurboGears looks to be one of the next great frameworks.  While django has a little more maturity, TurboGears understands MVC quite well and is quick, easy to work with.

Entries (RSS) and Comments (RSS).
Cluster host: li