Posts Tagged ‘php’

Our best practices regarding a web hosting environment

Monday, July 9th, 2012

Over the years we’ve had to deal with persistent security scans from hosts around the world, verifying that our installations were secure. After witnessing a competitor implode this morning as the result of a hack, I’m putting this out as a few of our best practices when dealing with Virtual and Dedicated web hosting. He called, we cleaned up quite a bit, but, he’s got a lot of work to do. When I got the call, he was ready to close shop offering to let us clean up.

SSH

One of the worst offenders are the SSH bots that come in and try 60000 password combinations on your machine. Since SSH requires a secure handshake, it uses a bit of CPU. As a result, when you have 200 IP addresses on a machine and their script goes through sequentially, you’re dealing with 12 million authentication attempts.

The first thing we did was limit our machines to answer only on the primary IP address – which cut our exposure tremendously. You could move SSH to an alternate port. If you do move it, make sure to use a port number lower than 1024. Ports above 1024 can be started by unprivileged users and you don’t want someone to have crashed the SSH daemon using the OOM killer, and restarting their own. Accidentally accepting a new key when you log in under pressure results in a compromised account. Due to a number of issues, we left SSH answering on port 22, but, used IPRECENT to allow 6 connections from an IP address within a minute, or, the IP address would be blocked. With this method, an attacker could do a authentication request every 12 seconds and work around this, or, use multiple proxy servers, but, it is usually easier to just allow the automated scan to run, then, move on when you don’t find anything quickly enough.

/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --set
/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 6 -j DROP

Another possibility is using Fail2ban. Fail2ban can report back to a central server and works with numerous SSH daemon log format strings. Personally, I prefer IPRECENT for its ease of use for services without worrying about what daemon is running, and the fact that it is self-healing.

Depending on your client needs, you could also use port knocking. Port 22 is closed unless it receives two syn packets on two ports. Both an unlock and lock sequence can be added and it can be configured to add the IP you’re currently connecting from. Handy if you’re on the road and need to connect in, but, don’t want to leave your connection wide open.

POP3/IMAP

POP3 receives a ton of attempts – more so than IMAP, but, they are usually served by the same daemon. You can use the IPRECENT rule above to limit connections, but, some of the scan scripts are smart enough to pipeline requests. Adjust your daemon to return fails after 3 attempts without checking the backend and the scanner just gets a number of failed attempts.

Make sure you support TLS – possibly even disabling PLAINTEXT authentication as almost every email client out there should use SSL connections.

FTP

For a while, we limited FTP to the primary machine IP due to a number of scans. IPRECENT doesn’t work well here either as some FTP clients will try to open 10-50 connections to transfer data more quickly. Choosing a lightweight FTP daemon that has some protections built in makes a difference. You can also have only the primary IP answer and set up a CNAME of ftp.theirdomain.com to point to the hostname to avoid some confusion. FTP passes the data over the wire unencrypted and you should use FTP-SSL or, if possible FTPS. Most FTP client software understands both.

www

With SQL Injections, ZMeu scans and everyone trying to look for vulnerabilities and exploits, there are a number of things that can be done. The problem is, once they find a vulnerability, exploit code is usually left on the server. That code might run attacks against other machines, send spam or allow them a remote shell.

One method of finding out what is being executed without actually breaking things with Suhosin is to run in simulation mode with the following patch (around line 1600) in execute.c:

        } else if (SUHOSIN_G(func_blacklist) != NULL) {
                if (zend_hash_exists(SUHOSIN_G(func_blacklist), lcname, function_name_strlen+1)) {
                        suhosin_log(S_EXECUTOR, "function within blacklist called: %s()", lcname);
                      /*goto execute_internal_bailout;*/
                }
        }

You want to comment out the goto execute_internal_bailout; line so that it logs to syslog rather than actually breaking due to simulation mode not actually running as a simulation. This way, you can add commands in your blacklist, run in simulation mode and actually see what is being executed.

If you’ve never built it, untar it, phpize, ./configure, make (verify that it has built sanely), make install, modify the config file, restart apache, check the error log for segfaults, etc.

Some somewhat sane defaults:

suhosin.log.syslog.facility = 0
suhosin.simulation = on
suhosin.executor.func.blacklist = include,include_once,require_once,passthru,eval,system,popen,exec,shell_exec,preg_replace,mail
suhosin.log.syslog.facility = 5
suhosin.log.syslog.priority = 1

Why preg_replace

There are a number of ways to execute a command, but, PHP allows the e PCRE modifier which evals the result:

preg_replace("/.*/e","\x65\x76\x61\x6C\x28");

You can run mod_security, but, so many software developers have problems with it that the default .htaccess for many applications is to just disable mod_security altogether. It isn’t really security if it gets disabled.

One issue with WordPress is that it handles 404s but has to do a bit of work beforehand before it can determine if it is a 404. ZMeu scans multiple IP addresses and hammers away with a few thousand requests, most of which 404 when a WordPress site answers on the bare IP.

The quick solution to this is to create virtualhost entries for the bare IPs that point somewhere, i.e. a parking page.

#!/usr/bin/python

DEFAULT_PATH = '/var/www/uc'

import os

ips = os.popen('/sbin/ifconfig -a|grep "inet addr"|cut -f 2 -d ":"|cut -f 1 -d " "|grep -v 127.0.0.1')
for ip in ips:
    print """
<virtualhost %s:80>
ServerName %s
DocumentRoot %s
</virtualhost>""" % (ip.strip(), ip.strip(), DEFAULT_PATH)

Now, when ZMeu or any of the other scanners come along, rather than hammering away at a WordPress or Joomla site, they are handed a parking page and 404s can be handled more sanely. Briefly, this is due to the fact that ZMeu is scanning the IP without sending through hostnames in most cases. Once they start scanning using hostnames, this method won’t provide as much utility.

Another issue years ago was having DocumentRoot set to /var/www and having domains in /var/www/domain.com, /var/www/domainb.com. domain.com and domainb.com would have a /cgi-bin/ directory set with ScriptAlias, but, if you visited the machine on the bare IP that didn’t match a VirtualHost, they would then scan the DocumentRoot. Secure files saved in /cgi-bin/ that should have been executed, usually wouldn’t execute and would be handed back with a content-type: text/plain, exposing datafiles. Make sure DocumentRoot in the base configuration points to a directory that is not the parent of multiple domain directories.

.htaccess

I don’t know how this one started or why this ever happened, but, so many tools on the internet try to make creating an .htaccess to password protect your directory and they contain one fatal flaw. I covered this a bit more in depth in another post, but, briefly, in the .htaccess, the site is protected with the following:

AuthUserFile .htpasswd
AuthName "Protected Area"
AuthType Basic

<Limit GET POST>
require valid-user
</Limit>

Since the directory being protected is only protected against GET and POST requests, as long as they are using PHP to serve the pages, GETS will work. In fact, almost anything other than one of the two verbs will get results without having to be authenticated.

WordPress

One of the most popular pieces of software our clients use is WordPress followed by Joomla. Both have numerous updates throughout the year, bundling features with security patches which makes upgrading somewhat painful for people. A recent patch rolled in with 3.4 and 3.4.1 fixed security issues, but, broke a number of themes causing a bit of pain for clients. Joomla isn’t immune from this either and has made multiple security upgrades bundled with features that break backwards compatibility.

If you run multiple WordPress sites on a single machine, take a look at this post which contains code to upgrade WordPress sites from the command line.

Plugins are the second issue. WordPress and some plugins have bundled code, but, when that bundled code has a security update, the plugin that bundled it often doesn’t get updated. timthumb.php comes to mind as a plugin that we found in numerous plugins and had to do a little grep/awk magic to replace all of them. One plugin updated, but, overwrote timthumb.php with an exploitable version – causing all sorts of discontent.

SetUID versus www-data

I’ve got a much longer post regarding this. Suffice it to say that no one will ever agree one way or the other, but, I feel that limiting what a site is able to write to on the physical disk is better than allowing the webserver to write over the entire site. In the case of a WordPress site that is compromised, with www-data, they may only be able to write files to the theme directory or the upload directory, eliminating the ability to damage most of the rest of the site. Joomla however, leaves the FTP password in the clear in a config file. Once a Joomla site has been hacked that uses www-data mode, the FTP password has been exposed and can be used to modify the site.

Ok, so the site has been hacked, now what?

Restore backups and keep on running? Regrettably, that is what most hosting companies do. If you have done any of the above, particularly the Suhosin patch, you can look through the logs to see what was executed. Even code that is Zend Optimized will trigger a syslog entry so that you can at least see what may be happening. If you run in www-data mode, new .php files owned by www-data are potential suspects especially in directories that shouldn’t contain executable code. Over the years, I’ve attempted to get WordPress to disable .php execution in the /uploads directory to no avail. Since a remote exploit may allow them to save a file, a common dumping ground is the wp-content/uploads directory so that they can later have a remote shell. Sometimes, they’ll get creative and put a file in /wp-content/uploads/2012/05/blueberry.jpg.php if you have a file named blueberry.jpg.

There are a number of things to look for – modified times (though, sometimes hackers are resetting the timestamps on files to match other files in that directory), .php files where they shouldn’t be, etc. There are other tricks, sometimes they will modify .htaccess to run a .jpeg as .php, so, you can’t always depend on that. If they can’t upload a remote exploit, there is a possibility that they can upload a .jpg file that actually contains php code which can be remotely included from another exploited site. Since PHP doesn’t check the mime type on an include, exploit code could be sitting on your server.

Even mime type validation of uploads isn’t always enough.

Javascript exploits are some of the toughest to ferret out of a site. Viewing the source of a document usually results in the pre-rendered version, and if their exploit was snuck onto the end of jquery.1.7.min.js, the page can get modified after it has loaded. Firefox has a ‘View Generated Source’ option which makes it a little easier to track this down. Be prepared for some obfuscation as the code will sometimes be encrypted a bit. A Live Headers plugin can also give you a hostname to grep the contents of the site.

One of the trickier WordPress exploits was using wp_options _transient_feed_ variables to store exploit code. Cleaning that up was tricky to say the least, but, the code was gzipped, base64 encoded and stored as an option variable that was inserted into a plugin very cleanly.

Some of the code found in the template:

$z=get_option("_transient_feed_8130f985e7c4c2eda46e2cc91c38468d");
$z=base64_decode(str_rot13($z)); if(strpos($z,"2A0BBFB0")!==false){ 

and in wp_options:

| 2537 | 0 | _transient_feed_8130f985e7c4c2eda46e2cc91c38468d | s:50396:"nJLbVJEyMzyhMJDbWmWOZRWPExVjWlxcrj0XVPNtVROypaWipy9lMKOipaEcozpbZPx7VROcozyspzImqT9lMFtvp2SzMI9go2E

The trick to a cleanup is to be meticulous in your analysis. File dates, file ownership, what was changed, what logging has been done all have the ability to provide clues. Sometimes, you won’t be able to find the original exploit on the first runthrough and will have to install a bit more logging.

If you still don’t have a good idea, you can do something like this:

.htaccess:

include_path = ".:/var/www/php"
auto_prepend_file postlog.php

postlog.php

<?php

if ( (isset( $HTTP_RAW_POST_DATA ) || !empty( $_POST )) &&
     (strpos($_SERVER['REMOTE_ADDR'],'11.222.') === FALSE)
   ) {
// block out local IP addresses (monitoring, etc)

if ( !isset( $HTTP_RAW_POST_DATA ) ) {
$HTTP_RAW_POST_DATA = file_get_contents( 'php://input' );
}
 
  $buffer = "Date: " . date('M/d/Y H:i') . "\nSite: {$_SERVER[ 'HTTP_HOST' ]}\nURL: {$_SERVER[ 'REQUEST_URI' ]}\nPOST request from: {$_SERVER[ 'REMOTE_ADDR' ]}\n\nPOST DATA:: " . print_r( $_POST, 1 ) . "\nCOOKIES: " . print_r( $_COOKIE, 1 ) . "\nHTTP_RAW_POST_DATA: $HTTP_RAW_POST_DATA\nSERVER Data:" . print_r ($_SERVER, 1) . "\n----------\n" ;
  $tmp = explode('/',$_SERVER[ 'SCRIPT_NAME' ]);
  $fn = array_pop($tmp);

  $fh = fopen('/home/username/postlog/'.date('Ymd').'.'.$fn,'a+');
  fwrite($fh, $buffer);
  fclose($fh);
}
?>

What this will do is log every post request that is done and log it to a file. If the site gets exploited again, you have a forensic log that you can go back through.

The single biggest thing I can say is keep your applications updated. I know most webhosts don’t pay attention to what is running on their machines, but, it is usually easier to prevent things from breaking than to fix them after they’ve broken.

My competitor? They’re on hour 72 cleaning things up since they don’t maintain two generations of weekly backups.

Nginx to Apache?

Sunday, May 23rd, 2010

A few months ago we had a client that wanted to run Nginx/FastCGI rather than Apache because it was known to be faster. While we’ve had extensive experience performance tuning various webserver combinations, the workload proposed would really have been better served with Apache. While we inherited this problem from another hosting company — he moved because they couldn’t fix the performance issues — he maintained that Nginx/FastCGI for PHP was the fastest because of all of the benchmarks that had been run on the internet.

While the conversion to or from one server to another is usually painful, much of the pain can be avoided by running Apache on an alternate port, testing, then, swapping the configuration around. The graph below shows when we changed from Nginx to Apache:

We made the conversion from Nginx to Apache on Friday. Once we made the conversion, there were issues with the machine which was running an older kernel. After reviewing the workload, we migrated from 2.6.31.1 with the Anticipatory Scheduler to 2.6.34 with the Deadline Scheduler. Three other machines had been running 2.6.33.1 with the CFQ scheduler and showed no issues at the 10mb/sec mark, but, we felt that we might benchmark his workload using deadline. We’ve run a number of high-end webservers with both Anticipatory and CFQ prior to 2.6.33 and for most of our workloads, Anticipatory seemed to win. With 2.6.33, Anticipatory was removed, leaving NOOP, CFQ and Deadline. While we have a few MySQL servers running Deadline, this is probably the first heavy-use webserver that we’ve moved from CFQ/AS to Deadline.

The dips in the daily graph were during times where a cron job was running. The two final dips were during the kernel installation.

All in all, the conversion went well. The machine never really appeared to be slow, but, it is obvious that it is now handling more traffic. The load averages are roughly the same as they were before. CPU utilization is roughly the same, but, more importantly, Disk I/O is about half what it was and System now hovers around 3-4%. During the hourly cron job, the machine is not having issues like it was before.

Nginx isn’t always the best solution. In this case, 100% of the traffic is serving an 8k-21k php script to each visitor. Static content is served from another machine running Nginx.

While I do like Nginx, it is always best to use the right tool for the job. In this case, Apache happened to be the right tool.

Why do you use an Object Relational Mapping (ORM) System in Development?

Monday, October 12th, 2009

Here’s a programmer that is saying goodbye to ORMs at Hatful of Hollow.

And another site offering a tutorial of sorts dealing with ORMs Why should you use an ORM.

While both have their points, both have missed a fundamental benefit that an ORM hands you.

Most of my development is in Pylons. Django’s ORM and template language can do the same thing. A programmer that has used PHP/Smarty to develop large scale systems will likely resist ORMs. After working with a team to develop 90k+ lines of PHP/Smarty over a six year period, making the shift required a paradigm shift.

Let’s consider the following structure. We have a cp_ticket table and a cp_ticket_detail table. A Ticket can have multiple detail records. The output we wish to have is:

ticket id, ticket header information
         ticket detail line
         ticket detail line #2
ticket id, ticket header information
         ticket detail line
         ticket detail line #2
         ticket detail line #3
ticket id, ticket header information
         ticket detail line
         ticket detail line #2

Our model:

class cp_ticket(DeclarativeBase):
    __tablename__ = 'cp_ticket'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    priority = Column(mysql.MSEnum('1','2','3','4','5'), default = '3')

    ticket_detail = relation('cp_ticket_detail', order_by='cp_ticket_detail.ticket_detail_id')

class cp_ticket_detail(DeclarativeBase):
    __tablename__ = 'cp_ticket_detail'

    ticket_id = Column(mysql.MSBigInteger(20, unsigned = True), ForeignKey('cp_ticket.ticket_id'), default = '0')
    ticket_detail_id = Column(mysql.MSBigInteger(20, unsigned = True), primary_key=True, autoincrement = True)
    stamp = Column(mysql.MSTimeStamp, PassiveDefault('CURRENT_TIMESTAMP'))
    detail = Column(mysql.MSLongText, default = '')

Our query to pass to our template:

        tickets = meta.Session.query(cp_ticket).filter(cp_ticket.client_id==1).all()

Compared with the query as you would write it without an ORM:

select * from cp_ticket,cp_ticket_detail where client_id=1 and cp_ticket.ticket_id=cp_ticket_detail.ticket_id;

Both are doing the same fundamental thing, but, the ORM maps the results almost identical to the way we want to display the data. This makes template design easy.

Using Mako, we use the following code to display the results:

<table border="1">
 <tr><th>Ticket ID</th><th>Status</th><th>Detail</th></tr>
%for ticket in tmpl_context.tickets:
  <tr>
    <td><strong>${ticket.ticket_id}</strong></td>
    <td><strong>${ticket.priority}</strong></td>
  </tr>
  %for detail in ticket.ticket_detail:
  <tr>
    <td></td>
    <td>${detail.stamp}</td>
    <td>${detail.detail}</td>
  </tr>
  % endfor
% endfor
</table>

To do the same thing without using an ORM, you need to revert to a control break structure similar to the following:

current_ticket=0
for ticket in tickets:
  if (current_ticket != ticket.ticket_id):
    #new row, print the header
    print "<tr><td>first piece</td></tr>"
    current_ticket = ticket.ticket_id
  # print our detail row
  print "<tr><td></td><td>stamp and detail</td></tr>"

Control Break structures require you to be able to set a variable within your template language. Some template languages don’t allow that. If your template language (in any language) can’t do variable assignments in the template, guess where your html generation logic needs to go?

With an ORM, the template contains your display logic. Your webmaster/design team can modify the template without having to modify html contained within your code. The loops are simple to understand and designers usually have little problem avoiding the lines that start with %.

Sure, you could wrap much of this logic in your template to do the control-break structure, but, as you get more complex data, deciding how to display the data requires a define or some other functionality.

An ORM adds some insulation to the process, but, the result is a much easier page structure when displaying related data. Granted there are some performance hits and SQLAlchemy appears to create some queries that are not optimal, unless there is a tremendous performance hit, I think the benefits of the ORM for developing a web application are tremendous.

Once you move into an environment where you are dealing with multiple developers, having a defined schema with comments is much easier than using reflection to figure out what the meaning of a status field as enum(‘U’,’A’,’P’,’C’,’R’,’S’).

However, as the original poster mentions, you can do raw SQL within SQLAlchemy and do all of your work with reflection as he has done with his ORM^H^H^H, abstraction. If he’s still using SQLAlchemy, he can selectively decide when to use it and when to avoid it.

Rapid Application Development using Turbogears and Django

Saturday, August 8th, 2009

For the last 14 months we’ve been developing an application to replace 90000 lines of PHP code. Rewriting the application from scratch to support I18N and many of the enhancements it needed was deemed to be a better long term solution.

When that project was first started, I spent a month with Django and a month with Turbogears writing the same test application so that I could compare the development cycle. Both have matured and I needed to do a rapid turnaround on another project. I decided to give Django another look since it had hit version 1.0 and had added quite a few features that were missing in my preliminary evaluation. What follows is a discussion of the major points from both of the frameworks.

Turbogears has excellent form handling. Except for the Forms Wizard in Django, working with forms is much easier in Turbogears. Validation of the form and the resulting database update methods are much cleaner in Turbogears. Django, while slightly more complex in handling form input, does handle things with a single function which might enhance readability in a large project.

Database handling through SQL Alchemy in Turbogears is much better than the database methods in Django. Yes, you can use SQL Alchemy in Django now, but, their default ORM has plenty of room for improvement.

Turbogears is true MVC. Their terminology and methods are true to the paradigm set forth by Smalltalk. Django is MVC, but they call it MTV, Model, Template, View. The differences are slight and the developers of both have made good decisions. Adapting to either project’s methods is quick and not a hindrance.

Django’s definitely wins with Authentication/Authorization. Methods to handle registration, user creation, login and forgotten passwords are built in and wrapped with very nice templates that can be overridden. For Turbogears, repoze.who and repoze.what have been pulled from Plone and put into place. While Turbogears works with repoze, the decisions made and the lack of full support behind it make it difficult to implement.

Django feels faster. Comparing 14 months of development in Turbogears on an application to an application written in Django this week, the template engine, database access and pageload time seemed faster. Django is a lighter weight framework and you are closer to the data. Turbogears puts a little more insulation in which makes some coding easier at the expense of performance.

Maintainability of code would have to go to Turbogears. IBM once stated that the maximum number of bugfree lines of code that could be written was 23. With Turbogears, much of the heavy lifting is handled by widgets and decorators and your model resulting in a smaller codebase. Django requires more code to do the same task unless you utilize some of the snippets. Turbogears makes certain assumptions and has wrapped many of the libraries that make development easy in the default installation. Django’s default installation lacks those decisions, but, you are not prevented from establishing your own middleware. If you were developing a number of Django projects, you would pick and choose snippets that would replicate the decisions that Turbogears has already made.

URL Mapping is much easier to implement with Django. While routes support in Turbogears is present, Django’s regexp mapping is much easier to manipulate.

Community, hands down, Django wins. With a much larger installed base, bugs are found and fixed much more quickly. While Turbogears was founded on loftier principles, execution and follow through are lacking. Development is done when it is needed by a client project in the core group of developers. There is a definite air of condescension when the project developers handle questions from potential developers. With Django, there are people of all experience levels willing to help on groups.google, IRC, and thorough documentation that far exceeds most of the open source documentation out there.

Documentation, again, Django. Well organized, well thought out and well documented examples on Django’s part show dedication to having a framework that is well understood and well used. Turbogears recently moved to Sphinx, but, documentation generated from poorly documented code still means poor documentation. The tutorials and examples have been improving, but, they have a long way to go.

Genshi and Mako are supported fairly well with Turbogears and are both very good engines. Jinja is also supported which is a bit faster than Genshi and is powerful and very easy to work with. Django’s template language is also very flexible, powerful and easy to work with. Django had some definite advantages with a simpler language, but, neither Django or Turbogears stood out as a clear winner.

If you were looking to write an extremely heavy database or form involved site, I think Turbogears would be a good solution. If you choose Turbogears, be prepared to delve into the code when you are faced with problems. Bugs are not dealt with very promptly even though upgrades are pushed out. Be prepared to patch upgraded packages or hold certain packages once you move into production.

On the other hand, if you are writing a less complicated application, Django would be a good choice.

All told, the application developed this week in Django took about 12 hours and had I been working with Django for the last 14 months, I would estimate the project to have taken roughly 8 hours. Developed in Turbogears, I probably could have written it in 6 hours. PHP, to mimic all of the functionality would have taken 14-16 hours even using one of the numerous frameworks.

There is a definite advantage to using a framework and Python frameworks do appear to offer Rapid Application Development even over most of the PHP frameworks. For me, new development will probably be done in Django unless there is a strong case to use Turbogears.

* Turbogears 2.0
* Django
* Django Snippets

User Interface Design

Wednesday, June 24th, 2009

Programmers are not designers. Technical people should not design User Interfaces.

* 810 source files
* 90658 lines of code
* 10213 lines of html

For an internal project tasked to a series of programmers throughout the years without enough oversight, it is a mass of undocumented code with multiple programming styles. PHP allowed lazy programming, Smarty didn’t have some of the finesse required, so, the User Interface suffered. Functional but confusing to anyone that hadn’t worked intimately with the interface or been walked through it.

The truest statement is that it is easier for me to do things through the MySQL command line than through the application. While this does have a tendency to introduce possible typos, it has altered SQL practices here.

update table set value=123 where othervalue=246;

could have an accidental typo of

update table set value=123 where othervalue-=246;

which would have completely unintended consequences. One typo altered the DNS entries for 48000 records. Shortly after that typo, ingrained in company policy was that I never wanted to ever see a query like that executed in the command line regardless of how simple the command.

Even within code, the above command would be entered as:

update table set value=123 where othervalue in (246);

This prevented a number of potential typos. Even limit clauses with deletions were enforced to make sure things didn’t go too haywire in an update.

With Python, indenting is mandatory which results in multiple programmer’s code looking similar and easier to troubleshoot. Utilizing SQLAlchemy which enforces bind variables when talking with the database engine, we’ve eliminated the potential for a typo updating too many records. Even cascade deletes are enforced in SQLAlchemy even when running on top of MyISAM. With MVC, our data model is much better defined and we’re not tied down to remembering the relationship between two tables and possible dependencies. Conversion from the existing MySQL database to a DeclarativeBase model hasn’t been without issues, but, a simple python program allowed the generation of a simple model that took care of most of the issues. Hand tweaking the database model while developing the application has allowed for quite a bit of insight into issues that had been worked around rather than making adjustments to the database.

Fundamental design issues in the database structure were worked around with code rather than fixed. Data that should have been retained was not, relationships between tables was defined in code rather than in the database leading to a painful conversion.

When it was decided to rewrite the application in Python using TurboGears, I wasn’t that familiar with the codebase nor the user interface. Initially it was envisioned that the templates would be copied and the backend engine would be written to power those templates. After a few hours running through the application, and attempting the conversion on a number of templates, I realized the application was functional but it was extremely difficult to use in its current state. So much for having a programmer design an interface.

Some functionality from the existing system was needed so I peered into the codebase and was unprepared for that surprise. At this point it became evident that a non-programmer had designed the interface. While Smarty was a decent template language, it was not a formtool, so, methods were designed to give a consistent user experience when dealing with error handling. A single php file was responsible for display, form submission and validation and writing to the database for each ‘page’ in the application. The code inside should have been straightforward.

* Set up default CSS classes for each form field for an ‘ok’ result
* Validate any passed values and set the CSS class as ‘error’ for any value that fails validation
* Insert/Update the record if the validation passes
* Display the page

Some validation takes place numerous times throughout the application, and, for some reason one of the ‘coders’ decided that copy and paste of another function that used that same validation code was better than writing a function to do the validation. Of course when that validation method needed to be changed, it needed to be changed in eight places.

So, what should have been somewhat simple has changed considerably:

* Evaluate each page
* Redesign each page to make the process understandable
* Adjust terminology to make it understandable to the application’s users
* modify the database model
* rewrite the form and validation

A process that should have been simple has turned into quite a bit more work than anticipated. Basically, development boils down to looking at the page, figuring out what it should be, pushing the buttons to see what they do and rewriting from scratch.

TurboGears has added a considerable amount of efficiency to the process. One page that dealt with editing a page of information was reduced from 117 lines of code to 12 lines of code. Since TurboGears uses ToscaWidgets and Formencode, validation and form presentation is removed from the code resulting in a controller that contains the code that modifies the tables in the database with validated input. Since Formencode already has 95% of the validators that are needed for this project, we can rest assured that someone else has done the work to make sure that field will be properly validated. Other validation methods can be maintained and self-tested locally, but, defined in such a manner that they are reused throughout the application rather than being cut and pasted into each model that is validating data. In addition, bugs should be much less frequent as a result of a much-reduced codebase.

Due to the MVC framework and the libraries selected by the developers at TurboGears, I wouldn’t be surprised if the new codebase is 10%-15% the size of the existing application with greater functionality. The code should be more maintainable as python enforces some structure which will increase readability.

While I am not a designer, even using ToscaWidgets and makeform, the interface is much more consistent. Picking the right words, adding the appropriate help text to the fields and making sure things work as expected has resulted in a much cleaner, understandable interface.

While there are some aspects of ToscaWidgets that are a little too structured for some pages, our current strategy is to develop the pages using ToscaWidgets or makeform to make things as clear as possible making notes to overload the Widget class for our special forms at a later date.

While it hasn’t been a seamless transition, it did provide a good opportunity to rework the site and see a number of the problems that the application has had for a long time.

Entries (RSS) and Comments (RSS).
Cluster host: li