Archive for the ‘Web Infrastructure’ Category

User Interface Design

Wednesday, June 24th, 2009

Programmers are not designers. Technical people should not design User Interfaces.

* 810 source files
* 90658 lines of code
* 10213 lines of html

For an internal project tasked to a series of programmers throughout the years without enough oversight, it is a mass of undocumented code with multiple programming styles. PHP allowed lazy programming, Smarty didn’t have some of the finesse required, so, the User Interface suffered. Functional but confusing to anyone that hadn’t worked intimately with the interface or been walked through it.

The truest statement is that it is easier for me to do things through the MySQL command line than through the application. While this does have a tendency to introduce possible typos, it has altered SQL practices here.

update table set value=123 where othervalue=246;

could have an accidental typo of

update table set value=123 where othervalue-=246;

which would have completely unintended consequences. One typo altered the DNS entries for 48000 records. Shortly after that typo, ingrained in company policy was that I never wanted to ever see a query like that executed in the command line regardless of how simple the command.

Even within code, the above command would be entered as:

update table set value=123 where othervalue in (246);

This prevented a number of potential typos. Even limit clauses with deletions were enforced to make sure things didn’t go too haywire in an update.

With Python, indenting is mandatory which results in multiple programmer’s code looking similar and easier to troubleshoot. Utilizing SQLAlchemy which enforces bind variables when talking with the database engine, we’ve eliminated the potential for a typo updating too many records. Even cascade deletes are enforced in SQLAlchemy even when running on top of MyISAM. With MVC, our data model is much better defined and we’re not tied down to remembering the relationship between two tables and possible dependencies. Conversion from the existing MySQL database to a DeclarativeBase model hasn’t been without issues, but, a simple python program allowed the generation of a simple model that took care of most of the issues. Hand tweaking the database model while developing the application has allowed for quite a bit of insight into issues that had been worked around rather than making adjustments to the database.

Fundamental design issues in the database structure were worked around with code rather than fixed. Data that should have been retained was not, relationships between tables was defined in code rather than in the database leading to a painful conversion.

When it was decided to rewrite the application in Python using TurboGears, I wasn’t that familiar with the codebase nor the user interface. Initially it was envisioned that the templates would be copied and the backend engine would be written to power those templates. After a few hours running through the application, and attempting the conversion on a number of templates, I realized the application was functional but it was extremely difficult to use in its current state. So much for having a programmer design an interface.

Some functionality from the existing system was needed so I peered into the codebase and was unprepared for that surprise. At this point it became evident that a non-programmer had designed the interface. While Smarty was a decent template language, it was not a formtool, so, methods were designed to give a consistent user experience when dealing with error handling. A single php file was responsible for display, form submission and validation and writing to the database for each ‘page’ in the application. The code inside should have been straightforward.

* Set up default CSS classes for each form field for an ‘ok’ result
* Validate any passed values and set the CSS class as ‘error’ for any value that fails validation
* Insert/Update the record if the validation passes
* Display the page

Some validation takes place numerous times throughout the application, and, for some reason one of the ‘coders’ decided that copy and paste of another function that used that same validation code was better than writing a function to do the validation. Of course when that validation method needed to be changed, it needed to be changed in eight places.

So, what should have been somewhat simple has changed considerably:

* Evaluate each page
* Redesign each page to make the process understandable
* Adjust terminology to make it understandable to the application’s users
* modify the database model
* rewrite the form and validation

A process that should have been simple has turned into quite a bit more work than anticipated. Basically, development boils down to looking at the page, figuring out what it should be, pushing the buttons to see what they do and rewriting from scratch.

TurboGears has added a considerable amount of efficiency to the process. One page that dealt with editing a page of information was reduced from 117 lines of code to 12 lines of code. Since TurboGears uses ToscaWidgets and Formencode, validation and form presentation is removed from the code resulting in a controller that contains the code that modifies the tables in the database with validated input. Since Formencode already has 95% of the validators that are needed for this project, we can rest assured that someone else has done the work to make sure that field will be properly validated. Other validation methods can be maintained and self-tested locally, but, defined in such a manner that they are reused throughout the application rather than being cut and pasted into each model that is validating data. In addition, bugs should be much less frequent as a result of a much-reduced codebase.

Due to the MVC framework and the libraries selected by the developers at TurboGears, I wouldn’t be surprised if the new codebase is 10%-15% the size of the existing application with greater functionality. The code should be more maintainable as python enforces some structure which will increase readability.

While I am not a designer, even using ToscaWidgets and makeform, the interface is much more consistent. Picking the right words, adding the appropriate help text to the fields and making sure things work as expected has resulted in a much cleaner, understandable interface.

While there are some aspects of ToscaWidgets that are a little too structured for some pages, our current strategy is to develop the pages using ToscaWidgets or makeform to make things as clear as possible making notes to overload the Widget class for our special forms at a later date.

While it hasn’t been a seamless transition, it did provide a good opportunity to rework the site and see a number of the problems that the application has had for a long time.

Nginx impresses yet again

Wednesday, April 22nd, 2009

First three machines went pretty well without a hitch.  Another client machine was having some issues with apache performance.  They were still running prefork, not our typical mpm-worker/fastcgid php setup when machines need that extra push.

The client’s application was able to be modified quickly to replace the url of images, so, we ran nginx in more of a Content Delivery Network capacity where it overlaid their static images directories allowing them to make a tiny change to their code and the images would be served from Nginx while their code ran untouched on Apache.

I am amazed Apache held up as well as it did.  Within minutes of the conversion, apache dropped from 740 active processes to roughly 300.  During its normal peak times, Apache is still handing about 400 processes, but, the machine has roughly 2gb cached up from about 600mb when running pure Apache.  That alone has got to be helping things considerably.

Two minor issues in the logs that were fixed by fixing ulimit -n and

events {
worker_connections  8192;
}

With those two changes, the machine has performed flawlessly.  Even with our settings at 1024, only in times of extreme traffic, did we get a handful of warnings.

The load has dropped, the machine has much more idle cpu time and did seem to hit a new traffic record today.

Apache, Varnish, nginx and lighttpd

Wednesday, April 1st, 2009

I’ve never been happy with Apache’s performance.  It seemed that it always had problems with high volume sites.  Even extremely tweaked configurations resulted in decent performance to a point which then required more hardware to continue going.  While I had been a huge fan of Tux, sadly, Tux doesn’t work with Linux 2.6 kernels very well.

So, the search was on.  I’ve used many webservers over the years ranging from AOLServer to Paster to Caudium looking for a robust, high-performance solution.  I’ve debated caching servers in front of Apache, a server to handle just static files and coding the web sites to utilize that, but, I never really found the ultimate solution to handle particular requirements.

This current problem is a php driven site with roughly 100 page elements plus the generated page itself.  The site receives quite a bit of traffic and we’ve had to tweak Apache quite a bit from our default configuration to keep the machine performing well.

Apache can be run many different ways.  Generally when a site uses php, we’ll run mod_php because it is faster.  Eaccelerator can help sometimes — though, does create a few small problems, but, in general, Apache-mpm-prefork runs quite well.  On sites where we’ve had issues with traffic, we’ve switched over to Apache-mpm-worker with a fastcgi php process.  This works quite well even though php scripts are slightly slower.

After considerable testing, I came up with three decent metrics that I used to judge things.  Almost all testing was done with ab (apachebench) running 10000 connections with keepalives and 50 concurrent sessions from a dual quad-core xeon machine to a gigE connected machine on the same switch running a core2quad machine.  On the first IP was bare apache, the second IP had lighttpd, the third IP ran nginx and the fourth IP ran Varnish in front of Apache.  Everything was set up so that no restarts of daemons would need to be made, the tests were run twice with the second result generally being the higher of the two which was used.  The linux kernel does some caching and we’re after the performance after the kernel has done its caching, apache has forked its processes and hasn’t killed off the children, etc.

First impressions from Apache-mpm-prefork were that it handled php exceedingly well, but, has never had great performance with static files.  This is why Tux prevailed for us as Apache handled what it did best and Tux handled what it did best.  Regrettably, Tux didn’t keep up with the 2.6 kernel and development ceased.  With new hardware, the 2.6 kernel and the ability for userland processes to get access to sendfile, large file transfer should be almost the same for all of the processes so, startup latency of the tiny files was what really seemed to harm Apache.  Apache-mpm-worker with php running as fastcgi has always been a fallback for us to gain a little more serving capacity as most sites have a relatively heavy static file to dynamic file construction.

But, Apache seemed to have problems with the type of traffic our clients are putting through and we felt that there had to be a better way.  I’ve read page after page of people complaining about their Drupal installation being able to take 50 users and then they upgraded to nginx or lighttpd and now their site doesn’t run into swap issues.  If your server is having problems with 50 simultaneous users with apache, you have serious problems with your setup.  It is not uncommon for us to push a P4/3.0ghz with 2gb ram with 80mb/sec traffic and MySQL running 1000 queries per second.  Where your apache logfile reaches 6gb/day for a domain not including the other 30 domains configured on the machine.  VBulletin will easily run 350 online users and 250 guests on the same hardware without any difficulties.  The same with Joomla, Drupal and the other CMS products out there.  If you can’t run 50 simultaneous users, with any of those products, dig into the configs FIRST so that you are comparing a tuned configuration to a tuned configuration.

Uptime: 593254  Threads: 571  Questions: 609585858  Slow queries: 1680967  Opens: 27182  Flush tables: 1  Open tables: 2337  Queries per second avg: 1027.529

86

Based on all of my reading, I expected Varnish -> Apache2 to be the fastest followed by nginx, lighttpd and bare Apache.  Lighttpd has some interesting design issues that I believed would put it behind nginx, I really expected Varnish would do really well.  For this client, we needed the FLV streaming so, I knew I would be running nginx or lighttpd for a backend for the .flv files and contemplated running Varnish in front of whichever of those performed best.  Splitting things so that the .flv files were served from a different domain was no problem for this client, so, we weren’t having to put a solution in place where we couldn’t make changes.

The testing methodology was based on numerous runs of ab where I tested and tweaked each setup.  I am reasonably sure that someone with vast knowledge of Varnish, nginx or lighttpd would not be able to substantially change the results.  Picking out the three or four valid pieces of information from all of the testing to give me a generalized result was difficult.

The first thing I was concerned with was the raw speed on a small 6.3kb file.  With keepalives enabled, that was a good starting point.  The second test was to run a page that called phpinfo();.  Not an exceedingly difficult test, it does at least start the php engine, process a page and return the result.  The third test was to download a 21mb flv file.  All of the tests were run with 10000 iterations and 50 concurrent threads except the 21mb flv file which ran 100 iterations and 10 concurrent threads due to the time it took.

Server Small File Requests Per Second phpinfo() Requests Per Second .flv MB/Sec Min/Max time to serve .flv Time to run ab for .flv test
Apache-mpm-prefork 1000 164 11.5MB/sec 10-26 seconds 182 seconds
Apache-mpm-worker 1042 132 11.5MB 11-25 seconds 181 seconds
Lighttpd 1333 181 11.4MB 13-23 seconds 190 seconds
nginx 1800 195 11.5MB 14-24 seconds 187 seconds
Varnish 1701 198 11.3MB 18-30 seconds 188 seconds

Granted, I expected more from Varnish and it’s caching nature does shine through.  It is considerably more powerful than nginx due to some of the internal features it has for load balancing, multiple backends, etc.  However, based on the results above, I have to believe that in this case, nginx wins.

There are a number of things about the nginx documentation that were confusing.  First was that they used inet rather than a local socket for communication with the php-cgi process.  That alone bumped up php almost 30 transactions per second.  The documentation for nginx is sometimes very terse and it required a bit more time to get configured correctly.  While I do have both php and perl cgi working with nginx natively, some perl cgi scripts do have minor issues which I’m still working out.

Lighttpd performed about as well as I expected.  Due to some backend design issues, there are some things that made me believe it wouldn’t be the top performer.  It is also older and more mature than Nginx and Varnish which use today’s tricks to accomplish their magic.  File transfer speed is going to be somewhat capped because the Linux kernel opens up some APIs that allow a userspace application to ask the kernel to handle the transfer.  Every application tested takes advantage of this.

Given the choice of Varnish or Nginx for a project that didn’t require .flv streaming, I might consider Varnish.  Lighttpd did have one very interesting module that prevented hotlinking of files in a much different manner than normal — I’ll be testing that for another application. If you are used to Apache mod_rewrite rules, Nginx and Lighttpd have a completely different structure for these.  They work in almost the same manner with some minor syntax changes.  Varnish runs as a cache to the frontend of your site, so, everything works with it the same way it does under Apache since Varnish merely connects to your Apache backend and caches what it can.  Its configuration language allows considerable control over the process.

Short of a few minor configuration tweaks, this particular client will be getting nginx.

Overall, I don’t believe you can take an agnostic approach to webservers.  Every client’s requirements are different and they don’t all fit into the same category.  If you run your own web server, you can make choices to make sure your site runs as well as it can.  From the number of pages showing stellar performance gains from switching from Apache to something else, if most of those writers spent the same time debugging their apache installation as they did migrating to a new web server, I would imagine 90% of them would find Apache meets their needs just fine.

The default out of the box configuration of MySQL and Apache in most Linux distributions leaves a lot to be desired.  To compare those configurations with a more sane default supplied by the software developers of competing products doesn’t really give a good comparison.  I use Debian, and their default configurations for Apache, MySQL and a number of other applications are terrible for any sort of production use.  Even Redhat has some fairly poor default configurations for many of the applications you would use to serve your website.  Do yourself a favor and do a little performance tuning with your current setup before you start making changes.  You might find the time invested well worth it.

Documentation Redux

Friday, June 27th, 2008

156 hours.

That’s what it took to track down a solution to a problem with some Open Source software.  The software was written in the early 2000 time frame, the last documentation update was in 2006.  The scenario that we were designing was documented on a page written in 2004.

The issue we ran into must be something that someone else has stumbled into because it is a very basic piece of the operation of this piece of software, but, in perusing all of the available documentation, using google to find any possible references, looking through all FAQs, committed code, mailing lists, etc. the solution presented itself on a page last updated in January of 2000.

A three line mention.

That’s it.

The author of the FAQ written in 2004 that describes the process and documents every step save for one very important part.  The three line mention in another FAQ, coincidentally written by the FAQ author in an email that was sent to a mailing list and included in someone else’s FAQ.

This is the inherent cost in Open Source.

We’re not using this software in an odd manner — in fact, the feature we were trying to use is one of the three fundamental uses.  The software hasn’t changed much in the last 4 years, but, it just goes to show you that documentation is easily forgotton in the Open Source world.

Would I have it any other way?  No.  I prefer open source because we can develop solutions that give us a competitive edge, and, if we need to, we can change the code to fix problems that the developers won’t fix.

Often times our requirements are based on a business case which conflicts with some of the purist open source coders.

Open Source Documentation, what?

Sunday, June 22nd, 2008

The technology is there to make it very easy for an open source project to document.  Wiki’s, blogs, web access to revision control software, but, documentation is usually done as an afterthought, or worse, left up to the people that may not completely understand the product.

I have written technical documentation many times in my life and I evaluate a lot of open source projects to see if they fit into our organization and I can tell you, a large percentage of the documentation for many open source projects is extremely bad.

A recent project that we started to evaluate had configs on their web site documentation, which was powered by a wiki, that directly contradicted their mailing list responses from the company.  Thirty seconds later and I was able to correct the documentation to reflect the right information, but, that post which was 3 months old and directly referenced the incorrect wiki page never elicited an update.

Open Source Project maintainers — if you want people to use your product, you MUST provide good documentation.  Samples of config files with quick comments about usability are a start if you’re not going to completely document the required config files, but, a project with little to no documentation will not get adopted by the masses.

While I appreciate the fact that the coders don’t like to write documentation, if you are going to publish a project and expect people to use it, take some time to write some documentation.  When someone suggests changes or makes modifications to the wiki, be receptive rather than adversarial.

Your project will succeed much more quickly.

Also, if you have a commercial support package, and while I’m beta testing some software, I am also testing your support team’s attitude.  I know the same guys hanging out on irc, monitoring the mailing list and responding to bug and feature inquiries are the same people I’m going to be contacting for support.  Treat me wrong and I’ll find another solution.

Monetizing GPLed software isn’t easy — I know that.  But make it easy for those of us that will end up relying on your solution and are willing to pay for a support contract to make sure we get the support we need.

Entries (RSS) and Comments (RSS).
Cluster host: li