Archive for the ‘Scalability’ Category

Professional MySQL Developer?

Monday, March 9th, 2009

Once in a while I get to troubleshoot problems on machines specifically with MySQL.  Sometimes that task is difficult because the vendor written code has been Zend Encoded.  Perhaps in the name of performance, possibly to protect his code, or, to prevent me from seeing just how bad his code is.  Diagnosing MySQL issues forensically requires a few scripts, patience, and time.

This time, I modified /etc/mysql/my.cnf on two machines and added the following:

log        = /var/log/mysql/mysql.log
log-slow-queries= /var/log/mysql/mysql-slow.log
log-queries-not-using-indexes

This gave me three things.  A full log of all queries executed.  I have written a small program to normalize the queries and sort them so I can focus on the queries that take 90% of the time.  I then get queries that take more than 10 seconds — which can be altered by adjusting long_query_time, and, I get any query that doesn’t use an index.  Usually I will focus on the unindexed queries as those generally cause the problems.  Very few people understand the impact of table locks and that usually is where most problems occur.

The first example comes from an application which claims to be able to support 100,000 visitors per day.  Perhaps with enough hardware behind it can, but, in its current state, on a quad core xeon, about 40,000 per day is about all they can push.

Where do we start?  So few lines, so many errors.

CREATE TABLE `iplog` (
`ip` varchar(255) NOT NULL default ”,
`pid` bigint(20) NOT NULL default ’0′,
`viewed` varchar(255) NOT NULL default ”,
`locked` tinyint(1) NOT NULL default ’0′,
PRIMARY KEY  (`ip`,`pid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The above schema replicates functionality in the prior version of the software.  The old version of the system didn’t do things this way, so, I am unsure why the developer chose to alter his habits and break things in so many ways.

ip varchar(255)

IPv4 currently only takes 15 spaces.  char(15) would have been more efficient taking 16 bytes to store a 15 digit IP address including the ‘.’.  Even varchar(15) would have taken 20 bytes to store the 15 digit IP address.  But varchar(255) just takes the cake.  The proper way to store an IPv4 address is unsigned int.  Even barring the fact that varchar(15) or char(15) would have been more space efficient, using an unsigned int allows the BTree that MyISAM creates to be a lot more balanced.

Then, lets add an index to the IP field.  Knowing that indexes space pad keys to their full length, the Mysql index file on the disk is actually larger than the data file itself.

I might object to the use of InnoDB on a table with questionable value.  The overhead for InnoDB on data that is valuable only for a few minutes seems like overkill, but, we’ll give some credit for data integrity.

That really wasn’t the worst offender.  Another application written by another professional programmer included roughly 80 queries to display one page.  Once you dig deeper into the code, you find queries like this:

# Query_time: 0  Lock_time: 0  Rows_sent: 1  Rows_examined: 984
SELECT count(VID) AS total_videos FROM video WHERE type=’public’ AND ( keyword LIKE ‘%keyword%’  OR keyword LIKE ‘%keyword%’ OR keyword LIKE ‘%keyword%’ OR keyword LIKE ‘%keyword%’);

First, VID is auto_increment, not null.  This particular query causes mysql to scan the results from the above query, and the verify VID to see if it is null for any row.  Due to the table construction, it cannot be null, so, SELECT count(*) would have been faster.

Second, they disassembled the input, wrapped it in % so that they could do floating matches rather than using a fulltext index which would have been much faster.  As the table only has 984 rows now, once they start growing the site, they’ll start to see more performance issues.  A fulltext index would allow for quite a bit more growth.

While the following query isn’t optimizable, the results could be cached.

# Query_time: 0  Lock_time: 0  Rows_sent: 5  Rows_examined: 155
SELECT * from channel ORDER BY rand() LIMIT 5;

As a result of the ORDER BY rand(), MySQL will need to do a full table scan each and every time this query is called.  In this case, twice for each pageload.

Another particularly interesting query:

select tradeId from trades2 where tradeDomain = ‘noref’;

Taken on its own, you can see where tradeDomain may have been omitted in the indexes.  However, the table name gives you some indication of the mess.  Here’s another query that ran without an index:

select tradeId from trades28 where tradeDomain = ‘noref’;

There are a total of 50 trades tables.  This corresponds to the number of ‘sites’ in the sites table.  It seems our developer couldn’t quite figure out how to index on two fields.

In the last 4 hours, the unindexed queries from this one site have resulted in the following:

-rw-r—–  1 mysql adm  10869491 2009-03-09 10:53 mysql-slow.log

Roughly 54000 queries in 4 hours that had no index.  Compared to yesterday’s logs when I added a number of indices, it is a world of difference.

However, fixing the issues in the code is still more expensive than throwing hardware at the problem.  The problem you run into with using hardware as a crutch is that eventually you reach an impasse where the problem must be fixed, or, you face other bottlenecks.

Based on a few changes made by adding indexes, the load on those two machines has dropped considerably.  Each of the vendors that wrote the systems that I analyzed claim to handle well more traffic than either of the machines is currently handling.  The claims were written on their web sites roughly 3 years ago, and the machines that they are currently running on a much newer and much more powerful than they were at the time the software was installed.

One of the software vendors is out of the business due to too many support issues.  One software vendor continues churning out bad code and getting people to pay $600 for it.  The third developer disappeared and was replaced by a team of two programmers who appear to know what they are doing and are releasing version 5.0 (the current, latest version is 3.1) ‘soon’.  I know it will be soon because their web site, last updated in July 2008, says they are in limited beta.

I guess it is time to order some hardware.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

A Scalable, Commodity Data Center Network Architecture

Sunday, August 24th, 2008

Looks interesting…

Abstract:
Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance.
In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages

Saturday, August 16th, 2008

Pre-generating static files is an oldy but a goody, and as Thomas Brox Røst says, it’s probably an underused strategy today. At one time this was the dominate technique for structuring a web site. Then the age of dynamic web sites arrived and we spent all our time worrying how to make the database faster and add more caching to recover the speed we had lost in the transition from static to dynamic.

Static files have the advantage of being very fast to serve. Read from disk and display. Simple and fast. Especially when caching proxies are used. The issue is how do you bulk generate the initial files, how do you serve the files, and how do you keep the changed files up to date? This is the process Thomas covers in his excellent article Serving static files with Django and AWS – going fast on a budget”, where he explains how he converted 600K thousand previously dynamic pages to static pages for his site Eventseer.net, a service for tracking academic events.

Eventseer.net was experiencing performance problems as search engines crawled their 600K dynamic pages. As a solution you could imagine scaling up, adding more servers, adding sharding, etc etc, all somewhat complicated approaches. Their solution was to convert the dynamic pages to static pages in order to keep search engines from killing the site. As an added bonus non logged-in users experienced a much faster site and were more likely to sign up for the service.

The article does a good job explaining what they did, so I won’t regurgitate it all here, but I will cover the highlights and comment on some additional potential features and alternate implementations…

read more

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

This Word, “Scaling”

Monday, June 30th, 2008

It seems that everyone on the blogosphere, including Divmod, is talking about “scaling” these days.  I’d like to talk a bit about what we mean ­— and by “we” I mean both the Twisted community and Divmod, Inc., — when we talk about “scaling”.

[more]

One of the issues that I continually deal with is the scaling of an application and its platform.  I believe this author makes the point rather well.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Planning to fail

Thursday, June 19th, 2008

Years ago, I was taught that to avoid downtime, you bought the best router, the best switches, the best computers, multithousand dollar machines, the best storage servers and put everything together…. and then hoped.

Somewhere along the line, during a phone call at 3am talking with a tech from NetApp regarding a shelf failure, it seemed that the strategy had failed.  NetApp did dispatch before we even knew the shelf had failed — their system sent an email out with the status, NetApp had a shelf couriered to the facility and the phone call I received was from the facility asking if they should let the NetApp technician in.

NetApp handled the problem exceedingly well, but, we didn’t have 100% availability because the shelf failed and knocked out multiple disks in the array.  While NetApp did suggest we would have 100% uptime with the solution, based on the facts, I don’t believe there is any way we could have had 100%.  As it stood, 99.95% was what we ended up with.

The data center we were located in at the time had two 30+ minute unplanned power outages.  Multiple power feeds, single vault construction.  Another hosting company recently had a multi-day outage because they had a single power vault for their data center.  Even more ironic, a transformer failure 3 years earlier should have taught them that lesson.

So, what does one do?

HP used to want to sell really expensive hardware as did Sun when chasing five nines, 99.999%.  Google took a different approach using COTS (Commercial Off The Shelf) hardware.

Google planned to fail.  In fact, their entire distributed computer design handles failures very gracefully.  When a node crashes, the disk storage from that machine is replicated elsewhere on the network, its tasks are handed to other nodes.  This gives Google a distinct advantage.  Because their network is designed with cheaper hardware, they can put more of it online.  With more equipment comes more CPU capacity and this allows Google to do expensive calculations that other companies can only dream of.

You can buy one dual Xeon quadcore machine that is expensive and reliable for $4500, or, you can buy 3 dual Xeon quadcore machines for $4500.  Three machines on Google’s distributed computer will provide a primary and two backups for the same price, however, all three machines are available to provide Google with CPU and Disk space for the same price as the more expensive machine.

Because Google engineered their network expecting failure, their solution is obviously very robust and approaches the holy grail of five nines.  Geographically dispersed, fault tolerant network, fault tolerant distributed computer ensures you get the results you’re looking for quickly… every time.

Did Google start their mission thinking this way, or did they hire the right thinkers?  Its certainly helped our thought processes as we engineer solutions for our clients.

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

FaceStat’s Rousing Tale of Scaling Woe and Wisdom Won

Monday, June 9th, 2008

Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo’s main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture: massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat’s situation.

This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load.

How did our fine FaceStat fellowship fair against Yahoo’s onslaught?

read more

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

LinkedIn Architecture

Wednesday, June 4th, 2008

LinkedIn is the largest professional networking site in the world. LinkedIn employees presented two sessions about their server architecture at JavaOne 2008. This post contains a summary of these presentations.

Key topics include:

  • Up-to-date statistics about the LinkedIn user base and activity level
  • The evolution of the LinkedIn architecture, from 2003 to 2008
  • “The Cloud”, the specialized server that maintains the LinkedIn network graph
  • Their communication architecture
del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter

Biggest Under Reported Story: Google’s BigTable Costs 10 Times Less than Amazon’s SimpleDB

Saturday, May 31st, 2008

Why isn’t Google’s aggressive new database pricing strategy getting more pub? That’s what Bill Katz, instigator of the GAE Meetup and prize winning science fiction author is wondering:

It's surprising that the blogosphere hasn't picked up the biggest difference in pricing:
Google's datastore is less than a tenth of the price of Amazon's SimpleDB while offering a better API.

If money matters to you then the burn rate under GAE could be convincingly lower. Let’s compare the numbers:

read more

del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon Twitter