Ahh the dreaded temporary table and filesort. This is one performance killer that is incredibly bad on a high traffic site and the cause is fairly easy to explain.
MySQL tries to keep a result set in memory. When the query plan optimizer checks the number of rows that might be returned, it also looks at the table structure. In this case, we have text fields, and, a lot of them. However, it only takes one for MySQL to decide to write to disk.
The fix for this is somewhat simple to explain, but, may be a little difficult to implement. In our case, we have 91000 lines of some very poorly written php code that ‘builds’ the command through string concatenation, allowing for unique prefixes and tablenames. Houdini would be proud at the misdirection in this application, but, we’ve found the query through the MySQL slow query log, and we can fix it there, then, figure out where to modify the code.
Heart of the problem
select * from tablea,tableb where tablea.a=1 and tablea.b=2 and tablea.c=3 and tablea.id=tableb.id;
Of course, the initial application had no indexes on the 35000 row table. If you’re interested in some blog posts I wrote about indexing, MySQL Query Optimization, MySQL 5.1’s Query Optimizer and Designing MySQL Indices.
What is the solution to dealing with queries that return text fields?
Creative use of Subqueries is needed.
SELECT * from tablea,tableb where tablea.id in (SELECT id from tablea where a=1 and b=2 and c=3) and tablea.id=tableb.id;
But wait, I need a limit clause in my subselect and MySQL says:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Now we modify the query slightly to use a join:
select * from tablea where tablea.id join (SELECT id from tablea where a=1 and b=2 and c=3 order by c desc limit 15) subq on subq.id=tablea.id,tableb where tablea.id=tableb.id;
We’ve avoided the creation of the temporary table, we’ve avoided the filesort and we’ve saved ten seconds off this query which is loaded on every pageload.
Now to convince this person that they don’t need to regenerate the page on every pageload – only when they are adding content. But, that’s an argument for another day.