{"id":577,"date":"2009-03-09T11:54:56","date_gmt":"2009-03-09T15:54:56","guid":{"rendered":"http:\/\/cd34.com\/blog\/?p=577"},"modified":"2009-03-09T11:56:58","modified_gmt":"2009-03-09T15:56:58","slug":"professional-mysql-developer","status":"publish","type":"post","link":"https:\/\/cd34.com\/blog\/scalability\/professional-mysql-developer\/","title":{"rendered":"Professional MySQL Developer?"},"content":{"rendered":"<p>Once in a while I get to troubleshoot problems on machines specifically with MySQL.\u00a0 Sometimes that task is difficult because the vendor written code has been Zend Encoded.\u00a0 Perhaps in the name of performance, possibly to protect his code, or, to prevent me from seeing just how bad his code is.\u00a0 Diagnosing MySQL issues forensically requires a few scripts, patience, and time.<\/p>\n<p>This time, I modified \/etc\/mysql\/my.cnf on two machines and added the following:<\/p>\n<blockquote><p>log\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 = \/var\/log\/mysql\/mysql.log<br \/>\nlog-slow-queries= \/var\/log\/mysql\/mysql-slow.log<br \/>\nlog-queries-not-using-indexes<\/p><\/blockquote>\n<p>This gave me three things.\u00a0 A full log of all queries executed.\u00a0 I have written a small program to normalize the queries and sort them so I can focus on the queries that take 90% of the time.\u00a0 I then get queries that take more than 10 seconds &#8212; which can be altered by adjusting long_query_time, and, I get any query that doesn&#8217;t use an index.\u00a0 Usually I will focus on the unindexed queries as those generally cause the problems.\u00a0 Very few people understand the impact of table locks and that usually is where most problems occur.<\/p>\n<p>The first example comes from an application which claims to be able to support 100,000 visitors per day.\u00a0 Perhaps with enough hardware behind it can, but, in its current state, on a quad core xeon, about 40,000 per day is about all they can push.<\/p>\n<p>Where do we start?\u00a0 So few lines, so many errors.<\/p>\n<blockquote><p>CREATE TABLE `iplog` (<br \/>\n`ip` varchar(255) NOT NULL default &#8221;,<br \/>\n`pid` bigint(20) NOT NULL default &#8216;0&#8217;,<br \/>\n`viewed` varchar(255) NOT NULL default &#8221;,<br \/>\n`locked` tinyint(1) NOT NULL default &#8216;0&#8217;,<br \/>\nPRIMARY KEY\u00a0 (`ip`,`pid`)<br \/>\n) ENGINE=InnoDB DEFAULT CHARSET=latin1<\/p><\/blockquote>\n<p>The above schema replicates functionality in the prior version of the software.\u00a0 The old version of the system didn&#8217;t do things this way, so, I am unsure why the developer chose to alter his habits and break things in so many ways.<\/p>\n<blockquote><p>ip varchar(255)<\/p><\/blockquote>\n<p>IPv4 currently only takes 15 spaces.\u00a0 char(15) would have been more efficient taking 16 bytes to store a 15 digit IP address including the &#8216;.&#8217;.\u00a0 Even varchar(15) would have taken 20 bytes to store the 15 digit IP address.\u00a0 But varchar(255) just takes the cake.\u00a0 The proper way to store an IPv4 address is unsigned int.\u00a0 Even barring the fact that varchar(15) or char(15) would have been more space efficient, using an unsigned int allows the BTree that MyISAM creates to be a lot more balanced.<\/p>\n<p>Then, lets add an index to the IP field.\u00a0 Knowing that indexes space pad keys to their full length, the Mysql index file on the disk is actually larger than the data file itself.<\/p>\n<p>I might object to the use of InnoDB on a table with questionable value.\u00a0 The overhead for InnoDB on data that is valuable only for a few minutes seems like overkill, but, we&#8217;ll give some credit for data integrity.<\/p>\n<p>That really wasn&#8217;t the worst offender.\u00a0 Another application written by another professional programmer included roughly 80 queries to display one page.\u00a0 Once you dig deeper into the code, you find queries like this:<\/p>\n<blockquote><p># Query_time: 0\u00a0 Lock_time: 0\u00a0 Rows_sent: 1\u00a0 Rows_examined: 984<br \/>\nSELECT count(VID) AS total_videos FROM video WHERE type=&#8217;public&#8217; AND ( keyword LIKE &#8216;%keyword%&#8217;\u00a0 OR keyword LIKE &#8216;%keyword%&#8217; OR keyword LIKE &#8216;%keyword%&#8217; OR keyword LIKE &#8216;%keyword%&#8217;);<\/p><\/blockquote>\n<p>First, VID is auto_increment, not null.\u00a0 This particular query causes mysql to scan the results from the above query, and the verify VID to see if it is null for any row.\u00a0 Due to the table construction, it cannot be null, so, SELECT count(*) would have been faster.<\/p>\n<p>Second, they disassembled the input, wrapped it in % so that they could do floating matches rather than using a fulltext index which would have been much faster.\u00a0 As the table only has 984 rows now, once they start growing the site, they&#8217;ll start to see more performance issues.\u00a0 A fulltext index would allow for quite a bit more growth.<\/p>\n<p>While the following query isn&#8217;t optimizable, the results could be cached.<\/p>\n<blockquote><p># Query_time: 0\u00a0 Lock_time: 0\u00a0 Rows_sent: 5\u00a0 Rows_examined: 155<br \/>\nSELECT * from channel ORDER BY rand() LIMIT 5;<\/p><\/blockquote>\n<p>As a result of the ORDER BY rand(), MySQL will need to do a full table scan each and every time this query is called.\u00a0 In this case, twice for each pageload.<\/p>\n<p>Another particularly interesting query:<\/p>\n<blockquote><p>select tradeId from trades2 where tradeDomain = &#8216;noref&#8217;;<\/p><\/blockquote>\n<p>Taken on its own, you can see where tradeDomain may have been omitted in the indexes.\u00a0 However, the table name gives you some indication of the mess.\u00a0 Here&#8217;s another query that ran without an index:<\/p>\n<blockquote><p>select tradeId from trades28 where tradeDomain = &#8216;noref&#8217;;<\/p><\/blockquote>\n<p>There are a total of 50 trades tables.\u00a0 This corresponds to the number of &#8216;sites&#8217; in the sites table.\u00a0 It seems our developer couldn&#8217;t quite figure out how to index on two fields.<\/p>\n<p>In the last 4 hours, the unindexed queries from this one site have resulted in the following:<\/p>\n<blockquote><p>-rw-r&#8212;&#8211;\u00a0 1 mysql adm\u00a0 10869491 2009-03-09 10:53 mysql-slow.log<\/p><\/blockquote>\n<p>Roughly 54000 queries in 4 hours that had no index.\u00a0 Compared to yesterday&#8217;s logs when I added a number of indices, it is a world of difference.<\/p>\n<p>However, fixing the issues in the code is still more expensive than throwing hardware at the problem.\u00a0 The problem you run into with using hardware as a crutch is that eventually you reach an impasse where the problem must be fixed, or, you face other bottlenecks.<\/p>\n<p>Based on a few changes made by adding indexes, the load on those two machines has dropped considerably.\u00a0 Each of the vendors that wrote the systems that I analyzed claim to handle well more traffic than either of the machines is currently handling.\u00a0 The claims were written on their web sites roughly 3 years ago, and the machines that they are currently running on a much newer and much more powerful than they were at the time the software was installed.<\/p>\n<p>One of the software vendors is out of the business due to too many support issues.\u00a0 One software vendor continues churning out bad code and getting people to pay $600 for it.\u00a0 The third developer disappeared and was replaced by a team of two programmers who appear to know what they are doing and are releasing version 5.0 (the current, latest version is 3.1) &#8216;soon&#8217;.\u00a0 I know it will be soon because their web site, last updated in July 2008, says they are in limited beta.<\/p>\n<p>I guess it is time to order some hardware.<\/p>\n<div style=\"float:left;\">\n<div id=\"fb-root\"><\/div>\n<fb:like href=\"https:\/\/cd34.com\/blog\/scalability\/professional-mysql-developer\/\" width=\"250\" send=\"false\" show_faces=\"false\" layout=\"button_count\" action=\"recommend\"><\/fb:like>\n<\/div><div style=\"clear:both;\"><\/div>","protected":false},"excerpt":{"rendered":"<p>Once in a while I get to troubleshoot problems on machines specifically with MySQL.\u00a0 Sometimes that task is difficult because the vendor written code has been Zend Encoded.\u00a0 Perhaps in the name of performance, possibly to protect his code, or, to prevent me from seeing just how bad his code is.\u00a0 Diagnosing MySQL issues forensically [&hellip;]<\/p>\n<div style=\"float:left;\">\n<div id=\"fb-root\"><\/div>\n<fb:like href=\"https:\/\/cd34.com\/blog\/scalability\/professional-mysql-developer\/\" width=\"250\" send=\"false\" show_faces=\"false\" layout=\"button_count\" action=\"recommend\"><\/fb:like>\n<\/div><div style=\"clear:both;\"><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,12],"tags":[16],"class_list":["post-577","post","type-post","status-publish","format-standard","hentry","category-programming","category-scalability","tag-mysql"],"_links":{"self":[{"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/posts\/577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/comments?post=577"}],"version-history":[{"count":2,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/posts\/577\/revisions"}],"predecessor-version":[{"id":579,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/posts\/577\/revisions\/579"}],"wp:attachment":[{"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/media?parent=577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/categories?post=577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cd34.com\/blog\/wp-json\/wp\/v2\/tags?post=577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}