MySQL performance optimization
In this post, I’m not going to talk ad-nauseum about all the little tweaks you can do to improve MySQL performance (my.cnf memory optimizations, to getting an SSD (do it, it lowers random access time for table-row lookups), to the different ways to execute joins using keys). I assume everyone is familiar with SQL and EXPLAIN types to queries to get access to MySQL’s query plan. Oh also, I’m not going to talk about key & query caching, and hit ratios. There’s plenty of that stuff everywhere. Instead, I’m going to provide some tips about empirically identifying and locating inefficiencies that slow your server down.
First of all, make sure your multi-core CPU has about the max concurrency it can handle.
# my.cnf/[mysqld] section thread_concurrency =16
Now, the biggest indicator of slow or stalling server performance (and remember from my post “Economics of Ad Delivery“, that slow performance stalls out ad delivery and prevents traffic growth) is slow running queries. Now, mostly I never paid attention to this because in the past we use forward delivery caches for all the content, so 99% of the requests never our database server. But since I started doing split A/B testing of designs, it became necessary to deliver all the content as dynamic. And suddenly the main server went from 1% cpu to 150%.
Step 1: Turn slow query logging on & log those queries not using indexes.
log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes
After a half hour, I look for a snapshot of the slow running queries:
% cat /var/log/mysql/mysql-slow.log | grep select | sed 's/[0-9]+//g' | sort | uniq -c | sort | egrep -v " [1-9] "
Basically I’m replacing all numeric values in the select queries so that they can be grouped and counted by number of occurrences.
So in just a half hour… (field names anonymized)
... 607 select * from AllPics where XXX= and YYY= and ZZZ='studentgroup'; 1883 select count(*) from GraduateComments where XXX= and YYY!=; 5607 select A, B, C, D from SurveyData where XXX>(-) and XXX<(+); 6064 select XXX from University_Transactions where (XXX_FMT like "% //%" and YYY > (unix_timestamp(Now()) - )) or (XXX_FMT like "% //%" and YYY > (unix_timestamp(Now()) - ));
This immediately notifies me there are 4 queries that can be optimized, 2 of which dominating.
Using EXPLAIN to identify the query plan and which keys were being used, I was able to spend just 15 minutes constructing better keys — from more practical/runtime empirical data. All the tables listed had plenty of keys, but the number of queries that still didn’t use the existing keys were substantial. (SR runs over a BILLION queries per day. )
% mysqladmin status Uptime: 244789 Questions: 16008747549 Queries per second avg: 65398.149
Seriously, this is absurd. Someone must be attacking the server, or a crawler is hitting it hard.
After empirical optimization, one can measure the distribution of slow queries. I use:
% cat /var/log/mysql/mysql-slow.log | fgrep Query_time | sed 's/Lock_.*//' | sort | uniq -c | sort
Results over 8 hours. (After optimizing the above)
1 # Query_time: 10 1 # Query_time: 14 1 # Query_time: 19 1 # Query_time: 305 1 # Query_time: 34 1 # Query_time: 39 1 # Query_time: 48 1 # Query_time: 59 1 # Query_time: 70 1 # Query_time: 92 2 # Query_time: 18 3 # Query_time: 13 3 # Query_time: 9 4 # Query_time: 6 8 # Query_time: 12 10 # Query_time: 5 10 # Query_time: 7 50 # Query_time: 4 123 # Query_time: 3 868 # Query_time: 1 1023 # Query_time: 2 18442 # Query_time: 0
Locks are spectacularly important too. (possibly more so, since they tell you where other queries are being blocked). I’ll leave that to another post though.
Sooooo, continuing where I left off, which now has become to figure out why I’m generating 5 billion requests per day from only 200K paying pageviews… I turn on full query logging. Just for a few minutes, as surely it will fill the disk with logs. In 10 minutes, it’s generated 40MB.
# my.cnf/[mysqld] section log = /tmp/mysqld-query.log
To group and count the types of queries:
% cat /tmp/mysqld-query.log | grep select | sed 's/.*Query//' | sed 's/[0-9]+//g' | sed 's/"[^"]*"/""/g' | sed "s/'[^']*'/''/g" |sort | uniq -c | sort | egrep -v " [1-9][0-9]? "
... 17192 select count(*) from SurveyComments where XXX= and YYY= and ZZZ= 28885 select * from Responses where XXX='' and YYY= 31135 select count(*) from Responses where XXX='' and YYY= 68228 select count(*) as ugrad from SurveyData where XXX =
After some scrounging, I find that the originator of the 68,000 requests is a long-forgotten-about page that (due to poor programming) is generating factorial combinations of the URL string so that crawlers are stuck on it, and for every university, it runs 5 additional queries instead of a join. Seriously, it was written back in 2000 and long stopped being “live” for general users on the site, but crawlers still had the URL. A few fixes and joins queries later, it is no longer producing a plethora of queries. Unfortunately, I still have to figure out what to do about the Responses queries, since they are in use.