Fig 1)
Log (base 10) of the number of queries taking X amount of seconds to execute. So 4.2 => 18,442 queries took 0 seconds to execute (but were logged because no index was used). 23 took 10 seconds or more to execute.

MySQL performance optimization

In this post, I’m not going to talk ad-nauseum about all the little tweaks you can do to improve MySQL performance (my.cnf memory optimizations, to getting an SSD (do it, it lowers random access time for table-row lookups), to the different ways to execute joins using keys). I assume everyone is familiar with SQL and EXPLAIN types to queries to get access to MySQL’s query plan. Oh also, I’m not going to talk about key & query caching, and hit ratios. There’s plenty of that stuff everywhere. Instead, I’m going to provide some tips about empirically identifying and locating inefficiencies that slow your server down.

First of all, make sure your multi-core CPU has about the max concurrency it can handle.

# my.cnf/[mysqld] section
thread_concurrency              =16

Now, the biggest indicator of slow or stalling server performance (and remember from my post “Economics of Ad Delivery“, that slow performance stalls out ad delivery and prevents traffic growth) is slow running queries. Now, mostly I never paid attention to this because in the past we use forward delivery caches for all the content, so 99% of the requests never our database server. But since I started doing split A/B testing of designs, it became necessary to deliver all the content as dynamic. And suddenly the main server went from 1% cpu to 150%.

Step 1: Turn slow query logging on & log those queries not using indexes.

log_slow_queries  = /var/log/mysql/mysql-slow.log
long_query_time  = 1
log-queries-not-using-indexes

After a half hour, I look for a snapshot of the slow running queries:

% cat /var/log/mysql/mysql-slow.log | grep select | sed 's/[0-9]+//g' | sort | uniq -c | sort | egrep -v " [1-9] "

Basically I’m replacing all numeric values in the select queries so that they can be grouped and counted by number of occurrences.

So in just a half hour… (field names anonymized)

...
607 select * from AllPics where XXX= and YYY= and ZZZ='studentgroup';
1883 select count(*) from GraduateComments where XXX= and YYY!=;
5607 select A, B, C, D from SurveyData where XXX>(-) and XXX<(+);
6064 select XXX from University_Transactions where (XXX_FMT like "% //%" and YYY > (unix_timestamp(Now()) - )) or (XXX_FMT like "% //%" and YYY > (unix_timestamp(Now()) - ));

This immediately notifies me there are 4 queries that can be optimized, 2 of which dominating.

Using EXPLAIN to identify the query plan and which keys were being used, I was able to spend just 15 minutes constructing better keys — from more practical/runtime empirical data. All the tables listed had plenty of keys, but the number of queries that still didn’t use the existing keys were substantial. (SR runs over a BILLION queries per day. )

% mysqladmin status

Uptime: 244789
Questions: 16008747549
Queries per second avg: 65398.149

= 5,663,700,354/day

Seriously, this is absurd. Someone must be attacking the server, or a crawler is hitting it hard.

After empirical optimization, one can measure the distribution of slow queries. I use:

% cat /var/log/mysql/mysql-slow.log | fgrep Query_time | sed 's/Lock_.*//' | sort | uniq -c | sort

Results over 8 hours. (After optimizing the above)

1 # Query_time: 10
1 # Query_time: 14
1 # Query_time: 19
1 # Query_time: 305
1 # Query_time: 34
1 # Query_time: 39
1 # Query_time: 48
1 # Query_time: 59
1 # Query_time: 70
1 # Query_time: 92
2 # Query_time: 18
3 # Query_time: 13
3 # Query_time: 9
4 # Query_time: 6
8 # Query_time: 12
10 # Query_time: 5
10 # Query_time: 7
50 # Query_time: 4
123 # Query_time: 3
868 # Query_time: 1
1023 # Query_time: 2
18442 # Query_time: 0

(Fig 1)

Locks are spectacularly important too. (possibly more so, since they tell you where other queries are being blocked). I’ll leave that to another post though.

Sooooo, continuing where I left off, which now has become to figure out why I’m generating 5 billion requests per day from only 200K paying pageviews… I turn on full query logging. Just for a few minutes, as surely it will fill the disk with logs. In 10 minutes, it’s generated 40MB.

# my.cnf/[mysqld] section
log            = /tmp/mysqld-query.log

To group and count the types of queries:

% cat /tmp/mysqld-query.log | grep select | sed 's/.*Query//' | sed 's/[0-9]+//g' | sed 's/"[^"]*"/""/g' | sed "s/'[^']*'/''/g" |sort | uniq -c | sort | egrep -v " [1-9][0-9]? "
...
  17192        select count(*) from SurveyComments where XXX= and YYY= and ZZZ=
  28885        select * from Responses where XXX='' and YYY=
  31135        select count(*) from Responses where XXX='' and YYY=
  68228        select count(*) as ugrad from SurveyData where XXX =

After some scrounging, I find that the originator of the 68,000 requests is a long-forgotten-about page that (due to poor programming) is generating factorial combinations of the URL string so that crawlers are stuck on it, and for every university, it runs 5 additional queries instead of a join. Seriously, it was written back in 2000 and long stopped being “live” for general users on the site, but crawlers still had the URL. A few fixes and joins queries later, it is no longer producing a plethora of queries. Unfortunately, I still have to figure out what to do about the Responses queries, since they are in use.