Tweaking Scooty Puff Sr., The Doombringer to 83,000 requests per second

Continuing on my quest for absolutely blistering web service performance (after repairing Doombringer’s settings and drives), i had always been wondering why my NginX performance was topping out at 15,000 requests per second (RPS) on Doombringer, with 18,000 RPS each on {Amy, Fry, & Leela}. Even though SR isn’t hitting anywhere near this cap, and as an aggregate, it is at the gigabit switch capacity, I always felt that it was a little low for a dual cpu i7 (8 cores) system at 2.2Ghz. Then one day, I ran across a comparison benchmarking NginX vs GWAN, where nginx was hitting around 70,000 RPS with a slower processor.  This told me that it should be possible, and after futzing around with lots of settings files and linux tcp optimization, I found (this link) that the necessary changes to achieve high performance request handling were the tcp settings. So here they are.(Pic 1)

(Pic 2)

Before: 15,000 RPS

New Settings:

# process semaphores
echo 500 512000 64 2048 > /proc/sys/kernel/sem

# tune net (reuse & recycle sockets, lower tcp timeout)
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

After (testing with a 1 pixel gif, 1 packet, open connection, close connection):

# ab2 -c 10 -n 10000 -k -H “Accept-Encoding: gzip,deflate” “”;

Server Software:        nginx/0.8.38
Server Hostname:
Server Port:            80

Document Path:          /1ptrans.gif
Document Length:        43 bytes

Concurrency Level:      10
Requests per second:    83423.01 [#/sec] (mean)
Time per request:       0.120 [ms] (mean)
Time per request:       0.012 [ms] (mean, across all concurrent requests)
Transfer rate:          29393.07 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       1
Waiting:        0    0   0.1      0       1
Total:          0    0   0.1      0       1

Percentage of the requests served within a certain time (ms)
50%      0
66%      0
75%      0
80%      0
90%      0
95%      0
98%      0
99%      0
100%      1 (longest request)

On Fry (forward/slower delivery nodes):

Requests per second:    55831.61 [#/sec] (mean)

This tweak is important for a pretty neat reason.  With regular (slow) clients, connections may stay around for a long time, so one may not see a big difference from reusing the sockets.  But Doombringer is the backend content generator; it passes the content off to the forward nodes to deliver, and those forward nodes (Amy, Fry, Leela) open lots of high speed connections to Doombringer, close them, and open them again, and they are on the local network.  So having the proxy connections hanging around slows the request handling for the forward nodes as Doombringer chills out waiting for old requests to close and reallocate instead of servicing new requests.

Pretty rockin’ right? 83K RPS on the main, and 55K RPS on Amy/Fry/Leela each = 248RPS total!  That’s 21.4 BILLION requests per day that the Doombringer system is capable of handling!!!!!!!!!!

Of course, that’s total theoretical hoke, since the switch can only handle 83K/s, and the Bandwidth cap that I have is way lower than that too.