Nginx, tcp_nopush, sendfile, and memcache: The right configuration?

Feb 1 2010

Tuning Nginx ("engine-X") seems to be something of a black art. Today, I looked closely at the tcpnopush, sendfile, and keepaliverequests settings for pages rendered from PHP as a FastCGI, and memcached content. We discovered that with a little careful tuning, we could shave off as much as 200-400 msec per request.

I have been working on several speed improvements on the Condition Centers at Spine-Health.com. Initially, these pages were taking upwards of 3.5 seconds just to render the HTML. Through a series of optimizations that I will document in another article, we have the conditions page rendering in around 100 msec now.

Before we get going, let me mention a few details of our system:

  • We are running CentOS 5.3 (roughly equivalent to RHEL 5.3)
  • We are running Nginx 0.6, which is behind the current stable, but is the latest in the Fedora EPEL repositories that we use.
  • Since these settings make use of low-level kernel facilities (like TCPUNCORK), other platforms may differ. <!--break--> ## The Nginx Configuration with tconopush, sendfile, and keepalive_requests ##

Among the other tweaks, I discovered that for us, we could gain greater throughput by using the following settings only on our PHP pages. Note that many of these pages are actually serving out of various caches. On a cache miss, the render time can still be quite long. On a cache hit, though, we try to optimize for every detail we can. Since cache hits greatly outnumber cache misses, and since 200 msec on a cache hit is acceptable, we have chosen some aggressive settings for nginx:

# Inside of the location that handles PHP rendering:
sendfile on;
tcp_nopush off;
keepalive_requests 0;

sendfile and tcp_nopush

The sendfile option works very well with caching. The best explanation I've found for what it does and why it is good is an old TechRepublic article.

Normally, using tcpnopush with sendfile is good. But in <a href="http://www.baus.net/on-tcpcork">some cases (and pushing from a cache seems to be one of those), it can slow things down by as much as 200 msec -- the maximum amount of time a packet sits around waiting before it is flushed by the Kernel.

Memcache + KeepAlives + Gzip = less-than-ideal

We are using Nginx 0.6's memcache module, along with gzip. To be good Web citizens, we were using KeepAlives on our HTTP requests as often as possible. We discovered, though, that combining the memcache module, the gzip module, and KeepAlives has a negative consequence.

The connection is left open too long.

Often, we were seeing 50+ milliseconds of time spent "doing nothing". It appears that what is happening is this:

  • With memcache + gzip, Nginx is not setting a Content-Length header.
  • The browser (Safari and FireFox, at least) reads the content, but -- not knowing how much content is left -- does not know when all content has been received. Apparently, it waits up to a few hundred milliseconds before the connection is closed. (And I didn't dig deeply enough to see whether it was the client or the server that closed the connection.)

At the root of the issue seems to be the fact that when the connection is kept alive, but no Content-Length is sent, some times elapses where the client hasn't enough information to know what to do with the open connection.

The simple solution? Stop KeepAlives for pages served this way.

Setting keepalive_requests to 0 had the effect of changing the Connection header from to close, and once the last packet is sent from Nginx, the connection is closed.

Did this work? Our own performance tests (run on the live servers) indicated a drop by as much as 400 msec on some pages. No, I'm not totally sure why we experienced that much of a drop. I would have expected 50-100... but we're seeing much more.

Later: Fun Memcached Tricks

I've alluded to the fact that we have been using Memcache to get some impressive speed improvements. I'm working on a longer article detailing how we used Memcache to achieve some astounding performance numbers -- basically making it faster to serve out cached pages than static files, while at the same time reducing the CPU and disk load on our servers (and on our database).

The steps we took in this article are just a couple of the fine tuning steps we've made transitioning to our newer, faster front-end.