A 53,900% speedup: Nginx, Drupal, and Memcache bring concurrency up and page load time way down
With a clever hack utilizing Memcache, Nginx, and Drupal, we have been able to speed the delivery time of many of our major pages by 53,900% (from 8,100 msec to 15 msec, according to siege and AB benchmarks). Additional, we went from being able to handle 27 concurrent requests to being able to handle 3,334 concurrent requests (a 12,248% increase).
While we performed a long series of performance optimizations, this article is focused primarily on how we managed to serve data directly from Memcached, via Nginx, without invoking PHP at all.
Read on for the full explanation of how we achieved this huge speedup. <!--break-->
How it all got started
Spine-Health has been running Drupal for a few years. We are currently running D5 (though a D7 release is in the works). The majority of Spine-Health's content is public, and most of the site is high-read, low-write. Most of the interactivity between users occurs on the site's forums. While we have registered users, most of our content looks the same to logged-in users as it does to those who are not logged in. For this reason, there are some parts of the site that we can cache with high efficiency.
In January we began an aggressive push to take the Spine-Health site and improve performance. The Spine-Health home page was taking an average of just over 1.5 seconds to render, and some of our condition center pages took upwards of eight seconds. In fact, under high load, condition centers could take as long as 24 seconds. We began a series of performance tests which ran through 32 separate environmental configurations (with almost no code changes to Drupal itself). By the end of this battery of tests, we had a clear set of changes in mind, including tuning APC and MySQL, re-configuring page caching, and switching from Apache to a lighter and faster web server. We selected Nginx due to its stability, ease of configuration, and impressive benchmarks. (Lighttpd might have worked equally as well, but that is one benchmark we did not perform.)
Adding Nginx and tuning Memcached
In our benchmarking efforts, we noticed that Apache/mod_php was one bottleneck point. Static files were served slowly, memory usage and processor usage were consistently high, and try as we might we just couldn't squeeze that many concurrent connections out of Apache -- even on static files.
We switched from Apache to Nginx ("engine X"), and the benefits became immediately obvious. Nginx consumes about 2.5M of memory, and this number seems to remain basically constant over the lifetime of the process. We configured PHP to run as a FastCGI (just using the built-in PHP FastCGI server). It consumes around 32M on average, though it may go substantially higher on certain rare but intensive tasks.
From Nginx, we turned our attention to Memcached. Spine-Health was already running Memcached, but in a suboptimal capacity. Most importantly, pages were not cached. And with a little testing, I soon learned why. Turning on page caching was harming performance, and we traced this back to a minor bug whereby pages were being expired from the cache immediately upon insertion. We fixed that.
With Nginx, Drupal, and Memached all working in conjunction, we saw our average page render time for unauthenticated users go down to 100-200 msec. But that wasn't good enough for a few reasons:
- Our concurrency was still low, hovering around 334 concurrent requests.
- Authenticated users were not seeing most of the advantages introduced by our improved caching.
- Even cached pages were consuming too many system resources because PHP was invoked, and Drupal was bootstrapped. PHP's FastCGI is not as fast as we would have liked.
And I admit it... warmnoise and I wanted to make it go faster just to see if we could.
For a while we spent time performing micro-optimizations of Nginx, PHP, APC, and Memcached. We were making gains, but not substantial ones. Then inspiration struck.
The ah-ha moment: Direct interaction between Nginx and Memcached
While micro-tuning, we became very familiar with Nginx. This little webserver has an impressive array of built-in modules that allow robust interaction with HTTP headers, proxying of content, rewriting URIs, and interacting directly with Memacached. Wait... what was that last part?
Yes, that's right: Nginx can interact directly with Memcached. If you can tell Nginx how to transform data from an HTTP request into a Memcached key, it can try to fetch the data directly from the cache.
We started reading up on this capability, and found cases where Ruby on Rails apps had been sped up using this technique.
Before long, we had a forked version of the memcache.inc
file that ships with the Memcache Drupal module. Our minor modification stored a special version of the cached page (where by special I mean not stored as a serialized PHP object) in Memcache, generating a key that we could easily configure Nginx to generate from a URI.
All told, we changed less than 25 lines of code, and had an instant and tremendous performance improvement.
How it works
Take another look at the diagram at the top of this posting.
- The first access is the "lighting access" in box #1 on that diagram. If a page is found by Nginx in the Memcache, then the page is returned immediately from Nginx. In this case, it is never passed on to the FastCGI server, so PHP and Drupal are never directly used.
- If the key is not found in the cache, the request is passed on to Drupal (box #2 in the diagram), which handles the request in the usual way. As Drupal performs its rendering, it stores the rendered page in Memcached so that the next time the page is requested, it can be handled directly by Nginx.
We can control which pages Nginx tries to fetch from the cache using Nginx's conditional request processing. In other words, we can basically tell Nginx if the request is for these URIs, serve the cached version. One of our goals was to keep some content speedy even for logged-in users, and we did this by configuring Nginx to serve cached content to authenticated users!
As things currently stand, for example, all of our articles are served from the cache. In fact, most of our content is serving this way. Under our current configuration, logged-in users will see cached article pages and condition centers, but un-cached forum pages.
With this new configuration, we are seeing the following:
- Concurrency: We max out the network bandwidth (at about 3,334 requests) before we see Nginx/Memcached hit its limits.
- Load: With so much work offloaded from PHP and MySQL, we see the load average (as reported by
top
) on our webservers drop to about 1% and our database load average sits around 2-3%. - Memory: We have allocated around 8G of space to Memcache, spread over two servers. Memory consumption for the webserver is about 2.5M per nginx instance, and about 32M per PHP FastCGI instance.
- Response time: In our benchmarks, we see Nginx/Memcached responding in 15 msec. On our actual production machines, when hit from a nearby monitor on a different network, we see response times of 34 msec.
Drawbacks and Gotchas
According to the Grateful Dead, "Every silver lining has a touch of grey." We have had some mysteries to deal with since deploying this solution. Here are some of the big ones:
- Posted data: Drupal handles form processing in an interesting way. Data is almost always POSTed back to the URI of the current page. Nginx then generates an error, since it will not fetch from Memcache when the request is a POST. Consequently, we had to do some method checking in the Nginx configuration file, and make sure that Nginx simply passes POST requests through to Drupal.
- CSS synchronization: On rare occasions, a stale automatically generated CSS file is removed from the server, but a cached version of the page still refers to it. Those cases are rare, but when it happens, the user gets an unstyled page. We have a stopgap measure in place now, but in the future we'd like to find and fix the actual problem.
- HTTP 304 Responses: One nice thing about Drupal is its ability to generate HTTP
304 Not Modified
responses. But when using such an aggressive cache, the Etag mechanism for 304 cannot be used. After some benchmarking, though, we realized that it took longer for Drupal to return a 304 than it did to just return the data from the cache. - Delays on publication: Any caching solution results in delays for content propagation, and this one is no different. When a new page is published to the site, either the caches must be cleared (bad) or we must just wait for the content to be pushed to the front line. We sorta learned to be patient... and we sorta wrote a custom module to clear specific pages from the page cache.
While we've hit these minor drawbacks, the net result -- massive speedups and lower resource utilization -- have been well worth it.
Gritty technical details
To cache a page in Memcached in such a way that Nginx can easily read it, we modified memcache.inc to do the following:
On a cache storage request for cache-page
, we grab the page cache data object, retrieve the compressed HTML, uncompress it and store it in the cache, setting an explicit expiration date for Memcached (rather than having the application level handle cache expirations), and then store the data in Memcached according to a predefined naming convention.
On the Nginx side, the server is configured to check the request URI. If the URI matches one of our predefined patterns, then Nginx tries to fetch a cached copy by constructing a cached key based on our predefined naming convention.
If data is found, Nginx streams the data out to the client. We have a chance here to gzip the data, which gives us good compression and thereby reduces network load and transfer time.
If the data is not found in the cache, Nginx simply passes the request off to the FastCGI handler and lets Drupal deal with it.
Update: Fixed a dumb spelling error in the title.