Optimizing MySQL and Apache for Low Memory Usage, Part 1

MySQL and Apache can consume quite a bit of memory, if you’re not careful. This post discusses how to reduce the amount of memory they use without killing performance. The caveat, of course, is that you’re not going to be able to run a site with a large database and large amount of traffic with these settings. I’m going to try to explain the WHY more than the WHAT. All of this is in conjunction with my goal of reducing the amount of ram I use on my Xen based virtual server, as discussed previously in, Low Memory Computing.
Before I begin, I’d like to say that you should also look at various system utilities that consume ram. Services like FTP and SMTP can and should be passed off to xinetd. Also, you should look at shells besides bash, such as dash. And, if you’re really serious about low memory, you might look at using something like BusyBox, which brings you into the realm of real embedded systems. Personally, I just want to get as much as I can out of a standard linux distribution. If I need more horsepower, I want to be able to move to bigger, faster virtual machines and/or dedicated servers. For now, optimizing a small virtual machine will do.
First off, Apache. My first statement is, if you can avoid it, try to. Lighttpd and thttpd are both very good no frills webservers, and you can run lighttpd with PHP. Even if you’re running a high volume site, you can seriously gain some performance by passing off static content (images and javascript files, usually) to a lightweight, super-fast HTTPd server such as Lighttpd.

The biggest problem with Apache is the amount of ram is uses. I’ll discuss the following techniques for speeding up Apache and lowering the ram used.

  • Loading Fewer Modules
  • Handle Fewer Simultaneous Requests
  • Recycle Apache Processes
  • Use KeepAlives, but not for too long
  • Lower your timeout
  • Log less
  • Don’t Resolve Hostnames
  • Don’t use .htaccess

Loading Fewer Modules

First things first, get rid of unnecessary modules. Look through your config files and see what modules you might be loading. Are you using CGI? Perl? If you’re not using modules, by all means, don’t load them. That will save you some ram, but the BIGGEST impact is in how Apache handles multiple requests.

Handle Fewer Simultaneous Requests

The more processes apache is allowed to run, the more simultaneous requests it can serve. As you increase that number, you increase the amount of ram that apache will take. Looking at TOP would suggest that each apache process takes up quite a bit of ram. However, there are a lot of shared libraries being used, so you can run some processes, you just can’t run a lot. With Debian 3.1 and Apache2, the following lines are the default:

StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 20
MaxRequestsPerChild 0

I haven’t found documentation on this, but prefork.c seems to be the module that’s loaded to handle things w/ Apache2 and Debian 3.1. Other mechanisms could or could not be much more memory efficient, but I’m not digging that deep, yet. I’d like to know more, though, so post a comment and let me know. Anyway, the settings that have worked for me are:

StartServers 1
MinSpareServers 1
MaxSpareServers 5
MaxClients 5
MaxRequestsPerChild 300

What I’m basically saying is, “set the maximum amount of requests that this server can handle at any one time to 5.” This is pretty low, and I wouldn’t try to do this on a high volume server. However, there is something you can and should do on your webservers to get the most out of them, whether you’re going for low memory or not. That is tweak the keepalive timeout.

Recycle Apache Processes

If you noticed, I changed the MaxRequestsPerChild variable to 500, from 0. This variable tells Apache how many requests a given child process can handle before it should be killed. You want to kill processes, because different page requests will allocate more memory. If a script allocates a lot of memory, the Apache process under which it runs will allocate that memory, and it won’t let it go. If you’re bumping up against the memory limit of your system, this could cause you to have unnecessary swapping. Different people use different settings here. How to set this is probably a function of the traffic you receive and the nature of your site. Use your brain on this one.

Use KeepAlives, but not for too long

Keepalives are a way to have a persistent connection between a browser and a server. Originally, HTTP was envisioned as being “stateless.” Prior to keepalive, every image, javascript, frame, etc. on your pages had to be requested using a separate connection to the server. When keepalives came into wide use with HTTP/1.1, web browsers were able to keep a connection to a server open, in order to transfer multiple files across that same connection. Fewer connections, less overhead, more performance. There’s one thing wrong, though. Apache, by default, keeps the connections open for a bit too long. The default seems to be 15 seconds, but you can get by easily with 2 or 3 seconds.

This is saying, “when a browser stops requesting files, wait for X seconds before terminating the connection.” If you’re on a decent connection, 3 seconds is more than enough time to wait for the browser to make additional requests. The only reason I can think of for setting a higher KeepAliveTimeout is to keep a connection open for the NEXT page request. That is, user downloads page, renders completely, clicks another link. A timeout of 15 would be appropriate for a site that has people clicking from page to page, very often. If you’re running a low volume site where people click, read, click, etc., you probably don’t have this. You’re essentially taking 1 or more apache processes and saying, “for the next 15 seconds, don’t listen to anyone but this one guy, who may or may not actually ask for anything.” The server is optimizing one case at the expense of all the other people who are hopefully hitting your site.

Lower Your Timeout

Also, just in case, since you’re limiting the number of processes, you don’t want one to be “stuck” timing out for too long, so i suggest you lower your “normal” Timeout variable as well.

Log Less

If you’re trying to maximize performance, you can definitely log less. Modules such as Mod_Rewrite will log debugging info. If you don’t need the debugging info, get rid of it. The Rewrite log is set with the RewriteLog command. Also, if you don’t care about looking at certain statistics, you can choose to not log certain things, like the User-Agent or the Http-Referer. I like seeing those things, but it’s up to you.
Don’t Resolve Hostnames

This one’s easy. Don’t do reverse lookups inside Apache. I can’t think of a good reason to do it. Any self respecting log parser can do this offline, in the background.

HostnameLookups Off

Don’t Use .htaccess

You’ve probably seen the AllowOverride None command. This says, “don’t look for .htaccess files” Using .htaccess will cause Apache to 1) look for files frequently and 2) parse the .htaccess file for each request. If you need per-directory changes, make the changes inside your main Apache configuration file, not in .htaccess.
Well, that’s it for Part 1, I’ll be back soon with Part 2, where I’ll talk about MySQL optimization & possibly a few other things that crop up.

Credits:

I’d like to give credit to a few articles that were helpful in putting this information together.  I’m not a master at this, I’m just trying to compile lots of bits and pieces together into one place.  Thanks go to:

Is Slashdot Irrelevant? (Digg vs. Slashdot)

I’ve been a podcast listener for quite a while now.  Not since the beginning, but pretty darn close.  In the past several months, I’ve been listening to Diggnation pretty religiously, and finally, a couple weeks ago, I’ve started to actually use Digg.  For those of you who don’t already know, Digg.com is a social tech news website, much like Slashdot.  However, with Digg, all stories are submitted by users and all stories are voted on and promoted to the “front page” of Digg by the users.  This causes two things to happen.  First, there’s a larger volume of stories.  Secondly, there’s a significant lag time associated with Slashdot postings.

Because every post on Slashdot is approved by an editor, it’s got to be submitted, reviewed, etc., before it can go onto the front page.  Slashdot believes that there should be an editor.  That’s fine.  However, one of the reasons I loved Slashdot was because it was one of the places I could go and see news days before the mainstream media picked up certain stories.  I felt, “In the know” by using the site.  Recently, well, Slashdot is getting scooped by Digg pretty much constantly.

Since I’ve been reading Digg, I continually have the sensation of “not finding anything new” on slashdot.  Well, at least not anything new that is *interesting* to me.  What does this tell me?  The crowd at Digg.com is pretty damn good at picking out stories that are interesting to me.  It also tells me that while both sites are pretty much “covering all the bases,”  Digg’s userbase finds things faster and promotes them faster, giving me more timely news.

In addition, with Digg, I think there are fewer dupes, or duplicate postings.  The editors of slashdot are almost infamous for posting things that they’ve already posted.  I dont know whether this is because they lack editorial communication or are just forgetful, but it happens an awful lot.  With Digg, the astute users almost always notice and don’t promote duplicate postings.  There are even built in mechanisms for finding duplicates. (Users can mark stories as duplicates, and when you submit a story, Digg searches its database to show you similar articles, helping you make sure you’re not posting a dupe.)

The one salvation for slashdot is, for better or for worse, its community.  It’s been recently reported that Digg has more pageviews than slashdot, but I think slashdot has a much higher number of comments per post, sparking more discussion.

In conclusion, if I want my news faster, I go to Digg.  If I want a second opinion or sanity check for a piece of news, I wait for it to show up on Slashdot.

Note: I do not advocate forming opinions solely based on that of Slashdot readership. That would be silly.

Star Trek Cribs

This has got to be one of the funniest commercials i’ve seen in a while.

I’m not sure exactly what G4 is doing with “star trek 2.0″ but the commercials they’re doing are pretty damn good.

Low Memory Computing

There is a seemingly unstoppable trend in computing to have ever more and more memory available to applications. When we run across performance bottlenecks, one of the easiest fixes is usually to “add more ram.” However, there is a trend towards getting virtual private servers to host websites. This trend isn’t new. Businesses have been consolidating production servers onto virtual servers for a while now. The “new thing” is that more and more people are able to get their own slice of a real server from hosting companies. I just moved this blog, Urban Pug, and Clean Your Microfiber to a small virtual server from Quantact.com for a very affordable price. We’re at the point where you can split a decent server up 80-100 ways and give everyone decent performance.

There is a problem, however. If you put 8 gigs of ram in a system with, say, two high powered Xeon or Opteron servers, you can split the CPU cyles up and guarantee everyone a minimum amount of performance. This part is straightforward. The host system gives all guests as much CPU as they want, but when there’s contention, resources are limited with “fair” mechanism to ensure even CPU cycle distribution. However, with RAM, you can guarantee a “limited” amount of ram, but you can’t “burst” ram like you can with CPU cycles. The guest operating system can’t just be told, “hey, you have more ram now.”

That leaves us with the situation of having pretty damn fast virtual servers that are set to run with small amounts of ram. The problem that THIS creates is that standard applications such as MySQL and Apache expect a certain amount of ram to be present in modern configurations. When this happens, you can easily have a situation where you run out of ram and start swapping to disk a lot. If you’re in this situation, you might actually be better off limiting the applications in some way to use less ram (and thus potentially be slower under certain conditions).

It’s not a “win-win” situation. If you need a big fast server, you’ll still have to get a big fast server. If you just need something medium-to-small, it’s possible to do this on the cheap and still get good performance.

So, how do we do this? Basically, it comes down to limiting the ram MySQL uses and limiting the ram and number of processes apache uses (or using other applications altogether, like lighttpd). In my next few posts, i’ll discuss what i’ve learned from tweaking my own VPS, and hopefully get some feedback on how to do a better job.