Damn Interesting

Damn Interesting has got to be one of the best websites out there, to me.  It’s well written, esoteric, and intellectual.  I mean really, just about every single post there is totally Digg worthy.

I can’t believe there aren’t ads there.  I’d click on them all the time just cause i like the articles so much.

technorati tags:,

Rails and the MySQL Unique Index with validates_uniqueness_of

So I just found this feature of rails’ activerecord pattern when I was adding some unique indeces to my mysql tables. Basically, whenever you have a unique index, you also want to add some client side code to say, “hey, you can’t do that,” instead of barfing out an error message to the user.
This is handled with validates_uniqueness_of in rails.

Say we have a table, widgets that has two columns: id and name. Id is of course, our primary key, but say we want to ensure that name is unique at the database level, and we make name a unique key. Now, we must do some checking to ensure that the code never tries to insert a non unique name.

In rails, in your widget model, you’d simply add: validates_uniqueness_of :name.

This magic sauce makes rails check before inserting or updating. If you don’t want to have the overhead of an index, you could use this w/o an index, and it would still work.

validates_uniqueness_of provides lots of other magic, most notably the ability to specify a scope for your uniqueness. This is the true power of the feature, letting you do some really complicated things.

Say we had the same table, but all widgets come in multiple sizes, so we have id, name, and size. We can have a widget named Fizzlecutter 3000, but it can come in Small, Medium, or Large. We want to ensure that Fizzlecutter 3000 can be inserted multiple times, but only with different sizes. We do this by saying:

validates_uniqueness_of :name, :scope => size

You can have multiple uniqueness validators, to create compliated business rule sets.

Cool!

Update: I’ve corrected some of my misconceptions about using multiple scopes with one validates uniqueness of.  Read the article.

Optimizing Apache and MySQL for Low Memory Usage, Part 2

In Optimizing Apache and MySQL for Low Memory Usage, Part 1, I discussed some system tools and Apache optimization. I’ve also discussed mod_deflate, thttpd, and lighttpd in Serving Javascript and Images — Fast. Now, i’ll talk about MySQL.

Tweaking MySQL to use small amounts of memory is fairly straightforward. You just have to know what to tweak to get the most “bank for your buck,” so to speak. I’m going to try to show you the why instead of the what, so you can hopefully tweak things for your specific server.
We’ll look at the following MySQL types of mysql settings:

  • Things We Can Disable
  • The Key Buffer
  • The Table Cache
  • The Query Cache
  • Max Connections

Roughly, the amount of memory mysql uses is defined by a fairly simple formula: query_cache + key_buffer + max_connections * (other buffers). For a low volume site, query cache and key buffer are going to be the most important things, but for a larger site, you’re going to need to look at other things. Additionally, using the key buffer and the query cache are AMAZING performance increasers. I’m only showing you how to lower the amount of ram MySQL uses for if you’re trying to run a few smaller sites that don’t store hundreds of megs of data.

Things We Can Disable

First off, InnoDB requires about 10 megs of memory to run, so disable it. You shouldn’t need it if you’re going small. For those unfamilar, innodb is a different storage engine within mysql that you can use. It supports transactions and most importantly (to me, at least), row level locking. It’s a little bit slower than MyISAM, but it can greatly improve performance later. Basic example: changing a table in a MyISAM table locks the entire table. You can’t do any selects while you’re inserting. If you’re inserting a lot, this can be a problem. InnoDB lets you insert or update a row while still performing selects. It locks just the rows you’re working with, rather than the whole table.

You can disable InnoDB with “skip-innodb”

You can also disable BDB (berkely database, a deprecated alternative to InnoDB) and NDB, MySQL’s clustering database. Do this with “skip-bdb” and “skip-ndbcluster” I haven’t noticed skipping BDB and NDB to reduce memory much, but if you’re not using them, it can’t hurt.

The last thing you can skip is networking, with “skip-networking” I haven’t noticed this lower my RAM utilization, but if you’re not accessing mysql from a remote server, you should use the local unix socket to get better performance as well as better security. If you don’t have mysql listening on a TCP port, then you’re a lot less likely to get hacked. Also, for those of you who might be worried about having to configure PHP to connect to MySQL on the local socket, if you specify localhost as your hostname in mysql_connect() in php, it automatically uses the local unix socket, so there’s no need to worry.

The Key Buffer

This is probably the single most important thing you can tweak to influence MySQL memory usage and performance. The MySQL Reference Manual says about the key buffer:

Index blocks for MyISAM tables are buffered and are shared by all threads. key_buffer_size is the size of the buffer used for index blocks. The key buffer is also known as the key cache.

The maximum allowable setting for key_buffer_size is 4GB. The effective maximum size might be less, depending on your available physical RAM and per-process RAM limits imposed by your operating system or hardware platform.

Increase the value to get better index handling (for all reads and multiple writes) to as much as you can afford. Using a value that is 25% of total memory on a machine that mainly runs MySQL is quite common. However, if you make the value too large (for example, more than 50% of your total memory) your system might start to page and become extremely slow. MySQL relies on the operating system to perform filesystem caching for data reads, so you must leave some room for the filesystem cache. Consider also the memory requirements of other storage engines.

In other words, MySQL tries to put everything that’s indexed into the key buffer. This is a huge performance speedup. If you can get every table column in a specific select statement to be indexed, and your entire index fits into the key buffer, the SQL statement in question will be served directly from RAM. It’s possible to take that kind of optimization overboard, but if you are going for speed (not memory), that’s one way to do it.

I can’t say what size you should make your key buffer, because only you know how much ram you have free. However, you can probably get by with 2-3 megs here, bigger if you need it. If you want to play MySQL Memory Limbo (how low can you go!), you can look and see how much your key buffer is being used. Essentially, you’ll need to write a query that uses the SHOW syntax and uses the following equation:

1 – ((Key_blocks_unused × key_cache_block_size) / key_buffer_size)

This yields the percentage of the key buffer in use. After restarting mysql, let your site run a while and have time to fill up the key buffer (assuming it’s live. if not, simulate some use, first). Then, check the usage using the aforementioned equation. If you’re running below, say 0.8 or so, you can probably safely lower your key buffer size.

The Table Cache

MySQL seems to think that this one is the second most important thing to tweak, and it is. However, it’s really important for performance, marginally so for memory usage. In a nutshell, every time you access a table, MySQL loads a reference to a table as one entry in the table cache. This is done for every concurrent access of a table. So, if you have 10 people accessing your website simultaneously, and each of them is accessing a page that does a join across 3 tables, you’ll need to set your table cache to at least 30. If you don’t, MySQL will refuse to perform queries.

You can keep upping the table cache, but you’ll eventually hit a limit on the number of files your operating system can have open, so keep that in mind.

If you have table_cache set a little bit low, you’ll see the “opened_tables” server variable be high. It’s the number of times mysqld has had to open a table. If this is low, you’re never having any cache misses. If your table_cache is set too low, you’ll have cache misses, and you’ll hit the disk. If table cache is set TOO low, mysql will barf on you, and you don’t want that. In summary, hitting the disk occasionally is probably better than paging a lot, so find a balance, lowering table_cache to the point where you’re not hitting the disk on every query and also not using up memory unnecessarily.

The Query Cache
The Query Cache is essentially a mapping of queries to results. If you do the same query two times in a row, and the result fits in the query cache, mysql doesn’t have to do the query again. If you’re going for performance, this can be a huge benefit, but it can also eat up memory. If you’re not doing a lot of the same query, this probably won’t help you much. Chances are, it will help, and there’s probably some benefit for having a 500-1000 kb of query cache, even on a tight memory budget. There are three variables that influence how the query cache works.

  • query_cache_size – This is the total size of the query cache. This much memory will be used for storing the results of queries. You must allocate at least 40k to this before you get any benefit. There’s a 40k data structure overhead, so if you allocate 41k, it “works,” but you don’t have much space to actually get anything done.
  • query_cache_limit – This is the maximum size of an individual query that is cachable. If you have a 10 megabyte query cache, and a 1 megabyte query cache limit, you can have at least 10 one-megabyte queries cached. This is extremely useful to prevent big queries from busting your cache. Precise benchmarking probably will help you decide what’s best. Use your judgement here.
  • query_cache_type – Here, you can turn the query cache totally on or off. Also, if you want to get really sophisticated, you can turn it on or off — but enable or disable it for specific queries. If you want it to default on, leave it on, and disable the query cache for specific queries with something like, “select sql_no_cache * from table” — Alternatively, if you want it to default OFF, set query_cache_type to “2” or “DEMAND” and write queries that look like “select sql_cache * from table”

Maximum Number of Connections

This may or may not be a problem for you, but it’s one of the most important things for optimizing a mysql installation for high usage. If you’re already limiting the number of apache processes, then you’ll be fine. If you’re not, and you need to handle thousands of users simultaneously, you need to increase this number. It’s the number of connections MySQL allows at once. If it’s not set high enough, you’ll get the dreaded, “too many connections” MySQL error, and your users won’t be happy. You want to keep this number in sync with the max number of users apache allows, and you’ll need to budget extra ram for extra MySQL connections. See above for the rough formula used.
I’ll discuss a few more minor tweaks to MySQL in the next article, where I’ll discuss, among other things:

  • Persistent Connections
  • Other Buffers and Caches
  • Miscellaneous Options

Vanilla Forum Software

I’m thinking about getting some forums going for UrbanPug.com, and I’m thinking about using Vanilla. It looks pretty good, and it’s about due for a 1.0 release.  It looks to have a promising community along with plugin support.

Serving Javascript and Images — Fast

Recently, I posted an article mainly about optimizing apache for low memory usage.  There, I noted that webservers like thttpd and lighttpd are really good at serving things fast.  I’ve been trying to optimize a site I’m playing with, and I’ve done a bit of analysis / work on using an alternative webserver.

Lighttpd wasn’t immediately in my debian apt-get list, so I went with thttpd.

The site I’m playing with has lots of images, so I took my own advice and deployed thttpd to serve up the images, and while I was at it, I moved all css and javascript over, too.  I’m using Scriptaculous, which requires the download of a large amount of javascript.

My thoughts on implementing this are first: thttpd serves up the images a LOT faster, using almost no ram.  Top doesn’t even notice that it’s running.
Second, not clogging the apache processes with images frees more apache processes for serving users.

Third, thttpd doesn’t support output compression, so I moved the javascript files back to apache, where they can be compressed with mod_deflate.  Lighttpd *does* support output compression, php, url rewriting, virtual hosts, and pretty much everything else i’d want to have.  It really looks like an amazing product, and I’m going to have to give it a try to see if lives up to the hype i’m giving it.

Oh, and in the end, I got initial page load (across the internet, cache cleared) from just over 8 seconds to 3 seconds using mod_deflate for the javascript and thttpd for the images.  Successive page loads are about 1-2 seconds.