Some open source applications (ex. Request Tracker) have excellent practices when it comes to their database schemas, but a surprising number of them do not.

This probably stems from many open source authors being programmers first and DBAs second, so the procedural logic is closer to their design decisions than the database schemas. The other day, when I was reading the installation manual for a popular web calendar system, I came across the following:

Next, create the database user account
that will be used to access the database.
mysql –user=root mysql
mysql> GRANT ALL PRIVILEGES ON *.*
TO webcalendar@localhost
IDENTIFIED BY ‘webcal01′
WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;
mysql> QUIT

Obviously giving root mysql access to a single web application is a grave mistake. This shows an absolute lack of the understanding of database administration.

So, with this example fresh in my head, I’m going to go through some practices that I’ve found to be helpful for maintaining extensible database schemas. As usual, the examples will be tailored to MySQL >4.1 and PHP 5, but there’s no reason they couldn’t be adapted to other languages. I’ve optimized these examples for robustness and auditability, without a lot of regard for disk usage. Frankly disk space is quite cheap, and throwing away information is rarely the best way to go.

This is in no particular order:

INSERT data, rather than UPDATEing, and UPDATE rather than DELETEing

If you’re building a CMS, when a user modifies a page, you can either add a new row for the modified page, or you can UPDATE the current row. The difference is that inserting a second row makes rolling back the old page easy, and also allows you to generate a log straight from the database (”User “y” modified page “z”. Diff: …). Of course, this makes the SELECTs a little bit uglier:

SELECT pagename, content, title FROM pages
WHERE pagename = "foo" ORDER BY timestamp DESC LIMIT 1

versus
SELECT pagename, content, title FROM pages
WHERE pagename = "foo"

However, he first option is auditable and rollback-ready, and with a few simple modifications could allow administrator approvals, etc. The second option requires the user who edits the page to get it right the first time.

Similiarly, if your goal is to remove a user, you can either DELETE that user’s row, or you can add a boolean Disabled field, and UPDATE them to Disabled instead. This way you don’t have to be nearly as careful with referential integrity, as the user’s primary key still exists,and thus any tables that reference that id will stay consistent.

When extending your application, add fields and tables, rather than altering existing ones

If you add a new tables and fields as your functionality grows, rather than modifying the older ones, you can roll back the application to the previous version, and it will
still work as it always did, since all of the data it expects will be in the form it was. Over time this can result in somewhat bloated database schemas, but good documentation and
cleanup can mitigate this.

Enumerate all of the fields you expect will be returned, rather than using wildcards

SELECT * from foo;
May break your application when you add extra fields to the database, or if the order of the fields changes, however:
SELECT field1, field2 from foo;
will continue to work even as the database schema evolves.

An alternate way of accomplishing the same design goal is to use a wildcard in the SQL but access the fields by name in your application code. For example:

$result = $mysqli->query('SELECT * from foo;');
$foo = $result->fetch_assoc;
do_stuff_with_fields1_and_2($foo['field1'], $foo['field2'];

This strategy results in even easier application extension, as you can immediately use new fields in your application logic without having to modify your SQL queries. The disadvantage is that you may be pulling more information out of your database than you plan to use, expending extra memory and loading down the DBMS.

Use prepared statements whenever possible

Here are two examples of PHP code that perform the same function (Error handling not included):


$searchstringescaped = $mysqiobject->escape_string($searchstring)
$result = $mysqli->query("SELECT foo, bar, bak WHERE bak like '$searchstringescaped'");
while(list($foo, $bar, $bak) = $result->fetch_row()) {
do_stuff_with_bar_and_bak()
}

Or, with prepared statements:

$stmt = $mysqli->prepare("SELECT foo, bar, bak WHERE bak like ?");
$stmt->bind_param('s', $searchstring);
$stmt->execute();
$stmt->bind_result($foo, $bar, $bak)
while($stmt->fetch()) {
do_stuff_with_bar_and_bak();
}

While the former may have fewer lines, the second one clearly separates the query logic from the parameters, and eliminates any possibility of SQL injection or double escaping.

Use Transactions

(This requires innodb)

Any database statement can fail. It can be because of programmer error, hardware failure, or any other host of issues. When executing a group of statements, it’s almost always better for all of them to fail than for only some of them to fail. For example, suppose that you want to transfer credits from one user’s account to another. The naive solution would be:

UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;

However, this code has two problems. First of all, if there were a power failure at precisely the right time, it would result in both users having the thousand credits. Second, for a short amount of time, the credits would be in both accounts, and a third client viewing the accounts between the two updates would see inconsistent data. The proper way to do this would be:

START TRANSACTION;
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
COMMIT;

Or, if you’re using PHP 5’s Mysqli extension, you can use the extension’s interface to transactions like:

$mysqli->autocommit(false);
//Run some mysqli queries
$mysqli->commit();

Finally, if something in the business logic causes you to change your mind about running the transaction, you can call a rollback, which will cancel the last transaction:


$mysqli->autocommit(false);
$mysqli->query('UPDATE users SET credits = credits + 1000 WHERE id = 1;');
$mysqli->query('UPDATE users SET credits = credits - 1000 WHERE id = 2;');
$result = $mysqli->query('SELECT credits from users WHERE id = 2');
list($giverbalance) = $result->fetch_row();
if($giverbalance < 0) {
//Ooops, the giver now has a negative balance.
$mysqli->rollback();
}

I hope that this helps, in a small way, to further the practices of database developers, and I’d love to hear any more suggestions or comments people have on the topic.

Update (Feb 02 2007): Fixed several grammatical errors

There’s a linux worm currently spreading rapidly that exploits web servers. Finjan estimates that about 10,000 servers are affected. Nobody has confirmed how it’s getting root access, but once it is in, it installs a dynamic apache module that randomly sends java script code to clients. The javascript code exploits vulnerabilities in Quicktime, Yahoo Messenger, and others. It attempts to install Rbot, a malware suite on computers that access the sites, using a host of exploits including ones found in Quicktime, Yahoo Messenger, and Windows Media player.

An immediate way to test if you’re affected is to see if you can create an entirely numeric directory, and if you run into a file not found error, or the directory isn’t actually created, it means that you’re infected. This is a bug in the rootkit, and there are some reports coming in that it’s already been fixed by the attackers. A more robust way to check for the exploit is to run the following command:

tcpdump -nAs 2048 src port 80 | grep "[a-zA-Z]\{5\}\.js'"

and if you see some lines printed, it means that your server is sending infected javascript files. If your web server is particularly low traffic, you may want to run:

ab -c 10 -n 100 http://www.yourdomain.com/somefile.html

This will generate some traffic on your web server, so that there are some requests for tcpdump to pick up on.

I’ll post more if I hear any news about the nature of the underlying vulnerability. In the meantime here’s some further reading:

I’ve recently discovered clusterssh, a tool that opens up many xterm sessions and binds them all to one keyboard input. I use it for updating my servers or reconfiguring them all in the same way. For example, since most of my boxes run the same OS version, they all need package updates at the same time, so after I get a flurry of “Update available” emails I have a quick look at an eratta site to see what problems I’m fixing, and then I fire up clusterssh and run (for example) sudo yum update. Here’s a quick screenshot to show it in action:
Clusterssh
It works with clusters of servers, so you’ll have a configuration file like (~/.csshrc):

clusters = web-servers all-servers special-servers
all-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
web-servers = erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
special-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com

Obviously, it works best if you have ssh keys set up to all of your servers. Then you can just run cssh , and it pops up ssh sessions with each of those servers. A simple tool that saves you a lot of time.

By now it’s been reported all over the internet that Sun is buying MySQL, AB for approximately $1 billion. This is either the largest or smallest news of the new year. As of 2005, MySQL’s revenue was $40 million, and according to Sun’s press release, they have about 400 employees. To me, this doesn’t seem that it justifies the $1 billion price tag, so Sun is buying them for reasons other than a straight return potential.

On the cynical side, they could be looking to build it up to a second oracle, with a small free version and requiring a purchase for any of the newer enterprise features. I doubt this is the case, as Sun should recognize that much of the MySQL appeal comes from its image as an open free program with the availability of rock solid support. However, I wouldn’t be surprised if we see MySQL Enterprise’s base prices (currently $600-$4000/server/year or $40,000/year for a site license ) go up a bit to help pay off Sun’s investment and close the gap with Oracle (~$40,000/CPU).

First of all, quite a few online disucssions have centered around “Why not postgres?” I think that this is the easiest question to answer: Regardless of the technical merits of each system, MySQL is currently the web application leader. Furthermore, Postgres isn’t a company, so it can’t be bought. Sure, Sun could announce that they are putting serious development resources behind postgres, and offering paid support options, or they could go the EnterpriseDB route and fork postgres into a commercial and commercially supported product, but neither of those give them the same control of a stable and highly adopted database product that owning MySQL AB does for them. More importantly, it doesn’t expose them to MySQL’s impressive customer portfolio.

By buying MySQL, Sun is showing that they’re interested in becoming more like Oracle and IBM, the enterprise consulting company. Since many of the larger and data intensive technology companies (Baidu, Google, Facebook, Ticketmaster, Dunn & Bradstreet) use MySQL, they are buying themselves ins to the who’s-who of technology, and just the companies that wouldn’t have previously been Sun customers (none have a need for Java, they tend to use whitebox clusters, etc) However, now that Sun owns MySQL, there’ll be the opportunity to sell complete solutions to enterprises based upon the LAMP stack.

Several concerns come to mind: Since Sun has a vested interest in Solaris and their own hardware, optimizations and new features may come to Solaris and Sun storage systems before it makes it to the other platforms, or we may end up with a whole set of features that only work with Solaris and/or on Sun hardware. Also, Mysql AB was a known. We could count on them to provide a steady stream of new features for both the paid and unpaid versions, reasonably responsive unpaid support via mailing lists, and excellent (if verbose) documentation. With Sun, we can expect all these things and more to be provided to paying customers, but it remains to be seen what will happen to the community version.

I’m actually happy to see Sun emerging as one more of the behemoth consulting companies, as that was starting to become somewhat of a duopoly, but I just hope it doesn’t come at the expense of fixing something that wasn’t broken (MySQL AB).