|
All about the system administration and application development behind a local linux-based company
Some open source applications (ex. Request Tracker) have excellent practices when it comes to their database schemas, but a surprising number of them do not.
This probably stems from many open source authors being programmers first and DBAs second, so the procedural logic is closer to their design decisions than the database schemas. The other day, when I was reading the installation manual for a popular web calendar system, I came across the following:
Next, create the database user account that will be used to access the database.mysql –user=root mysql mysql> GRANT ALL PRIVILEGES ON *.* TO webcalendar@localhost IDENTIFIED BY ‘webcal01′ WITH GRANT OPTION; mysql> FLUSH PRIVILEGES; mysql> QUIT
Obviously giving root mysql access to a single web application is a grave mistake. This shows an absolute lack of the understanding of database administration.
So, with this example fresh in my head, I’m going to go through some practices that I’ve found to be helpful for maintaining extensible database schemas. As usual, the examples will be tailored to MySQL >4.1 and PHP 5, but there’s no reason they couldn’t be adapted to other languages. I’ve optimized these examples for robustness and auditability, without a lot of regard for disk usage. Frankly disk space is quite cheap, and throwing away information is rarely the best way to go.
This is in no particular order:
If you’re building a CMS, when a user modifies a page, you can either add a new row for the modified page, or you can UPDATE the current row. The difference is that inserting a second row makes rolling back the old page easy, and also allows you to generate a log straight from the database (”User “y” modified page “z”. Diff: …). Of course, this makes the SELECTs a little bit uglier:
SELECT pagename, content, title FROM pages
WHERE pagename = "foo" ORDER BY timestamp DESC LIMIT 1
versus
SELECT pagename, content, title FROM pages
WHERE pagename = "foo"
However, he first option is auditable and rollback-ready, and with a few simple modifications could allow administrator approvals, etc. The second option requires the user who edits the page to get it right the first time.
Similiarly, if your goal is to remove a user, you can either DELETE that user’s row, or you can add a boolean Disabled field, and UPDATE them to Disabled instead. This way you don’t have to be nearly as careful with referential integrity, as the user’s primary key still exists,and thus any tables that reference that id will stay consistent.
If you add a new tables and fields as your functionality grows, rather than modifying the older ones, you can roll back the application to the previous version, and it will
still work as it always did, since all of the data it expects will be in the form it was. Over time this can result in somewhat bloated database schemas, but good documentation and
cleanup can mitigate this.
SELECT * from foo;
May break your application when you add extra fields to the database, or if the order of the fields changes, however:
SELECT field1, field2 from foo;
will continue to work even as the database schema evolves.
An alternate way of accomplishing the same design goal is to use a wildcard in the SQL but access the fields by name in your application code. For example:
$result = $mysqli->query('SELECT * from foo;');
$foo = $result->fetch_assoc;
do_stuff_with_fields1_and_2($foo['field1'], $foo['field2'];
This strategy results in even easier application extension, as you can immediately use new fields in your application logic without having to modify your SQL queries. The disadvantage is that you may be pulling more information out of your database than you plan to use, expending extra memory and loading down the DBMS.
Here are two examples of PHP code that perform the same function (Error handling not included):
$searchstringescaped = $mysqiobject->escape_string($searchstring)
$result = $mysqli->query("SELECT foo, bar, bak WHERE bak like '$searchstringescaped'");
while(list($foo, $bar, $bak) = $result->fetch_row()) {
do_stuff_with_bar_and_bak()
}
Or, with prepared statements:
$stmt = $mysqli->prepare("SELECT foo, bar, bak WHERE bak like ?");
$stmt->bind_param('s', $searchstring);
$stmt->execute();
$stmt->bind_result($foo, $bar, $bak)
while($stmt->fetch()) {
do_stuff_with_bar_and_bak();
}
While the former may have fewer lines, the second one clearly separates the query logic from the parameters, and eliminates any possibility of SQL injection or double escaping.
(This requires innodb)
Any database statement can fail. It can be because of programmer error, hardware failure, or any other host of issues. When executing a group of statements, it’s almost always better for all of them to fail than for only some of them to fail. For example, suppose that you want to transfer credits from one user’s account to another. The naive solution would be:
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
However, this code has two problems. First of all, if there were a power failure at precisely the right time, it would result in both users having the thousand credits. Second, for a short amount of time, the credits would be in both accounts, and a third client viewing the accounts between the two updates would see inconsistent data. The proper way to do this would be:
START TRANSACTION;
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
COMMIT;
Or, if you’re using PHP 5’s Mysqli extension, you can use the extension’s interface to transactions like:
$mysqli->autocommit(false);
//Run some mysqli queries
$mysqli->commit();
Finally, if something in the business logic causes you to change your mind about running the transaction, you can call a rollback, which will cancel the last transaction:
$mysqli->autocommit(false);
$mysqli->query('UPDATE users SET credits = credits + 1000 WHERE id = 1;');
$mysqli->query('UPDATE users SET credits = credits - 1000 WHERE id = 2;');
$result = $mysqli->query('SELECT credits from users WHERE id = 2');
list($giverbalance) = $result->fetch_row();
if($giverbalance < 0) {
//Ooops, the giver now has a negative balance.
$mysqli->rollback();
}
I hope that this helps, in a small way, to further the practices of database developers, and I’d love to hear any more suggestions or comments people have on the topic.
Update (Feb 02 2007): Fixed several grammatical errors
May 18th, 2008 at 5:04 am
If for some reason you can’t or don’t want to use innodb you can accomplish some of the transactional functionality by updating both rows in a single query:
update users u1, users u2 set u1.credits=u1.credits + 1000, u2.credits=u2.credits-1000 where u1.id=1 and u2.id=2
This can work with more then two tables and they can be different tables too
Of course you would not have the rollback functionality but at least you know this query will only succeed if both rows are updated successfully so the db will not be left in an inconsistent state in case of some hardware or power failure
May 29th, 2008 at 1:18 pm
It is always advisable when performing an update across multiple tables, without the ability to perform a roll back on the query, to incorporate the checking logic within the update’s where statement. This has the advantage of not having to use server resources and memory as part of a when implementing a transaction statement. This is really important when the application needs to scale to potentially thousands of simultaneous user sessions all doing updates and selects.