All about the system administration and application development behind a local linux-based company
Some open source applications (ex. Request Tracker) have excellent practices when it comes to their database schemas, but a surprising number of them do not.
This probably stems from many open source authors being programmers first and DBAs second, so the procedural logic is closer to their design decisions than the database schemas. The other day, when I was reading the installation manual for a popular web calendar system, I came across the following:
Next, create the database user account that will be used to access the database.mysql –user=root mysql mysql> GRANT ALL PRIVILEGES ON *.* TO webcalendar@localhost IDENTIFIED BY ‘webcal01′ WITH GRANT OPTION; mysql> FLUSH PRIVILEGES; mysql> QUIT
Obviously giving root mysql access to a single web application is a grave mistake. This shows an absolute lack of the understanding of database administration.
So, with this example fresh in my head, I’m going to go through some practices that I’ve found to be helpful for maintaining extensible database schemas. As usual, the examples will be tailored to MySQL >4.1 and PHP 5, but there’s no reason they couldn’t be adapted to other languages. I’ve optimized these examples for robustness and auditability, without a lot of regard for disk usage. Frankly disk space is quite cheap, and throwing away information is rarely the best way to go.
This is in no particular order:
If you’re building a CMS, when a user modifies a page, you can either add a new row for the modified page, or you can UPDATE the current row. The difference is that inserting a second row makes rolling back the old page easy, and also allows you to generate a log straight from the database (”User “y” modified page “z”. Diff: …). Of course, this makes the SELECTs a little bit uglier:
SELECT pagename, content, title FROM pages
WHERE pagename = "foo" ORDER BY timestamp DESC LIMIT 1
versus
SELECT pagename, content, title FROM pages
WHERE pagename = "foo"
However, he first option is auditable and rollback-ready, and with a few simple modifications could allow administrator approvals, etc. The second option requires the user who edits the page to get it right the first time.
Similiarly, if your goal is to remove a user, you can either DELETE that user’s row, or you can add a boolean Disabled field, and UPDATE them to Disabled instead. This way you don’t have to be nearly as careful with referential integrity, as the user’s primary key still exists,and thus any tables that reference that id will stay consistent.
If you add a new tables and fields as your functionality grows, rather than modifying the older ones, you can roll back the application to the previous version, and it will
still work as it always did, since all of the data it expects will be in the form it was. Over time this can result in somewhat bloated database schemas, but good documentation and
cleanup can mitigate this.
SELECT * from foo;
May break your application when you add extra fields to the database, or if the order of the fields changes, however:
SELECT field1, field2 from foo;
will continue to work even as the database schema evolves.
An alternate way of accomplishing the same design goal is to use a wildcard in the SQL but access the fields by name in your application code. For example:
$result = $mysqli->query('SELECT * from foo;');
$foo = $result->fetch_assoc;
do_stuff_with_fields1_and_2($foo['field1'], $foo['field2'];
This strategy results in even easier application extension, as you can immediately use new fields in your application logic without having to modify your SQL queries. The disadvantage is that you may be pulling more information out of your database than you plan to use, expending extra memory and loading down the DBMS.
Here are two examples of PHP code that perform the same function (Error handling not included):
$searchstringescaped = $mysqiobject->escape_string($searchstring)
$result = $mysqli->query("SELECT foo, bar, bak WHERE bak like '$searchstringescaped'");
while(list($foo, $bar, $bak) = $result->fetch_row()) {
do_stuff_with_bar_and_bak()
}
Or, with prepared statements:
$stmt = $mysqli->prepare("SELECT foo, bar, bak WHERE bak like ?");
$stmt->bind_param('s', $searchstring);
$stmt->execute();
$stmt->bind_result($foo, $bar, $bak)
while($stmt->fetch()) {
do_stuff_with_bar_and_bak();
}
While the former may have fewer lines, the second one clearly separates the query logic from the parameters, and eliminates any possibility of SQL injection or double escaping.
(This requires innodb)
Any database statement can fail. It can be because of programmer error, hardware failure, or any other host of issues. When executing a group of statements, it’s almost always better for all of them to fail than for only some of them to fail. For example, suppose that you want to transfer credits from one user’s account to another. The naive solution would be:
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
However, this code has two problems. First of all, if there were a power failure at precisely the right time, it would result in both users having the thousand credits. Second, for a short amount of time, the credits would be in both accounts, and a third client viewing the accounts between the two updates would see inconsistent data. The proper way to do this would be:
START TRANSACTION;
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
COMMIT;
Or, if you’re using PHP 5’s Mysqli extension, you can use the extension’s interface to transactions like:
$mysqli->autocommit(false);
//Run some mysqli queries
$mysqli->commit();
Finally, if something in the business logic causes you to change your mind about running the transaction, you can call a rollback, which will cancel the last transaction:
$mysqli->autocommit(false);
$mysqli->query('UPDATE users SET credits = credits + 1000 WHERE id = 1;');
$mysqli->query('UPDATE users SET credits = credits - 1000 WHERE id = 2;');
$result = $mysqli->query('SELECT credits from users WHERE id = 2');
list($giverbalance) = $result->fetch_row();
if($giverbalance < 0) {
//Ooops, the giver now has a negative balance.
$mysqli->rollback();
}
I hope that this helps, in a small way, to further the practices of database developers, and I’d love to hear any more suggestions or comments people have on the topic.
Update (Feb 02 2007): Fixed several grammatical errors
My goals for upgrading to PHP 5 are to do it without breaking any customer websites, and also to use few or no legacy options. Hopefully by being strict about legacy options I can prevent pidgin-php that requires a combination of current and depreciated features, as that would make the upgrade to PHP 6 much more of a nightmare. Specifically url_fopen and register_globals will be turned off, and zend v1 compatibility mode will not be turned on. I have decided to leave register_long_arrays on, because of the sheer number of scripts that use $HTTP_*_VARS style formatting, and because there isn’t much harm in that syntax being available. Register_long_vars will be removed from php 6, but I’m willing to deal with that problem when we start looking at PHP 6 (fortunately, a file search to find all of the scripts using that syntax will be trivial).
Because of these substantial changes, I need a way for my customers to preview their websites rendered by the new php, so we can both be confident that when it comes time to flip the switch there won’t be any issues. I’ve done this by installing a new php and apache on an alternate port, and we’ll be emailing customers shortly to have them check their websites on the alternate port to verify they work correctly, and to make support available to customers who aren’t sure about how to fix their pages ( I actually anticipate very few will call).
I’d like to be able to include links that are not pages in my pages menu, as most themes display the pages menu as the main navigation bar for the site. The general wisdom has been to hard-code those links into the theme, however I want to be able to intersperse links
with pages, and I want to be able to switch themes without patching all of the themes I use.
I accomplished this with a plugin that takes an array of html to insert at points in the pages menu. For example, in my blog I have a link for “computers” and a link for “other thoughts”, which are before any of the real pages are displayed. The configuration line of my plugin looks like the following:
//An array of items to insert. 0 makes it the first link, 2 makes the link appear after link 2, etc.$insert_array = array (0 => '<li><a href="http://erek.blumenthals.com/blog/category/linux" title="linux">Computers</a></li>',1 => '<li><a href="http://erek.blumenthals.com/blog/category/other" title="other">Other Thoughts</a></li>'To install this in your blog: download the file, customize the insert array with the links you’d like to add, drop the php file in your wp-content/plugins folder, and activate the plugin. Please drop me a line if you decide to use it, as I’m curious what people will do with it. Also feel free to shoot suggestions my way for improvements.
I’ll admit it. Like many other hosting companies, we’ve been remiss in upgrading to PHP 5 on our shared hosting servers. Apparently, we’re in good company. Nexen Services queried approximately 2 million servers and found the following (as of November 2007):

There may be some valid reasons for sticking with PHP 4, especially in highly controlled environments (think Yahoo), or in redistributable software, where you need to be able to reach the largest installed base without requiring architecture changes. However, as a web host to customers in non-controlled environments, and a developer who develops primarily within our own hosting environment, we have the luxury of being a cautious leader.
Also, I think that part of the reluctance to switch to PHP-5 is a fear of the new coding styles that it strongly encourages. The days of <?php include(’header.inc’) ?> are coming to an end, and are being replaced with MVC-esque coding structures. The general PHP community is taking time to come to terms with this.
It’s funny how realization of the need for change often comes out of a personal inconvenience. Last week I set out to write a substantial application to use on our hosting servers, developing it on my personal development box. I uploaded the code expectantly, and found that nothing worked. I had forgotten that PHP 4 lacks many of the magic object methods I’ve grown accustomed to using (__get, and __set, and __construct primarily).
I can either spend a day refactoring my code to work on PHP 4 (and make it much less elegant in the process) or I can take the plunge and get the ball rolling to upgrade all of our boxes to PHP 5. I’m going with the latter option. Implementation details to follow.