All about the system administration and application development behind a local linux-based company
Some open source applications (ex. Request Tracker) have excellent practices when it comes to their database schemas, but a surprising number of them do not.
This probably stems from many open source authors being programmers first and DBAs second, so the procedural logic is closer to their design decisions than the database schemas. The other day, when I was reading the installation manual for a popular web calendar system, I came across the following:
Next, create the database user account that will be used to access the database.mysql –user=root mysql mysql> GRANT ALL PRIVILEGES ON *.* TO webcalendar@localhost IDENTIFIED BY ‘webcal01′ WITH GRANT OPTION; mysql> FLUSH PRIVILEGES; mysql> QUIT
Obviously giving root mysql access to a single web application is a grave mistake. This shows an absolute lack of the understanding of database administration.
So, with this example fresh in my head, I’m going to go through some practices that I’ve found to be helpful for maintaining extensible database schemas. As usual, the examples will be tailored to MySQL >4.1 and PHP 5, but there’s no reason they couldn’t be adapted to other languages. I’ve optimized these examples for robustness and auditability, without a lot of regard for disk usage. Frankly disk space is quite cheap, and throwing away information is rarely the best way to go.
This is in no particular order:
If you’re building a CMS, when a user modifies a page, you can either add a new row for the modified page, or you can UPDATE the current row. The difference is that inserting a second row makes rolling back the old page easy, and also allows you to generate a log straight from the database (”User “y” modified page “z”. Diff: …). Of course, this makes the SELECTs a little bit uglier:
SELECT pagename, content, title FROM pages
WHERE pagename = "foo" ORDER BY timestamp DESC LIMIT 1
versus
SELECT pagename, content, title FROM pages
WHERE pagename = "foo"
However, he first option is auditable and rollback-ready, and with a few simple modifications could allow administrator approvals, etc. The second option requires the user who edits the page to get it right the first time.
Similiarly, if your goal is to remove a user, you can either DELETE that user’s row, or you can add a boolean Disabled field, and UPDATE them to Disabled instead. This way you don’t have to be nearly as careful with referential integrity, as the user’s primary key still exists,and thus any tables that reference that id will stay consistent.
If you add a new tables and fields as your functionality grows, rather than modifying the older ones, you can roll back the application to the previous version, and it will
still work as it always did, since all of the data it expects will be in the form it was. Over time this can result in somewhat bloated database schemas, but good documentation and
cleanup can mitigate this.
SELECT * from foo;
May break your application when you add extra fields to the database, or if the order of the fields changes, however:
SELECT field1, field2 from foo;
will continue to work even as the database schema evolves.
An alternate way of accomplishing the same design goal is to use a wildcard in the SQL but access the fields by name in your application code. For example:
$result = $mysqli->query('SELECT * from foo;');
$foo = $result->fetch_assoc;
do_stuff_with_fields1_and_2($foo['field1'], $foo['field2'];
This strategy results in even easier application extension, as you can immediately use new fields in your application logic without having to modify your SQL queries. The disadvantage is that you may be pulling more information out of your database than you plan to use, expending extra memory and loading down the DBMS.
Here are two examples of PHP code that perform the same function (Error handling not included):
$searchstringescaped = $mysqiobject->escape_string($searchstring)
$result = $mysqli->query("SELECT foo, bar, bak WHERE bak like '$searchstringescaped'");
while(list($foo, $bar, $bak) = $result->fetch_row()) {
do_stuff_with_bar_and_bak()
}
Or, with prepared statements:
$stmt = $mysqli->prepare("SELECT foo, bar, bak WHERE bak like ?");
$stmt->bind_param('s', $searchstring);
$stmt->execute();
$stmt->bind_result($foo, $bar, $bak)
while($stmt->fetch()) {
do_stuff_with_bar_and_bak();
}
While the former may have fewer lines, the second one clearly separates the query logic from the parameters, and eliminates any possibility of SQL injection or double escaping.
(This requires innodb)
Any database statement can fail. It can be because of programmer error, hardware failure, or any other host of issues. When executing a group of statements, it’s almost always better for all of them to fail than for only some of them to fail. For example, suppose that you want to transfer credits from one user’s account to another. The naive solution would be:
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
However, this code has two problems. First of all, if there were a power failure at precisely the right time, it would result in both users having the thousand credits. Second, for a short amount of time, the credits would be in both accounts, and a third client viewing the accounts between the two updates would see inconsistent data. The proper way to do this would be:
START TRANSACTION;
UPDATE users SET credits = credits + 1000 WHERE id = 1;
UPDATE users SET credits = credits - 1000 WHERE id = 2;
COMMIT;
Or, if you’re using PHP 5’s Mysqli extension, you can use the extension’s interface to transactions like:
$mysqli->autocommit(false);
//Run some mysqli queries
$mysqli->commit();
Finally, if something in the business logic causes you to change your mind about running the transaction, you can call a rollback, which will cancel the last transaction:
$mysqli->autocommit(false);
$mysqli->query('UPDATE users SET credits = credits + 1000 WHERE id = 1;');
$mysqli->query('UPDATE users SET credits = credits - 1000 WHERE id = 2;');
$result = $mysqli->query('SELECT credits from users WHERE id = 2');
list($giverbalance) = $result->fetch_row();
if($giverbalance < 0) {
//Ooops, the giver now has a negative balance.
$mysqli->rollback();
}
I hope that this helps, in a small way, to further the practices of database developers, and I’d love to hear any more suggestions or comments people have on the topic.
Update (Feb 02 2007): Fixed several grammatical errors
There’s a linux worm currently spreading rapidly that exploits web servers. Finjan estimates that about 10,000 servers are affected. Nobody has confirmed how it’s getting root access, but once it is in, it installs a dynamic apache module that randomly sends java script code to clients. The javascript code exploits vulnerabilities in Quicktime, Yahoo Messenger, and others. It attempts to install Rbot, a malware suite on computers that access the sites, using a host of exploits including ones found in Quicktime, Yahoo Messenger, and Windows Media player.
An immediate way to test if you’re affected is to see if you can create an entirely numeric directory, and if you run into a file not found error, or the directory isn’t actually created, it means that you’re infected. This is a bug in the rootkit, and there are some reports coming in that it’s already been fixed by the attackers. A more robust way to check for the exploit is to run the following command:
tcpdump -nAs 2048 src port 80 | grep "[a-zA-Z]\{5\}\.js'"
and if you see some lines printed, it means that your server is sending infected javascript files. If your web server is particularly low traffic, you may want to run:
ab -c 10 -n 100 http://www.yourdomain.com/somefile.html
This will generate some traffic on your web server, so that there are some requests for tcpdump to pick up on.
I’ll post more if I hear any news about the nature of the underlying vulnerability. In the meantime here’s some further reading:
I’ve recently discovered clusterssh, a tool that opens up many xterm sessions and binds them all to one keyboard input. I use it for updating my servers or reconfiguring them all in the same way. For example, since most of my boxes run the same OS version, they all need package updates at the same time, so after I get a flurry of “Update available” emails I have a quick look at an eratta site to see what problems I’m fixing, and then I fire up clusterssh and run (for example) sudo yum update. Here’s a quick screenshot to show it in action:
![]()
It works with clusters of servers, so you’ll have a configuration file like (~/.csshrc):
clusters = web-servers all-servers special-servers
all-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
web-servers = erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
special-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com
Obviously, it works best if you have ssh keys set up to all of your servers. Then you can just run cssh
By now it’s been reported all over the internet that Sun is buying MySQL, AB for approximately $1 billion. This is either the largest or smallest news of the new year. As of 2005, MySQL’s revenue was $40 million, and according to Sun’s press release, they have about 400 employees. To me, this doesn’t seem that it justifies the $1 billion price tag, so Sun is buying them for reasons other than a straight return potential.
On the cynical side, they could be looking to build it up to a second oracle, with a small free version and requiring a purchase for any of the newer enterprise features. I doubt this is the case, as Sun should recognize that much of the MySQL appeal comes from its image as an open free program with the availability of rock solid support. However, I wouldn’t be surprised if we see MySQL Enterprise’s base prices (currently $600-$4000/server/year or $40,000/year for a site license ) go up a bit to help pay off Sun’s investment and close the gap with Oracle (~$40,000/CPU).
First of all, quite a few online disucssions have centered around “Why not postgres?” I think that this is the easiest question to answer: Regardless of the technical merits of each system, MySQL is currently the web application leader. Furthermore, Postgres isn’t a company, so it can’t be bought. Sure, Sun could announce that they are putting serious development resources behind postgres, and offering paid support options, or they could go the EnterpriseDB route and fork postgres into a commercial and commercially supported product, but neither of those give them the same control of a stable and highly adopted database product that owning MySQL AB does for them. More importantly, it doesn’t expose them to MySQL’s impressive customer portfolio.
By buying MySQL, Sun is showing that they’re interested in becoming more like Oracle and IBM, the enterprise consulting company. Since many of the larger and data intensive technology companies (Baidu, Google, Facebook, Ticketmaster, Dunn & Bradstreet) use MySQL, they are buying themselves ins to the who’s-who of technology, and just the companies that wouldn’t have previously been Sun customers (none have a need for Java, they tend to use whitebox clusters, etc) However, now that Sun owns MySQL, there’ll be the opportunity to sell complete solutions to enterprises based upon the LAMP stack.
Several concerns come to mind: Since Sun has a vested interest in Solaris and their own hardware, optimizations and new features may come to Solaris and Sun storage systems before it makes it to the other platforms, or we may end up with a whole set of features that only work with Solaris and/or on Sun hardware. Also, Mysql AB was a known. We could count on them to provide a steady stream of new features for both the paid and unpaid versions, reasonably responsive unpaid support via mailing lists, and excellent (if verbose) documentation. With Sun, we can expect all these things and more to be provided to paying customers, but it remains to be seen what will happen to the community version.
I’m actually happy to see Sun emerging as one more of the behemoth consulting companies, as that was starting to become somewhat of a duopoly, but I just hope it doesn’t come at the expense of fixing something that wasn’t broken (MySQL AB).
I’ll be on vacation until January 15th. I probably won’t be updating this while I’m gone. Hope everyone had great holidays.
I always find our traffic patterns to be extremely interesting, from both an administrative and also a human behavior perspective. As could be expected, our traffic was way down for the holiday weekend.
This is an example traffic graph for one of our servers.

It’s even more pronounced if you drill it down to only smtp traffic.

Kudos to Tobi Oetiker for writing mrtg, smokeping, and rrdtool, one of which almost every admin uses, and some of us use all of them. If you use his stuff, consider getting him something off of his (in)famous wish-list
As originally reported by Artem Nosulchik The NSA recently released their security configuration guide for Redhat Enterprise 5. This is an excellent (and surprisingly practical) jumping off point for securely configuring a server.
It is written for the mid-level admin, so it expects knowledge of how to get around Linux but doesn’t expect you to be an selinux guru. It could prove especially helpful where you need a reference-able set of best practices you followed for a server that will be audited (i.e. bank web server or server that collects HIPAA-protected data) without spending a fortune on consulting. It certainly leaves a few things out (like reboots after kernel updates), and it also recommends a few things that makes life difficult or impossible to the mission, but in a whole it’s the best concise set of best practices I’ve found. Here’s a quick overview of some things it covers (by no means all inclusive):
They also have a bunch of service specific best practices that you definitely should have a look at. Everyone who reads it is bound to learn some things and be reminded of some things they’ve forgotten.
I was looking at host-based change discovery tools like aide and tripwire, and tested them out, but unfortunately I can’t afford the CPU cycles and hard drive bandwidth for an extra full read of the disks every night. However, it just occurred to me that the since I use incremental rsnapshot backups, I’m already checking the entire filesystem for changes. To determine the file integrity I simply have to find the difference between the most recent backup against the next oldest. By running this on the backup server I’m both offloading CPU cycles to a box that they’re less precious on, and also running the tests in a more secure environment.
Rsync normally checks file attributes to determine if there was a change, so a determined hacker could prevent rsync from picking up the change by making sure that the file size, creation, and modification times are the same. To be a functional tripwire replacement, one would need to enable –checksum in rsync_args, which causes rsync to physically checksum every file (and slow down the backup substantially). However, if you’re a little less paranoid and are interested in a general “change notification” scheme, than attribute-based difference detection is probably sufficient.
It turns out that rsnapshot already ships rsnapshot-diff, a tool for determining the difference between snapshots. It’s not quite suited for my application, but I should be able to whip up a wrapper pretty quickly that comes out with a useful file integrity report. I’ll post the script here when I make some progress.
Template for a place that another post will go
My goals for upgrading to PHP 5 are to do it without breaking any customer websites, and also to use few or no legacy options. Hopefully by being strict about legacy options I can prevent pidgin-php that requires a combination of current and depreciated features, as that would make the upgrade to PHP 6 much more of a nightmare. Specifically url_fopen and register_globals will be turned off, and zend v1 compatibility mode will not be turned on. I have decided to leave register_long_arrays on, because of the sheer number of scripts that use $HTTP_*_VARS style formatting, and because there isn’t much harm in that syntax being available. Register_long_vars will be removed from php 6, but I’m willing to deal with that problem when we start looking at PHP 6 (fortunately, a file search to find all of the scripts using that syntax will be trivial).
Because of these substantial changes, I need a way for my customers to preview their websites rendered by the new php, so we can both be confident that when it comes time to flip the switch there won’t be any issues. I’ve done this by installing a new php and apache on an alternate port, and we’ll be emailing customers shortly to have them check their websites on the alternate port to verify they work correctly, and to make support available to customers who aren’t sure about how to fix their pages ( I actually anticipate very few will call).