Xen Presentation Slides

May 17th, 2008

I just finished a presentation at the Niagara Frontier LUG on Xen virtualization and how it applies to a managed hosting infrastructure. Here is a copy of the presentation for anyone who’s interested (sorry, it’s MS powerpoint. Yes, I do appreciate the irony).

NFLUG Xen Presentation

It goes over some of the pros and cons of all virtualization, different types of virtualization, strategies for achieving levels of availability, and finally, some xen-specific configuration options and tips.

Edit: Mark asked me to clarify the benchmark parameters, so here’s an excpert of an email that I sent to another reader about this.

as an FYI, the benchmarks were done with:
core 2 duo 1.8Ghz
single 7200RPM 2.5″ drive

The image was done as a .img file on the same partition as the DomU, and the
partition was a separate DOS partition on the same drive. The only VMWare test
I ran used an image file. All of the OS installs were cloned from the same directory tree
(stay tuned for a future post on converting a Xen image to vmware)

The Moodle benchmarks were a snapshot of a production moodle install,
where I copied the install to my test system, logged in and clicked a
few things, and replayed the session logs through jmeter several hundred
times with five concurrent threads.

The images benchmark is transfer speed of randomly selecting 1 of 2500
10kb-40kb images. 8 concurrent users and 1500 iterations.

The Mysql benchmark is the results from running all the tests in the
mysql benchmark suite.

Of note, the image file generally performed a little bit better than the
raw partition. This is counter to what the Xen documentation and common
sense would say, and I think a lot of it has to do with my pretty
limited tests. The image file was ending up in memory cache, whereas
the block device wasn’t. I doubt the same comparitive performance would
play out in a production system where a lot more’s going on.

It often comes up where I’ve got a mail delivery issue (user n can’t send email to domain n.com) and it’s difficult to troubleshoot just with the info in the logs. One thing that sometimes yields useful info is directly telnetting to port 25 of the offending mail server. This way if it’s rejecting messages you can see the exact reject message, or it could be violating the protocol in some creative way.

I just ran into a succinct protocol description at http://helpdesk.islandnet.com/pep/smtp.php

Here’s their transcipt of a successful SMTP session. Be sure to notice the excra line break between the end of the headers and the beginning of the body.

telnet mail.islandnet.com 25
220 Islandnet.com ESMTP server ready
helo a.b.c
250 mail.islandnet.com Hello x [YOUR_IP_ADDRESS]
mail from:
250 is syntactically correct
rcpt to:
250 verified
data
354 Enter message, ending with “.” on a line by itself
From: Bugs Bunny
To: Daffy Duck
Subject: Loony Toons!

Hi there!
.
250 OK id=1778te-0009TT-00
quit
221 mail.islandnet.com closing connection

CentOS Releases patched kernels

February 13th, 2008

Well, it’s all over with. I hope that everyone enjoyed the wild patching frenzy. Make sure you update your kernel to the CentOS-supported one at your earliest convenience. I’ll remove all the kernels except for the ones from the 53.1.4 tree soon to avoid confusion. Many thanks to everyone in the Redhat and CentOS teams for their quick and diligent responses.

From Centos-Announce

The following updated files have been uploaded and are currently
syncing to the mirrors:

i386:
kernel-2.6.18-53.1.13.el5.i686.rpm
kernel-devel-2.6.18-53.1.13.el5.i686.rpm
kernel-doc-2.6.18-53.1.13.el5.noarch.rpm
kernel-headers-2.6.18-53.1.13.el5.i386.rpm
kernel-PAE-2.6.18-53.1.13.el5.i686.rpm
kernel-PAE-devel-2.6.18-53.1.13.el5.i686.rpm
kernel-xen-2.6.18-53.1.13.el5.i686.rpm
kernel-xen-devel-2.6.18-53.1.13.el5.i686.rpm

Redhat has released updated RPMs for RHEL 5.1 uncharacteristically quickly, in recognition of the seriousness and internet coverage of the issue: RHSA-2008-0129. I expect we’ll see a release from centos soon as well. Of note, this release does not fix the nfs issues that were present in 2.6.18-53.1.6.

At the suggestion of a Centos mailing list member, I’ll be posting RPMs from the 2.6.18-53.1.4 release soon, for people who need to run the earlier version because of NFS issues.

I’ve built the following RPMs for RHEL 5 that fix the vmsplice() exploit in RHEL machines. They are built off of the 2.6.18-53.1.6.el kernel, with the upstream patch from kernel.org.

I’ve tested them on i686 and x86_64 machines, however be aware that they have not undergone extensive QA, so I’m not responsible if they blow up your machine. That said, I’m pretty confident that no one will have any problems with them, as they are literally a one-line difference.

Update: Reminder to install these with rpm -ivh and not rpm -Uvh. Otherwise you’ll remove your old kernels, which you may need to fall back to,.

i686:

i688-PAE:

x86_64:

Source:

Xen, and several other RPMs are available at: erek.blumenthals.com/vmsplicekernels. Note that the PAE and Xen kernels are entirely untested.

As this kernel is an odd-numbered release, yum should pick up the official upstream patch as soon as it’s available, but in case they do their numbering differently or pick the same release number that I did, it’d be a good idea to double check that yum picks up the latest.

Let me know any experiences with this, especially any confirmations that it’s safe with PAE or Xen.

See http://www.milw0rm.com/exploits/5092 for proof of concept code.

I’ve verified this to work:

[erek@centosmachine src]$ uname -a

[erek@centosmachine src]$ ./exploit
-----------------------------------
 Linux vmsplice Local Root Exploit
 By qaaz
-----------------------------------
[+] mmap: 0x0 .. 0x1000
[+] page: 0x0
[+] page: 0x20
[+] mmap: 0x4000 .. 0x5000
[+] page: 0x4000
[+] page: 0x4020
[+] mmap: 0x1000 .. 0x2000
[+] page: 0x1000
[+] mmap: 0xb7fad000 .. 0xb7fdf000
[+] root
[root@centos5machine src]# whoami
root
[root@centos5machine src]#

Ubuntu, Centos 5, and most Fedoras seem to be vulnerable. Centos 4 is not. I’m recompiling Centos 5 and FC 3 kernel RPMs with the appropriate patches, and will post them here in an hour or two. These are using the upstream kernel patch and I’ll know soon whether they conflict with any of the RHEL-specfic code. I doubt it does, as it’s a one-line patch.

And that’s the sound of 1000 admins running home from their Sunday afternoons to patch their boxes, and the sound of 1000 cell phones going off as their bosses read about this.

Update: Compiler is still going, and I’m heading out. I’ll post the rpms in the morning.

There’s a linux worm currently spreading rapidly that exploits web servers. Finjan estimates that about 10,000 servers are affected. Nobody has confirmed how it’s getting root access, but once it is in, it installs a dynamic apache module that randomly sends java script code to clients. The javascript code exploits vulnerabilities in Quicktime, Yahoo Messenger, and others. It attempts to install Rbot, a malware suite on computers that access the sites, using a host of exploits including ones found in Quicktime, Yahoo Messenger, and Windows Media player.

An immediate way to test if you’re affected is to see if you can create an entirely numeric directory, and if you run into a file not found error, or the directory isn’t actually created, it means that you’re infected. This is a bug in the rootkit, and there are some reports coming in that it’s already been fixed by the attackers. A more robust way to check for the exploit is to run the following command:

tcpdump -nAs 2048 src port 80 | grep "[a-zA-Z]\{5\}\.js'"

and if you see some lines printed, it means that your server is sending infected javascript files. If your web server is particularly low traffic, you may want to run:

ab -c 10 -n 100 http://www.yourdomain.com/somefile.html

This will generate some traffic on your web server, so that there are some requests for tcpdump to pick up on.

I’ll post more if I hear any news about the nature of the underlying vulnerability. In the meantime here’s some further reading:

I’ve recently discovered clusterssh, a tool that opens up many xterm sessions and binds them all to one keyboard input. I use it for updating my servers or reconfiguring them all in the same way. For example, since most of my boxes run the same OS version, they all need package updates at the same time, so after I get a flurry of “Update available” emails I have a quick look at an eratta site to see what problems I’m fixing, and then I fire up clusterssh and run (for example) sudo yum update. Here’s a quick screenshot to show it in action:
Clusterssh
It works with clusters of servers, so you’ll have a configuration file like (~/.csshrc):

clusters = web-servers all-servers special-servers
all-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
web-servers = erek@webserver1.blumenthals.com erek@webserver2.blumenthals.com
special-servers = erek@libra.blumenthals.com erek@aquarius.blumenthals.com erek@scorpio.blumenthals.com

Obviously, it works best if you have ssh keys set up to all of your servers. Then you can just run cssh , and it pops up ssh sessions with each of those servers. A simple tool that saves you a lot of time.

Holiday traffic trends

December 26th, 2007

I always find our traffic patterns to be extremely interesting, from both an administrative and also a human behavior perspective. As could be expected, our traffic was way down for the holiday weekend.

This is an example traffic graph for one of our servers.
Christmas week traffic graph

It’s even more pronounced if you drill it down to only smtp traffic.
Christmas week traffic graph

Kudos to Tobi Oetiker for writing mrtg, smokeping, and rrdtool, one of which almost every admin uses, and some of us use all of them. If you use his stuff, consider getting him something off of his (in)famous wish-list

I was looking at host-based change discovery tools like aide and tripwire, and tested them out, but unfortunately I can’t afford the CPU cycles and hard drive bandwidth for an extra full read of the disks every night. However, it just occurred to me that the since I use incremental rsnapshot backups, I’m already checking the entire filesystem for changes. To determine the file integrity I simply have to find the difference between the most recent backup against the next oldest. By running this on the backup server I’m both offloading CPU cycles to a box that they’re less precious on, and also running the tests in a more secure environment.

Rsync normally checks file attributes to determine if there was a change, so a determined hacker could prevent rsync from picking up the change by making sure that the file size, creation, and modification times are the same. To be a functional tripwire replacement, one would need to enable –checksum in rsync_args, which causes rsync to physically checksum every file (and slow down the backup substantially). However, if you’re a little less paranoid and are interested in a general “change notification” scheme, than attribute-based difference detection is probably sufficient.

It turns out that rsnapshot already ships rsnapshot-diff, a tool for determining the difference between snapshots. It’s not quite suited for my application, but I should be able to whip up a wrapper pretty quickly that comes out with a useful file integrity report. I’ll post the script here when I make some progress.