Xen + AoE = New Hotness

In my continued experimentation with hot migration of Xen environments, I think I’ve found a pretty awesome solution. It involves a system called ATA over Ethernet (AoE). This system transmits ATA commands over ethernet, so it allows for a remote disk to be treated like local block storage. The system was originally designed by a company called Coraid for use with their own proprietary disk arrays, but they produced a piece of software that replicates the same functionality on a normal linux machine.

I was doing experimentation with using NFS root filesystems, but there were a few things I didn’t like about it. First off, creating the kernel was a pain. WIth all of the effort I mentioned in my previous post on the subject, keeping an updated kernel would be a total pain if you were using CentOS 5 like I am. Second, the kernel didn’t seem to perform any caching of the NFS filesytems, so there was a large amount of traffic flowing over the network from all of the filesystem reads that the Xen environments were doing. Third, all of the root filesystem reads/writes were visible to the Xen instances, so their bandwidth counters (and their associated graphs in my Cacti system) were skewed by a large amount.

These issues don’t seem to occur with AoE. The filesystems are imported on the host, so the stock CentOS Xen kernel doesn’t have to be modified in any way. This also renders the network traffic required in maintaining the filesystems invisible to the Xen domains. The filesystem acts as a normal block device, so it is cached like a normal local disk is cached.

That’s not to say there weren’t issues. At first, the vblade daemon (the linux ‘server’ component of the AoE system) seemed pretty unstable. It seemed to randomly lock up, causing all of my Xen domains to crash, and forcing a reboot of the host server. I think it was just the way I was using it though. I was running the vblade program and backgrounding it, instead of using the vbladed script that was provided. I think it was locking things up when I disconnected the termnal in which I started the vblade instances. When the controlling PTY died, it caused the vblade instances to die in a bad way due to a lack of standard input and output channels. The vbladed script controls all of the input and output paths, so there’s no worry if the terminal disconnects. Since I’ve started using vbladed, about three weeks ago, I haven’t had a single failure.

I’m currently running vbladed against the LVM partitions I used with my NFS root filesystems. Off the bat, I thought this would come up a little short because I didn’t have a swap partition available to the Xen domains. Then I remembered that I could use a regular flat file as swap space, so the problem went away.

Since the vblade server allows you to export a whole block device, be it a whole disk, a single partition, a LVM partition, or a whole RAID array, it opens up some interesting possibilities. On the remote system, you can access the exported block device as if it were a disk, partitioning it as you see fit, while on the system exporting, it could be one of many LVM partitions. This allows for the possibility of creating a “mini hard drive” for each Xen instance, each with its own root filesystem, swap space, and whatever else is deemed necessary. I haven’t implemented this because I want to be able to use my LVM partitions with NFS if stability becomes an issue, but it would be a pretty neat setup.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>