Monthly Archives: May 2007

Ridiculous

[$3.62 per gallon]

Taken today while waiting at a stop light heading into work. Sigh.

Hot Migration Action

As described in my last few posts, I’ve recently acquired a good amount of new server hardware. Well, everything is in my posession now except a few sticks of RAM, and it’s all set up at work. I ended up picking up the RAID enclosure I mentioned earlier, along with disks to fill it. It ended up being quite a bargain, with the enclosure, drive trays, and a external SCSI cable only costing around $50 plus shipping. Here’s all the new gear mounted in a rack at work… My stuff is the white stuff in a sea of black servers.

[servers]

I’ve got the RAID enclosure connected to the dual P3 1.0GHz machine I bought (furthest away on the bottom), and combined, there’s 18 drive bays available to the SCSI system. I’ve got fifteen 36GB drives (plus one hot spare) in a RAID5 storage array and two 18GB drives in RAID1 for the OS installation. The RAID5 array weighs in at about 500GB, so I have plenty of room to keep stuff that I don’t want to lose.

I’m currently seeing how well my Xen domains function with NFS root filesystems. So far it looks pretty good. I’ve got the domains that host my web site (among other things) and the mysql domain running off the RAID5 array via NFS, and I haven’t noticed any slowdowns whatsoever. The only unexpected thing I’ve come across is a few weird incompatibilities with the Gentoo init scripts, specifically when it tries to bring up networking devices. It just hangs up when trying to initialize eth1, which is the interface that the NFS root filesystem is accessed through. My firewall script also kills things, but I should be able to fix that.

Having things running over NFS allows for live migration of running domains. I tried it out a few hours ago, and it’s surprisingly painless, given that the appropriate functionality is enabled in the Xen daemon. One command sends a running domain between physical Xen hosts, which is pretty damned neat. I can see this being tremendously useful in a high-availabilty sort of environment. If a host machine needs maintenance, you can simply transfer the running child domain to another host, do your business, and transfer it back with only a fraction of a second of downtime.

Xen+NFS Root Filesystem Madness

As part of my continuing experimentation with Xen, I decided a while back to try running the child environments (domUs) from a NFS root filesystem, so I could play with hot-migrating domUs between Xen hosts. I just started playing with it a couple nights ago, and what a pain in the ass its been.

First off, I’m using CentOS 5 as the Xen host operating system (dom0) because it’s got Xen support built right in. Handy right? Sure. It does not, however, have support for NFS root filesystems built into the Xen kernels it supplies. Not a big deal – I compile my own kernels all the time. I added the proper options into the kernel – IP Autoconfiguration support, NFS client support, and NFS root filesytem support – and I went on my way.

That wasn’t the end of the trouble. While I could get the domU to use the NFS share as its root filesystem, it wasn’t accessing it properly. The root user had no permissions to write to anything, so everything was broken. This is typical of a NFS share with the “root_squash” option enabled, but I specified that my share be expored with the opposite setting enabled (“no_root_squash”). No matter what I did, I couldn’t find out why root squashing was happening. I could mount the share just fine from another machine, and root squashing wasn’t happening.

I decided to look at the differences between the mount parameters between my broken domU and the working system. There were a few differences, but the thing that was causing problems was “sec=null”. That setting disables all authentication for the mount, and all access is mapped to the anonymous user specified on the NFS server.

I had found my problem, but the solution eluded me. I tried every way I could think of to change the mount parameters, but nothing worked. Then I stumbled across this post to the Linux Kernel Mailing List. Apparently, something was broken in the NFS kernel code in the 2.6.18 release that has to do with properly identifying what NFS server version one is connecing to. CentOS uses the 2.6.18 kernel. I tried applying the patch described in the post, and voila! Everything works!

With everything working, I was able to play with a few other things. I have two physical networks in my Xen boxes, one public and one private. All domUs are connected to the public network on eth0, and the private network is connected on eth1. I want to mount the NFS shares on the private network, but the default Xen configuration directives only seem to allow mounting NFS roots via eth0. I got around this by specifying the IP configuration stuff in the “extra” directive instead of the ip, netmask, and gateway directives. Here’s the relevant portion of the config file.

nfs_root="/xen/domains/test"
nfs_server="192.168.3.10"
root="/dev/nfs"
extra="ip=192.168.3.150:192.168.3.10:192.168.3.4:255.255.255.0::eth1:"

Now for more experimentation!