Xen+NFS Root Filesystem Madness

As part of my continuing experimentation with Xen, I decided a while back to try running the child environments (domUs) from a NFS root filesystem, so I could play with hot-migrating domUs between Xen hosts. I just started playing with it a couple nights ago, and what a pain in the ass its been.

First off, I’m using CentOS 5 as the Xen host operating system (dom0) because it’s got Xen support built right in. Handy right? Sure. It does not, however, have support for NFS root filesystems built into the Xen kernels it supplies. Not a big deal – I compile my own kernels all the time. I added the proper options into the kernel – IP Autoconfiguration support, NFS client support, and NFS root filesytem support – and I went on my way.

That wasn’t the end of the trouble. While I could get the domU to use the NFS share as its root filesystem, it wasn’t accessing it properly. The root user had no permissions to write to anything, so everything was broken. This is typical of a NFS share with the “root_squash” option enabled, but I specified that my share be expored with the opposite setting enabled (“no_root_squash”). No matter what I did, I couldn’t find out why root squashing was happening. I could mount the share just fine from another machine, and root squashing wasn’t happening.

I decided to look at the differences between the mount parameters between my broken domU and the working system. There were a few differences, but the thing that was causing problems was “sec=null”. That setting disables all authentication for the mount, and all access is mapped to the anonymous user specified on the NFS server.

I had found my problem, but the solution eluded me. I tried every way I could think of to change the mount parameters, but nothing worked. Then I stumbled across this post to the Linux Kernel Mailing List. Apparently, something was broken in the NFS kernel code in the 2.6.18 release that has to do with properly identifying what NFS server version one is connecing to. CentOS uses the 2.6.18 kernel. I tried applying the patch described in the post, and voila! Everything works!

With everything working, I was able to play with a few other things. I have two physical networks in my Xen boxes, one public and one private. All domUs are connected to the public network on eth0, and the private network is connected on eth1. I want to mount the NFS shares on the private network, but the default Xen configuration directives only seem to allow mounting NFS roots via eth0. I got around this by specifying the IP configuration stuff in the “extra” directive instead of the ip, netmask, and gateway directives. Here’s the relevant portion of the config file.


Now for more experimentation!

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>