Longest Post About Xen Ever.

WARNING: Gratuitious technobabble ahead. Moms and non-linux types may want to just go on with their day and skip this one.

In my last post I alluded to doing some new fun stuff with AoE and Xen. One of the things that I like so much about Xen is its ability to combine the roles typically performed by multiple machines down into one machine. A few things that require specialized hardware, such as my MythTV recording backend machine, are not really feasible to put into virtualized environments, but there are quite a few tasks that work very well. One of my goals with my setups is to move everything I can onto a single “cluster,” with a single machine providing shared storage, and two (or more) machines running Xen environments. That way, I can add new instances with distinct functionality without adding (much) hardware.

One of the roles that has been resistant to my Xen transition is my firewall machine. My home network is divided into three separate logical networks, each with a different purpose. At one point, the firewall machine had four ethernet cards in it – one per network, plus one that connected to the internet. I was able to cut that down to two and eventually one physical interface by acquiring some VLAN-aware network gear. By using VLANs at the network level, I was able to push all four networks through one physical link. Having reached that goal, I knew it was feasible to run my firewall machine in a Xen instance. Actually, I knew it was possible since I’d read about other people doing it, but those stories came with very little in the way of implementation detail. I figured that if I could get the same VLAN information into a Xen instance, my goal would be attainable. In my ideal setup, I would be able to pass VLAN encapsulated traffic to an instance, traffic not encapsulated in a VLAN, or both.

Getting VLANs to work properly inside of a Xen instance was not as easy as I would have liked, however. I gave it a cursory attempt a few weeks ago, and it failed pretty badly, so I moved on to something prettier and shinier. I didn’t really dig much into why it failed, but it turns out that it was pretty simple. In my intial attempt, I tried to run the standard Xen network-bridge script against a VLAN interface defined by the 8021q kernel module. When I attempted to create the Xen network bridge on this interface, everything seemed to work, but in the end, only the bridge itself was present. The VLAN interface and its associated virtual interface were nowhere to be found. I tweaked a few things, with no change in the end result. I must not have been feeling particularly curious that day, and I let the attempts end in failure.

I decided to give things another shot last weekend. I searched around and found a few other people making the same attempts I was. One guy even published a few scripts that allowed him to connect VLAN interfaces to his network bridges so that each instance could have its own VLAN. While somewhat similar to my goal of passing both encapsulated and non-encapsulated traffic into my Xen instances, it wasn’t exactly the same. I looked over his scripts in the attempt to figure out what they were doing, and then looked at the default scripts that came with Xen. After an hour or two of digging through code, I realized why my previous attempt failed. When the network-bridge script does its thing, it takes an interface, puts it into an inactive state, renames it, gives its old name to one of the virtual ethernet interfaces provided by the netloop kernel module, and then attaches both of the aforementioned interfaces to a newly-created bridge interface. The deactivation of the physical interface is where things went awry. The script makes a call to the ‘ifdown’ script, which deactivates the interface. On a normal physical interface, there isn’t much to do short of just downing the interface, but with a VLAN interface, it actually destroys the interface in the process of disabling it. I had found my linchpin… so I thought.

I whipped up a quick patch to alter the behavior of the network-bridge script so it wouldn’t make the call to ifdown, which preserved the VLAN interface. I did a few tests, and things worked as I felt they should. I set up all of the VLAN interfaces in the domain0 environment, configured the network-bridge script to create a Xen bridge for each of them, and then let it rip. All of the interfaces that needed to be renamed were renamed, all the proper bridges were created, and everything was connected where it should have been connected. Everything looked great – except it completely didn’t work. After I changed the network port on my test machine to trunk all VLANs instead of just utilizing my main VLAN, everything stopped working, even though I had the right network stuff configured on the VLAN interfaces in the domain0 environment. I did some simple ping/tcpdump tests, and everything looked right, except the ping traffic was never making it into the dom0 like it should have. Outbound pings were visible in the proper VLAN interface, they egressed through the proper phsycial interface, made it to the destination machine, came back in through the proper physical interface… and then disappeared. The packets never made it back to the “physical” VLAN interface in the dom0, which prevented any kind of success.

After a good amount of expletives and some pacing around the apartment, I had an idea. My broken setup was connecting two separate types of virtual interfaces to the physical ethernet device in the whole initialization process – the VLAN interfaces and the bridge for trunked VLAN traffic. I decided to see if my missing packets were entering the bridge instead of the VLAN interface, and sure enough, they were. I had found another linchpin.

Seeing as how the traffic flow seemed to favor the bridge over the VLAN interfaces, I thought I was stuck. If the traffic is going into the bridge, its game over for the non-VLAN-encapsulated traffic, right? Wrong. The whole purpose of the network-bridge script is to masquerade the physical interface, replace it with a virtual one that is only visible to the dom0 environment, and connect them both to a bridge that is used pass data in and out of the dom0 and domU environments. The virtual interface acts as a normal interface in the dom0, so I figured it would be worth a try to configure the VLAN interfaces on the virtual interface instead of the physical one, and then build their bridges from that. I was a bit pessimistic at this point and really didn’t expect it to work, but it sure did. Traffic started flowing properly to the dom0 environment, which meant the domUs would probably work too. I brought up a test environment, and all of the networking stuff worked as intended. It could see both encapsulated and non-encapsulated VLAN traffic.

It was Miller Time at that point, and pretty late to boot. I had my celebratory beer and hit the sack soon after. I waited until this past weekend to attempt my goal of moving my firewall setup into a Xen instance. I’ve become pretty good at moving Linux environments between physical machines and virtual ones, so the actual move was pretty painless. Once everything was ready, I stopped the networking stuff on my physical firewall machine and brought it up in the virtual one. The only issue to speak of was my cable modem locking in on the MAC address of the ethernet card in my physical firewall machine. I changed the virtual MAC address inside of the firewall Xen instance to match the physical firewall interface, and everything fell into place. I now have a firewall instance that I can migrate between my Xen host machines with no interruption. Great success!

I know I promised diagrams, but I’m not going to put them here. This post is already long enough as it is. I plan on making a wiki entry for the actual technical details (not just the technical description) of the process, so stay tuned for that. My diagrams will be in there.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>