The daily excitements of an open source enthousiast: July 2012

Thursday, July 19, 2012

Hot-detach disks with sanlock

Qemu/kvm allows virtual disks (logical volumes, files, ...) to be attached to and detached from a running domain, and it works great (with virtio). However, when a lock manager is in the game to protect your virtual disks from being assigned to different domains, you might get surprised when you end up loosing all your disk locks from the lock manager for that virtual machine.

What's going on?

Libvirt has a plugin for the sanlock lock manager which protects your virtual disks from being corrupted by getting accessed from multiple guests. It works nicely, but hot-detaching has a flaw: the current libvirt code will release all sanlock resources (read: when removing 1 disk, protection for all disks get lost)!

I wrote a patch to release only the specific resource that you want to hot-detach. It can be found in the bug report. The patch has not been reviewed yet or approved by the libvirt devs, but for me it works as expected, and it may help others who depend on it...

Wednesday, July 18, 2012

Open vSwitch active-passive failover - unreachable guests

The current release of Open vSwitch (1.6.1) does not send learning packets when doing an active-passive bond failover. Switches connected to your network interfaces will not now about the network change when LACP is not used. Result: all your virtual machines machines become unavailable until your guests send out packages that updates the MAC learning table of the uplink switches or until the entry expires from the learning table.

The next release (1.7?) will include a patch to send learning packets when a failover happens. I tested the patch by doing a manual failover on the host and having the interfaces connected to 2 different switches:

# ovs-appctl bond/show bond0
# ovs-appctl bond/set-active-slave bond0 eth1

Hooray! Not a single interruption in guest connectivity... like it should be :-)

SUSE KVM guest 100% cpu usage - lost network - wrong date

There is something wrong with the "kvm-clock" paravirtual clocksource for KVM guests when running the 2.6.32.12-0.7-default kernel of SLES 11 SP1.

Several times now, I encountered unreachable virtual machines (lost network), 100% cpu usage of these guests as seen from the host, and when logging in to the guest console for further debugging:

wrong date, like Sun Feb 5 08:08:16 CET 2597
in dmesg: CE: lapis increasing min_delta_ns to 18150080681095805944 nsec

The fix is simple, update to the latest kernel in SLES 11 SP1, like 2.6.32.54-0.3.1 which apparently provides a stable kvm-clock module.

As as side note: I'm using ntpd on the guests. Some resources report that you should, other tell the opposite. My experience is that when doing live migrations, clock drifting may appear which is not corrected or too slowly. Ntpd handles this correctly.

Tuesday, July 17, 2012

KVM Live migration of memory intensive guests

Recently, I've had some trouble to live migrate a memory intensive (jboss) application on a KVM virtual machine. It took ages (read it failed after 1,5h) for the online migration of the VM to another KVM host while a jvm with 4GB heap size (1GB young gen.) was constantly refilling the memory.

I got around this by increasing the migrate-setmaxdowntime value on the host while the domain was migrating away:

# virsh migrate-setmaxdowntime <domain name> 750

This allows the domain to be paused for 750ms while the remaining memory is sync'ed to the new host. Smaller values can be tried first, and changed on the run...

This behavior can also be simulated by using the "stress" utility on the guest:

# stress --cpu 1 --vm 8 --vm-bytes 128M --vm-hang 1 --timeout 900s

If it takes too long, increase the maxdowntime parameter (or increase your network bandwidth). It can also be worthy to check if the migration process is really taking advantage of all the available bandwidth with a utility like "iftop". If needed, increase to the available bandwidth:

# virsh migrate-setspeed <domain name> 1000

After all, sometimes it's more acceptable to have a small hickup than failing all the way and having to do an offline migration. As long as your application can live with it...