Migrate Paravirtualized Xen to KVM under RHEL

Posted by Roel Gloudemans on 9 July 2009 | 0 Comments

Tags:

Update July 11, 2009: Re-registering VMs at RHN uses an extra entitlement with RHEL5.4Beta
Update July 15, 2009: Swap usage, clock and disk cache of the virtual machine
Update July 16, 2009: Replace virsh create with virsh define & start to create a managed domain and not a transient one
Update September 2, 2009: Re-registering with RHN works
Update September 2, 2009: RHEL5.4 has been released. Added a note about services on the physical host
Update September 6, 2009: Updating TimeKeeping and Hugepages

RedHat Enterprise Linux version 5.4 is out. It heralds the arrival of KVM as RedHat's official hypervisor. RedHat will be supporting Xen for the rest of the RHEL5 life cycle, so for the moment, there is no need to migrate to KVM.

However migrating to KVM has some advantages. For one KVM looks simpler from the outside, another is that it works with a normal kernel, meaning that all drivers that work on a normal kernel work as well. This not only encompasses display drivers, but CPU scaling (dynamically adapting the speed of the CPU) as well. This is not only very "green" but makes a difference is your or the companies wallet as well.

RedHat put a lot of work into making Xen easier to manage in RHEL5.0-5.3. As a result Xen uses a single disk image from which it can boot. The format of this image is the same as for KVM. One would suspect that migrating from one Hypervisor to another would be easy and it is. This blog will describe a step-by-step scenario on how to do it.

The starting situation is a RHEL5.3 Physical host with RHEL5.3 paravirtualized guests. The guests have two networking interfaces, one bridged to the physical network interface, and one bridged to a dummy network interface for an internal host network. Note that the minimum requirement to run with virtio is RHEL5.3.

Note:
I had some trouble with selinux in the rhel 5.4 beta. It is related to the attributes on /var/lib/libvirt. I do not use this directory to store the images, bit I use raw LVM volumes. To get my system running again, I just disabled selinux.

How to create an internal network with Xen
Create a dummy network interface on dom0
echo alias dummy0 dummy >> /etc/modprobe.conf
and
cat > /etc/sysconfig/network-scripts/ifcfg-dummy0 <<EOF
DEVICE=dummy0
BOOTPROTO=static
BROADCAST=a.b.c.255
IPADDR=a.b.c.d
NETMASK=255.255.255.0
NETWORK=a.b.c.0
ONBOOT=yes
USERCTL=no
EOF

Create a virtual network using this interface
cat > /etc/xen/scripts/network-xen-custom <<EOF
#!/bin/sh
# network-xen-custom
# Exit if anything goes wrong
set -e
# First arg is operation.
OP=$1
shift
script=/etc/xen/scripts/network-bridge

case ${OP} in
start)
$script start vifnum=0 bridge=xenbr0 netdev=eth0
$script start vifnum=1 bridge=xenbr1 netdev=dummy0
;;
stop)
$script stop vifnum=0 bridge=xenbr0 netdev=eth0
$script stop vifnum=1 bridge=xenbr1 netdev=dummy0
;;
status)
$script status vifnum=0 bridge=xenbr0 netdev=eth0
$script status vifnum=1 bridge=xenbr1 netdev=dummy0
;;
*)
echo "Unknown command:${OP}"
echo 'Valid commands are: start, stop, status'
exit 1
esac
EOF

Edit /etc/xen/xend-config.sxp and find the line starting with (network-script and change it to(network-script network-xen-custom)

Now the internal interface can be added to virtual machines by changing one line in the virtual machine definition file from
vif = [ "mac = ab:cd:ef:gh:ij:kl,bridge=xenbr0" ]
to
vif = [ "mac = ab:cd:ef:gh:ij:kl,bridge=xenbr0", "mac = ab:cd:ef:gh:ij:km,bridge=xenbr1" ]
(change ab:cd:... to your own mac addresses, xenbr1 should have a different mac)

First we start by preparing the virtual machines. This can be done afterward, because a Xen VM will boot under KVM as long as the disk is defines as an IDE disk in the KVM machine definition file. THe image will not boot with the virtio drivers, unless it is prepared first.

The machine image can also be changed when the virtual machine is not running by using the kpartx, vgscan, vgchange and mount commands, but why go through all this trouble.

Updating the guest image
Start by logging on to the virtual machine as root and installing a non xen kernel. Also make sure the system is up to date. I had some issues with newly installed RHEL5.3 kernels under KVM with virtio. Turn off kudzu in the process as well. We will be making all changes by hand

yum update
yum install kernel
chkconfig --level 2345 kudzu off

Update the grub configuration
The next thing to do is to modify /boot/grub/menu.lst, not only to change the default boot kernel, but to make some modifications to serial console redirection and the system clock as well (the latter two are optional)

Change the default boot kernel. Change the line default=X, where X is the kernel (first kernel is 0)

There are two ways to access the console, one is using VNC, virt-manager could be used for that. The other one is the serial console. For this to work a serial port must be added to the virtual machine definition. The console can then be accessed by first finding the ID of the virtual machine using "virsh list" and then doing "virsh console [ID]" (both on the physical system). Find a modified /boot/grub/menu.lst below:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
# initrd /initrd-version.img
#Change the line below to the correct kernel
default=0

timeout=5
# Splash image is of no use for virtual consoles. Comment it out
#splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu

# Redirect grub output to the serial console
serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1
terminal --timeout=10 serial console

#### Kernel 0 ####
title Red Hat Enterprise Linux Server (2.6.18-128.1.16.el5)
root (hd0,0)
#This line has an added or changed serial console
#Your root device and kernel version is probably different
kernel /vmlinuz-2.6.18-128.1.16.el5 ro root=LABEL=/ rhgb quiet console=ttyS0,9600n8
initrd /initrd-2.6.18-128.1.16.el5.img
#### Kernel 1 ####
title Red Hat Enterprise Linux Server (2.6.18-128.1.16.el5xen)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.1.16.el5xen ro root=LABEL=/ console=xvc0
initrd /initrd-2.6.18-128.1.16.el5xen.img
#and maybe even more kernels below

Another grub configuration file needs to be modified as well. /boot/grub/device.map points to the systems root device. On a Xen system, this is /dev/xvda. On a KVM system with virtio, this is /dev/vda. So

(hd0)   /dev/vda

Update the serial console settings (optional, if the serial console was not configured under Xen)
To add a serial console to the system the /etc/inittab file must be edited. See if a line starting with "co:2345:respawn" is present. If it is, it is probably pointing to xvc0 (Xen Virtual Console). Change xvc0 to ttyS0. If the line is not present, add it.

Old contents of /etc/inittab
...
...
...
# Run gettys in standard runlevels
co:2345:respawn:/sbin/agetty xvc0 9600 vt100-nav
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
...
...
...

New contents of /etc/inittab
...
...
...
# Run gettys in standard runlevels
co:2345:respawn:/sbin/agetty ttyS0 9600 vt100-nav
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
...
...
...


Note that, when the system is booted the console is also accessible via VNC

Make the console on ttyS0 trusted:

echo ttyS0 >> /etc/securetty

Configure the virtio drivers
Open /etc/modprobe.conf in the editor. In our case /etc/modprobe.conf contains the following lines:

alias eth0 xennet
alias eth1 xennet
alias scsi_hostadapter xenblk

change it to

alias eth0 virtio_net
alias eth1 virtio_net
alias scsi_hostadapter virtio_blk

Now add the virtio drivers to the kernel boot image (modify this lane to mirror the latest kernel version)

mkinitrd -f --with=virtio_blk --with=virtio_pci --builtin=xenblk initrd-2.6.18-128.1.16.el5.img 2.6.18-128.1.16.el5
The --builtin is necessary only when currently running under a xen kernel in paravirtualized mode

Internal clock
The internal clock of KVM is less stable than the clock under Xen. Heavy loads have been know to cause clock drift. There are two workarounds:

  • Boot with divider=10 notsc (see earlier) and start ntpd at boot (chkconfig --level 2345 ntpd on; configure ntp first)
  • Use the -no-kvm-pit-reinjection option with qemu-kvm. One of the improvements added to the final version is that libvirt seems to add this option by default now, so everything should work out of the box. You still need to start ntp though.

Also see (https://bugzilla.redhat.com/show_bug.cgi?id=507834)

Now shut down the virtual system (shutdown -h now)

Updating the host
The physical host needs some updating as well. First, before you start, make sure all virtual systems are stopped (xm list) and that you are logged on as root. If RHEL5.4 is already released, yum will update the system automatically to this version. If now, the system needs to be subscribed to the RHEL5.4 beta channel. You can do this at RedHat network, if your system is subscribed to rhn. Also make sure the system has access to the Virtual Platform channel beta. Aside from the updates, some new packages need to be installed as well and all virtualization services must be disabled at boot time until we are ready with the configuration work.

yum clean all #for safety
yum update
yum install kernel kvm kvm-tools kmod-kvm kvm-qemu-img bridge-utils
chkconfig --level 2345 xend off
chkconfig --level 2345 xendomains off
chkconfig --level 2345 rhn-virtualization-host off

Edit /boot/grub/menu.lst and set the default boot kernel to the newest non-xen kernel (see example grub config)

Network configuration
By default only a network that is connected via NAT to the outside world is created. There are three options, leave it as is, but check that the IP range does not conflict with anything on the local network, change the IP range, or convert it to a host only network. I left the network, but adapted the IP range and created a new network for host-only networking. Be sure to change the uuid of the network. The format of the uuid should not change. Change any hex number [0-9|a-f] in the uuid string.

/etc/libvirt/qemu/networks/default.xml
<network>
<name>default</name>
<uuid>cc06c2a2-0766-45ee-baaa-896e04c7a3be</uuid>
<forward mode='nat'/>
<bridge name='virbr0' stp='on' forwardDelay='0' />
<ip address='a.b.c.d' netmask='255.255.255.0'>
<dhcp>
<range start='a.b.c.e' end='a.b.c.f' />
</dhcp>
</ip>
</network>
/etc/libvirt/qemu/networks/hostonly.xml
<network>
<name>hostonly</name>
<uuid>04255669-803e-d8f6-352a-086fa45ae09d</uuid>
<bridge name='virbr1' stp='on' forwardDelay='0' />
<ip address='a.b.g.h' netmask='255.255.255.0'>
<dhcp>
<range start='a.b.g.i' end='a.b.g.j' />
</dhcp>
</ip>
</network>


The host-only network should be started at boot, so ln -s /etc/libvirt/qemu/networks/hostonly.xml /etc/libvirt/qemu/networks/autostart. Note that his network will replace the network coupled to the dummy0 interface, so dummy0 should not start up after a reboot. To do this, move /etc/sysconfig/network-scripts/ifcfg-dummy0 to a safe location, or edit it and change the ONBOOT option from "yes" to "no".

Note:
If you run any services on the physical host, which are bound to the network interface of the host only network, you need to watch the boot order. Most services are started before libvirtd. The Virtual bridges only exist after libvirtd has been started. Any services started before libvirtd will not be able to bind to the virbrX interface. Named (bind) for instance binds to the interfaces. If you use the host only network to access a nameserver on the physical hosts, you need to restart named after boot (of the physical host), or the guests cannot access the nameserver.

The bridged network is a bit more complex. Use the configuration file of eth0 as a basis. cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-br0. Remove the lines crossed out below and change/add the bold statements.

/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
BROADCAST=a.b.c.255
HWADDR=ab:cd:ef:gh:ij:kl
IPADDR=a.b.c.d
NETMASK=255.255.255.0
NETWORK=a.b.c.0
BRIDGE=br0
ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
BOOTPROTO=static
BROADCAST=a.b.c.255
HWADDR=ab:cd:ef:gh:ij:kl
IPADDR=a.b.c.d
NETMASK=255.255.255.0
NETWORK=a.b.c.0
ONBOOT=yes
TYPE=Bridge


Now br0 can be used as a bridge interface. All traffic over the bridge interface is subject to filtering by IPtables. I think this is a great feature and allows you to centralize firewalling on each host. Even better is that the firewall rules are now susceptible to change if the virtual machine is ever compromised. However, Xen worked in a different fashion. Our Xen based images will have their own firewall rules. To skip the firewall rules for the physical host do:

echo net.bridge.bridge-nf-call-ip6tables = 0 >> /etc/sysctl.conf
echo net.bridge.bridge-nf-call-iptables = 0 >> /etc/sysctl.conf
echo net.bridge.bridge-nf-call-arptables = 0 >> /etc/sysctl.conf

Swap usage and caching
If your physical machine is only running Virtual Machines and the memory is not oversubscribed (all VM's together use not more than 80-90%) of total memory, you might want to limit swapfile usage. Since the kernel sees the VMs as a process, rules for processes apply as well. One of those rules means that pages that are not referenced for a while are paged out to swap. The purpose is to free up memory to use for other processes or cache. This speeds up things that are being used. For a VM this is unwanted behavior. On a dedicated host nothing else does run and I don't want my VMs being cached, since that is already happening inside the VM. Double caching gives inconsistent performance behavior, let alone the effects when the host crashes.

There are two ways to put a stop to paging and swapping. The first is not to create a swapfile at all. The second one is to set the kernel dwappiness parameter to a low value. I've set it to 0.

echo vm.swappiness = 0 >> /etc/sysctl.conf


See the virtual machine config file on how to turn off disk caching for virtual machines.

Converting the virtual machine configuration file
There are two ways of converting to KVM. The easiest one is to use virt-manager and create a new virtual machine with exactly the same details as the old one, but point it to a different virtual disk (smallest possible) to prevent overwriting any existing data. Then stop the machine (no need to really install anything) and change the configuration file in /etc/libvirt/qemu by hand to point at the right disk image. This method requires you to reboot first. Else the configuration tools wont see the networks we just created.

The other method is to convert the virtual machine definition by hand. Below is an old Xen definition file (/etc/xen/test1:

name = "test1"
uuid = "4a07fde8-f244-2a6d-9603-85ff2179a9bb"
maxmem = 512
memory = 512
vcpus = 2
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/var/lib/xen/images/test1.img,xvda,w" ]
vif = [ "mac=00:16:3e:1a:d0:96,bridge=xenbr0", "mac=00:16:3e:1a:d0:97,bridge=xenbr1" ]


This information can be converted into a KVM configuration file (/etc/libvirt/qemu/test1.xml. Take care to use the same MAC addresses for the network interfaces or else they won't be recognized when the virtual machine is booted. Also watch the serial and console arguments to not point to the same serial port for multiple VMs. You could use virsh list and virsh dumpxml as a starting point. However you must do this before starting with this howto.

<domain type='kvm'>
<name>test1</name>
<uuid>48156322-4e0c-b658-b80a-1bf3b608b49d</uuid>
<memory>524288</memory>
<currentMemory>524288</currentMemory>
<vcpu>2</vcpu>
<os>
<type arch='x86_64' machine='pc'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' cache='none'/>
<source file='/var/lib/xen/images/test1.img'/>
<target dev='vda' bus='virtio'/>
</disk>
<interface type='bridge'>
<mac address='00:16:3e:1a:d0:96'/>
<source bridge='br0'/>
<model type='virtio'/>
</interface>
<interface type='network'>
<mac address='00:16:3e:1a:d0:97'/>
<source network='hostonly'/>
<model type='virtio'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/2'/>
<target port='0'/>
</serial>
<console type='pty'>
<source path='/dev/pts/2'/>
<target port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
</devices>
</domain>


If you are using a partition as a virtual disk the Xen configuration disk = [ "phy:/dev/vgvm/lvmyvolume,xvda,w" ] translates to:

<disk device='disk" type="block'>
<driver='qemu' cache='none'/>
<source dev='/dev/vgvm/lvmyvolume'/>
<target dev='vda' bus='virtio'/>
</disk>


If you want to bind the virtual cpu to a physical one use the following vcpu syntax:

<vcpu cpuset='cpu1,cpu2,cpu3'>virtual cpus</vcpu>
for example
<vcpu cpuset='0,1'>4</vcpu>


Also see http://libvirt.org/formatdomain.html If you want to verify that the xml file is correct, use the virt-xml-validate command.
now reboot the host

Starting the virtual machines
You can now start the virtual machines by using the virsh command. Open a console directly after starting the domain to monitor boot progress. You also might want to start the machine after booting.

virsh define /etc/libvirt/qemu/[mymachine.xml]
virsh list
virsh start [mymachines ID]
virsh console [mymachines ID]
virsh autostart [mymachines ID]

Improving Performance with Hugepages
Note:
There could be some unwanted interaction with SELinux here. If you run into problems, either don't use Hugepages or turn SELinux off

KVM uses 4kB memory pages by default, just like any other process. One of the main differences between a normal average process and a kvm virtual machine process is the amount of memory allocated to it. Virtual machines normally use hundreds or even gigabytes of memory. This means a lot of overhead when the CPU switches between virtual machines since large memory tables need to be updated each time.

Linux also has Hugepages, special memory pages that are 1,2 or 4MB in size, shortening the list of memory pages dramatically and improving performance up to 10%. Sadly, support for Hugepages hasn't been implemented into libvirt. There is work on it in Fedora 12, but I don't expect to see those developments in RHEL5. There is a way however. First lets start by reserving the Hugepages. The file /proc/meminfo should contain the Hugepage size of the system somewhere in the last lines.

Now calculate the amount of Hugepages needed for the virtual machines and add at least 6 pages extra for each virtual machines. If you do not reserve enough pages, your virtual machine won't start. KVM uses some additional pages when starting up the VM, so if you don't add those 6 pages, the last VM will not start. Add the total of Hugepages to your kernel configuration by doing:

echo vm.nr_hugepages = XXXX >> /etc/sysctl.conf


Make the Hugepages accessible to KVM

mkdir /hugepages
echo hugetlbfs /hugepages hugetlbfs defaults 0 0


Now the Hugepages are set-up (they become accessible after a system reboot). Lets rig libvirt so the Hugepages are actually used after a system reboot. To do this we need to move the qemu-kvm binary and replace it with a script of our own. The binary is located in /usr/libexec. Execute mv /usr/libexec/qemu-kvm /usr/libexec/qemu-kvm2. Now create the script /usr/libexec/qemu-kvm with the following contents:

#!/bin/bash
exec /usr/libexec/qemu-kvm2 -mem-path /hugepages "$@"


Now reboot the system and start your virtual machines like normal.
Note:
Be careful when updating the libvirt package. An update will overwrite our script, so you need to reapply the change after each libvirt update.