Thursday, March 13, 2014

Understanding resolvconf behavior on pxe-booted hosts.

I have been learning more than I ever cared to know about resolvconf, and what happens when you use it on a host with a read-only root filesystem. I have been preparing to roll out pxe-boot virtual machines at a second location. The PXE image has the following characteristics.

* NFS-mounted, read-only / filesystem.
* Local writeable disk for swap and /var.
* Unionfs, md-backed /etc (non-persistent, r/w)

I noticed that on the initial boot of a new VM, /etc/resolv.conf would be written correctly. However, all subsequent boots never see the NFS-supplied resolv.conf updated. After a frustrating afternoon of digging, I determined why resolvconf appears to stop working.

Resolvconf stores state data in /var/run/resolvconf. When dhclient is run for an interface, the dhclient-script script calls resolvconf with DNS particulars, resolvconf looks in the interfaces/ sub-directory for an entry named after the interface. If the file does not exist, or does not match the domain/nameserver options received by resolvconf, a new file is written, and appropriate changes are made to /etc/resolv.conf. If the options match what resolvconf already has, no changes are made. The below output shows the contents of the interfaces/ directory on my pxe host.

> ll /var/run/resolvconf/interfaces/
total 8
-rw-r--r--  1 root  wheel  80 Mar 13 18:00 vmx0:dhcp4
-rw-r--r--  1 root  wheel  76 Mar 13 18:00 vmx1

The problem with my pxe hosts lies in the volatile /etc. Every time the host reboots, the modified contents of /etc vanish. /etc/resolv.conf is replaced with the copy from NFS. In my case, this copy reflects the nameservers at the "original" datacenter. Since the state directory for resolvconf exists on the persistent /var, resolvconf sees the old [unchanged] lease data, and assumes everything is peachy with the resolv.conf file.

I don't need the extra features of resolvconf, so I can solve the problem by disabling it. I created a file in the pxe image, /etc/dhclient-enter-hooks, that contains the following.


My initial, more complicated fix, was to create an rc script to re-initialize the resolvconf state directory on every boot. This also worked flawlessly.

# Clean out the contents of the resolvconf state directory. Otherwise,
# /etc/resolv.conf never gets updated after the initial boot of a new pxe host.
# BEFORE: netif
# PROVIDE: clean_resolvconf

echo -n "Cleaning out resolvconf state directory: "

/sbin/resolvconf -I

if [ $? ]; then
echo "OK"
echo "FAILED"