Monday, September 18, 2017

VMFS version confusion and FreeBSD UNMAP

I recently started moving VMWare guests to a new ESXi 6.5 host, when I experienced an unusual problem. Guests would get tied up in knots, endlessly sending the following error messages to the console.

(da0:mpt0:0:0:0): UNMAP. CDB: 42 00 00 00 00 00 00 02 68 00
(da0:mpt0:0:0:0): CAM status: SCSI Status Error
(da0:mpt0:0:0:0): SCSI status: Busy
(da0:mpt0:0:0:0): Retrying command

After a bunch of Google hits (mostly FreeNAS users) that didn't totally add up for me, I have a theory on the actual cause of the issue. In short, I think that this issue is caused by the presence of the following conditions.

  1.  VMWare vSphere 6.5 host, which supports both VMFS5 and VMFS6.
  2. A FreeBSD guest, using...
  3. A virtual disk that is thinly-provisioned, and stored on a VMFS5 filesystem.
VMFS6 supports the use of the UNMAP command, which allows the guest operating system to inform the hypervisor that it is no longer using a block. When the virtual disk is thin-provisioned, the host can reallocate the block to the pool of available disk space. FreeBSD has included support for this SCSI command for some time.

My theory is this. I think ESXi is lying to the guests. I think that when a thin guest is created on a VMFS5 filesystem, the UNMAP command is still exposed/permitted, even though the underlying filesystem doesn't actually support it. When the guest tries to send the UNMAP command, it gets a bogus response. In the case of FreeBSD, it retries the command perpetually, hanging up the system.

The search hits I found (linked above) mention switching the virtual disk to a SATA/IDE bus as a workaround. I suspect that this works because the UNMAP command does not exist on those buses, preventing the issue from occurring. I believe that the following solutions are less hacky. 

  1. When creating virtual disks for FreeBSD on VMFS5 filesystems, they should always be thick provisioned (I use eager zeroing, I haven't tested lazy). This seems to be the one-size-fits-all solution. It's also worth noting that the ESXi installer uses VMFS5 on the system disk, with no apparent way to use VMFS6.
  2. If you must use thin-provisioning, make sure that it is on a VMFS6 filesystem. I have not tested this extensively, but it seems to work.
  3. Thin-provisioned guests also seem to behave normally on NFS-backed storage.
In my testing, I have not had any issues on guests provisioned per #1.

Saturday, August 26, 2017

Display the listening ports on CentOS 7

It's been a while since I was serious about Linux, but the fun new goodies have lured me back towards the fold. A lot of things have changed over the last few years, some for the better, some not, but that's way beyond the scope here.

One thing that has been removed (at least from CentOS) is netstat. I'm going to call that a win, because invoking netstat always required a trip to the man page, aside from the trusty netstat -lnp.  The problem I have is that CentOS (and presumably RHEL) removed netstat, but decades worth of Google indexing has entrenched netstat as the blessed method of pulling a list of listening sockets.

Installing net-tools seems like the wrong approach, there must be a better way...and there is! Buried in a blog post, I found a conversion reference for netstat functionality. From this, I learned that ss is the replacement for the functionality I need. As an added bonus, it appears that the basic syntax is similar to my beloved sockstat.

For example, my oft-used "what's listening on the network":

ss -l46