ZFS / ZVOL

Some time ago (2012) I spent a fair bit of time with ZFS, indeed I was using it as a base for virtual machines. There were a couple of issues I couldn't completely get to grips with and as time passed I moved back onto Linux RAID10.

This year I ended up playing with ProxMox on some decent hardware and as luck would have it, I found myself running on ZFS again. (albeit using Gluster for VM backing) It worked really well and I was impressed by the apparent speed boost.

The issue that really turned me off in 2012 was the performance of ZVOL instances. Then (and now) they perform really badly in comparison to the physical hardware. However, after a little experimentation with SSD's, that problem it seems can be overcome.

I've just set up a new server with 8T of disk and a 128G SSD, with the SSD providing the boot partition (grub still isn't completely happy using root on ZFS). Disk layout looks like this;

# zpool status pool2
pool: pool2
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Nov  5 09:51:21 2015 config:

	NAME        STATE     READ WRITE CKSUM
	pool2       ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sda     ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	logs
	  sdg4      ONLINE       0     0     0
	cache
	  sdg6      ONLINE       0     0     0

errors: No known data errors

From there I've created a 2T ZVOL instance, which shows up like this;

# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
pool1            300K  1.76T    96K  /pool1
pool2           2.06T   500G   140K  /pool2
pool2/owncloud  2.06T  2.48T  76.4G  -

So as far as virt-manager is concerned, it's seeing a virtual block device at /dev/zvol/pool2/owncloud. All well and good, but ZVOL performance isn't brilliant, so enter the ZIL and L2ARC cache options;

# zpool iostat -v pool2
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool2        105G  3.52T     28    618   223K  10.6M
  raidz1     105G  3.52T     28    401   223K  2.96M
    sda         -      -     15     19  83.1K  1.52M
    sdb         -      -      5     14  31.2K   824K
    sdc         -      -     15     19  88.1K  1.52M
    sdd         -      -      5     14  26.1K   826K
logs            -      -      -      -      -      -
  sdg4       272M  7.67G      0    217     76  7.59M
cache           -      -      -      -      -      -
  sdg6      9.12G  20.0G     21     16   167K  1.95M
----------  -----  -----  -----  -----  -----  -----

So we have 8G of SSD providing a write buffer, and another 30G of SSD providing an LFU cache against pool2. So essentially all writes and the majority of reads are going via a 500MB/sec SSD ...

And this is the view from inside the VM that's sitting on the ZVOL;

# hdparm -t /dev/vda
/dev/vda:
 Timing buffered disk reads: 988 MB in  3.02 seconds = 327.46 MB/sec

For 4 spindles, although I can get more from MD-RAID, I'll gladly sacrifice the additional performance in exchange for checksums, compression and snapshots.

So, rock-on ZFS on Linux, all you need to do now is fix the grub problems affecting ZFS root filesystems, and we're away! And BTRFS becomes .. well, we don't really need it, do we?