I've been running mail systems on the Internet since 1993 and seem to have tried so many different combinations, my current iteration based on mailinabox just came to an end. It served me well, but I had a number of trivial to moderate issues with it, eventually the lack of upgrade path without a re-install, and a little time on my hands was the final nail in it's coffin. So, what next?

I need (want?) control over my email, partially because I don't trust the faceless machines not to lose it, but probably that I don't trust the faceless machines not to read it .. which I understand some do .. just to workout what I might be interested in for marketing purposes. After much browsing and tinkering, I came back once again to Zimbra, which is by far the most comprehensive (best?) Open Source email system out there, the problem is it's resource hungry and expensive to host, and of course once the data is thrown in, potentially expensive to back up. What to do. Ok, so there is a configuration I've tried before, didn't last for a couple of reasons, firstly my broadband was a tad too slow and unreliable, secondly because getting off-site backups off site over a sluggish broadband connection is a pain. However, as modern technology seems to have addressed these issues to some extent, maybe worth trying again. So here's what we have now;

Zimbra has a nice feature where it can run in either main server mode, or in proxy mode. In proxy mode it will simply accept IMAP, SMTP and HTTP requests and forward them as directed, so I can then relocate the 'actual' Zimbra server on the end of a broadband line, and as it happens, as a virtual machine on a preexisting server. The broadband up-link speed is around 10M, so you wouldn't really know it wasn't directly on the net and after getting DKIM, DMARC, SPF and RBL's in a row, it seems thus far to be very happy.

File Backups

While I'm at it, random rsync's from CRON are not an ideal way of backing up user's files, not least when someone requests the restore of a missing or corrupted file, so again after much poking around things like "owncloud" and "nextcloud", I came across Seafile. This is the same sort of thing as "owncloud", but comes with a desktop "sync" client for Linux, OS/X and Mac. This seemed to be the dream solution if indeed it would work as described. After installing it in a VM and connecting it to the office VPN, it too seems to be doing the job. Essentially you install the desktop sync client, tell it (for example) you want to sync "Documents", "Desktop" and "Pictures" .. and off it goes. After an initial sync, each time you create, amend or delete a file in any of these locations, changes are mirrored on the server. You can then log into the server and see your files via the UI, including change history etc, so if you leave the defaults alone, files should never expire, so on top of all else, you also get a versioned history of all your work. (then of course once you do this on your laptop, all your laptop and desktop files are in sync!)

The Seafile Server UI

Cool .. but there is a minor fly in the ointment ...

Off-site backups

So this is always an issue. In this particular configuration, Zimbra (the main mail server) is running as a KVM virtual machine, backed by a QCOW2 filesystem. This means that the entire system is contained within 1 file on the host server. Seafile is the same, so effectively I have two files that contain everything for everyone on the network. (well people still have local copies of their files, but all mail and all file backups) Things we need to protect against here;

  1. Users inadvertently killing the files
  2. Files dying with a hard drive
  3. Files getting corrupted by bugs, hardware glitches, or lightening ... (I saw this once, not pretty, always make sure your lightning conductor ground isn't severed and isn't next to the phone line connected to the internal modem of your company's accounts server)

The first is relatively easy, dedicated user account and strict file permissions, the second, also relatively trivial, just make sure you have a decent RAID array in the server, the third, well, that's more interesting. I have two files with a total size of around 350Gb which are literally changing all the time.

Enter the Borg!

I'm not sure how I missed this, maybe after 20 years with rsync I just wasn't paying attention, however there is a new kid on the block and the name is very apt, for backups it kind of wipes the floor with rsync.

Why? Well rsync is good at incremental backups at file level, i.e. only recopying changed files, however when you're dealing with machines images, things are not quite so straightforward. Sure, it can use delta's to only copy parts of the file that have changed, but with backups, I don't want just ONE backup, I want a new backup every day so I can recover from a week ago if needs be .. so for my use-case, for one week I already need 2Tb of remote storage. Also, for sensitive files (Desktop files and EMail!) I really don't want a bunch of copies of my entire machine sat on an Internet server somewhere just waiting for a hacker to come along ...

So .. Borg .. firstly it operates an archive or repository at the server side of the connection which provides the following facilities;

  1. Compression (always good)
  2. De-duplication (filesystem independent, say goodbye to needing ZFS!)

And on the client side, encryption, so only encrypted data leaves the machine, and data stored on remote servers is no good to anybody without your keys and passphrase.

So on a daily (or hourly?) basis, all we need to do is a Borg backup of our two machine image files to our remote Borg backup server, we backup the full machine image, the server effectively stores a complete copy of the image, but only the changes are transferred, and de-duplication means that each unique block is stored only once .. i.e. effectively it's only storing the changes each time, which relatively speaking are tiny.

Just to make this even easier, I came across a company called BorgBase who provide a hosted Borg Server solution for literally peanuts, and as they only ever see encrypted data, this seems to be a very safe and neat way to tie up the entire package. Just to stick a bow on top, they even provide a new modern UI for managing your backups and repositories.

Borgbase UI

No pause for thought ..

One last issue with backing up Virtual Machines, unless you're running on ZFS and backup ZFS snapshots (which is cool, but I've lost or nearly lost too many ZFS filesystems after upgrades or system issues) then you end up needing to stop or pause the virtual machine for the duration of the backup .. and we'd rather not pause the service at all, let alone for an indeterminate amount of time .. enter a new feature of libvirt, the ability to hive off an external snapshot, run on that snapshot, then merge the snapshot back into the base image, all without taking a breath. While it's running on the snapshot, it's then safe to back up the base image, which won't be changing under your feet while you copy. Just for a flavour, this is my backup script thus far based on this mechanism, seems to work;

#!/bin/bash

export BACKUP_NAME=`date "+%A_%d_%B_%Y_%H_%M"`
DOMAIN=$1
IMAGE=`virsh domblklist ${DOMAIN}|grep vda|xargs|cut -d" " -f2`
BACKUP=${IMAGE/qcow2/backup.qcow2}

if [[ "$IMAGE" == *".backup."* ]]
then
        echo "ERROR: snapshot already live - please fix!"
        exit -1
fi
if [ -f $BACKUP ]
then
        echo "ERROR: snapshot file already exists - please delete!"
        exit -1
fi

virsh dumpxml --migratable ${DOMAIN} > ${DOMAIN}.xml
virsh snapshot-create-as --domain ${DOMAIN}   \
                             --name backup.qcow2 \
                             --no-metadata       \
                             --atomic            \
                             --quiesce           \
                             --disk-only         \
                             --diskspec vda,snapshot=external

borg create --stats $REPO::${BACKUP_NAME} ${IMAGE} ${DOMAIN}.xml
virsh blockcommit ${DOMAIN} vda --active --pivot

IMAGE=`virsh domblklist ${DOMAIN}|grep vda|xargs|cut -d" " -f2`
if [[ "$IMAGE" == *".backup."* ]]
then
        echo "ERROR: pivot did not succeed!"
        exit -1
fi
echo "Deleting... ${BACKUP}"
sudo rm "${BACKUP}"

All in all this seems to be a relatively tidy and maintainable solution, it'll probably need some tweaks, but so far this would seem to be a reasonable pattern for a small distributed office solution with an absolutely minimal cost. DigitalOcean fees for two instances are around $10 a month, and a 1Tb backup from BorgBase comes in at $85 a YEAR. The solution is currently running 8 users with 10 main domains (what can I say, we're all Schizophrenic) but I guess it should be good for a few dozen users at least.

DNS is also running on Digital Ocean, and I have a dumb backup MX also sat on another server, just to cover proxy reboots.