Gmail and Gsuite Backup

Recent malware and phishing attackes continue to increase in scale and target gmail and other infrastructure. Since many folks keep email mostly in the cloud, and even worse, client synchronization would delete any mail removed from a mailbox, making the client ineffective as offline backup, a system is needed to ensure there is an adequate backup of email, contact, and calendar information. There are several tools that can help, but actually using them is the only way they will work.

Google Takeout

Google Takeout is a tool that makes it easy to download from about two dozen services, including calendary, contacts, email, and Google Drive documents.

It is probably good to make at least two backups: one for calendar, contact, and mail data; and another for any Google Drive documents. See the next section for detail on partitioning the Google Drive documents from generic file sharing/storage on Google Drive.

Note that for large mailboxes:

Please note that archives may take a long time (hours or possibly days) to create. You will receive an email when your archive is complete.

Google Drive for Files vs. Gsuite Apps

I personally use Google Drive as file backup/sync tool. However, and in addition, I use Google Drive as an editor for documents and spreadsheets that are shared with others. A key to sanity is to have a folder at the root of the Google Drive share that is GDOCS or something like that. That way documents in the Google Docs, Spreadsheets, etc., file format are kept in that location only.

In this case, a Google Takeout backup can be made of the GDOCS folder, and a regular file-level backup to something like Amazon S3 or Glacier (or local media storage).

Gmvault command line backup and restore

Gmvault is free and open source. It is a python-based command line utility. The source code and active issues and commits are hosted on Github. Gmvault can be installed and used on Linux, OSX, and Windows. Once installed, IMAP folders need to be enabled on Gmail/Gsuite mailboxes.

Gmvault has full and quick synchornization (quick sync goes back 7 days). Gmvault can perform fairly quick backup and restores, including partial and full backup. Partial only goes back 7 days, so if the backup has not been made regularly, a full backup (comparing the local backup with the mailbox) is needed. In the case of an example mailbox with about 15gb of mail in 45,000 messages, parsing them takes about 3 minutes and then processing a days's worth of changes takes about 15 seconds.

Gmvault uses Oauth2 for authorization (there is also an option to use passwords, for example when testing). It uses a single database for backup, but can access multiple gmail/gsuite mailboxes (though this breaks the ability to do deletions on the local backup). Gmvault can also do restores to Gmail/Gsuite mailboxes.

There are many configuration options in Gmvault, and it is a mature and robust application. The main directories of note are:

  • ~/.gmvault is where the configuration and oauth2 certs are
  • ~/gmvault-db/db is the location of the backup, which is a simple structure of emails as files in folders by date
    • by default mail is unecrypted but can be encrypted

Because of the simple structure, the entire backup can be zipped at the /db/ folder level and then backed up as a file system. Note that attachements are encoded and saved in the .eml file format.

Daily backup script

Basic script for full backup and archiving the backup folders, using Fish Shell. Note that this assumes that Google.Drive backup folder is in ~/Desktop/Google.Drive and there is a backup directory there.

#!/bin/sh
cd /Applications/gmvault
./gmvault sync user@domain.tld
cd ~/gmvault-db
tar -jcvf "mail-backup-"(date +%Y-%m-%d)".tar.bz2" db
mv "mail-backup-"(date +%Y-%m-%d)".tar.bz2" ~/Desktop/Google.Drive/backup

With this script 7gb of email (as represented in PostBox) is backed up into a single 2gb file, which is then synced with Google.Drive.

Note that it is possible to have multiple database directories, each for a given mailbox, and also to have authentication for different mailboxes, and for a script to call each of these in turn, potentially backing up many different mailboxes all at once. Because this is a set of python scripts, this can be done from a central server, with a little configuring and testing.

IMAP Client with Offline support - Disadvantages

An IMAP client with offline support essentially has a mirror of all email, likely contacts, and possibly calendar events. However, the challenge comes when recognizing that there is a problem in the mailbox on the server, and not simply duplicating that problem through regular synchronization on the client. This means a preserved image of the mailbox at a point in time is needed. Hence, the configuration above is the best approach.

POP Client Backup - Limited and Not Recommended

A very simple approach is to use a POP email client, and ensure that the setting is to leave mail on the server. However, POP doesn't support folders, and doesn't support remote deletes (it only fetches new email). So POP is limited to essentially keeping a backup of new mail (and this only works if it is on all the time, before any new mail is sorted/deleted).