Sensitive File Cloud Back ups
I have a folder in my laptop that contains sensitive information that I’d like to backup/sync. I’m not worried about the security of the storage provider1 I’m uploading to, but I don’t want them to be able to parse the file for any purpose.
My threat model is preventing the storage provider from reading the contents. I somewhat trust the cloud provider to not compromise my files to external entities.
The usual advice on how to accomplish that is to encrypt the files before uploading them using gpg or some other solution. However, that advice ignores how to manage the complexity that comes with encryption like choice of crypto, key management and continually syncing any changes to the sensitive files. For my purposes. I want to use age
for the encrypting and decrypting my files since it uses modern secure cryptography and has a very simple cmd line interface compared to gpg and other solutions.
Encryption:
I used age-keygen
for generating the encryption key and optionally encrypting it using a passphrase. age
only works on files, and doesn’t support recursively encrypting a folder’s contents. I don’t want to archive the folder that I want to encrypt since it is large and I continually edit and update its files.
I wrote up a simple python script that will recursively encrypt all the folder contents and optionally the path name to a target directory. It also only re-encrypts files only if their contents change.
Key / Passphrase Backup: A private key in its raw form should not be uploaded to a storage provider. It’s recommended that the key is backed up in private external storage (SSD, HDD, flash drive, etc..), printed on PR in raw form or as a QR code, or in your password manager assuming your master password is strong.
A private key that’s encrypted with a passphrase can be backed up in the storage provider along with the encrypted files. [assuming the passphrase used is secure, true if you let age
generate it] The same methods above can be used to securely backup the passphrase as well.
Note: When backing up a key to a remote entity, its security is only as strong as the passphrase used to encrypt it. So, it’s important to let age
generate a strong passphrase and avoid providing age
with a weak passphrase that’s easily crack-able, or use a weak master password for your password manager. To test the strength of your passphrase you can use zxcvbn
Folder Syncing:
Most storage providers usually have a local directory like Google Drive
where all of its contents are synced. If the target directory to the python script is a path within the synced folder, then the default syncing mechanism for the storage provider kicks in.
If the target is some non-synced directory, external drive, or a remote server: rsync
(with ssh optionally) can be used to sync the encrypted directory.
Continuous Sync:
Since I regularly update the contents of the folder, I need to re-trigger encryption and sync when any change occurs in the folder. There ares several utilities that watch a folder or set of files and trigger a command when something changes. I’ve tired out a few and settled with watchexec
which I found the simplest to use.[add note about watchman not being able to watch for any change and being hard to use]
I setup watchexec
to watch the folder and trigger my python script on any change and then optionally syncing with remote target. The python script takes care of only encrypting the changes.
Note: This can also be accomplished by implementing a virtual file system instead of using a file watcher.
Next steps:
- Add support to encrypting the filenames and folders.
- Support for printing QR code for key and passphrase
- Package tool up so its easier for non-technical people to use. Mostly as a learning exercise.
- Research alternative solutions like cryptomator, rcopy, gocryptFS and write up my thoughts about them especially related the path/name obfuscation.
-
I use google drive, but this can be applied to Dropbox, icloud, etc ↩︎