IMAP Backup Script

If you absolutely must have SSL support, have a look at imapsave, the relevant tidbits of which I will eventually incorporate here.

As depicted here, I've been wrestling with the need to back up my IMAP archive server, and this is what I have come up with in the meantime. Since it's late and there isn't much point in dallying around, let's go straight to the legalese, shall we?

IN NO EVENT WILL I BE LIABLE FOR ANY DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, THOSE RESULTING FROM LOST PROFITS, LOST DATA, LOST REVENUE OR BUSINESS INTERRUPTION) ARISING OUT OF THE USE, INABILITY TO USE, OR THE RESULTS OF USE OF, THIS PROGRAM. WITHOUT LIMITING THE FOREGOING, I SHALL NOT BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES THAT MAY RESULT FROM THE USE OF THIS SCRIPT OR ANY PORTION THEREOF WHETHER ARISING UNDER CONTRACT, NEGLIGENCE, TORT OR ANY OTHER LAW OR CAUSE OF ACTION. I WILL ALSO PROVIDE NO SUPPORT WHATSOEVER, OTHER THAN ACCEPTING FIXES AND UPDATING THE SCRIPT AS IS DEEMED NECESSARY.

There. That means NO SUPPORT, get it?

Features:

  • Copies every single message from every single folder in your IMAP server to your disk.
  • Does incremental copying (i.e., tries very hard to not copy messages twice).
  • Tries to do everything as safely as possible (only performs read operations on IMAP)
  • Generates mbox formatted files that can be imported into Mail.app (just choose "Other" on the import dialog).
  • Is completely and utterly free (distributed under a BSD license).
  • I do not support it, or intend to reply to any e-mails regarding problems in it.

Download/Usage:

Grab the script from here and use it from a terminal like so (you can do python imapbackup.py if you don't want to rename the file or set it as executable):

$ imapbackup -?
Usage: imapbackup [OPTIONS]
-z --compress                   create/append to gzip compressed files (EXPERIMENTAL)
-s HOSTNAME --server=HOSTNAME   connect to HOSTNAME
-u USERNAME --username=USERNAME with USERNAME
-p PASSWORD --password=PASSWORD with PASSWORD (you will be prompted for one if missing)

Mailbox files will be created IN THE CURRENT WORKING DIRECTORY

$ imapbackup -s imap.local --username=me
Password:

\ IMAP: Scanning INBOX
...
  IMAP: Found 2440 messages in Personal/2005/Q3.
\ MERGE: Copying from Personal/2005/Q3 to Personal.2005.Q3.mbox
  APPEND: Appended 2440 messages to Personal.2005.Q3.mbox
  (752171854 bytes, of which the largest message was 20211381 bytes)

Caveats:

These are missing features, ideas for enhancements, or both.

  • EXPERIMENTAL: Outputting gzip compressed files is doable, even for appending, but takes a long time to perform Message-Id scanning due to internal Python mechanics (my guess is that it tries to seek() inside the file a lot). Since bzip2 only lets you create completely new files, I'm considering re-implementing this in a different way:
    • If the mailbox file does not exist, there is no need to scan it for existing messages, so I can use Python's native gzip or bzip2 file wrappers and create a new file.
    • If the mailbox file already exists and is compressed, it's probably best to have it be completely decompressed first (by invoking gzip or bzip2</tt) and then scanned and appended to in the usual way, re-compressing it afterwards. This takes up a bit more disk space temporarily (especially in my case, with 0.5GB per mailbox), but lets me deal with gzip and <tt>bzip2 in precisely the same way and might be more efficient for large volumes of "new" messages (have to do some trial runs cases to figure out).
  • Code is fairly clean now, but patching socket bloated the script a bit too much for my taste.
  • The text-based spinner is a hack job (lifted the code from here, have to review it).
  • No provision for backing up subsets of folders (yet - am trying to figure out if I really need to do, say, full-blown Regular Expressions for folders, or a simple substring match plus an exclusion list).
  • No provision for injecting messages back into an IMAP server (remember, this is a backup utility, not imapsync). But since importing things into Mail.app tends to take a good while (especially when you have a lot of e-mail) and there's no point in importing into Mail.app only to drag things out to the server, I'm strongly considering implementing this (shouldn't be any trouble doing the actual copy, the command-line UI is the real pain).
  • May mark messages as read in some IMAP servers (very heavily dependent on the server itself and its support of PEEK, is NOT fixable for those servers that don't support it, so try to backup only when you've read all your mail...).
  • Does NOT backup IMAP flags (which also includes user-visible flags in Mail.app and Thunderbird), because - guess what - those aren't stored in the message headers. It does, however back up MailTags data, even though MailTags may not pick up flags from restored backups without a little nudging.
  • SSL connections are NOT supported, which is sure to annoy those wanting to use this against .Mac (not a priority for me since I back up my IMAP server over a LAN, and it seems that I may have to patch socket or its SSL equivalent somewhere else).
  • Authentication is secured by whatever Python's imaplib feels like using - i.e., the password is not necessarily encrypted over the wire (again, stuff like CRAM-MD5 is not a priority for me for the same reasons as SSL, and a wontfix if I ever get SSL working).
  • Some sort of GUI might be possible by using EasyDialogs or the native Tcl/Tk bindings on the Mac (not a priority, but fun to tinker with).
  • There is no detection of IMAP path separators whatsoever - i.e., if your IMAP server doesn't use '/' in nested folders, you'll have to tweak a constant in the script (no intention of fixing this).
  • IMAP and disk files are scanned separately from the merge operation, which is nice, re-usable and safe. However, it may be nicer (in terms of reporting overall progress) to interleave IMAP scanning with merging after pre-scanning the disk files (not a priority until there is some sort of GUI).
  • FIXED in 1.0.1a: For some reason, the qmail/Google/Groups combination tends to add a second Message-Id header to some messages. Since IMAP seems to prefer the first one and PortableUnixMailbox the last one, repeated runs will result in duplicate messages in the .mbox file (non-critical, but annoying).
  • There are occasional instances where Message-Ids cannot be parsed when read from the disk. No data is lost, but may result in duplicate messages. This is due to my dumping the IMAP RFC:2822 format verbatim into a file (which is the safe, high-fidelity approach) and not being clever enough at parsing Message-Ids from a file (it is easier to parse the IMAP values - which the server takes some care to format correctly - than to try to anticipate all the possible horrors that the originating MUA or MTA perpetrated on the headers).

FAQ:

  1. Does this FAQ mean you'll support the thing after all?

    No, merely that I care about you.

  2. I get some Python errors about (something)!

    Of course you do. If you're running Python 2.4 on a Mac, then you're running A NON-STANDARD CONFIGURATION and you ought to be able to figure it out yourself. Nevertheless, I took care to comment the script so that you can figure out precisely what I had to patch to get rid of MemoryError in 2.3.5.

  3. I get some Python errors and then something about AUTH!

    Of course you do. That's because the script does not support CRAM-MD5 authentication, and your server wants it to. Patches to do this are welcome, I have no easy way to test this at the moment.

Revision History:

1.2e:

  • Fixed a minor issue with Microsoft Exchange where Exchange would reply to a query for RFC822.MSGSIZE with more information than expected (it apparently uses FLAGS as markers for Calendar items, and insists on sending them even if not asked to).

1.2b:

  • Added bzip2 compression by popular demand. However, the unwashed masses will miss out on being able to append to existing .bz2 files (which is the nice thing about gzip). As a result, the code is now sprinkled with liberal warnings and a few more checks. Compression is likely to be done in a different way (by invoking external commands) if future tests show it to be faster.

1.1b:

  • Added experimental gzip compression (at level 9). Works perfectly for new files, seems to work OK for appending to existing files. Makes for rather slow checking of pre-existing messages in local files, but is relatively fast for snapshots and saves a lot of disk space. Obviously, you'll have to decompress the files to import them into Mail.app.

1.0.1a:

  • Test universe now spans over 8GB of e-mail messages (largest around 25MB)
  • Made Message-Id collection a bit more resilient by parsing around malformed headers and adding a bit more error handling (there was no data loss, but unintended dupes).
  • Fixed trailing newlines in some saved messages.
  • Possible fix for multiple Message-Ids (seems to work for the infamous qmail/Google/Groups combo).
  • Cleaned up some bits of code.

1.0:

  • Added command-line arguments, general code cleanup. First public version.

0.8 - 0.9:

  • Worked around Bug #1092502Mac OS X by patching socket._fileobject.read (with thanks to Bob Ippolito).
  • Removed PARTIAL fetches (no longer necessary after the fix)

0.3 - 0.7:

  • Implemented PARTIAL message fetches (very large messages are fetched in chunks to avoid wasting RAM).

0.2:

  • Worked around extremely asinine Exchange behavior whereupon it will break Message-Id headers across several lines (probably caused by overlong headers and overzealous wrapping).

0.1:

  • Implemented IMAP and mailbox scanners to enumerate messages per Message-Id
  • Implemented Message-Id generation for messages lacking it (like Drafts from some MUAs and .Mac news, which is directly injected into inboxes).