Plan Zero

Hello, there!

Flattr this

Uploading any file to Google Docs with Python

Published

I recently had a discussion with a client of mine who mentioned that they manually upload backups of their website to Google Docs, that they had always wished that there was a way to FTP them up to save time. That's an interesting idea, said I, I just happen to have written a Google Docs API interaction class in PHP which can be used to upload files to Docs. I wondered whether I could do the same thing in Python and automate the whole backup process…

It turns out that it's actually quite a simple thing to do using Google's GData Python library and a little patience with Google's API documentation.

I wrote a quick proof-of-concept file upload tool, so without further ado, here is the code:


#!/usr/bin/python -u
#
# Google Docs file upload tool
# (c)oded 2012 http://planzero.org/
#
# Requirements:
# gdata >= 2.0.15  http://code.google.com/p/gdata-python-client/
# magic >= 0.1     https://github.com/ahupp/python-magic (Note: this is NOT the
#                  same as the python-magic Ubuntu package, which is older!)
#

# Imports
from __future__ import division
import sys, time, os.path, magic, atom.data, gdata.client, gdata.docs.client, gdata.docs.data

# Settings
username = ''
password = ''

# Check arguments
if len(sys.argv) < 2:
    sys.exit('Usage: ' + sys.argv[0] + ' <filename> [collection]')

# Heads up
print 'Google Docs Upload Tool - v0.1 - http://planzero.org/'

# Set the filename and collection
filename = sys.argv[1]
collection = sys.argv[2] if len(sys.argv) >= 3 else None

# Open the file to be uploaded
try:
    fh = open(filename)
except IOError, e:
    sys.exit('ERROR: Unable to open ' + filename + ': ' + e[1])

# Get file size and type
file_size = os.path.getsize(fh.name)
file_type = magic.Magic(mime=True).from_file(fh.name)

# Create a Google Docs client
docsclient = gdata.docs.client.DocsClient(source='planzero-gupload-v0.1')

# Log into Google Docs
print 'o Logging in...',
try:
    docsclient.ClientLogin(username, password, docsclient.source);
except (gdata.client.BadAuthentication, gdata.client.Error), e:
    sys.exit('ERROR: ' + str(e))
except:
    sys.exit('ERROR: Unable to login')
print 'success!'

# The default root collection URI
uri = 'https://docs.google.com/feeds/upload/create-session/default/private/full'

# If a collection name was set
if collection:

    # Get a list of all available resources (GetAllResources() requires >= gdata-2.0.15)
    print 'o Fetching collection ID...',
    try:
        resources = docsclient.GetAllResources(uri='https://docs.google.com/feeds/default/private/full/-/folder?title=' + collection + '&title-exact=true')
    except:
        sys.exit('ERROR: Unable to retrieve resources')

    # If no matching resources were found
    if not resources:
        sys.exit('ERROR: The collection "' + collection + '" was not found.')

    # Set the collection URI
    uri = resources[0].get_resumable_create_media_link().href
    print 'success!'

# Make sure Google doesn't try to do any conversion on the upload (e.g. convert images to documents)
uri += '?convert=false'

# Create an uploader and upload the file
# Hint: it should be possible to use UploadChunk() to allow display of upload statistics for large uploads
t1 = time.time()
print 'o Uploading file...',
uploader = gdata.client.ResumableUploader(docsclient, fh, file_type, file_size, chunk_size=1048576, desired_class=gdata.data.GDEntry)
new_entry = uploader.UploadFile(uri, entry=gdata.data.GDEntry(title=atom.data.Title(text=os.path.basename(fh.name))))
print 'success!'
print 'Uploaded', '{0:.2f}'.format(file_size / 1024 / 1024) + ' MiB in ' + str(round(time.time() - t1, 2)) + ' seconds'
Here's a sample of the output, where the file backup_2012-04-13.tar.bz2 was uploaded to the Backups collection:

$ ./gupload.py backup_2012-04-13.tar.bz2 Backups
Google Docs Upload Tool - v0.1 - http://planzero.org/
o Logging in... success!
o Fetching collection ID... success!
o Uploading file... success!
Uploaded 11.72 MiB in 17.75 seconds
$ 

To use the script, all you need to do is fill in your username and password and make sure recent versions of the gdata library and the magic module are installed (see the comments at the top of the script for specifics). When you run the tool the collection name parameter is optional, if you don't provide it the file will be uploaded to your root collection. Note that the collection name is case sensitive, so if you get an error about the collection not being found, double check the case first!

It should be fairly easy from this base to incorporate OAuth authentication and other goodies. For my part, I'm going to expand the script to automatically create and upload a nice tarball containing a backup of important files on the server each night as part of a cron job. That should make my client's life a little easier :-)

Enjoy!