22 May 2013

Google Sites is a popular host for web pages because of its cost (nothing), its integration with Googles suite of productivity tools, and its ease of use. To support automated updates of web pages and other administrative functions, Google offers a programmatic interface (API) to its web-based tools, called Gdata. Following is an example up authenticating to Google and updating a page on a Google Sites website.

1 Why update Google Sites pages with Python?

Many people ran their own local web servers since the dawn of the web, and took advantage of their ownership of those web servers by having programmatically updated web pages within their website. This was often done by automatically generated HTML pages, server-side includes, or local scripts run by the web server.

Now, however, many people are more interested in using cloud-based services for web pages—they leave all of the operations to someone else, often they can scale better than a local webserver could be scaled, and they often have very friendly interfaces that allow for updates by people who are not fluent in HTML or other web technologies. A common choice for web pages that are hosted on someone elses' webserver is using Google Sites.

Google Sites are excellent for creating web pages with rich content (videos, images, text) and controlling access to that content. Local scripts or server-side includes are not permitted, but it is possible to programmatically update Google Sites pages.

2 How to update Google Sites pages with Python

Google provides a library of tools called GData1 that allows computer programs to read data from and write data to many of the Google sites. The GData libraries are available in several languages (for more information see https://developers.google.com/gdata/), but the easiest for me to use was Python, even though I don't really know how to program in Python.

2.1 Installing GData

First, I got the GData Python client library from https://developers.google.com/gdata/ and installed it in my home directory by finding the setup.py in the GData distribution and typing the command:

python setup.py --home=~/python/

I also ran the included tests to make sure it was all working.

GData comes with everything you need to work programmatically with information at Google.

2.2 Creating an API Project

In order for your Python program to talk to Google, you need to create an API Client ID, which you can do for free at https://code.google.com/apis/console. An API Client ID will give you a Client ID and a Client secret, both of which you'll need in your Python program.

2.3 The Beginning of my Python program

To get started, I imported the Python libraries I knew I'd need. I learned about the required gdata libraries from the API documentation.

import sys
import os
import time
# adjust the next line for your installation of gdata
sys.path.append('/Users/acaird/python/lib/python')
import atom.data
import gdata.sites.client
import gdata.sites.data
import gdata.gauth

This block of code imports the standard Python libraries sys, os, and time, and you'll see those used later (in the case of sys, not too much later).

Next I use the sys library to tell Python where I installed the gdata library with the sys.path.append function. You will almost certainly want to edit that. You can also use the PYTHONPATH environment variable.

Once the program can find the gdata libraries, I import the ones the documentation says I'll need.

At this point, I have all of the tools I need.

2.4 Authorization to edit pages

The next block of source code handles the authorization of the program to make changes to a Google Sites page. The authorization is done using OAuth, an open standard and one that is well supported in the GData library2. The flow of the code is:

  1. Set a location for cached credentials
  2. Try to open the file in that location
    1. If the file can be opened, try to read a gauth token from the file
    2. If the file cannot be opened, set the token to None
  3. If there isn't a token, talk to Google to get one This process will print out a URL to be followed for authorization and ask for a key from the authorization to be entered, then authorize the client (this program, via the variable client), then save the credentials.
  4. If there is a token, it is used to authorize the client

In this case, the client secret isn't a secret.3 The user_agent can be anything meaningful to you so you can look at the logs and see when your Python program changed your web content and when a person changed it.

You'll notice in this code block we create the variable client; in that creation we also select the Google Site we want to edit, in this case it is confusingly called the same as my name, acaird. I suspect, but don't know for sure, you could read the sites (as below) and select from a list programmatically. In my case I know the name of the site I want to update, so I just typed it in.

The scope in the gdata.gauth.OAuth2Token function call is specific for Google Sites. For a list of other scopes, see http://googlecodesamples.com/oauth_playground/.

WARNING The file to which the token is written is important, it should be protected or removed if you aren't certain it can be kept safe.

#+NAME vars #+NAME authorization

token_cache_path=os.environ['HOME']+'/.gdata-storage'
print "Token Cache: %s" % token_cache_path
try:
   with open(token_cache_path, 'r') as f:
       saved_blob_string=f.read()
       if saved_blob_string is not None:
           token = gdata.gauth.token_from_blob(saved_blob_string)
       else:
           token = None
except IOError:
    token = None

if token == None :
   print "Getting a new token."
   token = gdata.gauth.OAuth2Token( client_id=MyClientId,
                                    client_secret=MyClientSecret,
                                    scope='https://sites.google.com/feeds/',
                                    user_agent='acaird-acexample-v1')
   url = token.generate_authorize_url(redirect_uri='urn:ietf:wg:oauth:2.0:oob')
   print 'Please go to the URL below and authorize this '
   print 'application, then enter the code it gives you.'
   print '   %s' % url
   code = raw_input("Code: ")
   token.get_access_token(code)
   client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird')
   token.authorize(client)
   saved_blob_string = gdata.gauth.token_to_blob(token)
   f=open (token_cache_path, 'w')
   f.write(saved_blob_string)
else:
   print "Using a cached token from %s" % token_cache_path
   client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird')
   token.authorize(client)

f.close()

2.5 Reading data from Google Sites

feed = client.GetSiteFeed()
print 'Google Sites associated with your account: '
counter = 0
for entry in feed.entry:
  print '       %i   %s (%s)' % (counter,entry.title.text, entry.site_name.text)
  counter = counter + 1
print ' --- The End ---'

This section of code, when run on my account, produces this output:

Google Sites associated with your account:
       0   acaird (acaird)
       1   CD Squared Project (umcdsquared)
       2   U-M GPR Project (umichgpr)
       3   ORCI Project Site (umorciprojectsite)
       4   UM Projects (umprojectstruthkos)
 --- The End ---

Since we already selected the acaird Google Site when we initialized client, we can start fetching content from it.

I'm not sure what most of the code below does, but at the end, old contains the HTML of the first webpage in the acaird Google Site, which was my goal.

kind = 'webpage'
print 'Fetching only %s entries' % kind
uri = '%s?kind=%s' % (client.MakeContentFeedUri(), kind)
feed = client.GetContentFeed(uri=uri)

print "Fetching content feed of '%s'...\n" % client.site
feed = client.GetContentFeed()

uri = '%s?kind=%s' % (client.MakeContentFeedUri(),'webpage')
feed = client.GetContentFeed(uri=uri)

old=feed.entry[0]

2.6 Writing to a Google Sites Page

To make sure we're updating the web page, here's the current date and time for later use, and comparison between the output on this screen and what is in the web page.

time = time.asctime()
print "Time: %s" % time

Then I create some new HTML, stored in old.content.html, which I could print out, but I've commented out that line.

Then I call client.Update with the feed.entry in old to update the page.

old.content.html = '''
<html:div xmlns:html="http://www.w3.org/1999/xhtml">
  <html:table cellspacing="0" border="1"
              class="sites-layout-name-one-column sites-layout-hbox">
    <html:tbody>
      <html:tr>
        <html:td class="sites-layout-tile sites-tile-name-content-1">
          <html:div dir="ltr">&#160;This is my web page.
                                    It was last updated on %s by <kbd>%s</kbd><br />
          </html:div>
        </html:td>
      </html:tr>
    </html:tbody>
  </html:table>
</html:div>
''' % (time,sys.argv[0])
# print old.content.html

updated_entry = client.Update(old)
print 'Web page updated.'

Footnotes:

2

https://developers.google.com/api-client-library/python/guide/aaa_oauth is a good reference for using the Python library version of GData's OAuth.

3

According to https://developers.google.com/accounts/docs/OAuth2#installed "The client_id and client_secret obtained during registration are embedded in the source code of your application. In this context, the client_secret is obviously not treated as a secret."