Using Python to Update Google Sites Pages oh how I miss SSIs
Google Sites is a popular host for web pages because of its cost (nothing), its integration with Googles suite of productivity tools, and its ease of use. To support automated updates of web pages and other administrative functions, Google offers a programmatic interface (API) to its web-based tools, called Gdata. Following is an example up authenticating to Google and updating a page on a Google Sites website.
1 Why update Google Sites pages with Python?
Many people ran their own local web servers since the dawn of the web, and took advantage of their ownership of those web servers by having programmatically updated web pages within their website. This was often done by automatically generated HTML pages, server-side includes, or local scripts run by the web server.
Now, however, many people are more interested in using cloud-based services for web pages—they leave all of the operations to someone else, often they can scale better than a local webserver could be scaled, and they often have very friendly interfaces that allow for updates by people who are not fluent in HTML or other web technologies. A common choice for web pages that are hosted on someone elses' webserver is using Google Sites.
Google Sites are excellent for creating web pages with rich content (videos, images, text) and controlling access to that content. Local scripts or server-side includes are not permitted, but it is possible to programmatically update Google Sites pages.
2 How to update Google Sites pages with Python
Google provides a library of tools called GData1 that allows computer programs to read data from and write data to many of the Google sites. The GData libraries are available in several languages (for more information see https://developers.google.com/gdata/), but the easiest for me to use was Python, even though I don't really know how to program in Python.
2.1 Installing GData
First, I got the GData Python client library from
https://developers.google.com/gdata/ and installed it in my home
directory by finding the setup.py
in the GData distribution and
typing the command:
python setup.py --home=~/python/
I also ran the included tests to make sure it was all working.
GData comes with everything you need to work programmatically with information at Google.
2.2 Creating an API Project
In order for your Python program to talk to Google, you need to create an API Client ID, which you can do for free at https://code.google.com/apis/console. An API Client ID will give you a Client ID and a Client secret, both of which you'll need in your Python program.
2.3 The Beginning of my Python program
To get started, I imported the Python libraries I knew I'd need.
I learned about the required gdata
libraries from the API
documentation.
import sys import os import time # adjust the next line for your installation of gdata sys.path.append('/Users/acaird/python/lib/python') import atom.data import gdata.sites.client import gdata.sites.data import gdata.gauth
This block of code imports the standard Python libraries sys
, os
,
and time
, and you'll see those used later (in the case of sys
,
not too much later).
Next I use the sys
library to tell Python where I installed the
gdata
library with the sys.path.append
function. You will
almost certainly want to edit that. You can also use the
PYTHONPATH
environment variable.
Once the program can find the gdata
libraries, I import the ones
the documentation says I'll need.
At this point, I have all of the tools I need.
2.4 Authorization to edit pages
The next block of source code handles the authorization of the program to make changes to a Google Sites page. The authorization is done using OAuth, an open standard and one that is well supported in the GData library2. The flow of the code is:
- Set a location for cached credentials
- Try to open the file in that location
- If the file can be opened, try to read a
gauth
token from the file - If the file cannot be opened, set the token to
None
- If the file can be opened, try to read a
- If there isn't a token, talk to Google to get one
This process will print out a URL to be followed for
authorization and ask for a key from the authorization to be
entered, then authorize the client (this program, via the
variable
client
), then save the credentials. - If there is a token, it is used to authorize the client
In this case, the client secret isn't a secret.3 The
user_agent
can be anything meaningful to you so you can look at the
logs and see when your Python program changed your web content and
when a person changed it.
You'll notice in this code block we create the variable client
; in
that creation we also select the Google Site we want to edit, in this
case it is confusingly called the same as my name, acaird
. I
suspect, but don't know for sure, you could read the sites (as below)
and select from a list programmatically. In my case I know the name
of the site I want to update, so I just typed it in.
The scope
in the gdata.gauth.OAuth2Token
function call is
specific for Google Sites. For a list of other scopes, see
http://googlecodesamples.com/oauth_playground/.
WARNING The file to which the token is written is important, it should be protected or removed if you aren't certain it can be kept safe.
#+NAME vars #+NAME authorization
token_cache_path=os.environ['HOME']+'/.gdata-storage' print "Token Cache: %s" % token_cache_path try: with open(token_cache_path, 'r') as f: saved_blob_string=f.read() if saved_blob_string is not None: token = gdata.gauth.token_from_blob(saved_blob_string) else: token = None except IOError: token = None if token == None : print "Getting a new token." token = gdata.gauth.OAuth2Token( client_id=MyClientId, client_secret=MyClientSecret, scope='https://sites.google.com/feeds/', user_agent='acaird-acexample-v1') url = token.generate_authorize_url(redirect_uri='urn:ietf:wg:oauth:2.0:oob') print 'Please go to the URL below and authorize this ' print 'application, then enter the code it gives you.' print ' %s' % url code = raw_input("Code: ") token.get_access_token(code) client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird') token.authorize(client) saved_blob_string = gdata.gauth.token_to_blob(token) f=open (token_cache_path, 'w') f.write(saved_blob_string) else: print "Using a cached token from %s" % token_cache_path client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird') token.authorize(client) f.close()
2.5 Reading data from Google Sites
feed = client.GetSiteFeed() print 'Google Sites associated with your account: ' counter = 0 for entry in feed.entry: print ' %i %s (%s)' % (counter,entry.title.text, entry.site_name.text) counter = counter + 1 print ' --- The End ---'
This section of code, when run on my account, produces this output:
Google Sites associated with your account: 0 acaird (acaird) 1 CD Squared Project (umcdsquared) 2 U-M GPR Project (umichgpr) 3 ORCI Project Site (umorciprojectsite) 4 UM Projects (umprojectstruthkos) --- The End ---
Since we already selected the acaird
Google Site when we
initialized client
, we can start fetching content from it.
I'm not sure what most of the code below does, but at the end, old
contains the HTML of the first webpage in the acaird
Google Site,
which was my goal.
kind = 'webpage' print 'Fetching only %s entries' % kind uri = '%s?kind=%s' % (client.MakeContentFeedUri(), kind) feed = client.GetContentFeed(uri=uri) print "Fetching content feed of '%s'...\n" % client.site feed = client.GetContentFeed() uri = '%s?kind=%s' % (client.MakeContentFeedUri(),'webpage') feed = client.GetContentFeed(uri=uri) old=feed.entry[0]
2.6 Writing to a Google Sites Page
To make sure we're updating the web page, here's the current date and time for later use, and comparison between the output on this screen and what is in the web page.
time = time.asctime() print "Time: %s" % time
Then I create some new HTML, stored in old.content.html
, which I
could print out, but I've commented out that line.
Then I call client.Update
with the feed.entry
in old
to update
the page.
old.content.html = ''' <html:div xmlns:html="http://www.w3.org/1999/xhtml"> <html:table cellspacing="0" border="1" class="sites-layout-name-one-column sites-layout-hbox"> <html:tbody> <html:tr> <html:td class="sites-layout-tile sites-tile-name-content-1"> <html:div dir="ltr"> This is my web page. It was last updated on %s by <kbd>%s</kbd><br /> </html:div> </html:td> </html:tr> </html:tbody> </html:table> </html:div> ''' % (time,sys.argv[0]) # print old.content.html updated_entry = client.Update(old) print 'Web page updated.'
Footnotes:
https://developers.google.com/api-client-library/python/guide/aaa_oauth is a good reference for using the Python library version of GData's OAuth.
According to https://developers.google.com/accounts/docs/OAuth2#installed "The client_id and client_secret obtained during registration are embedded in the source code of your application. In this context, the client_secret is obviously not treated as a secret."