For the Love of Suffering2021-06-03T20:27:14+00:00http://acaird.github.ioAndrew Cairdacaird@gmail.comExtracting Data from PDF Forms into CSV Files2021-05-24T00:00:00+00:00http://acaird.github.io/computers/2021/05/24/pdf-forms-to-csv<p>
My friend has a cool job helping people <a href="https://www.washtenaw.org/CivicAlerts.aspx?AID=1483">stay out of jail in Washtenaw
County, Michigan, USA</a>. That lofty mission, like so many these days,
is underpinned by the lowly PDF form. I wanted to make her job easier
by automating the extraction of data from those forms into a summary
spreadsheet.
</p>
<p>
tl;dr: Everyone on the Internet is cool and lets you use their code
and you can put it all together to do other cool things. This is no
exception, so big thanks to: <a href="https://github.com/gen2brain">Milan Nikolic</a> for the very nice <code>dlgs</code>
dialog box code; the <a href="https://github.com/pdfcpu/pdfcpu">PDF CPU</a> project for the PDF parsing code; and
<a href="https://github.com/sirupsen">Simon Eskildsen</a> for his very friendly <code>logrus</code> logging code.
</p>
<div id="outline-container-orgedd7158" class="outline-2">
<h2 id="orgedd7158">The first problem: Parsing PDF Form Data</h2>
<div class="outline-text-2" id="text-orgedd7158">
<p>
The <a href="https://github.com/pdfcpu/pdfcpu">PDF CPU</a> library does a very nice job of returning a map of form
element labels and the data that was entered into the form. This is
done in the <code>parsePdfForm</code> routine.
</p>
<p>
One of the downsides of this approach is that the form element
labels as generated by Acrobat’s conversion of input files (like
Word files) to PDF forms are, by default, randomly grabbed from the
surrounding context, so sometimes are not useful and can seemingly
be up to 99 characters long. I use these as the CSV column
headings, so they can be sort of confusing and very annoying. I
couldn’t think of a better way to do this in the general case, so
this is what you get. I did consider supporting a file that maps
between what’s in the form and what you want as headings, or a GUI
that let’s you rename the fields before writing the summary, but
that seemed like a lot of extra work for version 1.0.0, so skipped
it.
</p>
</div>
</div>
<div id="outline-container-orgd3235e0" class="outline-2">
<h2 id="orgd3235e0">The second problem: Not all of the PDF files are the same</h2>
<div class="outline-text-2" id="text-orgd3235e0">
<p>
It is possible that the PDF forms will change over time with added
fields, changed fields, removed fields, or just be altogether
different forms. After parsing all of the PDF files in the target
directory, we have a set of form fields (the result of the “first
problem”). What do we write as the summary? Does the first file
read define the set of fields? Is the set the intersection of them,
so only those fields that appear in all of the forms also appear in
the summary? Is it the union of them, so every field that appears
in any form appears in the summary? Is there some other
configuration read from a file or developed interactively via a GUI
that is used to define the fields in the summary?
</p>
<p>
The easiest thing is to keep the union of all of the fields. This
can lead to weird output, though, and also requires that all of the
files are read and processed before the summary can begin to be
created. For example, say you have two PDF Form files, one is to
order a hamburger and one is to order a book. You could end up with
a summary file that looks like:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">File</th>
<th scope="col" class="org-right">Number of Patties</th>
<th scope="col" class="org-left">Wheat Bun</th>
<th scope="col" class="org-left">Title</th>
<th scope="col" class="org-left">Author</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">book.pdf</td>
<td class="org-right"> </td>
<td class="org-left"> </td>
<td class="org-left">The TeX Book</td>
<td class="org-left">D. Knuth</td>
</tr>
<tr>
<td class="org-left">hburg.pdf</td>
<td class="org-right">2</td>
<td class="org-left">On</td>
<td class="org-left"> </td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left">zbook.pdf</td>
<td class="org-right"> </td>
<td class="org-left"> </td>
<td class="org-left">CDB!</td>
<td class="org-left">W. Steig</td>
</tr>
</tbody>
</table>
<p>
In order to make the headings, the program has to know all of the
form fields from all of the forms, then it has to go through each
set of data from each form and and fill in the table, not
compressing the data upwards, but keeping it aligned with the file
from which it came. Annoying and error-prone, at least the way I
first did it. This is done in the function <code>structData</code>.
</p>
</div>
</div>
<div id="outline-container-org898f37d" class="outline-2">
<h2 id="org898f37d">The third problem: Icons</h2>
<div class="outline-text-2" id="text-org898f37d">
<p>
I really wanted icons for the program, and bundling it all together
for either Windows or MacOS is purportedly possible. I tried to do
this, but was only slightly successful (the MacOS App did get an
icon). The details are in the <a href="https://github.com/acaird/pdfform2csv#building">Building</a> section of the <a href="https://github.com/acaird/pdfform2csv#readme">ReadMe</a> file.
</p>
</div>
</div>
<div id="outline-container-org3e7582c" class="outline-2">
<h2 id="org3e7582c">No problem: The results</h2>
<div class="outline-text-2" id="text-org3e7582c">
<p>
In a <a href="https://golang.org">Go</a> program of just over 300 lines and some extra support files,
I have a cross-platform, single file (well, sort of, for the Mac)
application that has a graphical interface to reading a directory
full of PDF Form files and writes to that same directory a CSV file
summarizing the data from the PDF files:
<a href="https://github.com/acaird/pdfform2csv">https://github.com/acaird/pdfform2csv</a>
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<tbody>
<tr>
<td class="org-left">MacOS</td>
<td class="org-left">Windows</td>
</tr>
<tr>
<td class="org-left"><img src="/assets/2021-05-24-ss-welcome-mac.png" alt="2021-05-24-ss-welcome-mac.png" /></td>
<td class="org-left"><img src="/assets/2021-05-24-ss-welcome-win.png" alt="2021-05-24-ss-welcome-win.png" /></td>
</tr>
<tr>
<td class="org-left"><img src="/assets/2021-05-24-ss-filedialog-mac.png" alt="2021-05-24-ss-filedialog-mac.png" /></td>
<td class="org-left"><img src="/assets/2021-05-24-ss-filedialog-win.png" alt="2021-05-24-ss-filedialog-win.png" /></td>
</tr>
<tr>
<td class="org-left"><img src="/assets/2021-05-24-ss-results-mac.png" alt="2021-05-24-ss-results-mac.png" /></td>
<td class="org-left"><img src="/assets/2021-05-24-ss-results-win.png" alt="2021-05-24-ss-results-win.png" /></td>
</tr>
</tbody>
</table>
</div>
</div>
Using Github Actions to Build and Push Images to Google Container Registry2020-02-11T00:00:00+00:00http://acaird.github.io/computers/2020/02/11/github-google-container-cloud-run<p>
Even personal projects that make use of Google Cloud Run to minimize
costs deserve CI. And they can get it!
</p>
<p>
I was following Alex Olivier’s blog post <a href="https://alexolivier.me/posts/deploy-container-stateless-cheap-google-cloud-run-serverless">Deploy your side-projects at
scale for basically nothing - Google Cloud Run</a> (which, by the way, is
<i>really good</i> and you should read it if you want to run web-apps at
next to zero cost, for real) that obviously uses Google’s Cloud Run
(which is Google’s <a href="https://knative.dev/">knative</a> service). Google’s Cloud Run requires that
the related Docker image is in the Google Container Registry (GCR)
(well, mostly requires that). This blog post is a follow-on to Alex’s
blog post, so if you haven’t followed along with his steps, some of
the assumptions I make might not be valid (or, at least, might be
confusing), however, you can likely learn some things without reading
Alex’s post but, honestly, are you so busy you can only read one blog
post?
</p>
<p>
My code is in Github and I wanted to be able to build images and push
them to GCR automatically so that I could later deploy them to Google
Cloud Run. There is some starting configuration at
<a href="https://github.com/actions/starter-workflows/blob/master/ci/google.yml">https://github.com/actions/starter-workflows/blob/master/ci/google.yml</a>
that is pretty good, but it includes code to deploy to the Google
Kubernetes Engine, which I didn’t need. It didn’t, however, include
some details that would have been handy, which I hope to expound on in
a useful way here…
</p>
<p class="verse">
Dearly beloved,<br />
We are gathered here today<br />
To get through this thing called “CI”<br />
Magic word, “CI”<br />
It means “continuous”, and that goes on for a long time<br />
But I’m here to tell you there is something else<br />
OK, not really. People really like CI in the late 20-teens, early 2020s.<br />
So go crazy.<br />
</p>
<p>
Sorry, your Purple-ness, that was unwarrented.
</p>
<div id="outline-container-orga06a3ef" class="outline-2">
<h2 id="orga06a3ef">Linking Github and Google</h2>
<div class="outline-text-2" id="text-orga06a3ef">
<p>
The docker image is built at Github, then pushed to the Google
Container Registry, so before even building the image, you should
give Github access to your project’s GCR bucket.
</p>
<p>
There are a lot of nice notes at Google’s page titled <a href="https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console">Creating and
managing service account keys</a>, but if you’ve started with <a href="https://alexolivier.me/posts/deploy-container-stateless-cheap-google-cloud-run-serverless">Alex’s
blog post</a> you probably have the <code>gcloud</code> client installed on your
computer, and these steps should help.
</p>
<p>
First, authenticate your local <code>gcloud</code> installation to your Google
account:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud auth login
</pre>
</div>
<p>
which will open your web brower and prompt you to log in to Google.
If you’ve followed Alex’s post, you might already be logged in.
</p>
<p>
Second, you need to get the service account for your Google Cloud
Run project (remember, see Alex’s blog post):
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud iam service-accounts list --project [project_name]
</pre>
</div>
<p>
This will produce something like:
</p>
<pre class="example">
NAME EMAIL DISABLED
Default compute service account 999999999999-compute@developer.gserviceaccount.com False
</pre>
<p>
The value that is in the <code>EMAIL</code> column is the IAM account that
needed to generate a key that will go into a GitHub secret. The
command to make that key is:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud iam service-accounts keys <span style="color: #A3BE8C;">\</span>
create ~/my_awesome_secret_key.json <span style="color: #A3BE8C;">\</span>
--iam-account 999999999999-compute@developer.gserviceaccount.com <span style="color: #A3BE8C;">\</span>
--project [project_name]
</pre>
</div>
<p>
This command will write a key to the file
<code>my_awesome_secret_key.json</code> in your home directory. Don’t publish
this anywhere someone might find it or put it in a Git repository,
it is a private key to your Google cloud project.
</p>
<p>
The third and final step to connecting Github and your Google Cloud
project is to put a base-64 encoded copy of your private key in a
<a href="https://help.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets">Github secret</a>. From a Mac terminal, you can type:
</p>
<div class="org-src-container">
<pre class="src src-sh">cat ~/my_awesome_secret_key.json | base64 | pbcopy
</pre>
</div>
<p>
to “copy” the base-64-encoded secret key to the clipboard. On Linux
you can type:
</p>
<div class="org-src-container">
<pre class="src src-sh">cat ~/my_awesome_secret_key.json | base64 | xclip -selection -clipboard
</pre>
</div>
<p>
to do the same. I don’t know how to base64-encode something or copy
it to the clipboard from the command line on Windows, I suggest
using <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">Windows Subsystem for Linux</a> if at all possible, but that’s a
whole other thing.
</p>
<p>
Once you have the base64-encoded key file, go to the project’s page
at Github and, from the list of options along the top of the project
(below the title), click “Settings” then from the menu on the left
click “Secrets”. Click the “Add a New Secret” link in the middle of
the page.
</p>
<div class="figure">
<p><img src="/assets/2020-02-11-ggccr-add-new-secret.png" alt="2020-02-11-ggccr-add-new-secret.png" width="70%" />
</p>
<p><span class="figure-number">Figure 1: </span>The Github Project Page’s Secrets Management Tab Click the “Add a new secret” link</p>
</div>
<p>
Set the Name to <code>GCR_KEY</code> and the Value to the base64-encoded
version of <code>$PATH_TO_KEY_FILE</code>.
</p>
<div class="figure">
<p><img src="/assets/2020-02-11-ggccr-adding-secret.png" alt="2020-02-11-ggccr-adding-secret.png" width="70%" />
</p>
<p><span class="figure-number">Figure 2: </span>Adding a secret to the Github Project Page Name the secret and add the contents of the secret; in this case the contents should be base64-encoded. Part of the secret in this screenshot is blanked-out, yours will look complete</p>
</div>
<p>
Click the “Add Secret” button and you’ll see your secret’s name in
the list.
</p>
<div class="figure">
<p><img src="/assets/2020-02-11-ggccr-saving-secret.png" alt="2020-02-11-ggccr-saving-secret.png" width="70%" />
</p>
<p><span class="figure-number">Figure 3: </span>Listing the secrets associated with a Github project This is the list of secrets for the Github project; here you can see the <code>GCR_KEY</code> secret listed</p>
</div>
<p>
Now this Github repository can access the Google project matching
the key; there may be better security that this, but :shrug: If you
have suggestions, find me on Twitter <a href="https://twitter.com/acaird">@acaird</a>.
</p>
</div>
</div>
<div id="outline-container-org60628dd" class="outline-2">
<h2 id="org60628dd">Setting up Github to build your Docker image</h2>
<div class="outline-text-2" id="text-org60628dd">
<p>
Github offers a free tier of CI that includes 2,000 minutes of build
time per month, there are more details at the <a href="https://github.com/features/actions">Github Actions page</a>
and <a href="https://github.blog/2019-08-08-github-actions-now-supports-ci-cd/">this blog post</a>. For building Docker images for a small project,
this is almost certainly plenty.
</p>
<p>
I chose to trigger builds when I push a Git tag, but you have the
option of starting a build when you push a branch. For more
details, see Github’s documentation for <a href="https://help.github.com/en/actions/reference/events-that-trigger-workflows">events that trigger
workflows</a>.
</p>
<p>
This is all configured by adding a YAML file to the
<code>.github/workflows</code> directory in your Git repository. I called my
YAML file <code>google.yml</code> because that was the name of the example file
I started with. My YAML file looks something like:
</p>
<div class="org-src-container">
<pre class="src src-yaml"><span style="color: #D8DEE9;">name</span>: Build and Push to GCR
<span style="color: #D8DEE9;">on</span>:
<span style="color: #D8DEE9;">push</span>:
<span style="color: #D8DEE9;">tags</span>:
- v*
<span style="color: #616e88;"># </span><span style="color: #616e88;">Environment variables available to all jobs and steps in this workflow</span>
<span style="color: #616e88;"># </span><span style="color: #616e88;">GKE_EMAIL: ${{ secrets.GKE_EMAIL }} </span>
<span style="color: #616e88;"># </span><span style="color: #616e88;">GKE_KEY: ${{ secrets.GKE_KEY }} </span>
<span style="color: #D8DEE9;">env</span>:
<span style="color: #D8DEE9;">GITHUB_SHA</span>: ${{ github.sha }}
<span style="color: #D8DEE9;">GITHUB_REF</span>: ${{ github.ref }}
<span style="color: #D8DEE9;">IMAGE</span>: [IMAGE_NAME]
<span style="color: #D8DEE9;">REGISTRY_HOSTNAME</span>: gcr.io
<span style="color: #D8DEE9;">jobs</span>:
<span style="color: #D8DEE9;">setup-build-publish-deploy</span>:
<span style="color: #D8DEE9;">name</span>: Setup, Build, and Publish
<span style="color: #D8DEE9;">runs-on</span>: ubuntu-latest
<span style="color: #D8DEE9;">steps</span>:
- <span style="color: #D8DEE9;">name</span>: Checkout
<span style="color: #D8DEE9;">uses</span>: actions/checkout@v2
<span style="color: #616e88;"># </span><span style="color: #616e88;">Setup gcloud CLI</span>
- <span style="color: #D8DEE9;">uses</span>: GoogleCloudPlatform/github-actions/setup-gcloud@master
<span style="color: #D8DEE9;">with</span>:
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #D8DEE9;">version</span>: <span style="color: #A3BE8C;">'270.0.0'</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #D8DEE9;">service_account_key</span>: ${{ secrets.GCR_KEY }}
<span style="color: #616e88;"># </span><span style="color: #616e88;">Configure docker to use the gcloud command-line tool as a credential helper</span>
- <span style="color: #D8DEE9;">run</span>: |
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #616e88;"># </span><span style="color: #616e88;">Set up docker to authenticate</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #616e88;"># </span><span style="color: #616e88;">via gcloud command-line tool.</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;">gcloud auth configure-docker</span>
<span style="color: #616e88;"># </span><span style="color: #616e88;">Build the Docker image</span>
- <span style="color: #D8DEE9;">name</span>: Build
<span style="color: #D8DEE9;">run</span>: |
<span style="color: #ff0000; font-weight: bold;"> </span>export TAG=`echo $GITHUB_REF | awk -F/ <span style="color: #A3BE8C;">'{print $NF}'</span>`
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;">echo $TAG</span>
<span style="color: #ff0000; font-weight: bold;"> </span>docker build -t <span style="color: #A3BE8C;">"$REGISTRY_HOSTNAME"</span>/<span style="color: #A3BE8C;">"$IMAGE"</span>:<span style="color: #A3BE8C;">"$TAG"</span> \
<span style="color: #ff0000; font-weight: bold;"> </span> --build-arg GITHUB_SHA=<span style="color: #A3BE8C;">"$GITHUB_SHA"</span> \
<span style="color: #ff0000; font-weight: bold;"> </span> --build-arg GITHUB_REF=<span style="color: #A3BE8C;">"$GITHUB_REF"</span> .
<span style="color: #616e88;"># </span><span style="color: #616e88;">Push the Docker image to Google Container Registry</span>
- <span style="color: #D8DEE9;">name</span>: Publish
<span style="color: #D8DEE9;">run</span>: |
<span style="color: #ff0000; font-weight: bold;"> </span>export TAG=`echo $GITHUB_REF | awk -F/ <span style="color: #A3BE8C;">'{print $NF}'</span>`
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;">echo $TAG</span>
<span style="color: #ff0000; font-weight: bold;"> </span>docker push <span style="color: #A3BE8C;">"$REGISTRY_HOSTNAME"</span>/<span style="color: #A3BE8C;">"$IMAGE"</span>:<span style="color: #A3BE8C;">"$TAG"</span>
<span style="color: #ff0000; font-weight: bold;"> </span>docker tag <span style="color: #A3BE8C;">"$REGISTRY_HOSTNAME"</span>/<span style="color: #A3BE8C;">"$IMAGE"</span>:<span style="color: #A3BE8C;">"$TAG"</span> <span style="color: #A3BE8C;">"$REGISTRY_HOSTNAME"</span>/<span style="color: #A3BE8C;">"$IMAGE"</span>:latest
<span style="color: #ff0000; font-weight: bold;"> </span>docker push <span style="color: #A3BE8C;">"$REGISTRY_HOSTNAME"</span>/<span style="color: #A3BE8C;">"$IMAGE"</span>:latest
</pre>
</div>
<p>
The only change you need to make to this file is to change
<code>[IMAGE_NAME]</code> to be what you want to name your Docker image.
</p>
<p>
Stepping through this file:
</p>
<ol class="org-ol">
<li><p>
The first section determines what triggers the execution of the
rest of the file; in this case, if there is a tag that starts
with <code>v</code> the rest of the file will be processed, otherwise
nothing happens. This is set by the lines
</p>
<pre class="example">
on:
push:
tags:
- v*
</pre>
<p>
This will also match the tags <code>victory</code> and <code>vodka</code>; you could
tighten up the regular expression if you’re worried about that;
I’m not (also, I might want to trigger a build with the tag
<code>vodka</code>, you don’t know).
</p></li>
<li>The second section sets some environment variables for use
elsewhere in the processing</li>
<li>Then we have one job with several steps, the steps are:
<ul class="org-ul">
<li>checkout our code</li>
<li>set up the <code>gcloud</code> environment using the secret we configured
in the previous section</li>
<li>set up the <code>gcloud</code> Docker environment</li>
<li>run <code>docker build</code> with some options (the <code>Build</code> step)</li>
<li>run <code>docker push</code> to push the image to the Google Container
Registry (the <code>Publish</code> step) twice, once with a tag that
matches the Git tag and once with the <code>latest</code> tag.</li>
</ul></li>
</ol>
<p>
I wanted to use the Git tag as the tag for my Docker image, but
<code>github.ref</code> is the full reference; that is, if your Git tag is
<code>v0.81</code>, the tag is <code>refs/tags/v0.81</code> and that is not a valid (or
desired) Docker image tag. The <code>export TAG ...</code> line splits the
input on <code>/</code> and takes the last field which is, in this example,
<code>v0.81</code>; happily, this is the tag we wanted all along. Hooray.
</p>
<p>
Once you have a YAML file in the right place, <code>git commit</code> it and
push it to Github. Whether that triggers a build or not, Github
will parse the YAML and make sure it is valid. To check this, go to
the “Actions” tab of your Github project and look at the list of
events. If there is an error in your YAML the name of the event
will be the file name, not the name of the job. Clicking that event
will show you your file with a message like:
</p>
<pre class="example">
Check failure on line 1 in .github/workflows/google.yml
GitHub Actions / .github/workflows/google.yml
Invalid Workflow File
You have an error in your yaml syntax on line 25
</pre>
<p>
If you commit the YAML file and there are no errors there won’t be
anything (or, if you’ve commited it before, anything new) listed in
the “Actions” tab.
</p>
</div>
</div>
<div id="outline-container-orgb78e9a4" class="outline-2">
<h2 id="orgb78e9a4">Initiating and Monitoring a Build and Push</h2>
<div class="outline-text-2" id="text-orgb78e9a4">
<p>
The initiation part is easy - simply push a tag (if you’re using the
example above) or a branch (if you switched the <code>on: push:</code> section
to <code>branch: <something></code>). Github will then start running the build
rules in the YAML file in your repository.
</p>
<p>
The monitoring is a little more involved, but not difficult - the
“Actions” tab in your Github project will list the jobs, click the
top one (the most recent one) and then, on the left, click the Job
name; if you followed the example above, it is “Setup, Build, and
Publish”, if you changed <code>jobs: <jobname>: name:</code>, it will be what
you set that to. That will open what looks a little like a terminal
window with the steps of the job in it, each one will get a
checkmark as they succeed. You can monitor the steps of the job in
real time.
</p>
<div class="figure">
<p><img src="/assets/2020-02-11-ggccr-monitoring-build.png" alt="2020-02-11-ggccr-monitoring-build.png" width="70%" />
</p>
<p><span class="figure-number">Figure 4: </span>The Github Actions monitoring window Here you can see each step with its expando-triangle; clicking the triangle will show the logs for that step of the job. On the left you can see the name of this Action (“Build and Push to GCR”) and the name of the one in the job (“Setup, Build, and Publish”)</p>
</div>
<p>
To confirm that the image is actually pushed to the Google Container
Registry for your project, you can run:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud container images list --project MyProject
</pre>
</div>
<p>
which will report the images in <code>MyProject</code>, like:
</p>
<pre class="example">
NAME
gcr.io/MyProject/MyProject
Only listing images in gcr.io/MyProject. Use --repository to list images in other repositories.
</pre>
<p>
Once you have the list of images (from the <code>NAME</code> column) you can
then run <code>container images list-tags</code>:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud container images list-tags gcr.io/MyProject/MyProject
</pre>
</div>
<p>
to see what images are available. You’ll see something that looks like:
</p>
<pre class="example">
DIGEST TAGS TIMESTAMP
e3ae68fe03b8 latest,v0.84 2020-02-11T15:14:39
b7baca4e21ec v0.83 2020-02-11T15:09:08
</pre>
<p>
which confirms that properly tagged images are being pushed to GCR.
</p>
</div>
</div>
<div id="outline-container-orgc70ffe2" class="outline-2">
<h2 id="orgc70ffe2">Using the new image in your Google Cloud Run instance</h2>
<div class="outline-text-2" id="text-orgc70ffe2">
<p>
I don’t automatically switch to the new image in my Google Cloud Run
instance in the “push” step of the CI (but more on this later). I
guess I’m old and conservative and this new-fangled CD makes me a
little nervous. Also, these graphical web pages are not my
favorite. Also, get off my lawn.
</p>
<p>
To switch to your new image you can use the Google Cloud Run web
pages, and I’ve done this to good effect before. We aren’t doing
that now, because command line tools are better (for one, they can
go into shell scripts or Github actions YAML files).
</p>
<p>
Updating the image used by a Google Cloud Run project isn’t that
complicated, there are only a few steps:
</p>
<ol class="org-ol">
<li>Make sure you have an image tagged <code>latest</code>; if you’ve gotten
this far and followed the steps, you do. You don’t <i>really</i> have
to, but you should know what tag you want to use.</li>
<li><p>
Confirm the available images
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud container images list-tags gcr.io/MyProject/MyProject
</pre>
</div></li>
<li><p>
Find out what region your Google Cloud Run project is running in:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud run services list --platform managed
</pre>
</div></li>
<li><p>
Update the image to the one currently tagged <code>latest</code> by typing:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud run deploy MyProject <span style="color: #A3BE8C;">\</span>
--platform managed <span style="color: #A3BE8C;">\</span>
--region MyRegion <span style="color: #A3BE8C;">\</span>
--image gcr.io/MyProject/MyProject:latest
</pre>
</div></li>
</ol>
<p>
These last two steps can be done in a shell script that looks like:
</p>
<div class="org-src-container">
<pre class="src src-sh"><span style="color: #616e88;">#</span><span style="color: #616e88;">!/bin/</span><span style="color: #81A1C1;">bash</span>
<span style="color: #D8DEE9;">PROJECT</span>=MyProject
<span style="color: #D8DEE9;">REGION</span>=$(<span style="color: #fa8072;">gcloud</span> run services list --platform managed --format=flattened | <span style="color: #A3BE8C;">\</span>
grep metadata.labels.cloud.googleapis.com/location | <span style="color: #A3BE8C;">\</span>
cut -d: -f2 | <span style="color: #A3BE8C;">\</span>
sed <span style="color: #A3BE8C;">'s/\s+//'</span>)
<span style="color: #81A1C1;">if</span> $(<span style="color: #fa8072;">echo</span> $<span style="color: #D8DEE9;">REGION</span> | wc -l) -gt 1; <span style="color: #81A1C1;">then</span>
<span style="color: #81A1C1;">echo</span> <span style="color: #A3BE8C;">"Using the last region in the list of regions"</span>
<span style="color: #D8DEE9;">REGION</span>=$(<span style="color: #fa8072;">echo</span> $<span style="color: #D8DEE9;">REGION</span> | tail -1)
<span style="color: #81A1C1;">fi</span>
gcloud run deploy $<span style="color: #D8DEE9;">PROJECT</span> <span style="color: #A3BE8C;">\</span>
--platform managed <span style="color: #A3BE8C;">\</span>
--region $<span style="color: #D8DEE9;">REGION</span> <span style="color: #A3BE8C;">\</span>
--image gcr.io/${<span style="color: #D8DEE9;">PROJECT</span>}/${<span style="color: #D8DEE9;">PROJECT</span>}:latest
</pre>
</div>
<p>
Such a shell script could be added as a step in the Github action
YAML file in a section that looks like:
</p>
<div class="org-src-container">
<pre class="src src-yaml">- <span style="color: #D8DEE9;">name</span>: Deploy
<span style="color: #D8DEE9;">run</span>: |
<span style="color: #A3BE8C;">PROJECT=MyProject</span>
<span style="color: #A3BE8C;"> REGION=$(gcloud run services list --platform managed --format=flattened | \</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;"> grep metadata.labels.cloud.googleapis.com/location | \</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;"> cut -d: -f2 | \</span>
<span style="color: #ff0000; font-weight: bold;"> </span> sed <span style="color: #A3BE8C;">'s/\s+//'</span>)
<span style="color: #A3BE8C;"> if $(echo $REGION | wc -l) -gt 1; then</span>
<span style="color: #ff0000; font-weight: bold;"> </span>echo <span style="color: #A3BE8C;">"Using the last region in the list of regions"</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;">REGION=$(echo $REGION | tail -1)</span>
<span style="color: #A3BE8C;"> fi</span>
<span style="color: #A3BE8C;"> gcloud run deploy $PROJECT \</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;"> --platform managed \</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;"> --region $REGION \</span>
<span style="color: #ff0000; font-weight: bold;"> </span><span style="color: #A3BE8C;"> --image gcr.io/${PROJECT}/${PROJECT}:latest</span>
</pre>
</div>
<p>
After you’ve deployed the new Docker image, you can confirm that it
is being used:
</p>
<ol class="org-ol">
<li><p>
Check that there is a new revision of the service:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud run revisions list --region MyRegion --platform managed
</pre>
</div></li>
<li><p>
Once you know the name of the new revision, check what image it
is using:
</p>
<div class="org-src-container">
<pre class="src src-sh">gcloud run revisions describe <span style="color: #A3BE8C;">\</span>
--region MyRegion <span style="color: #A3BE8C;">\</span>
--platform managed <span style="color: #A3BE8C;">\</span>
--format=json MyProject-00004-hab | <span style="color: #A3BE8C;">\</span>
grep image
</pre>
</div>
<p>
and you should see that the SHA256 hash of image, as reported in
<code>imageDigest</code>, matches that in the output of the <code>container
images list-tags</code> command.
</p></li>
</ol>
<p>
Accessing the URL of your service will now reflect the changes you
committed many steps back.
</p>
</div>
</div>
<div id="outline-container-org077ce6b" class="outline-2">
<h2 id="org077ce6b">Summary</h2>
<div class="outline-text-2" id="text-org077ce6b">
<p>
While there is a lot here, there aren’t too many steps at a
high-level:
</p>
<ol class="org-ol">
<li>Grant Github the ability to push Docker images to your Google
project’s Container Registry</li>
<li>Write some instructions in YAML for Github to follow so it builds
a Docker image and pushes it to the appropriate container
registry</li>
<li>A few manual steps to have your Google Cloud Run instance use
your new image</li>
</ol>
</div>
</div>
Plotting Functions for Kids (and Adults) on an Apple Computer2019-03-24T00:00:00+00:00http://acaird.github.io/2019/03/24/function-plotting-for-kids<p>
To plot functions on the computer, you need to install a program
called <code>gnuplot</code>. See the section below called <a href="#org653682b">Installing gnuplot</a> for
how to do that.
</p>
<p>
To start <code>gnuplot</code> you first have to open the Terminal program on your
Mac. You will be able to type into that window. You should type
<code>gnuplot</code> and press Return. Then you’ll see:
</p>
<pre class="example">
G N U P L O T
Version 5.2 patchlevel 6 last modified 2019-01-01
Copyright (C) 1986-1993, 1998, 2004, 2007-2018
Thomas Williams, Colin Kelley and many others
gnuplot home: http://www.gnuplot.info
faq, bugs, etc: type "help FAQ"
immediate help: type "help" (plot window: hit 'h')
Terminal type is now 'qt'
gnuplot>
</pre>
<p>
Now you need to do a few things to set up the plot. To turn on grid
lines, type <code>set grid</code> and press Return. It will look like:
</p>
<pre class="example">
gnuplot> set grid
gnuplot>
</pre>
<p>
Next you need to define your functions. Their names can be anything
you want and they are what will appear in the legend, so chose
something that makes sense (it is possible to change the legend, but
it is easier to let <code>gnuplot</code> chose it for you. For example, we will
use the names <code>noshrink</code> and <code>mighty</code> from a recent math homework.
Formulae in <code>gnuplot</code> are not represented with <code>x</code> and <code>y</code>, but with
<code>f(x)</code> and <code>x</code>, where <code>f(x)</code> is just like <code>y</code> and the <code>f</code> part can be
any name. This will look like:
</p>
<pre class="example">
gnuplot> noshrink(x) = 4.5 * x
gnuplot> mighty(x) = x + 45
gnuplot>
</pre>
<p>
and this is the same as \(y = 4.5 \times x\) for <code>noshrink</code> and
\(y = x + 45\) for <code>mighty</code>.
</p>
<p>
At this point, you have done three things:
</p>
<ol class="org-ol">
<li>Turned on the grid in your plot</li>
<li>Defined the formula for <code>noshrink</code></li>
<li>Defined the formula for <code>mighty</code></li>
</ol>
<p>
You can define as few as one formula or as many as you want.
</p>
<p>
To plot your formulae and see them on the screen, you need to know
what the range of <code>x</code> values are that you want to use. In the example
below we have chosen a range from 0 to 60 by typing <code>x=[0:60]</code> as part
of the <code>plot</code> command:
</p>
<pre class="example">
gnuplot> plot [x=0:60] noshrink(x), mighty(x)
gnuplot>
</pre>
<p>
and you will see a window pop up on the screen that looks like the
picture in Figure <a href="#org41b03d6">1</a>.
</p>
<div id="org41b03d6" class="figure">
<p><img src="/assets/2019-03-24-first-plot.png" alt="There is a lovely
plot here you
aren't seeing." />
</p>
<p><span class="figure-number">Figure 1: </span>A sample plot</p>
</div>
<p>
If the plot looks right, you can save it to a PDF file for printing or
emailing. Saving a plot to a PDF is a few more steps that you have to
type at the <code>gnuplot></code> prompt. You need to set the output type (PDF)
and the name of the output file and then make the plot again and
finally set the output back to the screen:
</p>
<pre class="example">
gnuplot> set term pdf
Terminal type is now 'pdfcairo'
Options are ' transparent enhanced fontscale 0.5 size 5.00in, 3.00in '
gnuplot> set output "my-plot.pdf"
gnuplot> replot
gnuplot> unset term
Terminal type is now 'qt'
gnuplot>
</pre>
<p>
You can change the name of the file from <code>my-plot.pdf</code> to anything you
want; it’s best to end it with <code>.pdf</code> though. You can open the PDF
file with Preview and print it if you want. To open the file in
Preview without exiting <code>gnuplot</code> you can type <code>! open my-plot.pdf</code>
(or whatever the name of your files is) at the <code>gnuplot></code> prompt,
like:
</p>
<pre class="example">
gnuplot> !open my-plot.pdf
gnuplot>
</pre>
<p>
and Preview will open and show you your file.
</p>
<p>
When you’re done, you can exit <code>gnuplot</code> by typing <code>exit</code> and pressing
Return.
</p>
<div id="outline-container-org653682b" class="outline-2">
<h2 id="org653682b">Installing gnuplot</h2>
<div class="outline-text-2" id="text-org653682b">
<p>
You can install <code>gnuplot</code> two ways. The best way is to first
install <code>brew</code> by going to <a href="https://brew.sh/">https://brew.sh/</a> and following the
instructions; you will have to open the Terminal program to do this.
When <code>brew</code> is installed, in Terminal (it can be the same one or a
new one) type <code>brew install gnuplot</code> and wait for it to install.
Then you can follow the instructions above.
</p>
<p>
You can also go to <a href="http://www.gnuplot.info/">http://www.gnuplot.info/</a> and follow the
instructions there to install it. It is really easier to install it
using <code>brew</code>, though. If you aren’t using a Mac, you can still
install <code>gnuplot</code> and follow most of the instructions here, but some
things may look or act slightly different.
</p>
</div>
</div>
<div id="outline-container-orgaac2e49" class="outline-2">
<h2 id="orgaac2e49">Why would I want to do this?</h2>
<div class="outline-text-2" id="text-orgaac2e49">
<p>
I don’t know, I guess you want to plot functions without buying
software. You’ve read this far, though, so I can tell you why <i>I</i>
did this. My daughter is learning about functions in Math and as
part of their homework they have to plot them. She has a graphing
calculator and graph paper, but computers are better at making
useful plots than either of those things. <code>gnuplot</code> is free and
straightforward to use (and, if you start using it more, it can be
<b>very</b> powerful, so it’s not a bad thing to learn about) and runs on
Macs and Windows and Linux and probably many other kinds of
computers, so it was a better option than buying software. Also,
public school teachers with tiny budgets might find this helpful for
making their lessons or quizzes or examples or whatever.
</p>
<p>
There are many, many resources on the Internet on using Gnuplot.
There are also some good books about it, like <a href="https://www.manning.com/books/gnuplot-in-action-second-edition">Gnuplot in Action</a> from
<a href="http://www.manning.com">Manning Publications</a> (I don’t benefit from these links, I’m just a
satisfied customer).
</p>
</div>
</div>
A Simple Web App in Python, Flask, Docker, Kubernetes, Microsoft Azure, and GoDaddy2019-02-11T00:00:00+00:00http://acaird.github.io/2019/02/11/docker-web-app-with-azure<p>
You’re talking with your wife one d<a href="https://www.letterkenny.tv/">a</a>aayyy… and realize that you
could improve her life by letting her create printable calendars and
importable ICS files for events she has to schedule. And that should
be on the web, because while your wife is very smart, she is not very
technical.
</p>
<p>
The problem is that we have a simple input and from a web site we want
printable PDF calendars and an <a href="https://en.wikipedia.org/wiki/ICalendar">iCalendar or ICS</a> file that can be
imported into calendar programs.
</p>
<p>
The input is just a column of dates and a column of events. An
example of it for scheduling people to work looks like:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<tbody>
<tr>
<td class="org-right">2019-02-01</td>
<td class="org-left">Alicia, Michelle, Dolly</td>
</tr>
<tr>
<td class="org-right">2019-02-02</td>
<td class="org-left">Jim, Stephen, Ian, and Bernard</td>
</tr>
<tr>
<td class="org-right">2019-02-03</td>
<td class="org-left">Wayne, Daryl, Katy</td>
</tr>
<tr>
<td class="org-right">2019-02-04</td>
<td class="org-left">Alicia, Jim, Wayne</td>
</tr>
<tr>
<td class="org-right">2019-02-05</td>
<td class="org-left">closed</td>
</tr>
<tr>
<td class="org-right">2019-02-06</td>
<td class="org-left">Michelle, Stephen, Daryl</td>
</tr>
<tr>
<td class="org-right">2019-02-07</td>
<td class="org-left">Dolly, Bernard, Katy</td>
</tr>
</tbody>
</table>
<p>
We will start with a Python program that can process that into
printable PDF calendars and iCalendar files, then turn that into a web
application using Flask. Once we have a Flask application, we will
package that into a Docker container and publish it to the Docker
Hub so it can be read by our cloud provider’s environment. I chose to
use Microsoft’s Azure cloud environment, which uses Kubernetes to
orchestrate the environment needed to instantiate our Docker container
and get web traffic to it. I had a domain at GoDaddy that I wanted to
use for this website, and that is the final step.
</p>
<p>
The Github project for this is at:
<a href="https://github.com/acaird/xlscal-to-pdf-ics">https://github.com/acaird/xlscal-to-pdf-ics</a>
</p>
<div id="outline-container-orgf2a8612" class="outline-2">
<h2 id="orgf2a8612">Python</h2>
<div class="outline-text-2" id="text-orgf2a8612">
<p>
Happily for us, Python can read that file as a <code>.csv</code> file (<a href="https://docs.python.org/3/library/csv.html">csv</a>) or
an Excel (<code>.xlsx</code>) file (<a href="https://github.com/python-excel/xlrd">xlrd</a>). Even more happily, it can output
PDF files using <a href="https://bitbucket.org/rptlab/reportlab/src/default/">ReportLab</a> and iCalendar files using <a href="https://icalendar.readthedocs.io/en/latest/">icalendar</a>.
</p>
<p>
The function for generating a PDF calendar looks like:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #81A1C1;">def</span> <span style="color: #88C0D0;">make_pdf_cals</span>(events):
<span style="color: #D8DEE9;">buf</span> = BytesIO()
<span style="color: #D8DEE9;">stylesheet</span> = getSampleStyleSheet()
<span style="color: #D8DEE9;">doc</span> = SimpleDocTemplate(buf, pagesize=letter)
<span style="color: #D8DEE9;">doc.pagesize</span> = landscape(letter)
<span style="color: #D8DEE9;">elements</span> = []
<span style="color: #D8DEE9;">months</span> = <span style="color: #81A1C1;">set</span>([d.month <span style="color: #81A1C1;">for</span> d <span style="color: #81A1C1;">in</span> events])
<span style="color: #D8DEE9;">years</span> = <span style="color: #81A1C1;">set</span>([d.year <span style="color: #81A1C1;">for</span> d <span style="color: #81A1C1;">in</span> events])
<span style="color: #81A1C1;">for</span> year <span style="color: #81A1C1;">in</span> years:
<span style="color: #81A1C1;">for</span> month <span style="color: #81A1C1;">in</span> months:
elements.append(
Paragraph(
<span style="color: #A3BE8C;">"{} {}"</span>.<span style="color: #81A1C1;">format</span>(calendar.month_name[month], year),
stylesheet[<span style="color: #A3BE8C;">"Title"</span>],
)
)
<span style="color: #D8DEE9;">cal</span> = [[<span style="color: #A3BE8C;">"Mon"</span>, <span style="color: #A3BE8C;">"Tue"</span>, <span style="color: #A3BE8C;">"Wed"</span>, <span style="color: #A3BE8C;">"Thu"</span>, <span style="color: #A3BE8C;">"Fri"</span>, <span style="color: #A3BE8C;">"Sat"</span>, <span style="color: #A3BE8C;">"Sun"</span>]]
cal.extend(calendar.monthcalendar(year, month))
<span style="color: #D8DEE9;">cal</span> = fill_cal(cal, month, year, events)
<span style="color: #D8DEE9;">table</span> = Table(cal, 7 * [1.25 * inch], <span style="color: #81A1C1;">len</span>(cal) * [0.8 * inch])
table.setStyle(
TableStyle(
[
(<span style="color: #A3BE8C;">"FONT"</span>, (0, 0), (-1, -1), <span style="color: #A3BE8C;">"Helvetica"</span>),
(<span style="color: #A3BE8C;">"FONT"</span>, (0, 0), (-1, 0), <span style="color: #A3BE8C;">"Helvetica-Bold"</span>),
(<span style="color: #A3BE8C;">"FONTSIZE"</span>, (0, 0), (-1, -1), 8),
(<span style="color: #A3BE8C;">"INNERGRID"</span>, (0, 0), (-1, -1), 0.25, colors.black),
(<span style="color: #A3BE8C;">"BOX"</span>, (0, 0), (-1, -1), 0.25, colors.green),
(<span style="color: #A3BE8C;">"ALIGN"</span>, (0, 0), (-1, -1), <span style="color: #A3BE8C;">"LEFT"</span>),
(<span style="color: #A3BE8C;">"VALIGN"</span>, (0, 0), (-1, -1), <span style="color: #A3BE8C;">"TOP"</span>),
]
)
)
elements.append(table)
elements.append(PageBreak())
doc.build(elements)
<span style="color: #D8DEE9;">pdf</span> = buf.getvalue()
<span style="color: #81A1C1;">return</span> pdf
</pre>
</div>
<p>
and the function for generating the iCalendar file looks like:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #81A1C1;">def</span> <span style="color: #88C0D0;">make_ics</span>(events):
<span style="color: #D8DEE9;">ical</span> = Calendar()
<span style="color: #81A1C1;">for</span> m <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">sorted</span>(events):
<span style="color: #D8DEE9;">event</span> = Event()
event.add(<span style="color: #A3BE8C;">"summary"</span>, events[m])
event.add(<span style="color: #A3BE8C;">"dtstart"</span>, m.date())
ical.add_component(event)
<span style="color: #81A1C1;">return</span> ical.to_ical(ical)
</pre>
</div>
<p>
Once you have that, then it’s a matter of making it into a <a href="http://flask.pocoo.org/">Flask</a>
app; there are lots of tutorials on the web about that and you can
see the details in the final result.
</p>
</div>
</div>
<div id="outline-container-org1c75cb2" class="outline-2">
<h2 id="org1c75cb2">Docker</h2>
<div class="outline-text-2" id="text-org1c75cb2">
<p>
Once you have a Flask app, it needs to be packaged with everything
it needs to run so it is portable, and <a href="https://www.docker.com/">Docker</a> is a good way to do
that.
</p>
<p>
Once you have Docker installed on your system, it’s a simply matter
to create a Docker image.
</p>
<ol class="org-ol">
<li><p>
Create a file called <code>Dockerfile</code> that contains a Python/Flask
image from the Docker Hub and adds what you need for your
application. Ours looks like:
</p>
<pre class="example">
FROM tiangolo/uwsgi-nginx-flask:python3.7
RUN pip install xlrd
RUN pip install python-dateutil
RUN pip install reportlab
RUN pip install icalendar
RUN pip install flask_bootstrap
COPY ./app /app
</pre>
<p>
Where we used Sebastián Ramírez’s image from
<a href="https://github.com/tiangolo/uwsgi-nginx-flask-docker">https://github.com/tiangolo/uwsgi-nginx-flask-docker</a> to start,
then added the Python packages we need.
</p>
<p>
Lastly we copied in the contents of our <code>app</code> directory, which
contains our Python script, called <code>app.py</code> and a <code>templates</code>
directory with some Flask templates.
</p></li>
<li><p>
Build the Docker image on your computer:
</p>
<pre class="example">
docker build -t acaird/xls2cal .
</pre>
<p>
but change the <code>acaird</code> to your own Docker Hub name (I think),
and don’t forget the <code>.</code> at the end so it knows to read the
<code>Dockerfile</code> from the current directory.
</p></li>
<li><p>
Run your Docker image locally:
</p>
<pre class="example">
docker run --rm -d --name xls2cal -p 80:80 acaird/xls2cal
</pre>
<p>
changing the tag at the end to match the tag you used in step 2.
The options are:
</p>
<dl class="org-dl">
<dt><code>run</code></dt><dd>runs a Docker container</dd>
<dt><code>--rm</code></dt><dd>removes the container after it exits</dd>
<dt><code>-d</code></dt><dd>runs the container in the background, without this the
<code>docker run</code> command will wait in your shell</dd>
<dt><code>--name xls2cal</code></dt><dd>gives the container a name; Docker will
assign one if you don’t give it one, but it is convenient to
have one</dd>
<dt><code>-p 80:80</code></dt><dd>sets the port on the localhost (your computer) to
80 and sends traffic to port 80 in the Docker
container; in our case, this is where <code>nginx</code> is
listening to send our requests to <code>uwsgi</code> which
will send them to our Flask application</dd>
<dt><code>acaird/xls2cal</code></dt><dd>is the tag of the image that is to be
started in the container</dd>
</dl></li>
<li>Use your web browser to go to <code>http://localhost</code> and make sure
your web application is working in its containerized environment</li>
<li><p>
Once everything is working, create yourself a free Docker Hub
account at <a href="https://hub.docker.com/">https://hub.docker.com/</a> and log in to it from your
computer by typing:
</p>
<pre class="example">
docker login
</pre>
<p>
then push your image to the Docker hub:
</p>
<pre class="example">
docker push acaird/xls2cal
</pre></li>
</ol>
</div>
</div>
<div id="outline-container-org35bc129" class="outline-2">
<h2 id="org35bc129">Azure, with a brief detour to Kubernetes</h2>
<div class="outline-text-2" id="text-org35bc129">
<p>
Now you have a functioning container that will run anywhere that
supports Docker images, and you need a place to run it.
</p>
<p>
The canonical cloud provider is Amazon Web Services (<a href="http://aws.amazon.com">AWS</a>) and, for
containers spefically, <a href="https://aws.amazon.com/fargate/">Fargate</a>, but I couldn’t follow their
documentation immediately, so I moved on.
</p>
<p>
The second cloud provider that might come to mind is Google and
their <a href="https://cloud.google.com/kubernetes-engine/">Kubernetes Engine</a>; <a href="https://cloud.google.com/cloud-build/docs/quickstart-docker">Google’s</a> <a href="https://console.cloud.google.com/kubernetes">documentation</a> was good, but when I
tried following their example, there were no resources:
</p>
<pre class="example">
ERROR: (gcloud.container.clusters.create) Operation [<Operation clusterConditions: [<StatusCondition code: CodeValueValuesEnum(GCE_STOCKOUT, 1)
message: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-b.'>]
detail: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-b.'
endTime: u'2019-02-07T01:52:24.015219227Z'
name: u'operation-1549504333886-880ea104'
nodepoolConditions: []
operationType: OperationTypeValueValuesEnum(CREATE_CLUSTER, 1)
selfLink: u'https://container.googleapis.com/v1/projects/180749766837/zones/us-central1-b/operations/operation-1549504333886-880ea104'
startTime: u'2019-02-07T01:52:13.886673043Z'
status: StatusValueValuesEnum(DONE, 3)
statusMessage: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-b.'
targetLink: u'https://container.googleapis.com/v1/projects/180749766837/zones/us-central1-b/clusters/xlsx-cal'
zone: u'us-central1-b'>] finished with error: Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-b.
</pre>
<p>
on searching the Internet for this error (ok, I Googled it), I found
that this is not uncommon. Being impatient, I moved on.
</p>
<p>
While Microsoft is a behemoth, they are newer to the cloud world
than Amazon or Google, so they were my third choice and who I ended
up using.
</p>
<p>
I started with their documentation <a href="https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough">Quickstart: Deploy an Azure
Kubernetes Service (AKS) cluster using the Azure CLI</a>. After
creating the account as instructed on that page, I installed the
command line tool for working with Azure cloud resources called <code>az</code>
(I do not like graphical interfaces). On my Mac, I use <a href="http://brew.sh">Brew</a> to
install software, and the Azure command line client is available
from there (for what it’s worth, I couldn’t find Google’s CLI
there), and typing:
</p>
<pre class="example">
$ brew info azure-cli
azure-cli: stable 2.0.57 (bottled), HEAD
Microsoft Azure CLI 2.0
https://docs.microsoft.com/cli/azure/overview
/usr/local/Cellar/azure-cli/2.0.57 (22,407 files, 100.5MB) *
Poured from bottle on 2019-02-07 at 06:45:38
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/azure-cli.rb
</pre>
<p>
gives me a useful link to the docs and typing <code>brew install
azure-cli</code> installed the <code>az</code> command for me.
</p>
<p>
I then followed their instructions and ran parts of the test
environment. I didn’t do all of it, because that seemed like a
lot. Also, I had my own Docker image ready and waiting.
</p>
<p>
I stripped down their Kubernetes manifest file to a simpler one that
looks like:
</p>
<pre class="example">
apiVersion: apps/v1
kind: Deployment
metadata:
name: xls2cal
spec:
replicas: 1
selector:
matchLabels:
app: xls2cal
template:
metadata:
labels:
app: xls2cal
spec:
containers:
- name: xls2cal
image: acaird/xls2cal
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: xls2cal
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: xls2cal
</pre>
<p>
This Kubernetes manifest has two main parts: the Deployment, which
describes the container and assigns a label (<code>app: xls2cal</code>) to it
and an Service, which exposes port 80 to a LoadBalancer, which is
provided by Azure. What you really care about is the line that
says:
</p>
<pre class="example">
image: acaird/xls2cal
</pre>
<p>
because that tells Kubernetes what Docker image to run.
</p>
<p>
The general steps once you have the Kubernetes manifest above and
have adjusted it to run your container are:
</p>
<ol class="org-ol">
<li><p>
Log in to Azure using the Azure command line interface (CLI)
</p>
<pre class="example">
az login
</pre></li>
<li><p>
Create a group:
</p>
<pre class="example">
az group create --name acaird-xls2cal --location eastus
</pre></li>
<li><p>
Create a single-node Kubernetes cluster:
</p>
<pre class="example">
az aks create \
--resource-group acaird-xls2cal \
--name xls2cal \
--node-count 1 \
--generate-ssh-keys
</pre></li>
<li><p>
Add the credentials for that Kubernetes cluster to your local
<code>kubectl</code> configuration:
</p>
<pre class="example">
az aks get-credentials --resource-group acaird-xls2cal --name xls2cal
</pre></li>
<li><p>
Check that you have a node, and learn a little about it:
</p>
<pre class="example">
kubectl get nodes -o wide
</pre></li>
<li><p>
Confirm that your <code>kubectl</code> is pointed at the correct context:
</p>
<pre class="example">
kubectl config get-contexts
</pre></li>
<li><p>
Send the Kubernetes manifest YAML file to the cluster:
</p>
<pre class="example">
kubectl apply -f xls2cal.yaml
</pre></li>
<li><p>
Start checking the state of what you have created:
</p>
<pre class="example">
kubectl get pods,svc -o wide
</pre>
<p>
after a while you should see the column <code>EXTERNAL-IP</code> go from
saying <code><pending></code> to being populated with an IP address
</p></li>
<li>Open your web browser to <code>http://YourAssignedExternalIP/</code> and you
should see your web application.</li>
<li>Do a little dance of thanks and amazement.</li>
</ol>
</div>
</div>
<div id="outline-container-org47f9bd3" class="outline-2">
<h2 id="org47f9bd3">DNS with GoDaddy</h2>
<div class="outline-text-2" id="text-org47f9bd3">
<p>
I have my domain registered at GoDaddy, and they will also provide
DNS services for hostnames in that domain.
</p>
<p>
Looking at the Azure tutorial titled <a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-web-tutorial-custom-domain">Map an existing custom DNS name
to Azure App Service</a> the example in the <a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-web-tutorial-custom-domain#access-dns-records-with-domain-provider">Access DNS records with
domain provider</a> is the same as GoDaddy, which was very convenient.
</p>
<p>
To add a hostname at GoDaddy simply go to
<a href="https://dcc.godaddy.com/manage/YOURDOMAIN.TLD/dns">https://dcc.godaddy.com/manage/YOURDOMAIN.TLD/dns</a> (of course,
replacing <code>YOURDOMAIN.TLD</code> with your actually domain, like
<code>mysweetdomain.biz</code> or whatever) click “Add”, select “A” as the type
(an <a href="https://support.dnsimple.com/articles/a-record/">A record</a>), type in the hostname of your choice, and the IP
address from above.
</p>
<p>
If you are using MacOS or Linux, you can type:
</p>
<pre class="example">
dig hostname.mysweetdomain.biz
</pre>
<p>
and you will see GoDaddy’s DNS servers respond with the hostname you
configured and the IP address that was assigned above.
</p>
<p>
Now you can point your web browser at
<a href="http://hostname.mysweetdomain.com">http://hostname.mysweetdomain.com</a> and you’ll see your web
application.
</p>
</div>
</div>
<div id="outline-container-orgc720895" class="outline-2">
<h2 id="orgc720895">Security</h2>
<div class="outline-text-2" id="text-orgc720895">
<p>
You should do your best to make sure your application is secure - it
was only a few minutes after mine was live that it was immediately
probed by an <a href="https://nmap.org/"><code>nmap</code></a> scan.
</p>
</div>
</div>
<div id="outline-container-org0bd6507" class="outline-2">
<h2 id="org0bd6507">Summary</h2>
<div class="outline-text-2" id="text-org0bd6507">
<p>
You have to know a little bit about a lot of technologies to get all
of this going, but none of it is rocket science, and it is all
pretty independent and each one is a useful skill to know on its
own:
</p>
<ul class="org-ul">
<li>Learn some Python</li>
<li>Learn about the Python Flask library</li>
<li>Learn a little about Docker</li>
<li>Learn a little about Kubernetes</li>
<li>Get a Microsoft Azure account</li>
<li>Buy a domain name</li>
<li>Follow the Azure documentation links and tie it all together</li>
</ul>
<p>
And you’ll get a web page:
<img src="/assets/webpage.png" alt="webpage.png" />
</p>
<p>
And some PDF calendars:
<img src="/assets/pdfCal.png" alt="pdfCal.png" />
</p>
</div>
</div>
Dotplots with Matplotlib2016-06-24T00:00:00+00:00http://acaird.github.io/2016/06/24/dotplots-with-matplotlib<p>
I like the dotplots that R + ggplot2 can make. There are lots of
examples of this on the Internet. At least one is at <a href="http://www.r-bloggers.com/summarising-data-using-dot-plots/">r-bloggers</a>, but
Python is useful for many reasons, so I want to make a decent looking,
<a href="https://en.wikipedia.org/wiki/Chartjunk">chartjunk</a>-free dotplot using matplotlib.
</p>
<p>
Dotplots are a much better choice than pie charts for representing
most data and can take the place of most bar charts and present a much
cleaner looking graphic. Most bar charts do not use the width or area
of the bar to represent anything, so the size of the bar is, at best,
chart junk, or, at worst, misleading.
</p>
<p>
Making dotplots using Python and matplotlib is not well documented
that I could find, so I figured it out myself with the help of many
Google results.
</p>
<p>
Some sample data is in Table <a href="#orgtable1">1</a>. If columns for your data are
flipped, change the point arrays (<code>p</code>) in the code that are indexed to
<code>[0]</code> to <code>[1]</code> and vice versa.
</p>
<table id="orgtable1" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption class="t-above"><span class="table-number">Table 1:</span> Fruit!</caption>
<colgroup>
<col class="org-right" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-right">Count</th>
<th scope="col" class="org-left">Type</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-right">10</td>
<td class="org-left">apple</td>
</tr>
<tr>
<td class="org-right">7</td>
<td class="org-left">pear</td>
</tr>
<tr>
<td class="org-right">2</td>
<td class="org-left">avocado</td>
</tr>
<tr>
<td class="org-right">8</td>
<td class="org-left">orange</td>
</tr>
<tr>
<td class="org-right">4</td>
<td class="org-left">peach</td>
</tr>
</tbody>
</table>
<div class="figure">
<p><object type="image/svg+xml" data="/assets/hbarplot.svg" >
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
The following code will produce the plot above. Hopefully the
comments will help.
</p>
<div class="org-src-container">
<pre class="src src-python" id="orgsrcblock1"><span style="color: #a020f0;">import</span> matplotlib.pyplot <span style="color: #a020f0;">as</span> plt
<span style="color: #a0522d;">d</span> = fruit
<span style="color: #b22222;"># </span><span style="color: #b22222;">sort the data</span>
<span style="color: #a0522d;">d</span> = <span style="color: #483d8b;">sorted</span>(d, reverse=<span style="color: #008b8b;">False</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">Get the plot aspect right for thinner bars that aren't too spread out</span>
<span style="color: #a0522d;">fig</span>, <span style="color: #a0522d;">ax</span> = plt.subplots(figsize=(12,2.5))
<span style="color: #b22222;"># </span><span style="color: #b22222;">Create the bars</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">The parameters are:</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the number of bars for the y-axis</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the values from the first column of data</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the width of the bars out to the points</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- color = the color of the bars</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- edgecolor = the color of the bars' borders</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- alpha = the transparency of the bars</span>
<span style="color: #a0522d;">bars</span> = ax.barh(<span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(d)), [p[0] <span style="color: #a020f0;">for</span> p <span style="color: #a020f0;">in</span> d], 0.001,
color=<span style="color: #8b2252;">"lightgray"</span>, edgecolor=<span style="color: #8b2252;">"lightgray"</span>, alpha=0.4)
<span style="color: #b22222;"># </span><span style="color: #b22222;">Create the points using normal x-y scatter coordinates</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">The parameters are:</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the x values from the first column of the data</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the y values, which are just the indices of the data</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">- the size of the points</span>
<span style="color: #a0522d;">points</span> = ax.scatter([p[0] <span style="color: #a020f0;">for</span> p <span style="color: #a020f0;">in</span> d], <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(d)), s=30)
<span style="color: #b22222;"># </span><span style="color: #b22222;">Create the ytic locations centered on the bars</span>
<span style="color: #a0522d;">yticloc</span> = []
[yticloc.append(bar.get_y() + bar.get_height()/2.) <span style="color: #a020f0;">for</span> bar <span style="color: #a020f0;">in</span> bars]
<span style="color: #b22222;"># </span><span style="color: #b22222;">Turn off all of the borders</span>
ax.spines[<span style="color: #8b2252;">'top'</span>].set_visible(<span style="color: #008b8b;">False</span>)
ax.spines[<span style="color: #8b2252;">'bottom'</span>].set_visible(<span style="color: #008b8b;">False</span>)
ax.spines[<span style="color: #8b2252;">'right'</span>].set_visible(<span style="color: #008b8b;">False</span>)
ax.spines[<span style="color: #8b2252;">'left'</span>].set_visible(<span style="color: #008b8b;">False</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">set all of the ticks to 0 length</span>
ax.tick_params(axis=u<span style="color: #8b2252;">'both'</span>, which=u<span style="color: #8b2252;">'both'</span>,length=0)
<span style="color: #b22222;"># </span><span style="color: #b22222;">set the tic locations and labels</span>
ax.set_yticks(yticloc)
ax.set_yticklabels([p[1] <span style="color: #a020f0;">for</span> p <span style="color: #a020f0;">in</span> d])
<span style="color: #b22222;"># </span><span style="color: #b22222;">set the x- and y-axis limits a little bigger so things look nice</span>
ax.set_xlim([0,<span style="color: #483d8b;">max</span>([p[0] <span style="color: #a020f0;">for</span> p <span style="color: #a020f0;">in</span> d])+1.1])
ax.set_ylim([-0.7,<span style="color: #483d8b;">len</span>(d)])
<span style="color: #b22222;"># </span><span style="color: #b22222;">Turn on the X (vertical) gridlines</span>
ax.xaxis.grid(<span style="color: #008b8b;">True</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">Re-wrap the figure so everything fits</span>
plt.tight_layout(<span style="color: #008b8b;">True</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">Save the figure</span>
<span style="color: #a0522d;">filename</span> = <span style="color: #8b2252;">"hbarplot.svg"</span>
plt.savefig(filename)
<span style="color: #b22222;"># </span><span style="color: #b22222;">this is for org-mode, in general it produces a Python error</span>
<span style="color: #a020f0;">return</span> filename
</pre>
</div>
Event Density Plot2016-06-18T00:00:00+00:00http://acaird.github.io/computer/python/datascience/2016/06/18/event-density<p>
I want to visualize how many concurrent events exist in a time period
along with how frequently they start and end. I don’t need to read
numbers off the visualization, I just want to get a relative sense of
how many events are starting, ongoing, and ending over a time period
with some resolution. Something that looks like this:
</p>
<div class="figure">
<p><object type="image/svg+xml" data="/assets/edplot.svg" >
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
Looking at the plot, you can immediately see when:
</p>
<ul class="org-ul">
<li>the most events were starting (about in the middle of the time
range)</li>
<li>the most events were happening (about in the first third of the time range)</li>
<li>the most events were ending (about at the end of the first third of
the time range).</li>
</ul>
<p>
With that information the reader can ask the next questions in more
useful ways:
</p>
<ul class="org-ul">
<li>“why did we stop starting events about half way through the time
range?”</li>
<li>“why did we stop so many events after the first third of the time
range?”</li>
<li>“why was nothing at all happening for the last 5–10% of the time
range?”</li>
</ul>
<p>
Those questions aren’t about the data directly, but about the
application of the data, which is what data are for (despite people
loving it for its own sake sometimes) and they aren’t obvious from the
input data (Table <a href="#org34a87d8">1</a>).
</p>
<div id="outline-container-orgca660d7" class="outline-2">
<h2 id="orgca660d7">Practice Data</h2>
<div class="outline-text-2" id="text-orgca660d7">
<p>
To start, I create some fake data with this Python script where all
time is between 1 and 100, there are 20 events, and the longest event
duration is 30. If it helps you can think of these numbers as seconds
after 4:15am on Thursday, June 16th, 2016. Or days after January
1st, 2000. It doesn’t matter.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #81A1C1;">import</span> random
<span style="color: #81A1C1;">from</span> tabulate <span style="color: #81A1C1;">import</span> tabulate
<span style="color: #D8DEE9;">data</span> = []
<span style="color: #81A1C1;">for</span> m <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">range</span>(1,20):
<span style="color: #D8DEE9;">start</span> = random.randint(1,70)
<span style="color: #D8DEE9;">end</span> = start + random.randint(1,30)
data.append((start,end))
data.sort()
<span style="color: #81A1C1;">print</span> tabulate(data, tablefmt=<span style="color: #A3BE8C;">"orgtbl"</span>, headers=([<span style="color: #A3BE8C;">"Start"</span>,<span style="color: #A3BE8C;">"End"</span>]))
</pre>
</div>
<table id="org34a87d8" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption class="t-above"><span class="table-number">Table 1:</span> Sample Event Start/End Data</caption>
<colgroup>
<col class="org-right" />
<col class="org-right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-right">Start</th>
<th scope="col" class="org-right">End</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-right">6</td>
<td class="org-right">11</td>
</tr>
<tr>
<td class="org-right">7</td>
<td class="org-right">27</td>
</tr>
<tr>
<td class="org-right">8</td>
<td class="org-right">35</td>
</tr>
<tr>
<td class="org-right">10</td>
<td class="org-right">11</td>
</tr>
<tr>
<td class="org-right">13</td>
<td class="org-right">37</td>
</tr>
<tr>
<td class="org-right">14</td>
<td class="org-right">35</td>
</tr>
<tr>
<td class="org-right">22</td>
<td class="org-right">34</td>
</tr>
<tr>
<td class="org-right">24</td>
<td class="org-right">36</td>
</tr>
<tr>
<td class="org-right">28</td>
<td class="org-right">51</td>
</tr>
<tr>
<td class="org-right">31</td>
<td class="org-right">59</td>
</tr>
<tr>
<td class="org-right">33</td>
<td class="org-right">34</td>
</tr>
<tr>
<td class="org-right">36</td>
<td class="org-right">47</td>
</tr>
<tr>
<td class="org-right">36</td>
<td class="org-right">58</td>
</tr>
<tr>
<td class="org-right">42</td>
<td class="org-right">51</td>
</tr>
<tr>
<td class="org-right">42</td>
<td class="org-right">51</td>
</tr>
<tr>
<td class="org-right">44</td>
<td class="org-right">66</td>
</tr>
<tr>
<td class="org-right">53</td>
<td class="org-right">74</td>
</tr>
<tr>
<td class="org-right">69</td>
<td class="org-right">95</td>
</tr>
<tr>
<td class="org-right">69</td>
<td class="org-right">96</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-org6697e9c" class="outline-2">
<h2 id="org6697e9c">Organizing the Data</h2>
<div class="outline-text-2" id="text-org6697e9c">
<p>
The next step is to see how many events are active, starting, and
ending at each time over all time (1–100 in our case).
</p>
<p>
This next bit of Python simply bins the data from the table above into
our 100 example time bins, which I won’t make you read through, but
you’ll need to bin your data in a similar way. The format of the data
is:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<tbody>
<tr>
<td class="org-left">Time</td>
<td class="org-left">Number of Events</td>
<td class="org-left">Number of Events</td>
<td class="org-left">Number of Events</td>
</tr>
<tr>
<td class="org-left"> </td>
<td class="org-left"><b>Ending</b> at this time</td>
<td class="org-left"><b>Ongoing</b> at this time</td>
<td class="org-left"><b>Starting</b> at this time</td>
</tr>
</tbody>
</table>
<p>
For example, if the frequency of your events is a few every minute,
your binned data might look like:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-right">Time</th>
<th scope="col" class="org-right">Ending</th>
<th scope="col" class="org-right">Ongoing</th>
<th scope="col" class="org-right">Starting</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-right">13:50</td>
<td class="org-right">4</td>
<td class="org-right">10</td>
<td class="org-right">3</td>
</tr>
<tr>
<td class="org-right">13:51</td>
<td class="org-right">2</td>
<td class="org-right">11</td>
<td class="org-right">1</td>
</tr>
<tr>
<td class="org-right">13:52</td>
<td class="org-right">0</td>
<td class="org-right">12</td>
<td class="org-right">4</td>
</tr>
<tr>
<td class="org-right">13:53</td>
<td class="org-right">8</td>
<td class="org-right">8</td>
<td class="org-right">2</td>
</tr>
<tr>
<td class="org-right">13:54</td>
<td class="org-right">1</td>
<td class="org-right">9</td>
<td class="org-right">4</td>
</tr>
</tbody>
</table>
<p>
although, since there is no data displayed for the x-axis (the time),
it is a lot easier to convert the time into relative time. In this
example, the times could be 49800, 49860, 49920, etc. Or if you have
a date, using the <a href="https://en.wikipedia.org/wiki/Unix_time">Unix epoch time</a> (seconds since 00:00:00 UTC 1
January 1970) makes things easy.
</p>
<div class="org-src-container">
<pre class="src src-python">
<span style="color: #D8DEE9;">timebin</span> = <span style="color: #81A1C1;">dict</span>()
<span style="color: #D8DEE9;">startbin</span> = <span style="color: #81A1C1;">dict</span>()
<span style="color: #D8DEE9;">endbin</span> = <span style="color: #81A1C1;">dict</span>()
<span style="color: #81A1C1;">for</span> timeincr <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">range</span>(1, 101):
<span style="color: #D8DEE9;">timebin</span>[timeincr] = 0
<span style="color: #D8DEE9;">startbin</span>[timeincr] = 0
<span style="color: #D8DEE9;">endbin</span>[timeincr] = 0
<span style="color: #81A1C1;">for</span> s, e <span style="color: #81A1C1;">in</span> timedata:
<span style="color: #81A1C1;">if</span> s == timeincr:
<span style="color: #D8DEE9;">startbin</span>[timeincr] += 1
<span style="color: #81A1C1;">if</span> e == timeincr:
<span style="color: #D8DEE9;">endbin</span>[timeincr] += 1
<span style="color: #81A1C1;">if</span> s <= timeincr <span style="color: #81A1C1;">and</span> e >= timeincr:
<span style="color: #81A1C1;">if</span> timeincr <span style="color: #81A1C1;">in</span> timebin:
<span style="color: #D8DEE9;">timebin</span>[timeincr] += 1
<span style="color: #81A1C1;">for</span> m <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">sorted</span>(timebin):
<span style="color: #81A1C1;">print</span> <span style="color: #A3BE8C;">"|{} | {} | {} | {}"</span>.<span style="color: #81A1C1;">format</span>(m, endbin[m],
timebin[m], startbin[m])
</pre>
</div>
</div>
</div>
<div id="outline-container-org3e1ceb0" class="outline-2">
<h2 id="org3e1ceb0">Plotting the Density of the Bins</h2>
<div class="outline-text-2" id="text-org3e1ceb0">
<p>
Once we have our bins, then it’s a matter of makeing a density plot
over time for each of the three events (starting, ongoing, and
ending).
</p>
<div class="org-src-container">
<pre class="src src-python" id="org2409373"><span style="color: #81A1C1;">import</span> matplotlib.pyplot <span style="color: #81A1C1;">as</span> plt
<span style="color: #81A1C1;">def</span> <span style="color: #88C0D0;">makebarplot</span>(bins):
<span style="color: #D8DEE9;">time</span> = [b[0] <span style="color: #81A1C1;">for</span> b <span style="color: #81A1C1;">in</span> bins] <span style="color: #4c566a;"># </span><span style="color: #4c566a;">extract the x-axis data</span>
<span style="color: #D8DEE9;">fig</span> = plt.figure() <span style="color: #4c566a;"># </span><span style="color: #4c566a;">get the matplotlib plot figure</span>
fig.set_size_inches(8, 1) <span style="color: #4c566a;"># </span><span style="color: #4c566a;">set the size of the plot</span>
<span style="color: #D8DEE9;">ax</span> = fig.add_subplot(1, 1, 1) <span style="color: #4c566a;"># </span><span style="color: #4c566a;">add a plot to the figure; Subplot</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">is confusing, though. The magical "(1, 1, 1)" here means there</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">will be one row, one column, and we are working with plot number</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">1, all of which is the same as just one plot. There is a little</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">more documentation on this at:</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot</span>
fig.patch.set_visible(<span style="color: #81A1C1;">False</span>) <span style="color: #4c566a;"># </span><span style="color: #4c566a;">make the background transparent</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">turn off the borders (called spines)</span>
ax.spines[<span style="color: #A3BE8C;">'top'</span>].set_visible(<span style="color: #81A1C1;">False</span>)
ax.spines[<span style="color: #A3BE8C;">'bottom'</span>].set_visible(<span style="color: #81A1C1;">False</span>)
ax.spines[<span style="color: #A3BE8C;">'right'</span>].set_visible(<span style="color: #81A1C1;">False</span>)
ax.spines[<span style="color: #A3BE8C;">'left'</span>].set_visible(<span style="color: #81A1C1;">False</span>)
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">set all of the ticks to 0 length</span>
ax.tick_params(axis=u<span style="color: #A3BE8C;">'both'</span>, which=u<span style="color: #A3BE8C;">'both'</span>,length=0)
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">hide everything about the x-axis</span>
ax.axes.get_xaxis().set_visible(<span style="color: #81A1C1;">False</span>)
<span style="color: #D8DEE9;">barwidth</span> = 1 <span style="color: #4c566a;"># </span><span style="color: #4c566a;">remove gaps between bars</span>
<span style="color: #D8DEE9;">color</span> = [<span style="color: #A3BE8C;">"red"</span>, <span style="color: #A3BE8C;">"blue"</span>, <span style="color: #A3BE8C;">"green"</span>] <span style="color: #4c566a;"># </span><span style="color: #4c566a;">set the colors for</span>
<span style="color: #81A1C1;">for</span> row <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">range</span>(1, <span style="color: #81A1C1;">len</span>(color)+1): <span style="color: #4c566a;"># </span><span style="color: #4c566a;">make as many rows as colors</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">extract the correct column</span>
<span style="color: #D8DEE9;">ongoing</span> = [b[row] <span style="color: #81A1C1;">for</span> b <span style="color: #81A1C1;">in</span> bins]
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">scale the data to the maximum</span>
<span style="color: #D8DEE9;">ongoing</span> = [c/<span style="color: #81A1C1;">float</span>(<span style="color: #81A1C1;">max</span>(ongoing)) <span style="color: #81A1C1;">for</span> c <span style="color: #81A1C1;">in</span> ongoing]
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">draw a black line at the left end</span>
<span style="color: #D8DEE9;">left</span> = 10
<span style="color: #D8DEE9;">border_width</span> = 20
<span style="color: #D8DEE9;">d</span> = border_width
ax.barh(row, d, barwidth, color=<span style="color: #A3BE8C;">"black"</span>,
left=left, edgecolor=<span style="color: #A3BE8C;">"none"</span>,
linewidth=0)
<span style="color: #D8DEE9;">left</span> += d
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">fill in the horizontal bar with the right color density</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">(alpha)</span>
<span style="color: #81A1C1;">for</span> d, c <span style="color: #81A1C1;">in</span> <span style="color: #81A1C1;">zip</span>(time, ongoing):
ax.barh(row, d, barwidth,
alpha=0.9*c+.01,
color=color[row-1],
left=left,
edgecolor=<span style="color: #A3BE8C;">"none"</span>,
linewidth=0)
<span style="color: #D8DEE9;">left</span> += d
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">draw a black line at the right end</span>
<span style="color: #D8DEE9;">d</span> = border_width
ax.barh(row, d, barwidth,
color=<span style="color: #A3BE8C;">"black"</span>,
left=left, edgecolor=<span style="color: #A3BE8C;">"none"</span>,
linewidth=0)
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">label the rows</span>
plt.yticks([1.5, 2.5, 3.5], [<span style="color: #A3BE8C;">'stopping'</span>, <span style="color: #A3BE8C;">'ongoing'</span>, <span style="color: #A3BE8C;">'starting'</span>], size=10)
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">return the plot to __main__</span>
<span style="color: #81A1C1;">return</span> plt
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">do some housekeeping that makes it all go in OrgMode (and hence PDF</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">and HTML)</span>
<span style="color: #81A1C1;">if</span> <span style="color: #81A1C1;">__name__</span> == <span style="color: #A3BE8C;">"__main__"</span>:
<span style="color: #D8DEE9;">plt</span> = makebarplot(bins)
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">The file extension controls the output format; .png and .pdf are</span>
<span style="color: #4c566a;"># </span><span style="color: #4c566a;">good choices along with .svg</span>
<span style="color: #D8DEE9;">filename</span>=<span style="color: #A3BE8C;">"edplot.svg"</span>
plt.savefig(filename)
<span style="color: #81A1C1;">return</span> filename
</pre>
</div>
<div class="figure">
<p><object type="image/svg+xml" data="/assets/edplot.svg" >
Sorry, your browser does not support SVG.</object>
</p>
</div>
<p>
And now you can see the number of starting events in green, the number
of ongoing events in blue, and the number of ending events in red.
The darker the color, the more events of that type are happening at
that time, hence the name, <i>event density plot</i>.
</p>
</div>
</div>
<div id="outline-container-orgfd97a1c" class="outline-2">
<h2 id="orgfd97a1c">The Future</h2>
<div class="outline-text-2" id="text-orgfd97a1c">
<p>
This could pretty readily be a Python class, and may be that
someday, but for now the <code>makebarplot</code> function is sufficient and
hopefully easy to understand and translate to the language of your
choice.
</p>
<p>
I would also like to include more examples, but thought that would
be as likely to add confusion as clarity.
</p>
</div>
</div>
<div id="outline-container-orgb27579f" class="outline-2">
<h2 id="orgb27579f">Addenda</h2>
<div class="outline-text-2" id="text-orgb27579f">
</div>
<div id="outline-container-orgc228573" class="outline-3">
<h3 id="orgc228573"><span class="timestamp-wrapper"><span class="timestamp">[2019-03-31 Sun]</span></span></h3>
<div class="outline-text-3" id="text-orgc228573">
<p>
As an example (thanks Daren!) of this, <a href="/assets/2016-06-18-event-density/eventDensityPlot.ipynb">here is an iPython
notebook</a>.
</p>
</div>
</div>
</div>
A List of Digital Leica M Cameras2016-05-11T00:00:00+00:00http://acaird.github.io/2016/05/11/leica<p>
There really should be a definitive list of digital Leica M cameras,
and I’m happy to turn this blog post into a link to that list. Until
then, I hope this helps someone.
</p>
<p>
I don’t know why Leica make it so hard on us to buy their cameras. I
can barely keep track of which camera I want, short of knowing that I
want a digital Leica M. I have a film M7 that is a freakish
combination of sculptural art work, exquisite functionality, and a
tool that makes photography enjoyable to the point of giggles and
produces results so good that they don’t make sense when compared to
the simplicity of the process. That said, I still want the digital
version because film really is a pain in the neck. If only I could
remember which digital version I wanted.
</p>
<p>
The film versions are pretty simple. The M7 has a meter and thus
modern<sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> conveniences like automatic shutter speed. Other Leica M
film cameras don’t have meters. Or some may. Whatever, I’ve already
got one and, as Monty Python’s French Soldier said, <i>Oh yes, it's very
nice!</i><sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> and Ferris was right: <i>It is so choice. If you have the
means, I highly recommend picking one up.</i> <sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>
</p>
<p>
But I don’t have a digital one. And as part of the long (yet almost
certainly inevitable) process of getting one, I needed some notes on
what the options are. If someone else has made this list, I couldn’t
find it, and I’m sorry for (a) not just linking it to your information
and (b) for spending the time I did making my own list when you
already had one.
</p>
<p>
So, here’s my table of digital Leica M-system cameras. If I’ve made
some mistake, let me know <a href="https://twitter.com/acaird">@acaird</a> <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-left" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-right" />
</colgroup>
<colgroup>
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Name</th>
<th scope="col" class="org-left">Price <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup></th>
<th scope="col" class="org-right">MP</th>
<th scope="col" class="org-right">Year</th>
<th scope="col" class="org-left">Comments</th>
<th scope="col" class="org-left">Top plate</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">M8</td>
<td class="org-left">ebay</td>
<td class="org-right">10</td>
<td class="org-right">2006</td>
<td class="org-left">not full-frame, while all the rest are</td>
<td class="org-left">brass</td>
</tr>
<tr>
<td class="org-left">M8.2</td>
<td class="org-left">ebay</td>
<td class="org-right">10</td>
<td class="org-right">2008</td>
<td class="org-left">M8 that is quieter, nicer LCD glass, more discreet</td>
<td class="org-left">brass</td>
</tr>
<tr>
<td class="org-left">M9</td>
<td class="org-left">$3,800</td>
<td class="org-right">18</td>
<td class="org-right">2009</td>
<td class="org-left"><a href="http://www.kenrockwell.com/leica/m9.htm">Ken says</a> it's the best camera ever made</td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left">M9-P</td>
<td class="org-left">$5,400</td>
<td class="org-right">18</td>
<td class="org-right">2011</td>
<td class="org-left">M9 with sapphire covered LCD and nicer cosmetics</td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left">M9 mono</td>
<td class="org-left">$5,000</td>
<td class="org-right">18</td>
<td class="org-right">2012</td>
<td class="org-left">b/w only; precursor to the typ 246</td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left">M-E (typ 220)</td>
<td class="org-left">$4,500</td>
<td class="org-right">18</td>
<td class="org-right">2012</td>
<td class="org-left">basically a less expensive M9 with no losses over the M9</td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left"><a href="https://en.wikipedia.org/wiki/Leica_M_(camera)">M (typ 240)</a></td>
<td class="org-left">$6,400</td>
<td class="org-right">24</td>
<td class="org-right">2012</td>
<td class="org-left">video</td>
<td class="org-left">brass</td>
</tr>
<tr>
<td class="org-left">M-P (typ 240)</td>
<td class="org-left">$7,000</td>
<td class="org-right">24</td>
<td class="org-right">2014</td>
<td class="org-left">video, more black and a bigger buffer than the M (typ 240)</td>
<td class="org-left">brass</td>
</tr>
<tr>
<td class="org-left">M-Mono (typ 246)</td>
<td class="org-left">$7,500</td>
<td class="org-right">24</td>
<td class="org-right">2015</td>
<td class="org-left">black-and-white only</td>
<td class="org-left"> </td>
</tr>
<tr>
<td class="org-left"><a href="https://en.wikipedia.org/wiki/Leica_M_(Typ_262)">M (typ 262)</a></td>
<td class="org-left">$5,200</td>
<td class="org-right">24</td>
<td class="org-right">2015</td>
<td class="org-left">no video</td>
<td class="org-left">aluminum</td>
</tr>
<tr>
<td class="org-left">M-D (typ 262)</td>
<td class="org-left">$6,000</td>
<td class="org-right">24</td>
<td class="org-right">2016</td>
<td class="org-left">A digital camera with no screen</td>
<td class="org-left"> </td>
</tr>
</tbody>
</table>
<p>
It seems to me that the M Typ 262 is the best option of that list, at
least for me. The <a href="http://www.theguardian.com/artanddesign/photography-blog/2012/dec/28/leica-m-e-rangefinder-photography">M-E</a> is probably just as effective and fun, but since
the savings of $700 would barely buy a suitable lens, it might not be
worth it<sup><a id="fnr.6" class="footref" href="#fn.6">6</a></sup>, plus everyone knows that megapixels are what define
good photos. I think it was Cartier-Bresson who said “Megapixels plus
Aperature Priority equal Art”, or something like that—it was in
French and I can’t be sure of the translation. As for video, I have
an iPhone for video, and it probably does a better job than either of
the <i>typ 240</i> do in my hands.
</p>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup> <div class="footpara"><p class="footpara">
Modernity here defined as sometime after the 1940s. As if your
megapixels are really that cool… because Ken Rockwell asserts they
aren’t, and he’s got enough self-confidence for himself, you, me, and
a goodly fraction of the Internet, so there’s that.
</p></div></div>
<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup> <div class="footpara"><p class="footpara">
<a href="http://www.imdb.com/character/ch0091166/quotes">http://www.imdb.com/character/ch0091166/quotes</a>
</p></div></div>
<div class="footdef"><sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup> <div class="footpara"><p class="footpara">
<a href="http://www.imdb.com/title/tt0091042/quotes">http://www.imdb.com/title/tt0091042/quotes</a>
</p></div></div>
<div class="footdef"><sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup> <div class="footpara"><p class="footpara">
This page could change at any time; if you are really concerned
about the history of it, you’ll have to follow the changes via <code>git</code>
at <a href="https://github.com/acaird/acaird.github.io/tree/master/_posts">github</a>.
</p></div></div>
<div class="footdef"><sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup> <div class="footpara"><p class="footpara">
Prices change over time. Plus, eBay. So these are basically
meaningless. Sorry. <a href="http://www.google.com/">Google</a> some stuff, you’re a grown-up looking at
multi-thousand-dollar cameras for goodness sake.
</p></div></div>
<div class="footdef"><sup><a id="fn.6" class="footnum" href="#fnr.6">6</a></sup> <div class="footpara"><p class="footpara">
However, I’m open to advice.
</p></div></div>
</div>
</div>
A Simple GUI and Command-line Python Program2016-02-07T00:00:00+00:00http://acaird.github.io/2016/02/07/simple-python-gui<p>
I needed to make a simple GUI for translating comma-separated value
input into a reStructuredText table, and ended up writing a simple
Python program that might be a useful example for you of Tkinter,
<code>tkFileDialog</code>, and a combination command line and GUI program.
</p>
<div id="outline-container-orgheadline1" class="outline-2">
<h2 id="orgheadline1">What is this for?</h2>
<div class="outline-text-2" id="text-orgheadline1">
<p>
I needed a simple program to convert CSV files into <a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a>
tables for a group of people who write in RST and don't want to be
bothered to create <a href="http://docutils.sourceforge.net/docs/user/rst/quickref.html#tables">RST tables</a> by hand (which really is a pain unless
you’re using Emacs and its <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Text-Based-Tables.html">text-based tables</a> package).
</p>
<p>
I started with the command-line version to get the functionality
then added the GUI elements.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">def</span> <span style="color: #0000ff;">write_table</span>(outputfile, table_contents):
<span style="color: #8b2252;">""" Write out the .rst file with the table in it</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">with</span> <span style="color: #483d8b;">open</span>(outputfile, <span style="color: #8b2252;">"wb"</span>) <span style="color: #a020f0;">as</span> output_file:
<span style="color: #a020f0;">try</span>:
output_file.write(tabulate(table_contents,
tablefmt=<span style="color: #8b2252;">"grid"</span>,
headers=<span style="color: #8b2252;">"firstrow"</span>))
<span style="color: #a020f0;">except</span>:
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">False</span>
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">True</span>
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">command_line</span>(args):
<span style="color: #8b2252;">""" Run the command-line version</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">if</span> args.output <span style="color: #a020f0;">is</span> <span style="color: #008b8b;">None</span>:
<span style="color: #a0522d;">args.output</span> = get_output_filename(args.<span style="color: #483d8b;">input</span>)
<span style="color: #a0522d;">table_contents</span> = read_csv(args.<span style="color: #483d8b;">input</span>)
<span style="color: #a020f0;">if</span> write_table(args.output, table_contents):
<span style="color: #a020f0;">print</span> <span style="color: #8b2252;">"rst table is in file `{}'"</span>.<span style="color: #483d8b;">format</span>(args.output)
<span style="color: #a020f0;">else</span>:
<span style="color: #a020f0;">print</span> <span style="color: #8b2252;">"Writing file `{}' did not succeed."</span>.<span style="color: #483d8b;">format</span>(args.output)
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">read_csv</span>(filename):
<span style="color: #8b2252;">""" Read the CSV file</span>
<span style="color: #8b2252;"> This fails pretty silently on any exception at all</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">try</span>:
<span style="color: #a020f0;">with</span> <span style="color: #483d8b;">open</span>(filename, <span style="color: #8b2252;">'rb'</span>) <span style="color: #a020f0;">as</span> csvfile:
<span style="color: #a0522d;">dialect</span> = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
<span style="color: #a0522d;">reader</span> = csv.reader(csvfile, dialect)
<span style="color: #a0522d;">r</span> = []
<span style="color: #a020f0;">for</span> row <span style="color: #a020f0;">in</span> reader:
r.append(row)
<span style="color: #a020f0;">except</span>:
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">None</span>
<span style="color: #a020f0;">return</span> r
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">get_parser</span>():
<span style="color: #8b2252;">""" The argument parser of the command-line version """</span>
<span style="color: #a0522d;">parser</span> = argparse.ArgumentParser(description=(<span style="color: #8b2252;">'convert csv to rst table'</span>))
parser.add_argument(<span style="color: #8b2252;">'--input'</span>, <span style="color: #8b2252;">'-F'</span>,
<span style="color: #483d8b;">help</span>=<span style="color: #8b2252;">'name of the intput file'</span>)
parser.add_argument(<span style="color: #8b2252;">'--output'</span>, <span style="color: #8b2252;">'-O'</span>,
<span style="color: #483d8b;">help</span>=(<span style="color: #8b2252;">"name of the output file; "</span> +
<span style="color: #8b2252;">"defaults to <inputfilename>.rst"</span>))
<span style="color: #a020f0;">return</span> parser
<span style="color: #a020f0;">if</span> <span style="color: #483d8b;">__name__</span> == <span style="color: #8b2252;">"__main__"</span>:
<span style="color: #8b2252;">""" Run as a stand-alone script """</span>
<span style="color: #a0522d;">parser</span> = get_parser() <span style="color: #b22222;"># </span><span style="color: #b22222;">Start the command-line argument parsing</span>
<span style="color: #a0522d;">args</span> = parser.parse_args() <span style="color: #b22222;"># </span><span style="color: #b22222;">Read the command-line arguments</span>
<span style="color: #a020f0;">if</span> args.<span style="color: #483d8b;">input</span>: <span style="color: #b22222;"># </span><span style="color: #b22222;">If there is an argument,</span>
command_line(args) <span style="color: #b22222;"># </span><span style="color: #b22222;">run the command-line version</span>
<span style="color: #a020f0;">else</span>:
gui() <span style="color: #b22222;"># </span><span style="color: #b22222;">otherwise run the GUI version</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-orgheadline2" class="outline-2">
<h2 id="orgheadline2">Tkinter</h2>
<div class="outline-text-2" id="text-orgheadline2">
<p>
There are many resources for Python’s Tk integration library
(<a href="https://wiki.python.org/moin/TkInter">Tkinter</a>) on the Internet, but the basics are:
</p>
<ul class="org-ul">
<li>create the widgets (buttons, labels, text entry fields, etc.) you
want, and use the <a href="http://effbot.org/tkinterbook/pack.htm">pack()</a> function to get them arranged.</li>
<li>add functions that are called when the buttons are pressed; these
are called <b>callback functions</b> because once the loop is started
in the next step, the only way out of the loop is to take a brief
detour from the loop to the function associated with the widget
(typically buttons)</li>
<li>call the function <code>mainloop()</code></li>
</ul>
<blockquote>
<p>
As a side note, this is how most graphical user interfaces work; in
Microsoft Windows this is called <code>GetMessage()</code>, in Mac OS X it is
<code>CFRunLoopRun()</code>, in Android apps it is <code>android.os.Looper</code>.
</p>
</blockquote>
<p>
With a text entry field and a <b>Go</b> button to process the file, this
little program could be considered complete.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">def</span> <span style="color: #0000ff;">gui</span>():
<span style="color: #8b2252;">"""make the GUI version of this command that is run if no options are</span>
<span style="color: #8b2252;"> provided on the command line"""</span>
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">button_go_callback</span>():
<span style="color: #8b2252;">""" what to do when the "Go" button is pressed """</span>
<span style="color: #a0522d;">input_file</span> = entry.get()
<span style="color: #a020f0;">if</span> input_file.rsplit(<span style="color: #8b2252;">"."</span>)[-1] != <span style="color: #8b2252;">"csv"</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Filename must end in `.csv'"</span>)
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a020f0;">return</span>
<span style="color: #a020f0;">else</span>:
<span style="color: #a0522d;">table_contents</span> = read_csv(input_file)
<span style="color: #a020f0;">if</span> table_contents <span style="color: #a020f0;">is</span> <span style="color: #008b8b;">None</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Error reading file `{}'"</span>.<span style="color: #483d8b;">format</span>(input_file))
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a020f0;">return</span>
<span style="color: #a0522d;">output_file</span> = get_output_filename(input_file)
<span style="color: #a020f0;">if</span> write_table(output_file, table_contents):
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Output is in {}"</span>.<span style="color: #483d8b;">format</span>(output_file))
message.configure(fg=<span style="color: #8b2252;">"black"</span>)
<span style="color: #a020f0;">else</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Writing file "</span>
<span style="color: #8b2252;">"`{}' did not succeed"</span>.<span style="color: #483d8b;">format</span>(output_file))
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a0522d;">root</span> = Tk()
<span style="color: #a0522d;">frame</span> = Frame(root)
frame.pack()
<span style="color: #a0522d;">statusText</span> = StringVar(root)
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Enter CSV filename, "</span>
<span style="color: #8b2252;">"then press the Go button"</span>)
<span style="color: #a0522d;">label</span> = Label(root, text=<span style="color: #8b2252;">"CSV file: "</span>)
label.pack()
<span style="color: #a0522d;">entry</span> = Entry(root, width=50)
entry.pack()
<span style="color: #a0522d;">button_go</span> = Button(root,
text=<span style="color: #8b2252;">"Go"</span>,
command=button_go_callback)
button_go.pack()
<span style="color: #a0522d;">message</span> = Label(root, textvariable=statusText)
message.pack()
mainloop()
</pre>
</div>
<p>
This code has two parts:
</p>
<ol class="org-ol">
<li>The main function that creates a text entry field and a <b>Go</b>
button then calls <code>mainLoop()</code></li>
<li>A sub-function (or nested function or inner function) that
calls the same functions as the command-line version of the
program and updates the status line in the GUI as appropriate.</li>
</ol>
<p>
Using this, however, means you have to know the path to the CSV file
and type it in to the text entry box. That is not how most modern
applications work, usually there is a file browser…
</p>
</div>
<div id="outline-container-orgheadline3" class="outline-3">
<h3 id="orgheadline3">tkFileDialog</h3>
<div class="outline-text-3" id="text-orgheadline3">
<p>
The <code>tkFileDialog</code> presents an OS-native file browsing and selection
dialog. This script takes the selected file name and populates the
text entry box for the CSV file with the full path to the selected
file. The nicest part of this is that it only takes a few lines of
Python to do this.
</p>
<p>
We add a <b>Browse</b> button and another nested function to be its
callback function. The callback function simply lets
<code>tkFileDialog.askopenfilename()</code> give us the name of the file the
user selected from the file browser and then fills in the entry
field (cleverly named <code>entry</code> in our program) with the full path
and file name.
</p>
<p>
When the <b>Browse</b> button is pressed the <code>browse_button_callback</code>
function is called because the button was created with:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a0522d;">button_browse</span> = Button(root,
text=<span style="color: #8b2252;">"Browse"</span>,
command=button_browse_callback)
</pre>
</div>
<p>
and the file name entry field was created with:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a0522d;">entry</span> = Entry(root, width=50)
</pre>
</div>
<p>
then the filename comes from the <code>askopenfilename</code> function in
<code>tkFileDialog</code> and is used to populate the text entry field.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">def</span> <span style="color: #0000ff;">button_browse_callback</span>():
<span style="color: #8b2252;">""" What to do when the Browse button is pressed """</span>
<span style="color: #a0522d;">filename</span> = tkFileDialog.askopenfilename()
entry.delete(0, END)
entry.insert(0, filename)
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-orgheadline4" class="outline-2">
<h2 id="orgheadline4">What do we have now?</h2>
<div class="outline-text-2" id="text-orgheadline4">
<p>
Now we have pretty simply Python program that:
</p>
<ul class="org-ul">
<li><p>
can be run from the command line using standard command line
options and offering a help menu when the command line option is
<code>--help</code>
</p>
<pre class="example">
$ ./makersttable.py --help
usage: makersttable.py [-h] [--input INPUT] [--output OUTPUT]
convert csv to rst table
optional arguments:
-h, --help show this help message and exit
--input INPUT, -F INPUT
name of the intput file
--output OUTPUT, -O OUTPUT
name of the output file; defaults to
<inputfilename>.rst
</pre></li>
<li>in the absence of command line options, a graphical application
is started that allows the user to type in a file name or select
one from a file browser, then click <b>Go</b></li>
<li>either option results in the creation of a file with a
reStructuredText table in it based on the contents of a file with
comma-separated values (CSV) in it</li>
</ul>
<p>
With a simple input file formatted like this:
</p>
<pre class="example">
Header Col 1, Header Col 2, Header Col3
This is the first value, This is the second value, This is the third value
Red, Blue, Green
42, 10, 1
</pre>
<p>
the Python program can be run from the command line like:
</p>
<pre class="example">
$ ./makersttable.py -F test.csv
rst table is in file `test.rst'
</pre>
<p>
and the resulting <code>test.rst</code> file looks like:
</p>
<pre class="example">
+-------------------------+--------------------------+-------------------------+
| Header Col 1 | Header Col 2 | Header Col3 |
+=========================+==========================+=========================+
| This is the first value | This is the second value | This is the third value |
+-------------------------+--------------------------+-------------------------+
| Red | Blue | Green |
+-------------------------+--------------------------+-------------------------+
| 42 | 10 | 1 |
+-------------------------+--------------------------+-------------------------+
</pre>
<p>
Or, the Python program can be run like:
</p>
<pre class="example">
$ ./makersttable.py
</pre>
<p>
and you’ll get a graphical interface that looks like:
</p>
<div class="figure">
<p><img src="/assets/makersttable-1.png" alt="makersttable-1.png" />
</p>
<p><span class="figure-number">Figure 1:</span> Initial GUI Screen</p>
</div>
<p>
Pressing the <b>Browse</b> button will present a file dialog:
</p>
<div class="figure">
<p><img src="/assets/makersttable-2.png" alt="makersttable-2.png" />
</p>
<p><span class="figure-number">Figure 2:</span> The file browsing dialog box</p>
</div>
<p>
Selecting a file will populate the entry field:
</p>
<div class="figure">
<p><img src="/assets/makersttable-3.png" alt="makersttable-3.png" />
</p>
<p><span class="figure-number">Figure 3:</span> The selected filename shown in the entry field</p>
</div>
<p>
And pressing the <b>Go</b> button converts the file; the path to which
is in the status message area of the GUI:
</p>
<div class="figure">
<p><img src="/assets/makersttable-4.png" alt="makersttable-4.png" />
</p>
<p><span class="figure-number">Figure 4:</span> The path to rST file is shown in the status area</p>
</div>
</div>
</div>
<div id="outline-container-orgheadline5" class="outline-2">
<h2 id="orgheadline5">The Whole Python Program</h2>
<div class="outline-text-2" id="text-orgheadline5">
<div class="org-src-container">
<pre class="src src-python"><span style="color: #b22222;">#</span><span style="color: #b22222;">!/usr/bin/env python</span>
<span style="color: #8b2252;">"""Convert CSV to reStructuredText tables</span>
<span style="color: #8b2252;">A command-line and PythonTk GUI program to do a simple conversion from</span>
<span style="color: #8b2252;">CSV files to reStructuredText tables</span>
<span style="color: #8b2252;">A. Caird (acaird@gmail.com)</span>
<span style="color: #8b2252;">2016</span>
<span style="color: #8b2252;">"""</span>
<span style="color: #a020f0;">import</span> argparse
<span style="color: #a020f0;">import</span> csv
<span style="color: #a020f0;">from</span> tabulate <span style="color: #a020f0;">import</span> tabulate
<span style="color: #a020f0;">import</span> tkFileDialog
<span style="color: #a020f0;">from</span> Tkinter <span style="color: #a020f0;">import</span> *
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">get_output_filename</span>(input_file_name):
<span style="color: #8b2252;">""" replace the suffix of the file with .rst """</span>
<span style="color: #a020f0;">return</span> input_file_name.rpartition(<span style="color: #8b2252;">"."</span>)[0] + <span style="color: #8b2252;">".rst"</span>
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">gui</span>():
<span style="color: #8b2252;">"""make the GUI version of this command that is run if no options are</span>
<span style="color: #8b2252;"> provided on the command line"""</span>
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">button_go_callback</span>():
<span style="color: #8b2252;">""" what to do when the "Go" button is pressed """</span>
<span style="color: #a0522d;">input_file</span> = entry.get()
<span style="color: #a020f0;">if</span> input_file.rsplit(<span style="color: #8b2252;">"."</span>)[-1] != <span style="color: #8b2252;">"csv"</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Filename must end in `.csv'"</span>)
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a020f0;">return</span>
<span style="color: #a020f0;">else</span>:
<span style="color: #a0522d;">table_contents</span> = read_csv(input_file)
<span style="color: #a020f0;">if</span> table_contents <span style="color: #a020f0;">is</span> <span style="color: #008b8b;">None</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Error reading file `{}'"</span>.<span style="color: #483d8b;">format</span>(input_file))
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a020f0;">return</span>
<span style="color: #a0522d;">output_file</span> = get_output_filename(input_file)
<span style="color: #a020f0;">if</span> write_table(output_file, table_contents):
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Output is in {}"</span>.<span style="color: #483d8b;">format</span>(output_file))
message.configure(fg=<span style="color: #8b2252;">"black"</span>)
<span style="color: #a020f0;">else</span>:
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Writing file "</span>
<span style="color: #8b2252;">"`{}' did not succeed"</span>.<span style="color: #483d8b;">format</span>(output_file))
message.configure(fg=<span style="color: #8b2252;">"red"</span>)
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">button_browse_callback</span>():
<span style="color: #8b2252;">""" What to do when the Browse button is pressed """</span>
<span style="color: #a0522d;">filename</span> = tkFileDialog.askopenfilename()
entry.delete(0, END)
entry.insert(0, filename)
<span style="color: #a0522d;">root</span> = Tk()
<span style="color: #a0522d;">frame</span> = Frame(root)
frame.pack()
<span style="color: #a0522d;">statusText</span> = StringVar(root)
statusText.<span style="color: #483d8b;">set</span>(<span style="color: #8b2252;">"Press Browse button or enter CSV filename, "</span>
<span style="color: #8b2252;">"then press the Go button"</span>)
<span style="color: #a0522d;">label</span> = Label(root, text=<span style="color: #8b2252;">"CSV file: "</span>)
label.pack()
<span style="color: #a0522d;">entry</span> = Entry(root, width=50)
entry.pack()
<span style="color: #a0522d;">separator</span> = Frame(root, height=2, bd=1, relief=SUNKEN)
separator.pack(fill=X, padx=5, pady=5)
<span style="color: #a0522d;">button_go</span> = Button(root,
text=<span style="color: #8b2252;">"Go"</span>,
command=button_go_callback)
<span style="color: #a0522d;">button_browse</span> = Button(root,
text=<span style="color: #8b2252;">"Browse"</span>,
command=button_browse_callback)
<span style="color: #a0522d;">button_exit</span> = Button(root,
text=<span style="color: #8b2252;">"Exit"</span>,
command=sys.<span style="color: #008b8b;">exit</span>)
button_go.pack()
button_browse.pack()
button_exit.pack()
<span style="color: #a0522d;">separator</span> = Frame(root, height=2, bd=1, relief=SUNKEN)
separator.pack(fill=X, padx=5, pady=5)
<span style="color: #a0522d;">message</span> = Label(root, textvariable=statusText)
message.pack()
mainloop()
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">write_table</span>(outputfile, table_contents):
<span style="color: #8b2252;">""" Write out the .rst file with the table in it</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">with</span> <span style="color: #483d8b;">open</span>(outputfile, <span style="color: #8b2252;">"wb"</span>) <span style="color: #a020f0;">as</span> output_file:
<span style="color: #a020f0;">try</span>:
output_file.write(tabulate(table_contents,
tablefmt=<span style="color: #8b2252;">"grid"</span>,
headers=<span style="color: #8b2252;">"firstrow"</span>))
<span style="color: #a020f0;">except</span>:
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">False</span>
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">True</span>
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">command_line</span>(args):
<span style="color: #8b2252;">""" Run the command-line version</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">if</span> args.output <span style="color: #a020f0;">is</span> <span style="color: #008b8b;">None</span>:
<span style="color: #a0522d;">args.output</span> = get_output_filename(args.<span style="color: #483d8b;">input</span>)
<span style="color: #a0522d;">table_contents</span> = read_csv(args.<span style="color: #483d8b;">input</span>)
<span style="color: #a020f0;">if</span> write_table(args.output, table_contents):
<span style="color: #a020f0;">print</span> <span style="color: #8b2252;">"rst table is in file `{}'"</span>.<span style="color: #483d8b;">format</span>(args.output)
<span style="color: #a020f0;">else</span>:
<span style="color: #a020f0;">print</span> <span style="color: #8b2252;">"Writing file `{}' did not succeed."</span>.<span style="color: #483d8b;">format</span>(args.output)
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">read_csv</span>(filename):
<span style="color: #8b2252;">""" Read the CSV file</span>
<span style="color: #8b2252;"> This fails pretty silently on any exception at all</span>
<span style="color: #8b2252;"> """</span>
<span style="color: #a020f0;">try</span>:
<span style="color: #a020f0;">with</span> <span style="color: #483d8b;">open</span>(filename, <span style="color: #8b2252;">'rb'</span>) <span style="color: #a020f0;">as</span> csvfile:
<span style="color: #a0522d;">dialect</span> = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
<span style="color: #a0522d;">reader</span> = csv.reader(csvfile, dialect)
<span style="color: #a0522d;">r</span> = []
<span style="color: #a020f0;">for</span> row <span style="color: #a020f0;">in</span> reader:
r.append(row)
<span style="color: #a020f0;">except</span>:
<span style="color: #a020f0;">return</span> <span style="color: #008b8b;">None</span>
<span style="color: #a020f0;">return</span> r
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">get_parser</span>():
<span style="color: #8b2252;">""" The argument parser of the command-line version """</span>
<span style="color: #a0522d;">parser</span> = argparse.ArgumentParser(description=(<span style="color: #8b2252;">'convert csv to rst table'</span>))
parser.add_argument(<span style="color: #8b2252;">'--input'</span>, <span style="color: #8b2252;">'-F'</span>,
<span style="color: #483d8b;">help</span>=<span style="color: #8b2252;">'name of the intput file'</span>)
parser.add_argument(<span style="color: #8b2252;">'--output'</span>, <span style="color: #8b2252;">'-O'</span>,
<span style="color: #483d8b;">help</span>=(<span style="color: #8b2252;">"name of the output file; "</span> +
<span style="color: #8b2252;">"defaults to <inputfilename>.rst"</span>))
<span style="color: #a020f0;">return</span> parser
<span style="color: #a020f0;">if</span> <span style="color: #483d8b;">__name__</span> == <span style="color: #8b2252;">"__main__"</span>:
<span style="color: #8b2252;">""" Run as a stand-alone script """</span>
<span style="color: #a0522d;">parser</span> = get_parser() <span style="color: #b22222;"># </span><span style="color: #b22222;">Start the command-line argument parsing</span>
<span style="color: #a0522d;">args</span> = parser.parse_args() <span style="color: #b22222;"># </span><span style="color: #b22222;">Read the command-line arguments</span>
<span style="color: #a020f0;">if</span> args.<span style="color: #483d8b;">input</span>: <span style="color: #b22222;"># </span><span style="color: #b22222;">If there is an argument,</span>
command_line(args) <span style="color: #b22222;"># </span><span style="color: #b22222;">run the command-line version</span>
<span style="color: #a020f0;">else</span>:
gui() <span style="color: #b22222;"># </span><span style="color: #b22222;">otherwise run the GUI version</span>
</pre>
</div>
</div>
</div>
Plotting Data in Org-Mode Tables with Python2015-09-04T00:00:00+00:00http://acaird.github.io/2015/09/04/plots-from-org-mode-tables<p>
Among the huge number of great things about Emacs’ Org-Mode is its
table editing and spreadsheet capabilities. Combined with OrgBabel
and Python, Org-Mode has a light-weight but capable data visualization
tool.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Org-Mode Tables and Spreadsheets</h2>
<div class="outline-text-2" id="text-1">
<p>
The <a href="http://orgmode.org/worg/org-tutorials/tables.html">table editing feature</a> of <a href="http://orgmode.org/">Org-Mode</a> is an amazing feat of
text-based formatting and managing, making tables in a text editor
not only possible, but, in many cases, better than the table editing
and display capabilities of graphical editors (I’m looking at you,
Microsoft Word and Google Docs).
</p>
<p>
In addition to the table editing, Org-Mode has a <a href="http://orgmode.org/worg/org-tutorials/org-spreadsheet-intro.html">simple spreadsheet</a>
built in, so taking sums or averages of columns, or applying
mathematical operations to two columns and putting the results in a
third is straightforward.
</p>
<p>
For more complicated spreadsheet-like operations, Org-Mode tables
can easily become data frames in R or lists (or lists of lists) in
Python (or a data structure in other languages) and those data
structures can be turned into tables in Org-Mode for export or
publication. This is the key to making plots.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Plottable data from table to image</h2>
<div class="outline-text-2" id="text-2">
<p>
Plotting in <a href="https://www.python.org/">Python</a> isn’t always straightforward, but <a href="http://matplotlib.org/">matplotlib</a>’s
<a href="http://matplotlib.org/api/pyplot_api.html">pyplot</a> makes it workable, so you’ll need Python and matplotlib
installed in addition to Emacs and Org-Mode. If you’re reading
this, that last sentence might have been insulting: I’m sorry.
</p>
<p>
For example, you have a table in Org-Mode that has four columns of
numbers:
</p>
<table id="mySweetTable" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="right" />
<col class="right" />
<col class="right" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="right">One</th>
<th scope="col" class="right">Two</th>
<th scope="col" class="right">Three</th>
<th scope="col" class="right">Four</th>
</tr>
</thead>
<tbody>
<tr>
<td class="right">1</td>
<td class="right">2</td>
<td class="right">3</td>
<td class="right">4</td>
</tr>
<tr>
<td class="right">5</td>
<td class="right">6</td>
<td class="right">7</td>
<td class="right">8</td>
</tr>
<tr>
<td class="right">9</td>
<td class="right">10</td>
<td class="right">11</td>
<td class="right">12</td>
</tr>
<tr>
<td class="right">13</td>
<td class="right">14</td>
<td class="right">15</td>
<td class="right">16</td>
</tr>
<tr>
<td class="right">17</td>
<td class="right">18</td>
<td class="right">19</td>
<td class="right">20</td>
</tr>
<tr>
<td class="right">21</td>
<td class="right">22</td>
<td class="right">23</td>
<td class="right">24</td>
</tr>
<tr>
<td class="right">25</td>
<td class="right">26</td>
<td class="right">27</td>
<td class="right">28</td>
</tr>
<tr>
<td class="right">29</td>
<td class="right">30</td>
<td class="right">31</td>
<td class="right">32</td>
</tr>
<tr>
<td class="right">33</td>
<td class="right">34</td>
<td class="right">35</td>
<td class="right">36</td>
</tr>
</tbody>
</table>
<p>
First, you have to give it a name, putting the line (that won’t be
exported) <code>#+NAME: mySweetTable</code> immediately before the table. This
will allow you to refer to the table data in the Babel script that
will plot (or otherwise manipulate) the data.
</p>
<p>
The key to the plot is the very simple Python script:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">import</span> matplotlib.pyplot <span style="color: #a020f0;">as</span> plt
<span style="color: #8b2252;">'''If you have formatting lines on your table</span>
<span style="color: #8b2252;">(http://orgmode.org/manual/Column-groups.html) you need to remove them</span>
<span style="color: #8b2252;">"by hand" with a line like:</span>
<span style="color: #8b2252;">data = data[2:]</span>
<span style="color: #8b2252;">'''</span>
<span style="color: #8b2252;">'''Turn the table data into x and y data'''</span>
<span style="color: #a0522d;">x</span> = [a[0] <span style="color: #a020f0;">for</span> a <span style="color: #a020f0;">in</span> data]
<span style="color: #a0522d;">y1</span> = [a[1] <span style="color: #a020f0;">for</span> a <span style="color: #a020f0;">in</span> data]
<span style="color: #a0522d;">y2</span> = [a[2] <span style="color: #a020f0;">for</span> a <span style="color: #a020f0;">in</span> data]
<span style="color: #a0522d;">y3</span> = [a[3] <span style="color: #a020f0;">for</span> a <span style="color: #a020f0;">in</span> data]
<span style="color: #8b2252;">''' Plot the x and y data'''</span>
a, = plt.plot(x, y1, label=<span style="color: #8b2252;">"y1"</span>, marker=<span style="color: #8b2252;">'v'</span>)
b, = plt.plot(x, y2, label=<span style="color: #8b2252;">"y2"</span>, marker=<span style="color: #8b2252;">'o'</span>)
c, = plt.plot(x, y3, label=<span style="color: #8b2252;">"y3"</span>, marker=<span style="color: #8b2252;">'x'</span>)
<span style="color: #8b2252;">''' Set the x and y labels on the graph '''</span>
plt.xlabel(<span style="color: #8b2252;">"x axis label"</span>)
plt.ylabel(<span style="color: #8b2252;">"y axis label"</span>)
<span style="color: #8b2252;">''' Create the legend '''</span>
plt.legend(handles=[a,b,c],loc=<span style="color: #8b2252;">"upper left"</span>)
<span style="color: #8b2252;">''' Save the PNG file '''</span>
<span style="color: #a0522d;">filename</span> = <span style="color: #8b2252;">"mySweetPlot.png"</span>
plt.savefig(filename)
<span style="color: #8b2252;">''' Return the PNG file path to OrgMode '''</span>
<span style="color: #a020f0;">return</span>(filename)
</pre>
</div>
<div class="figure">
<p><img src="/assets/mySweetPlot.png" alt="mySweetPlot.png" />
</p>
</div>
<p>
The <code>#+BEGIN_SRC</code> line that starts the OrgBabel section for the
script looks like:
</p>
<pre class="example">
#+BEGIN_SRC python :results file :exports results :var data=mySweetTable
</pre>
<p>
The data is imported into the Python script by referring to the
table name (<code>mySweetTable</code>) in the <code>#+BEGIN_SRC</code> line’s <code>:var</code>
option as the variable that will be in the Python script (in this
case, the cleverly named <code>data</code> variable). And, as you can see, the
columns in the table show up in the Python script as <code>a[0]</code> through
<code>a[3]</code>. If you have more than one table, you can have more than one
<code>:var pythonVar=tableName</code> option on the <code>#+BEGIN_SRC</code> line.
</p>
<p>
Most of the Python is plotting code, so when you type <code>C-c C-c</code> with
the cursor (point) in the Python code, Org-Mode will run it and put
a link into the document. Moving the cursor (point) onto the link
and pressing <code>C-c C-o</code> the image will open in either Emacs or the
program you have defined to open that file type. If you prefer
another file type, simply change the extension on the file name and
pyplot will save that file type; for details on the available types,
see the pyplot documentation.
</p>
<p>
While matplotlib can make publication quality plots, these are not
those; these plots are simply to quickly see relationships between
the data in your tables. For higher quality plots, see
<a href="https://github.com/olgabot/prettyplotlib">prettyplotlib</a>, <a href="https://github.com/jbmouret/matplotlib_for_papers">matplotlib for papers</a>, or any of the many other
<a href="http://bfy.tw/1gGD">resources on the Internet</a> on this topic.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">15 lines</h2>
<div class="outline-text-2" id="text-3">
<p>
The 15 lines of Python code (not counting comments) will let you
make plots of the data in a 4-column Org-Mode table. Using
Org-Mode’s <a href="http://orgmode.org/manual/Noweb-reference-syntax.html">noweb</a> options, you can include this code once in your
Org-Mode document and re-use it with different inputs throughout
your document for a quick look at the data you have in the tables.
</p>
<p>
For example, if you name the Python code above <code>plottingcode</code> by
putting the line <code>#+NAME: plottingcode</code> immediately before the
<code>#+BEGIN_SRC</code> line and then add <code>:noweb yes</code> to the end of the
<code>#+BEGIN_SRC</code> line, you can then later refer to that code in a code
block like:
</p>
<pre class="example">
#+BEGIN_SRC python :results file :exports both :var data=myOtherTable
<<plottingcode>>
#+END_SRC
</pre>
<p>
and running that by pressing <code>C-c C-c</code> with the point on or between
the <code>BEGIN</code> and <code>END</code> lines will evaluate the code from the other
location but with the new input—in this case using <code>myOtherTable</code>
as the <code>data</code> variable instead of the original <code>mySweetTable</code>.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">And there’s your in-Org-Mode data visualization</h2>
<div class="outline-text-2" id="text-4">
<p>
With 15 lines of re-usable Python, one useful Python package, you
can make simple plots out of your Org-Mode tables that you can see
in Emacs.
</p>
<div class="figure">
<p><img src="/assets/emacs-plot.png" alt="emacs-plot.png" width="60%" />
</p>
</div>
</div>
</div>
Time Series Plots with nvd32015-08-12T00:00:00+00:00http://acaird.github.io/2015/08/12/nvd3-time-series-plots<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<meta charset="utf-8" />
<link href="https://cdnjs.cloudflare.com/ajax/libs/nvd3/1.7.0/nv.d3.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/nvd3/1.7.0/nv.d3.min.js"></script>
<p>
The purpose of this post on plotting time on the x-axis of a plot
produced using Python’s nvd3 library is because when I Googled for it
(really, does anyone purposefully use Yahoo! or Bing to search for
anything?) I got no results. It wasn’t obvious to me, and no matter
how many times I Googled it, it wasn’t there. Why is there even an
Internet if it can’t help me figure out how to do exactly what I want
to do at the exact moment I want to do it?
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">The problem</h2>
<div class="outline-text-2" id="text-1">
<ul class="org-ul">
<li><a href="https://github.com/areski/python-nvd3">nvd3</a> is really handy
</li>
<li>many plots are time-series plots
</li>
<li>the <a href="http://python-nvd3.readthedocs.org/">Python nvd3 docs</a> don’t mention milliseconds, which is how it
wants time represented:
<div class="figure">
<p><img src="/assets/missing-milliseconds.png" alt="missing-milliseconds.png" />
</p>
</div>
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Getting your times in the right format</h2>
<div class="outline-text-2" id="text-2">
<p>
This is very simple, really. The most important thing is to realize
that the time that nvd3 wants is the number of milliseconds since
the epoch. Or add three zeros to the number of seconds since the
epoch. So <b>1439417760</b> is 12 August 2015 at 18:16 GMT.
</p>
<p>
In Python, if your times are <i>datetime.datetime</i> types (when printed
they look like <i>2013-05-05 08:51:51</i>), you can convert them to
seconds since the Unix epoch in many different ways. One way to
convert them is:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">import</span> time
<span style="color: #a020f0;">import</span> datetime
<span style="color: #a0522d;">seconds_since_epoch</span> = time.mktime(myDatetimeTimeVariable.timetuple())
</pre>
</div>
<p>
Then simply multiply <code>seconds_since_epoch</code> by 1000 to get
milliseconds.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Plotting Time on the X-Axis of an nvd3 plot</h2>
<div class="outline-text-2" id="text-3">
<p>
You’ll need to populate some lists with your data, but once you have
them, the bits of Python you need to make a nice D3 plot are:
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">from</span> nvd3 <span style="color: #a020f0;">import</span> lineChart
<span style="color: #a0522d;">chart</span> = lineChart(name=<span style="color: #8b2252;">"myChart"</span>, x_is_date=<span style="color: #008b8b;">True</span>,
date_format=<span style="color: #8b2252;">"%d %b %Y"</span>)
chart.set_graph_width(800)
chart.add_serie(y=myYdata, x=myXdata, name=<span style="color: #8b2252;">'My Awesome Data'</span>)
chart.buildhtml()
</pre>
</div>
<ol class="org-ol">
<li>The first <code>chart =</code> line contains two options that are important
for plotting date/time values: <code>x_is_date=True</code> and the string
format for the date set in <code>date_format</code>.
</li>
<li>The third line is actually <code>add_serie</code>, not <i>add_series</i> — it’s
French.
</li>
<li>The last line builds the HTML for the chart, placing the entire
HTML page in the <code>chart.htmlcontent</code> variable. For more on this,
see the <a href="http://python-nvd3.readthedocs.org/en/latest/classes-doc/NVD3Chart.html">python-nvd3 Chart Classes Reference</a>.
</li>
</ol>
<p>
Now you can print the HTML and open it in a browser, return in via
Flask or Bottle or whatever, etc.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">An Example</h2>
<div class="outline-text-2" id="text-4">
<div class="org-src-container">
<pre class="src src-python"><span style="color: #a020f0;">from</span> nvd3 <span style="color: #a020f0;">import</span> lineChart
<span style="color: #a020f0;">import</span> time
<span style="color: #a020f0;">import</span> datetime
<span style="color: #b22222;"># </span><span style="color: #b22222;">Set up our data</span>
<span style="color: #a0522d;">xdata</span> = [<span style="color: #8b2252;">'2015-08-01 09:00'</span>, <span style="color: #8b2252;">'2015-08-02 10:00'</span>, <span style="color: #8b2252;">'2015-08-03 11:00'</span>,
<span style="color: #8b2252;">'2015-08-04 12:00'</span>, <span style="color: #8b2252;">'2015-08-05 13:00'</span>, <span style="color: #8b2252;">'2015-08-06 14:00'</span>]
<span style="color: #a0522d;">ydata</span> = [10, 20, 30, 40, 25, 10]
<span style="color: #b22222;"># </span><span style="color: #b22222;">Convert xdata to datetime.datetime format</span>
<span style="color: #a0522d;">xdata</span> = [datetime.datetime.strptime(s, <span style="color: #8b2252;">"%Y-%m-%d %H:%M"</span>) <span style="color: #a020f0;">for</span> s <span style="color: #a020f0;">in</span> xdata]
<span style="color: #b22222;"># </span><span style="color: #b22222;">Convert datetime.datetime xdata to milliseconds since the epoch</span>
<span style="color: #a0522d;">xdata</span> = [time.mktime(s.timetuple()) * 1000 <span style="color: #a020f0;">for</span> s <span style="color: #a020f0;">in</span> xdata]
<span style="color: #b22222;"># </span><span style="color: #b22222;">create the lineChart with x_is_date and a date format string</span>
<span style="color: #a0522d;">chart</span> = lineChart(name=<span style="color: #8b2252;">"myChart"</span>, x_is_date=<span style="color: #008b8b;">True</span>,
date_format=<span style="color: #8b2252;">"%d %b %Y"</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">set the width of the chart</span>
chart.set_graph_width(800)
<span style="color: #b22222;"># </span><span style="color: #b22222;">add the data to the chart</span>
chart.add_serie(y=ydata, x=xdata, name=<span style="color: #8b2252;">'My Awesome Data'</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">build the HTML for the chart; you might prefer buildcontent() for an embeddable chart</span>
chart.buildhtml()
<span style="color: #b22222;"># </span><span style="color: #b22222;">print the data</span>
<span style="color: #a020f0;">print</span> chart.htmlcontent
</pre>
</div>
<div id="mychart"><svg style="width:800px;height:450px;"></svg></div>
<script>
data_mychart=[{"values": [{"y": 10, "x": 1438434000000.0}, {"y": 20, "x": 1438524000000.0}, {"y": 30, "x": 1438614000000.0}, {"y": 40, "x": 1438704000000.0}, {"y": 25, "x": 1438794000000.0}, {"y": 10, "x": 1438884000000.0}], "key": "My Awesome Data", "yAxis": "1"}];
nv.addGraph(function() {
var chart = nv.models.lineChart();
chart.margin({top: 30, right: 60, bottom: 20, left: 60});
var datum = data_mychart;
chart.xAxis
.tickFormat(function(d) { return d3.time.format('%d %b %Y')(new Date(parseInt(d))) }
);
chart.yAxis
.tickFormat(d3.format(',.02f'));
chart.showLegend(true);
d3.select('#mychart svg')
.datum(datum)
.transition().duration(500)
.attr('width', 800)
.attr('height', 450)
.call(chart);
});
</script>
</div>
</div>
Soul Seconds2015-02-28T00:00:00+00:00http://acaird.github.io/2015/02/28/soul-seconds<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>
People measure time in lots of ways, mostly seconds that tick by on
the clock, but there are other ways, and I think we should measure
more things in <i>soul-seconds</i>—the number of seconds of joy something
brings you in the future for what you did in the past.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Using time</h2>
<div class="outline-text-2" id="text-1">
<p>
The other day I received an email with three Microsoft Excel files
attached; each file was formatted to look like a calendar, with a row
of cells for dates in the week, then the following cells containing
events, one per cell. It looks sort of like this:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<colgroup>
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Sun</th>
<th scope="col" class="left">Mon</th>
<th scope="col" class="left">Tue</th>
<th scope="col" class="left">Wed</th>
<th scope="col" class="left">Thu</th>
<th scope="col" class="left">Fri</th>
<th scope="col" class="left">Sat</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">1</td>
<td class="left">2</td>
<td class="left">3</td>
<td class="left">4</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">event A1</td>
<td class="left">event B1</td>
<td class="left">event C1</td>
<td class="left">event D1</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left">5</td>
<td class="left">6</td>
<td class="left">7</td>
<td class="left">8</td>
<td class="left">9</td>
<td class="left">10</td>
<td class="left">11</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left">event A2</td>
<td class="left">event B2</td>
<td class="left">event C2</td>
<td class="left">event D2</td>
<td class="left">event E2</td>
<td class="left">event F2</td>
<td class="left">event G2</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">event C3</td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
</tr>
</tbody>
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
</tr>
</tbody>
</table>
<p>
That format isn’t helpful, because it isn’t a calendar. I’m not
sure why the people who make the calendar don’t use Outlook, or
Google Calendar, or something else that would be, for them, a
calendar. I assume they maintain two copies, and email me the worse
of the two.
</p>
<p>
Of all of the items on each monthly calendar, I care about 8 of
them. Manually transcribing them onto my Google calendar takes
about (at most) a minute each, so that’s 24 minutes of effort every
quarter, which is, approximately, 0 minutes in the grand scheme of
life.
</p>
<p>
All of that is measured in “wall clock” time, or time that ticks by
on a clock on the wall.
</p>
<p>
In “soul time”, it is something like 50 hours. It’s awful. It’s
drudgery for no reason. It’s error prone. It has no redeeming
value. I couldn’t dislike it more. I dislike it so much that some
quarters I don’t do it, then I spend three months being shocked when
those events roll around, and annoyed when they conflict with
something else I’ve scheduled. The whole process is the worst.
</p>
<p>
Wall clock: 24 minutes. Soul clock: 50 hours. You see the
problem.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Investing in soul clock time</h2>
<div class="outline-text-2" id="text-2">
<p>
To solve the pain of my 24-minutes-per-three-month problem, I spend
6 hours writing a <a href="https://github.com/acaird/call-sched-parser">computer program</a> to read the Excel file and give
me back an <a href="http://en.wikipedia.org/wiki/ICalendar">ICS</a> file that I can import into Google calendar (or
Apple’s iCal, or Microsoft’s Outlook, or lots of other calendar
programs, but not other spreadsheet programs, because the data isn’t
fucking spreadsheet data).
</p>
<p>
At the end of my project, I was able to put the information from the
Excel files into my calendar in a few seconds.
</p>
<p>
However, it took me six hours. That is about 12 quarters of doing
this manually (30 minutes per quarter goes into 6 hours 12 times).
That’s 3 years. There is no way that was worth the time
investment.
</p>
<p>
My soul time estimate for doing it manually was 50 hours per
quarter; in that case, my six hours was a great investment.
</p>
<p>
It’s all about perspective, and the clock on the wall shouldn’t be
your only perspective.
</p>
<p>
Some people say that automation doesn’t save you time
<a href="http://imgs.xkcd.com/comics/automation.png"><img src="http://imgs.xkcd.com/comics/automation.png" alt="automation.png" /></a>
and that can be true (clearly not in most cases, or the foundations
of the industrial revolution are in significant peril), but it is
important for you (and other humans) to factor in the pay-off from
the investment of soul-time, not just the investment of wall-clock
time.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">What is soul time?</h2>
<div class="outline-text-2" id="text-3">
<p>
Soul time is what you spend to make yourself happy in an on-going,
or future-looking, way. Like anything you don’t “waste” time on,
it’s an investment in the future. Soul time’s investment, though,
isn’t the same as wall-clock time; often it looks like a bad
investment when compared to a clock on the wall, but it makes you
happy. It cases where it looks like a bad investment with respect
to wall-clock time, you will get accused of “wasting” time, which is
too bad, because who ever accuses you of that doesn’t understand: a)
that not all investments pay off the same way and b) what makes you
happy. Because those people don’t understand that, you are safe to
ignore them (just be sure it’s <i>your</i> wall-clock time you are
investing; if it is your employer’s, that decision isn’t up to you).
</p>
<p>
If you can find a task (or job, or hobby, or whatever) that has a
ratio of soul time to wall-clock time that is 1 or less, guard that
with everything you have, for you have won: you are spending less
wall-clock time than soul-time, and thus coming out ahead on
soul-time while not spending (or, in some people’s eyes, “wasting”
time).
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Balance</h2>
<div class="outline-text-2" id="text-4">
<p>
Some things just take time and are no fun and somethings are fun but
not an investment in the future; both of those things are necessary
and part of life. One step to being happy is maximizing that slice
in the middle that is both: the balance between soul time and
wall-clock time.
</p>
<p>
And good luck getting to:
</p>
\begin{equation}
\frac{\mathrm{soul\ time}}{\mathrm{wallclock\ time}} \leq 1
\end{equation}
</div>
</div>
Running in the Heat2015-02-19T00:00:00+00:00http://acaird.github.io/running/2015/02/19/running-in-the-heat<p>
This month (February, 2015) I had the very good fortune to be able to
spend 5 days in San Juan, Puerto Rico, and I went for a run there
each of the 5 days. Part of the reason this was good fortune is that
my home town of Ann Arbor, Michigan is in the depths of some really
cold weather, although I had recently run 5 days there, too. The
running in San Juan seemed much more difficult, which I attributed to
the heat. I thought I’d look at my average heart rate over the runs
and see if there was anything noticeable.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Getting the Data</h2>
<div class="outline-text-2" id="text-1">
<p>
I use RunKeeper (<a href="http://www.runkeeper.com">http://www.runkeeper.com</a>) to track most of my
fitness activities, and they offer the most excellent feature of
allowing you to export your data.
</p>
<p>
To download your runs, log in to RunKeeper, click the settings gears
in the upper-right corner, and on the left-hand list of options
you’ll see “Export Data”, choose your date range and click the
“Export Data” button. After a few seconds or minutes you’ll get a
button that says “Download Now!”, click it and you’ll get a Zip file
of your data; the XML GPX files that this Python script reads and a
few CSV files with summary data.
</p>
<p>
I picked dates that let me make Table <a href="#runtimes">1</a>, and then I did a
little arithmetic by hand to come up with some average paces for
each location (Table <a href="#averagepace">2</a>).
</p>
<table id="runtimes" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption class="t-above"><span class="table-number">Table 1:</span> Runs</caption>
<colgroup>
<col class="right" />
<col class="right" />
<col class="left" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="right">Date</th>
<th scope="col" class="right">Time</th>
<th scope="col" class="left">Location</th>
<th scope="col" class="right">Pace</th>
</tr>
</thead>
<tbody>
<tr>
<td class="right">2015-01-31</td>
<td class="right">13:00</td>
<td class="left">AA</td>
<td class="right">8:12</td>
</tr>
<tr>
<td class="right">2015-02-03</td>
<td class="right">15:27</td>
<td class="left">AA</td>
<td class="right">8:33</td>
</tr>
<tr>
<td class="right">2015-02-07</td>
<td class="right">14:16</td>
<td class="left">AA</td>
<td class="right">8:07</td>
</tr>
<tr>
<td class="right">2015-02-08</td>
<td class="right">13:32</td>
<td class="left">AA</td>
<td class="right">8:09</td>
</tr>
<tr>
<td class="right">2015-02-10</td>
<td class="right">14:48</td>
<td class="left">AA</td>
<td class="right">8:34</td>
</tr>
<tr>
<td class="right">2015-02-15</td>
<td class="right">10:58</td>
<td class="left">SJ</td>
<td class="right">8:35</td>
</tr>
<tr>
<td class="right">2015-02-16</td>
<td class="right">09:40</td>
<td class="left">SJ</td>
<td class="right">9:06</td>
</tr>
<tr>
<td class="right">2015-02-17</td>
<td class="right">16:50</td>
<td class="left">SJ</td>
<td class="right">8:13</td>
</tr>
<tr>
<td class="right">2015-02-18</td>
<td class="right">15:50</td>
<td class="left">SJ</td>
<td class="right">8:29</td>
</tr>
<tr>
<td class="right">2015-02-19</td>
<td class="right">08:53</td>
<td class="left">SJ</td>
<td class="right">8:54</td>
</tr>
</tbody>
</table>
<table id="averagepace" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption class="t-above"><span class="table-number">Table 2:</span> Average Pace</caption>
<colgroup>
<col class="left" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Location</th>
<th scope="col" class="right">Average Pace (min/mile)</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Ann Arbor, MI</td>
<td class="right">8:19</td>
</tr>
<tr>
<td class="left">San Juan, PR</td>
<td class="right">8:39</td>
</tr>
</tbody>
</table>
<p>
The data I’m interested in, heart rate at each measurement, is
embedded in the GPX (<a href="http://en.wikipedia.org/wiki/GPS_Exchange_Format">GPS Exchange format</a>) files that RunKeeper
delivers. A GPX file from RunKeeper looks like:
</p>
<div class="org-src-container">
<pre class="src src-xml"><?<span style="color: #a020f0;">xml</span> <span style="color: #a0522d;">version</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">1.0</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">encoding</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">UTF-8</span><span style="color: #8b2252;">"</span>?>
<<span style="color: #0000ff;">gpx</span>
<span style="color: #a0522d;">version</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">1.1</span><span style="color: #8b2252;">"</span>
<span style="color: #a0522d;">creator</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">RunKeeper - http://www.runkeeper.com</span><span style="color: #8b2252;">"</span>
<span style="color: #483d8b;">xmlns</span>:<span style="color: #a0522d;">xsi</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">http://www.w3.org/2001/XMLSchema-instance</span><span style="color: #8b2252;">"</span>
<span style="color: #483d8b;">xmlns</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">http://www.topografix.com/GPX/1/1</span><span style="color: #8b2252;">"</span>
<span style="color: #483d8b;">xsi</span>:<span style="color: #a0522d;">schemaLocation</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd</span><span style="color: #8b2252;">"</span>
<span style="color: #483d8b;">xmlns</span>:<span style="color: #a0522d;">gpxtpx</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">http://www.garmin.com/xmlschemas/TrackPointExtension/v1</span><span style="color: #8b2252;">"</span>>
<<span style="color: #0000ff;">trk</span>>
<<span style="color: #0000ff;">name</span>><![<span style="color: #483d8b;">CDATA</span>[Running 2/19/15 8:53 am]]></<span style="color: #0000ff;">name</span>>
<<span style="color: #0000ff;">time</span>>2015-02-19T12:53:06Z</<span style="color: #0000ff;">time</span>>
<<span style="color: #0000ff;">trkseg</span>>
<<span style="color: #0000ff;">trkpt</span> <span style="color: #a0522d;">lat</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">18.441757000</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">lon</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">-66.018932000</span><span style="color: #8b2252;">"</span>><<span style="color: #0000ff;">ele</span>>9.0</<span style="color: #0000ff;">ele</span>><<span style="color: #0000ff;">time</span>>2015-02-19T12:53:06Z</<span style="color: #0000ff;">time</span>><<span style="color: #0000ff;">extensions</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>>85</<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>></<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>></<span style="color: #0000ff;">extensions</span>></<span style="color: #0000ff;">trkpt</span>>
<<span style="color: #0000ff;">trkpt</span> <span style="color: #a0522d;">lat</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">18.441755000</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">lon</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">-66.018906000</span><span style="color: #8b2252;">"</span>><<span style="color: #0000ff;">ele</span>>9.1</<span style="color: #0000ff;">ele</span>><<span style="color: #0000ff;">time</span>>2015-02-19T12:53:07Z</<span style="color: #0000ff;">time</span>><<span style="color: #0000ff;">extensions</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>>86</<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>></<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>></<span style="color: #0000ff;">extensions</span>></<span style="color: #0000ff;">trkpt</span>>
<<span style="color: #0000ff;">trkpt</span> <span style="color: #a0522d;">lat</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">18.441735000</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">lon</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">-66.018741000</span><span style="color: #8b2252;">"</span>><<span style="color: #0000ff;">ele</span>>9.2</<span style="color: #0000ff;">ele</span>><<span style="color: #0000ff;">time</span>>2015-02-19T12:53:13Z</<span style="color: #0000ff;">time</span>><<span style="color: #0000ff;">extensions</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>>90</<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>></<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>></<span style="color: #0000ff;">extensions</span>></<span style="color: #0000ff;">trkpt</span>>
[ ... ]
<<span style="color: #0000ff;">trkpt</span> <span style="color: #a0522d;">lat</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">18.442442000</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">lon</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">-66.018407000</span><span style="color: #8b2252;">"</span>><<span style="color: #0000ff;">ele</span>>8.8</<span style="color: #0000ff;">ele</span>><<span style="color: #0000ff;">time</span>>2015-02-19T13:38:23Z</<span style="color: #0000ff;">time</span>><<span style="color: #0000ff;">extensions</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>><<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>>165</<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">hr</span>></<span style="color: #483d8b;">gpxtpx</span>:<span style="color: #0000ff;">TrackPointExtension</span>></<span style="color: #0000ff;">extensions</span>></<span style="color: #0000ff;">trkpt</span>>
</<span style="color: #0000ff;">trkseg</span>>
</<span style="color: #0000ff;">trk</span>>
</<span style="color: #0000ff;">gpx</span>>
</pre>
</div>
<p>
and you can see the heart rate data embedded in the <code>gpxtpx</code> XML
name space.
</p>
<p>
In addition, RunKeeper names the GPX files like
<code>YYYY-MM-DD-HHMM.gpx</code>.
</p>
<p>
Now that I have a table of <a href="#runtimes">run times</a> and some GPX files with heart
rate data, the only thing left is to make a plot of it and look for
a trend.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Looking for trends</h2>
<div class="outline-text-2" id="text-2">
<p>
Jumping straight to the plot, there is nothing that strongly bears
out my theory that I was working harder in the heat.
</p>
<div id="hrplot" class="figure">
<p><img src="/assets/running-hr-warm-cold.png" alt="running-hr-warm-cold.png" />
</p>
</div>
<p>
The slope of my heart rate increases slightly faster in the heat,
but probably isn’t significant enough given only five samples in
each location. My average pace (in Table <a href="#averagepace">2</a>) was a fair
bit slower in the heat, so that combined with the faster increase in
heart rate looks like the heat has an effect, but it’s not shown as
powerfully as I felt it.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Conclusions and Next Steps</h2>
<div class="outline-text-2" id="text-3">
<p>
The heart rate data that wasn’t normalized for pace doesn’t show a
terribly powerful effect from the heat. Thinking about heart rate
increases over time and pace (or, better, pace over time) in each
climate might demonstrate a clearer impact of temperature on my
running.
</p>
<p>
I could try to look at the data again with more factors, but that
seems like more work than it’s worth to me.
</p>
<p>
I think collecting more data would be useful, but I wouldn’t want to
do it over a long period of time so I could minimize effects like
changes in fitness, injuries, conditions, etc., so I think
alternating weeks of running in Ann Arbor and San Juan for the
months of January and February is the best way to do this.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> Python source</h2>
<div class="outline-text-2" id="text-4">
<p>
The Python program that does this is below; I run it from within
Emacs <a href="http://orgmode.org/">Org mode</a>, so the data in Table <a href="#runtimes">1</a> is automatically
passed in as a variable; you would need to get it from the command
line or something if you extracted this script from Org mode.
</p>
<p>
There are three parts to this program: main, getHRs and plotHRs.
</p>
</div>
<div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1"><span class="section-number-3">4.1</span> <code>main</code></h3>
<div class="outline-text-3" id="text-4-1">
<p>
<code>main</code> imports some libraries and does a little data processing but
mostly calls the <code>getHRs</code> and <code>plotHRs</code> routines. It gets back a
<a href="http://matplotlib.org">Matplotlib</a> <code>fig</code> object and writes it to a file. The <code>return
(filename)</code> is an Org mode thing where it needs to get back the
string of the file name to put insert into itself (yes, it’s weird;
see
<a href="http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-python.html">http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-python.html</a>
for more information)
</p>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2"><span class="section-number-3">4.2</span> <code>getHRs</code></h3>
<div class="outline-text-3" id="text-4-2">
<p>
<code>getHRs</code> takes the information from Table <a href="#runtimes">1</a> and turns that
into RunKeeper GPX filenames, reads each file and uses <code>xml.etree</code>
to parse out the heart rate data. It uses the (hard-coded<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup>)
location information from Table <a href="#runtimes">1</a> to determine whether I was
running in the cold or in the warm, then computes averages<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup> for each
point.
</p>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3"><span class="section-number-3">4.3</span> <code>plotHRs</code></h3>
<div class="outline-text-3" id="text-4-3">
<p>
<code>plotHRs</code> uses Python’s <a href="http://matplotlib.org">Matplotlib</a> to plot the heart rate data and
linear fit data computed using <a href="http://www.numpy.org/">NumPy</a>. Basic plotting isn’t
difficult, but all plotting is fussy (although Wilkinson’s <a href="http://www.springer.com/us/book/9780387245447">Grammer
of Graphics</a> helps, making R’s <a href="http://ggplot2.org/">ggplot2</a> nicer than Matplotlib, in my
opinion), so there are a bunch of lines of code to make the plot
look OK (and even so…)
</p>
</div>
</div>
<div id="outline-container-sec-4-4" class="outline-3">
<h3 id="sec-4-4"><span class="section-number-3">4.4</span> Python Source</h3>
<div class="outline-text-3" id="text-4-4">
<div class="org-src-container">
<pre class="src src-python" id="hranalysis"><span style="color: #a020f0;">def</span> <span style="color: #0000ff;">getHRs</span>(runtimes):
<span style="color: #a0522d;">coldHR</span>=[]
<span style="color: #a0522d;">warmHR</span>=[]
<span style="color: #a0522d;">coldTot</span>=[]
<span style="color: #a0522d;">warmTot</span>=[]
<span style="color: #a020f0;">for</span> t <span style="color: #a020f0;">in</span> runtimes: <span style="color: #b22222;"># </span><span style="color: #b22222;">go through the elements in the table</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">construct the path from the elements in the table</span>
<span style="color: #a0522d;">path</span> = <span style="color: #8b2252;">"hr-heat/"</span>+t[0]+<span style="color: #8b2252;">"-"</span>+t[1].replace(<span style="color: #8b2252;">":"</span>,<span style="color: #8b2252;">""</span>)+<span style="color: #8b2252;">".gpx"</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">open the GPX files and parse the XML</span>
<span style="color: #a020f0;">with</span> <span style="color: #483d8b;">open</span>(path) <span style="color: #a020f0;">as</span> f:
<span style="color: #a0522d;">tree</span> = ElementTree.parse(f)
<span style="color: #b22222;"># </span><span style="color: #b22222;">extract the heart rate values from the XML tree into a list</span>
<span style="color: #a0522d;">a</span> = [<span style="color: #483d8b;">int</span>(node.text) <span style="color: #a020f0;">for</span> node <span style="color: #a020f0;">in</span>
<span style="color: #483d8b;">list</span>( tree.<span style="color: #483d8b;">iter</span>(<span style="color: #8b2252;">"{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}hr"</span>) )]
<span style="color: #a020f0;">if</span> t[2] == <span style="color: #8b2252;">"AA"</span>: <span style="color: #b22222;"># </span><span style="color: #b22222;">if we're in Ann Arbor where it's cold</span>
<span style="color: #a020f0;">if</span> <span style="color: #a020f0;">not</span> coldHR:
<span style="color: #a0522d;">coldHR</span> = a
<span style="color: #a0522d;">coldTot</span> = [1 <span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> coldHR] <span style="color: #b22222;"># </span><span style="color: #b22222;">make the count '1' for all of the values</span>
<span style="color: #a020f0;">else</span>:
<span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">min</span>(<span style="color: #483d8b;">len</span>(coldHR),<span style="color: #483d8b;">len</span>(a))):
<span style="color: #a0522d;">coldHR</span>[m] = (coldHR[m] + a[m])
<span style="color: #a020f0;">if</span> coldTot[m] == <span style="color: #008b8b;">None</span>:
<span style="color: #a0522d;">coldTot</span>[m] = 1 <span style="color: #b22222;"># </span><span style="color: #b22222;">extend the array (this might not actually work)</span>
<span style="color: #a020f0;">else</span>:
<span style="color: #a0522d;">coldTot</span>[m] += 1 <span style="color: #b22222;"># </span><span style="color: #b22222;">increment the count for averaging later</span>
<span style="color: #a020f0;">elif</span> t[2] == <span style="color: #8b2252;">"SJ"</span>: <span style="color: #b22222;"># </span><span style="color: #b22222;">if we're in San Juan where it's warm, do all the same stuff</span>
<span style="color: #a020f0;">if</span> <span style="color: #a020f0;">not</span> warmHR:
<span style="color: #a0522d;">warmHR</span> = a
<span style="color: #a0522d;">warmTot</span> = [1 <span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> warmHR]
<span style="color: #a020f0;">else</span>:
<span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">min</span>(<span style="color: #483d8b;">len</span>(warmHR),<span style="color: #483d8b;">len</span>(a))):
<span style="color: #a0522d;">warmHR</span>[m] = (warmHR[m] + a[m])
<span style="color: #a020f0;">if</span> warmTot[m] == <span style="color: #008b8b;">None</span>:
<span style="color: #a0522d;">warmTot</span>[m] = 1
<span style="color: #a020f0;">else</span>:
<span style="color: #a0522d;">warmTot</span>[m] += 1
<span style="color: #a020f0;">else</span>: <span style="color: #b22222;"># </span><span style="color: #b22222;">we don't know where we are</span>
<span style="color: #a020f0;">pass</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">apply all of our averages</span>
<span style="color: #a0522d;">coldHR</span> = [coldHR[m]/coldTot[m] <span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(coldTot))]
<span style="color: #a0522d;">warmHR</span> = [warmHR[m]/warmTot[m] <span style="color: #a020f0;">for</span> m <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(warmTot))]
<span style="color: #a020f0;">return</span> (warmHR, coldHR)
<span style="color: #a020f0;">def</span> <span style="color: #0000ff;">plotHRs</span>(HRs):
<span style="color: #a0522d;">cold</span>=[HRs[x][0] <span style="color: #a020f0;">for</span> x <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(HRs))]
<span style="color: #a0522d;">warm</span>=[HRs[x][1] <span style="color: #a020f0;">for</span> x <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(HRs))]
<span style="color: #a0522d;">x</span> = <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(cold))
<span style="color: #a0522d;">fig</span> = plt.figure()
fig.suptitle(<span style="color: #8b2252;">"Heart Rate in Warm and Cold Weather"</span>, fontsize=14, fontweight=<span style="color: #8b2252;">'bold'</span>)
<span style="color: #a0522d;">ax</span> = plt.subplot(111)
ax.set_ylim(0,180) <span style="color: #b22222;"># </span><span style="color: #b22222;">don't let autoscaling lie with plots</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">turn off a bunch of chartjunk</span>
ax.set_xticklabels(<span style="color: #8b2252;">''</span>*<span style="color: #483d8b;">len</span>(x)) <span style="color: #b22222;"># </span><span style="color: #b22222;">turn off the xticklabels, since they don't mean anything</span>
ax.spines[<span style="color: #8b2252;">'top'</span>].set_visible(<span style="color: #008b8b;">False</span>) <span style="color: #b22222;"># </span><span style="color: #b22222;">turn off top part of box (top spine)</span>
ax.spines[<span style="color: #8b2252;">'right'</span>].set_visible(<span style="color: #008b8b;">False</span>) <span style="color: #b22222;"># </span><span style="color: #b22222;">turn off right part of box (right spine)</span>
ax.yaxis.set_ticks_position(<span style="color: #8b2252;">'left'</span>) <span style="color: #b22222;"># </span><span style="color: #b22222;">turn off tick marks on right</span>
ax.xaxis.set_ticks_position(<span style="color: #8b2252;">'none'</span>) <span style="color: #b22222;"># </span><span style="color: #b22222;">turn off tick marks on top and bottom</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">http://matplotlib.org/examples/ticks_and_spines/spines_demo.html</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">http://matplotlib.org/api/axis_api.html</span>
<span style="color: #a0522d;">startSlopeCalc</span> = 75 <span style="color: #b22222;"># </span><span style="color: #b22222;">heuristically skip the ramp-up period when calculating slope</span>
<span style="color: #a0522d;">mC</span>, <span style="color: #a0522d;">bC</span> = np.polyfit(x[startSlopeCalc:], cold[startSlopeCalc:], 1)
<span style="color: #a0522d;">mW</span>, <span style="color: #a0522d;">bW</span> = np.polyfit(x[startSlopeCalc:], warm[startSlopeCalc:], 1)
<span style="color: #b22222;"># </span><span style="color: #b22222;">overlay the fit lines</span>
plt.plot(cold,<span style="color: #8b2252;">'b'</span>,label=<span style="color: #8b2252;">"Cold Weather, slope:"</span>+<span style="color: #483d8b;">str</span>(<span style="color: #483d8b;">round</span>(mC,2)))
plt.plot(warm,<span style="color: #8b2252;">'r'</span>,label=<span style="color: #8b2252;">"Warm Weather, slope:"</span>+<span style="color: #483d8b;">str</span>(<span style="color: #483d8b;">round</span>(mW,2)))
plt.legend(loc=3) <span style="color: #b22222;"># </span><span style="color: #b22222;">3=lower-left (see pydoc matplotlib.pyplot.legend)</span>
plt.xlabel(<span style="color: #8b2252;">''</span>)
plt.ylabel(<span style="color: #8b2252;">'heart rate (bpm)'</span>)
<span style="color: #b22222;"># </span><span style="color: #b22222;">generate and plot y-values for fit lines</span>
<span style="color: #a0522d;">yfitC</span>=[x*mC + bC <span style="color: #a020f0;">for</span> x <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(cold))]
<span style="color: #a0522d;">yfitW</span>=[x*mW + bW <span style="color: #a020f0;">for</span> x <span style="color: #a020f0;">in</span> <span style="color: #483d8b;">range</span>(<span style="color: #483d8b;">len</span>(cold))]
plt.plot(yfitC,<span style="color: #8b2252;">'b'</span>)
plt.plot(yfitW,<span style="color: #8b2252;">'r'</span>)
<span style="color: #a020f0;">return</span>(fig)
<span style="color: #a020f0;">if</span> <span style="color: #483d8b;">__name__</span> == <span style="color: #8b2252;">"__main__"</span>:
<span style="color: #a020f0;">import</span> numpy <span style="color: #a020f0;">as</span> np
<span style="color: #a020f0;">import</span> matplotlib
<span style="color: #a020f0;">import</span> matplotlib.pyplot <span style="color: #a020f0;">as</span> plt
<span style="color: #a020f0;">from</span> xml.etree <span style="color: #a020f0;">import</span> ElementTree
(w,c) = getHRs(runtimes)
<span style="color: #a0522d;">HRs</span> = <span style="color: #483d8b;">zip</span>(c,w) <span style="color: #b22222;"># </span><span style="color: #b22222;">put the cold and hot HR lists together, truncating to the shortest</span>
<span style="color: #a0522d;">fig</span> = plotHRs(HRs)
<span style="color: #a0522d;">filename</span> = <span style="color: #8b2252;">"assets/running-hr-warm-cold.png"</span>
fig.savefig(filename, <span style="color: #483d8b;">format</span>=<span style="color: #8b2252;">'png'</span>)
<span style="color: #a020f0;">return</span>(filename)
</pre>
</div>
</div>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
Because the GPX files have latitude data in them, it wouldn’t
be totally difficult to figure this out from the data, but hardcoding
it was suitable for me this time.
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> <p class="footpara">
The points don’t all line up an equal \(\Delta t\) away from each
other, but this whole thing is unscientific enough that I don’t think
that matters.
</p></div>
</div>
</div>
Soy Mozzarella2015-02-16T00:00:00+00:00http://acaird.github.io/food/2015/02/16/soycheese<p>
Trader Joe’s® soy mozzarella is a surprisingly good cheese substitute;
it’s no hand-made burrata or artisanal mozzarella, but it’s a fair
competitor in taste and texture to Kraft shredded mozzarella that
comes in a plastic bag. It also has no saturated fat, so yay for
heart health!
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Why soy cheese?</h2>
<div class="outline-text-2" id="text-1">
<p>
Why not just use regular part-skim mozzarella? Or actual good
mozzarella? I guess there are lots of reasons:
</p>
<ul class="org-ul">
<li>perhaps you are looking to reduce your fat intake or reduce your
calories
</li>
<li>perhaps you can’t find actual good mozzarella and, following the
principle of “don’t eat bad food” you figure you might as well eat
healthy food until you can find some high quality food
</li>
<li>maybe Trader Joe’s® is out of all of the other mozzarella
</li>
<li>maybe you are, like me, a terrible vegan, and don’t exactly count
milk proteins as non-vegan
</li>
</ul>
<p>
And probably other reasons. And your question is: can I tolerate
this non-cheese on my pizza?
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Trader Joe’s® Soy Cheese</h2>
<div class="outline-text-2" id="text-2">
<p>
Trader Joe’s® has a mozzarella flavor soy cheese alternative, which,
described that way, does not install confidence in how close it will
be to actually mozzarella.
</p>
<div class="figure">
<div class="figure">
<p><img src="/assets/soycheese1.png" alt="soycheese1.png" align="right" width="30%" />
</p>
</div>
</div>
<p>
The good news is that the nutrition information is pretty
encouraging, for 1 ounce of “cheese”, you’ll get:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left" />
<col class="right" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Total Fat</th>
<th scope="col" class="right"> </th>
<th scope="col" class="right">4g</th>
</tr>
<tr>
<th scope="col" class="left"> </th>
<th scope="col" class="right">Sat. Fat</th>
<th scope="col" class="right">0g</th>
</tr>
<tr>
<th scope="col" class="left"> </th>
<th scope="col" class="right">Trans. Fat</th>
<th scope="col" class="right">0g</th>
</tr>
<tr>
<th scope="col" class="left"> </th>
<th scope="col" class="right">Polyunsat. Fat</th>
<th scope="col" class="right">1g</th>
</tr>
<tr>
<th scope="col" class="left"> </th>
<th scope="col" class="right">Monounsat. Fat</th>
<th scope="col" class="right">3g</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Protein</td>
<td class="right"> </td>
<td class="right">7g</td>
</tr>
</tbody>
</table>
<p>
So for the things that, at least, I care about, that’s pretty good:
no saturated fat, respectable amounts of poly- and mono-unsaturated
fats, and a nice amount of protein.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Yes, but is it food?</h2>
<div class="outline-text-2" id="text-3">
<p>
Real cheese can be grated, has a good texture and flavor, melts on
pizza, and is generally cheese-like. What about this soy cheese?
</p>
<p>
Well, it grates pretty well.
</p>
<div class="figure">
<p><img src="/assets/soycheese2+3.png" alt="soycheese2+3.png" width="60%" align="center" />
</p>
</div>
<p>
And, on pizza, it melts pretty well.
</p>
<div class="figure">
<div class="figure">
<p><img src="/assets/soycheese4.png" alt="soycheese4.png" width="30%" align="right" />
</p>
</div>
</div>
<p>
So yes, it’s food. It’s not vegan (unless you’re a terrible vegan),
but it is better for you than mozzarella from a plastic bag, and it
tastes about the same.
</p>
<p>
Of course, given the choice between this and high quality
mozzarella, go with the cheese every time.
</p>
</div>
</div>
Using a Force Sensitive Resistor with a Raspberry Pi2015-01-07T00:00:00+00:00http://acaird.github.io/computers/2015/01/07/raspberry-pi-fsr<p>
Using a force sensitive resistor with a Raspberry Pi isn’t terribly
complicated, but I didn’t see it documented elsewhere on the Internet,
so here are my notes.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Ingredients</h2>
<div class="outline-text-2" id="text-1">
<p>
Assuming you have a <a href="http://www.adafruit.com/category/105">Raspberry Pi</a> and the desire to measure the
existence of some pressure, this is what else you’ll need:
</p>
<ul class="org-ul">
<li>a <a href="https://www.adafruit.com/products/2029">Pi Cobbler</a> that will let you attach the Pi to a breadboard with
a <a href="http://www.adafruit.com/products/1988">ribbon cable</a>
</li>
<li>a <a href="http://www.adafruit.com/products/239">breadboard</a> so you can plug everything in
</li>
<li>a <a href="http://www.adafruit.com/products/166">force sensitive resistor</a> (really, without this you can skip the
rest of this blog post)
</li>
<li>a MCP3008 <a href="http://www.adafruit.com/products/856">analog to digital converter</a> to convert the analog signals from
the FSR to digital signals that the Raspberry Pi can read
<p>
You can get a different ADC, but then these instructions will need
some interpretation.
</p>
</li>
<li>some <a href="http://www.adafruit.com/products/758">wires</a> to make connections on the breadboard
</li>
<li>a <a href="http://www.radioshack.com/10k-ohm-1-4-watt-carbon-film-resistor-5-pack/2711335.html">10k Ohm</a> resistor
</li>
</ul>
<p>
I’ve linked these to AdaFruit and RadioShack, but that’s just
because that’s what Google told me first, and that’s where I’ve
ordered most of my components; I have no affiliation with either of
them.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Wiring</h2>
<div class="outline-text-2" id="text-2">
<p>
Most of the wiring information I got was from AdaFruit’s
instructions at <a href="https://learn.adafruit.com/reading-a-analog-in-and-controlling-audio-volume-with-the-raspberry-pi/connecting-the-cobbler-to-a-mcp3008">Reading a Analog Input…</a>, which I’ll gratefully
paraphrase here.
</p>
<p>
First, attach the Pi Cobbler and th MCP3008 to the breadboard along
the center (see the diagram below).
</p>
<p>
Next, wire the Pi Cobbler to the MCP3008 following these
instructions:
</p>
<div class="figure">
<p><img src="/assets/mcp3008-cobbler.png" alt="mcp3008-cobbler.png" />
</p>
</div>
<p>
On the same unused location on the breadboard where Channel 0 is
connected, connect one of the leads from the FSR and connect that
same row on the breadboard to ground using the 10kOhm resistor. If
you leave this “pull-down resistor” out, the readings from the FSR
will just flap all around and be useless (at least, so says a friend
of mine who did that; you wouldn’t know him). Attach the other lead
of the FSR to the power rail (a resistor without a current isn’t
much good).
</p>
<p>
Finally, attach the <code>3v3</code> from the Pi Cobbler to the power rail of
the breadboard to deliver power to everything.
</p>
<p>
The end result should look something like this:
</p>
<div class="figure">
<p><img src="/assets/2015-01-07-rasp-pi-fsr-breadboard.png" alt="2015-01-07-rasp-pi-fsr-breadboard.png" />
</p>
</div>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Reading values from the FSR</h2>
<div class="outline-text-2" id="text-3">
<p>
Again, most of the script I use for this came from AdaFruit,
specifically from the <a href="https://learn.adafruit.com/reading-a-analog-in-and-controlling-audio-volume-with-the-raspberry-pi/script">Python script page</a> in the instructions for
<a href="https://learn.adafruit.com/reading-a-analog-in-and-controlling-audio-volume-with-the-raspberry-pi/connecting-the-cobbler-to-a-mcp3008">Reading a Analog Input…</a> that I mentioned above, although I did
make some changes. My script is available at
<a href="https://github.com/acaird/raspi-scale">https://github.com/acaird/raspi-scale</a>.
</p>
<p>
Running that script just prints out values between 0 and 1023,
depending on how hard you squeeze the FSR. What you do with that
data is up to you now. I plan to <a href="https://github.com/acaird/raspi-scale/blob/master/README.org">monitor my can of coffee beans</a>,
but you’ll have to check back later for the rest of that.
</p>
</div>
</div>
The Insidious Nature of Rewards Points2014-11-24T00:00:00+00:00http://acaird.github.io/other/2014/11/24/points<p>
I am a sucker for rewards points, the marketing people totally have my
number. And, for the really good ones, I don’t even mind.
</p>
<p>
Lots of things I use have points systems: <a href="http://www.moosejaw.com">Moosejaw</a> for outerwear,
<a href="http://www.delta.com">Delta</a> for travel (although Delta calls them miles and makes them
impossible to actually use), <a href="http://www.zingermans.com">Zingerman</a>’s for mail-order food, and
<a href="http://www.americanexpress.com">American Express</a> for buying all sorts of stuff.
</p>
<p>
I always forget about my Moosejaw points, can never figure out how to
use my Delta points, and Zingerman’s takes care of my points for me by
occasionally sending me a gift card. In that pool, Zingerman’s wins,
because I don’t have to do anything other than give them lots of money
in exchange for excellent gifts for friends.
</p>
<p>
The one that tops them all, though, is American Express. We spend
enough on our AmEx that we accrue points at a rate that keeps me
engaged in buying things from their Rewards Points store. For a while
all I was buying were Bowers & Wilkens <a href="http://www.bowers-wilkins.com/Wireless-Speakers/AirPlay-Speakers/A5">A5</a> speakers, until I was
forbidden from buying any more (by then, though, I had enough for all
of the rooms I wanted them in). Then for a while I was using points
to pay parts of the monthly bill—that turns out to be <b>totally
boring</b> so I stopped doing that. So points started accruing again and
pretty soon I had enough to make things interesting…
</p>
<p>
Detour…
</p>
<p>
Lately I’ve been using in-ear <a href="http://www.amazon.com/Shure-SE115m-Sound-Isolating-Headset/dp/B0031RG33C/">Shure SE115m+</a> ear-buds more and more at
home (by the way, flying without these is torture—these are the
perfect airplane ear-buds), and <a href="http://thestatusaudio.com/collections/headphones/products/classic-1">Status Classic</a> headphones more and
more at work and I’ve noticed that people leave me alone more when I’m
wearing the bigger headphones, so another pair of headphones was in
order.
</p>
<p>
All headphone research starts out at looking at headphones in the
$50–$100 range and escalates in about 0.5s to “monitors” in the
multiple hundreds of dollars range and then starts to involve
multi-hundred dollar <a href="http://schiit.com/products/valhalla-2">headphone tube amps</a> on top, then I get bummed and
stop.
</p>
<p>
<i>This time things were different… because of POINTS!</i>
</p>
<p>
… end of detour.
</p>
<p>
Now I have three facts in hand
</p>
<ul class="org-ul">
<li>I have gobs of AmEx points
</li>
<li>AmEx sells Bowers & Wilkens stuff
</li>
<li>I want the B&W P7 headphones (well, <i>need</i>, really)
</li>
</ul>
<p>
So anyhow, some new headphones are on their way.
</p>
Docker Port-forwarding with boot2docker2014-11-16T00:00:00+00:00http://acaird.github.io/computers/2014/11/16/docker-virtualbox-host-networking<p>
Port-forwarding from an application in Docker to a host running
<code>boot2docker</code> involves three OS instances and two port forwards. This
document will describe how to get a simple web application in a Docker
container to be accessible from the Mac OS X or Microsoft Windows
host.
</p>
<p>
Running this on a Mac or Windows computer is a little complicated,
because the Docker container is running in a VirtualBox VM, not
natively on the host as it does with Linux, so there is another layer
to get through to get networking working.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">The Pile of Computers and Their Networks</h2>
<div class="outline-text-2" id="text-1">
<div class="figure">
<p><img src="/assets/docker-boot2docker-host.png" alt="docker-boot2docker-host.png" />
</p>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Building an Image</h2>
<div class="outline-text-2" id="text-2">
<p>
Build your image with the Dockerfile that is in this directory (<code>.</code>)
and give it the name (or tag) <code>acaird/flask</code> by typing:
</p>
<pre class="example">
docker build -t "acaird/flask" .
</pre>
<p>
The tag (the thing in the quotes) can be anything; see the Docker
documentation for more on naming conventions, versioning, etc.
</p>
<p>
The Dockerfile looks something like:
</p>
<pre class="example">
FROM centos:centos6
MAINTAINER Andrew Caird "acaird@gmail.com"
# Apply all the updates
RUN yum update -y
# Install Apache and mod_wsgi for our Flask app
RUN yum install httpd mod_wsgi -y
# Get the new packages and python27
RUN yum install centos-release-SCL -y
RUN yum install python27 -y
# Install pip then use it to install Flask and its dependancies
RUN (. /opt/rh/python27/enable && easy_install-2.7 pip && pip install flask)
# Copy in our flask-virthost config file
COPY webserver/flask-virthost.conf /etc/httpd/conf.d/
# Copy in our flask app and templates
ADD web-app-reports.py /var/www/softwarereport/
ADD templates/ /var/www/softwarereport/templates/
# Expose Flask's default port 5000
EXPOSE 5000
CMD cd /var/www/softwarereport && . /opt/rh/python27/enable && python2.7 web-app-reports.py
</pre>
<p>
The thing that matters most in this Dockerfile for the purpose of
networking is the <code>EXPOSE 5000</code> line; this isn’t technically
required (more later) but I think it’s nice to keep it here for
documentation.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Getting to the port on your Mac</h2>
<div class="outline-text-2" id="text-3">
<p>
Now you have to get the port (port 5000, in our example) forwarded
from your computer to the VirtualBox host’s port. These steps will
do that:
</p>
<ul class="org-ul">
<li>Open the VirtualBox GUI and select the computer called
<code>boot2docker-vm</code> from the list on the left.
</li>
<li>Then choose <b>Settings</b> from the <b>Machine</b> menu (or press
Command-S on a Mac).
</li>
<li>In the <b>Settings</b> window, choose the <b>Network</b> icon at the top,
then click the <b>Port Forwarding</b> button.
</li>
<li>In the table that is presented, click the weird looking little
<code>+</code> sign on the right to add a rule.
</li>
<li>You can name the rule anything, but type in <code>127.0.0.1</code> for the
<code>Host IP</code> column, and <code>5000</code> in both the <code>Host Port</code> and <code>Guest
Port</code> columns. You can leave the <code>Guest IP</code> field empty.
</li>
</ul>
<p>
You can do this while the <code>boot2docker-vm</code> image is running, if it’s
convenient for you.
</p>
<p>
Once this is done, VirtualBox will connect port 5000 on the Mac (or
Windows) computer to port 5000 on the <code>boot2docker-vm</code> server. This
is Link #2.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Running the Container</h2>
<div class="outline-text-2" id="text-4">
<p>
To start a container from the image and do the Docker container to
the <code>boot2docker-vm</code> host port-forwarding, type:
</p>
<pre class="example">
docker run -t -i -p 5000:5000 acaird/flask
</pre>
<p>
This will start an instance, forward port 5000 between the
VirtualBox host (<code>boot2docker-vm</code>) and the Docker container; if your
command is <code>/bin/bash</code>, the <code>-t -i</code> options will connect you to the
shell; if you are using the Dockerfile above will show you the
output from Flask.
</p>
<p>
This is Link #1.
</p>
<p>
Also, as promised, the reason the <code>EXPOSE</code> command in the Dockerfile
isn’t required is that the <code>-p 5000:5000</code> sets up the port
forwarding whether or not you have an <code>EXPOSE</code> statement; for more
see <a href="http://docs.docker.com/reference/builder/#expose">http://docs.docker.com/reference/builder/#expose</a>
</p>
<p>
If you are running a shell (and not the exact Docker file from
above), from here you can start the test Flask server by typing:
</p>
<pre class="example">
. /opt/rh/python27/enable
cd /var/www/softwarereports
python2.7 web-app-report.py
</pre>
<p>
This loads the Python v2.7 environment and starts the Flask app with
Python’s web server. If all went well, you should see:
</p>
<pre class="example">
* Running on http://0.0.0.0:5000/
</pre>
<p>
And if everything went <i>super</i> well, you should be able to open a
web browser on your computer, go to <code>http://localhost:5000</code> and see
your Flask app’s <code>@app.route('/')</code> index page.
</p>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Recap</h2>
<div class="outline-text-2" id="text-5">
<p>
This isn’t supposed to be a Flask tutorial, it was just a handy way
to have a server. The goal is to point out that you have to get the
port of interest forwarded twice, once from the container to the
host (which is a virtual machine) and once from the virtual machine
to the physical machine that is running <code>boot2docker</code>. There are
lots of ways to do this, this is only one.
</p>
</div>
</div>
Salty vagrants, masters, and minions2014-09-30T00:00:00+00:00http://acaird.github.io/computers/2014/09/30/salt-vagrant<p>
There is some likelyhood that the combination of Salt and Vagrant will
be useful to me in the near future, so I started to experiment with
it, and it’s all pretty nice.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">This file</h2>
<div class="outline-text-2" id="text-1">
<p>
This org-mode file exists on a pretend computer, called <i>minion</i>,
that is controlled by another pretend computer called <i>master</i>.
These two pretend computers are both running Linux. The two
pretend computers are running on a real MacBook Air and are managed
by Vagrant. The minion is controlled by the master via Salt.
</p>
<p>
There are literally billions of people in the world to whom the
previous paragraph makes no sense.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Making pretend masters and minions</h2>
<div class="outline-text-2" id="text-2">
<ul class="org-ul">
<li>Install <a href="https://www.virtualbox.org/">VirtualBox</a>
</li>
<li>Install <a href="https://www.vagrantup.com/">Vagrant</a>
</li>
<li>Install <a href="https://developer.apple.com/xcode/downloads/">XCode</a>
</li>
<li>Follow the instructions at
<a href="https://github.com/dotless-de/vagrant-vbguest">https://github.com/dotless-de/vagrant-vbguest</a> to make sure your
VirtualBox guest additions match—this is important to make the
directory that is shared with the guests (<i>master</i> and <i>minion</i>)
work with the host. If you have XCode and some luck on your
side, the command:
<pre class="example">
vagrant plugin install vagrant-vbguest
</pre>
<p>
will do the trick.
</p>
</li>
<li>Follow the instructions at
<a href="http://humankeyboard.com/saltstack/2014/saltstack-virtualbox-vagrant.html">http://humankeyboard.com/saltstack/2014/saltstack-virtualbox-vagrant.html</a>
but add <code>sudo</code> where needed, because they aren’t in the document
everywhere I expected them to be.
</li>
<li>Don’t worry about the bunch of Salt errors from the minion
provisioning, Salt still works. What matters is that <code>test.ping</code>
works from the master:
<pre class="example">
You@Your-Computer$ vagrant ssh master
[...]
vagrant@master:~$ sudo salt minion test.ping
minion:
True
</pre>
<p>
If you see that, Salt commands will work. I think if you had a
state tree, you wouldn’t see those errors, but I haven’t tested
that yet.
</p>
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Salting the minion</h2>
<div class="outline-text-2" id="text-3">
<p>
Obviously, you want to install Emacs and Org-mode on the minion. To
do that, create a file on the master called <code>/srv/salt/emacs.sls</code>
that contains:
</p>
<pre class="example">
emacs:
pkg.installed
org-mode:
pkg.installed
</pre>
<p>
Then type: <code>vagrant@master:~$ sudo salt minion state.sls emacs</code> and
wait. When it’s done, you’ll get a report of all of the things that
were installed (there are a lot of them), but then you can log into
the minion and run the Proper Command.
</p>
<pre class="example">
You@Your-Computer$ vagrant ssh minion
[...]
vagrant@minion:~$ emacs -f org-mode
</pre>
<p>
and you’ll be so happy.
</p>
</div>
</div>
Scalable Signature Images for Electronic Documents2014-08-29T00:00:00+00:00http://acaird.github.io/computers/2014/08/29/scalable-signatures<p>
Many people include an image of their signature in electronic
documents to make them look traditional, and because putting a PGP
signature block in a PDF file would be weird and not really that
useful. Most of these images don't look much like signatures—they
are bitmaps that don't scale well and look blocky and not really like
a signature at all. We can do better.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">The Bitmap to Vector Miracle called <code>potrace</code></h2>
<div class="outline-text-2" id="text-1">
<p>
A while ago <a href="https://github.com/trozamon">Alec Ten Harmsel</a> and I were talking about how cool it
would be if you could take a bitmap (a PNG or JPG file) and convert
it into a vector file (SVG or PostScript or PDF), and also how hard
it would be do to that. A little Google'ing turned up a program
that magically does this.
</p>
<p>
Peter Selinger's <a href="http://potrace.sourceforge.net">potrace</a> (<i>polygon trace</i>) takes a bitmap (a file in
PBM, PGM, PPM, or BMP format), applies a lot of <a href="http://potrace.sourceforge.net/potrace.pdf">math</a>, and writes out
a file in one of several formats (the most useful of which, for our
purposes, are Encapsulated PostScript (EPS) and PDF).
</p>
<p>
You will need to <a href="http://potrace.sourceforge.net/#downloading">download potrace</a> (and maybe some other tools) to
follow along, but the results will be worth it. If you are using an
Apple computer running OS X, you can use Mac Ports or Brew to
install <code>potrace</code> and its dependancies.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Making an image file of your signature</h2>
<div class="outline-text-2" id="text-2">
<p>
There are three big steps to getting a scalable image file with your
signature in it.
</p>
<ol class="org-ol">
<li>Get a digital copy of your signature.
</li>
<li>Convert the file to one of <code>potrace</code>'s input formats.
</li>
<li>Use <code>potrace</code> to make an EPS or PDF file of your signature.
</li>
</ol>
</div>
<div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1">Getting a digital copy of your signature</h3>
<div class="outline-text-3" id="text-2-1">
<p>
There are countless ways to do this, but the four easiest ways are:
</p>
<ol class="org-ol">
<li>sign your name in a drawing app on a tablet, and email yourself
the image file.
</li>
<li>draw your signature in a drawing program on your computer, and
save the image file.
</li>
<li>sign your name on a piece of plain white paper and take a photo
of it with your camera phone and email the photo to yourself.
If you choose this way, make sure your signature is as dark as
possible and the paper is a white as possible; put the paper by
a window or other bright light source to take the photo.
</li>
<li>sign your name on a piece of white paper, scan it with a
document scanner, and email it to yourself. Even though the
scanner may send you a PDF, it isn't a scalable PDF, but a
bitmap wrapped in a PDF file.
</li>
</ol>
<p>
At the end of this process, you should have a file in PDF, JPG (or
JPEG), or PNG format that has your signature in it.
</p>
<p>
Once you have that file, you should crop it so that your signature
has a tight box around it. On a Mac, you can do this using
Preview.
</p>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2">Converting the file to one of <code>potrace</code>'s input format</h3>
<div class="outline-text-3" id="text-2-2">
<p>
<code>potrace</code> takes a limited number of input formats (for the good
reason why, see the <code>potrace</code> FAQ list), so the next step is to
convert your signature file into one of them. On a Mac, you can
use the appropriate command line tool:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left" />
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">If your file is</th>
<th scope="col" class="left">Convert it to</th>
<th scope="col" class="left"> </th>
</tr>
<tr>
<th scope="col" class="left">in this format</th>
<th scope="col" class="left">this format</th>
<th scope="col" class="left">By typing this</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">PNG</td>
<td class="left">PNM</td>
<td class="left"><code>pgntopnm file.png > file.pnm</code></td>
</tr>
<tr>
<td class="left">JPG / JPEG</td>
<td class="left">PNM</td>
<td class="left"><code>jpegtopnm file.jpg > file.pnm</code></td>
</tr>
<tr>
<td class="left">PDF</td>
<td class="left">PPM</td>
<td class="left"><code>pdftoppm file.pdf > file.ppm</code></td>
</tr>
</tbody>
</table>
<p>
these tools are available on Macs via MacPorts or Brew, on Linux
via your preferred package manager, and probably also on Windows.
</p>
<p>
At the end of this step, you should have a PNM or PPM version of
the file with the image of your signature in it.
</p>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3">Use <code>potrace</code> to get a PDF file</h3>
<div class="outline-text-3" id="text-2-3">
<p>
<code>potrace</code> is a command-line tool (although graphical interfaces are
available) that takes many parameters; the two that matter for our
purposes are:
</p>
<ul class="org-ul">
<li><code>-b pdf</code> to specify that the output is to be in PDF format
</li>
<li><code>-o signature.pdf</code> to specify that the output file is to be
called <code>signature.pdf</code>; you can change that to anything you
want
</li>
</ul>
<p>
The complete command, assuming you have a PNM file called
<code>signature.pnm</code> and you want your output file to be called
<code>signature.pdf</code> is:
</p>
<div class="org-src-container">
<pre class="src src-bash">potrace -b pdf -o signature.pdf signature.pnm
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Results</h2>
<div class="outline-text-2" id="text-3">
<p>
You should now have a PDF file with a scalable version of your
signature in it. If you open the original PNG file and the
resulting PDF file and zoom in on them, you should see a big
difference:
</p>
<div class="figure">
<p><img src="/assets/sig-png-pdf-potrace.png" alt="sig-png-pdf-potrace.png" />
</p>
<p><span class="figure-number">Figure 1:</span> Comparing PNG input (top) and PDF output (bottom) files</p>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Extra Bonus</h2>
<div class="outline-text-2" id="text-4">
</div><div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1">All in one command line</h3>
<div class="outline-text-3" id="text-4-1">
<p>
The command below will skip the generation of the PNM file and go
straight from the PNG file to the PDF file.
</p>
<div class="org-src-container">
<pre class="src src-bash">pngtopnm signature.png | potrace -b pdf -o signature.pdf
</pre>
</div>
<p>
On my system, this generates the warning
</p>
<pre class="example">
libpng warning: iCCP: known incorrect sRGB profile
</pre>
<p>
but that doesn't seem to have any effect on the output, so if you
see it, don't panic.
</p>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2">Using this signature in a LaTeX letter</h3>
<div class="outline-text-3" id="text-4-2">
<p>
If you are using a standard LaTeX letter format and pdflatex, you
should include the <code>graphicx</code> package and your signature line
should look like:
</p>
<div class="org-src-container">
<pre class="src src-latex"><span style="color: #a020f0;">\signature</span>{<span style="color: #a020f0;">\vspace</span>{-3em}<span style="color: #a020f0;">\includegraphics</span>[width=10em]{<span style="color: #483d8b;">sig.pdf</span>}<span style="color: #ff0000; font-weight: bold;">\\</span>Your Name}
</pre>
</div>
<p>
where <code>sig.pdf</code> is the PDF file containing the image of your
signature. You may have to adjust the <code>vspace</code> and the <code>width</code>
depending on the size of your signature and how you want it to
look.
</p>
<p>
If you are using the NewLFM LaTeX package and pdflatex for your
letters, you also need the <code>graphicx</code> package and your signature
line looks the same as above, but without the negative <code>vspace</code>:
</p>
<div class="org-src-container">
<pre class="src src-latex"><span style="color: #a020f0;">\signature</span>{<span style="color: #a020f0;">\includegraphics</span>[width=10em]{<span style="color: #483d8b;">sig.pdf</span>} <span style="color: #ff0000; font-weight: bold;">\\</span> Your Name}
</pre>
</div>
</div>
</div>
</div>
Org-Mode, MailMate, and Tables2014-06-27T00:00:00+00:00http://acaird.github.io/computers/2014/06/27/org-mailmate-tables<p>
MailMate supports writing emails in MarkDown and also offers an
extension to render tables; these get sent as a multi-part message
that is MarkDown and HTML, so other mail readers can render them,
too.
</p>
<p>
Org-Mode has, arguably, the best ASCII table editing environment, and
those tables can be written by hand or generated from SQL, R, Python,
etc. from within org-mode. Those tables can be rendered by org-mode
as HTML, LaTeX, plain text, markdown, and other formats.
</p>
<p>
Getting the org-mode tables rendered as an appropriate MarkDown table
that is suitable for insertion into MailMate is a great time saver for
the three people I know who might do that (Paul K. and Matt B., I’m
counting myself; if there is a fourth person, contact me, we can be
friends).
</p>
<p>
For example, this table looks pretty nice in HTML on this blog
(iidssm).
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Name</th>
<th scope="col" class="right">Age</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Andy</td>
<td class="right">43</td>
</tr>
<tr>
<td class="left">Michelle</td>
<td class="right">43</td>
</tr>
<tr>
<td class="left">Don</td>
<td class="right">69</td>
</tr>
<tr>
<td class="left">Maddie</td>
<td class="right">7</td>
</tr>
<tr>
<td class="left">Megan</td>
<td class="right">39</td>
</tr>
</tbody>
</table>
<p>
and what I typed was:
</p>
<pre class="example">
#+ORGTBL: SEND ages orgtbl-to-gfm
| Name | Age |
|----------+-----|
| Andy | 43 |
| Michelle | 43 |
| Don | 69 |
| Maddie | 7 |
| Megan | 39 |
</pre>
<p>
To go from the org table to the MarkDown table, these are the steps:
</p>
<ol class="org-ol">
<li>Paste the contents of <a href="https://gist.github.com/grafov/8244792">this Gist</a> into your <code>.emacs</code> file:
<script src="https://gist.github.com/grafov/8244792.js"></script>
</li>
<li>Type or generate your table in org-mode, including the “<code>#ORGTBL:
SEND</code>” line.
</li>
<li>Create a place for the MarkDown table to go, perhaps into a
<code>#+BEGIN/END COMMENT</code> block, using the <code>BEGIN/END RECEIVE...</code>
lines.
</li>
<li>Switch from Org-mode to text mode: <code>M-x text-mode</code>
</li>
<li>Enter the orgtbl minor mode: <code>M-x orgtbl-mode</code>
</li>
<li>Move the point (cursor) to the <code>ORGTBL: SEND</code> line and press
<code>C-c C-c</code> to generate the markdown table
</li>
<li>Switch back to Org-mode: <code>M-x org-mode</code>
</li>
</ol>
<p>
That will produce this output
</p>
<pre class="example">
# BEGIN RECEIVE ORGTBL ages
| Name | Age |
|---|--:|
| Andy | 43 |
| Michelle | 43 |
| Don | 69 |
| Maddie | 7 |
| Megan | 39 |
# END RECEIVE ORGTBL ages
</pre>
<p>
Now you can copy the MarkDown table to your MailMate message and it
will render as a table.
</p>
<div class="figure">
<p><img src="/assets/mailmate-table.png" alt="mailmate-table.png" />
</p>
</div>
Me, in a hardhat2014-04-12T00:00:00+00:00http://acaird.github.io/computers/2014/04/12/hardhat<p>
This is me in a hardhat.
</p>
<div class="figure">
<p><img src="/assets/acaird-hardhat.jpg" alt="Me wearing an IBM hardhat." />
</p>
</div>
<p>
This has more to do with computing than you might think.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Nice Hat</h2>
<div class="outline-text-2" id="text-1">
<p>
I am cleaning out my office, in a commitment to minimalism. A lot
of my strive for minimalism was brought about by cloud-based
service; if I can't do it in <a href="http://www.orgmode.org">Emacs Org Mode</a> and put it in GitHub or
do it in a Google doc, I don't want to do it; if I can, all of that
is in the cloud and my very expensive, but very nice, MacBook Air is
just a cloud access device. All of that is a different blog post.
I am cleaning out my office. I found this hardhat that I've been
keeping for years. I put it on to wear to get rid of it when I ran
into one of my excellent co-workers, Delisa, and she asked why I had
a hardhat.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">The story of the hardhat</h2>
<div class="outline-text-2" id="text-2">
<p>
In the olden days of of the mid-1990s I worked at <a href="http://caen.engin.umich.edu/">CAEN</a> (oh, wait, I
also work there now) as the only student employee of the only
full-time employee (Paul McClay) who was reponsible for the
high-performance computing environment. Paul went back to school to
study <a href="https://www.si.umich.edu/">Information</a>, and I became the only full-time employee (at
which point I hired the excellent <code>rebshol</code> (Becky Hollenbeck), but
the excellence of <code>rebshol</code> and U-M Engineering students in general
is also another blog post).
</p>
<p>
High-performance Computing systems have always placed extraordinay
demands on the data centers in which they are housed. Even in the
mid-1990s. However, U-M were building a new building, the
Integrated Technology Information Center (ITIC), with a very nice
modern data center. And we had a big computer that needed a nice
modern data center, a 32-node <a href="http://en.wikipedia.org/wiki/IBM_Scalable_POWERparallel">IBM SP2</a> (yes, at the time that was
big). As with all things, it was <i>critical</i> and <i>urgent</i> that we
move this computer out of its current data center. So we moved it
into ITIC. Along with it, my office moved, making me the first
denizen of the building.
</p>
<p>
Hey, when does the hardhat come in? Is this just a story of the
move of some ancient computer?
</p>
<p>
The building wasn't done. The data center was done, some offices
were done, but the building wasn't. I wasn't allowed in to my
office or to work on the computer because it was a construction site
with no Certificate of Occupancy. The computer needed a lot of
work. I needed to get to it.
</p>
<p>
After some negotiation about a scrawny kid who was clearly not a
construction worker, I was allowed to go to my office, but only if I
wore a hardhat to and from. I felt quite like a bad-ass doing
that. I wasn't, but that's beside the point. I had a hardhat.
Wicked.
</p>
<p>
A few months later the building was done, a bunch of other people
moved in, and I didn't need to wear a hardhat to work any longer.
Things were more boring.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Getting rid of the hardhat</h2>
<div class="outline-text-2" id="text-3">
<p>
This is the true story I told to Delisa, who then said I couldn't
get rid of the hat, because it was a good prop for a good story.
</p>
<p>
I'm hoping that this documentation will be enough so that she will
see her way clear to give me permission to retire the hat.
</p>
<p>
Minimalism vs. Memories.
</p>
<p>
This is why people keep things: fear of forgetting. A white piece
of plastic will bring those interesting old days to mind every time
I see it. I'm still young enough to be confident there are more
interesting days coming, so I'm OK with moving on, but only OK, not
certain. That is why minimalism is scary.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Some postscript notes</h2>
<div class="outline-text-2" id="text-4">
<p>
ITIC had its name changed before it was completed to the University
of Michigan Media Union (UMMU), because it is a place that was
supposed to bring together all sorts of media. Some years later,
Dr. James Duderstadt (go Nuclear Engineers!!) and his wife Anne made
a donation to the University which changed the name of that building
to the Duderstadt Center. Many people now call it <i>The Dude</i>. And
almost two decades and many jobs later, I do not need to wear a
hardhat to work, although some days it seems like I should.v
</p>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Post-postscript</h2>
<div class="outline-text-2" id="text-5">
<p>
Delisa agreed that I've met my duty to memory, and could get rid of
the hat:
</p>
<p class="verse">
From: Delisa<br />
Date: Sun, Apr 13, 2014 at 2:36 PM<br />
Subject: Re: now can I get rid of the hat?<br />
To: Andrew Caird<br />
<br />
Awesome! I think it is now safe to get rid of the hat. :-)<br />
<br />
-Delisa<br />
</p>
</div>
</div>
Slopegraphs in R with ggplot22013-11-27T00:00:00+00:00http://acaird.github.io/computers/r/2013/11/27/slopegraphs-ggplot<p>
Slopegraphs have seen some recent attention on <a href="http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0003nk">Edward Tufte's forum</a>
and in the data visualization community, especially Charlie Park's
<a href="http://charliepark.org/slopegraphs/">excellent treatment</a> of them. In this post, I make a simple slopegraph
using less than 20 lines of R and <code>ggplot2</code>.
</p>
<p>
Slopegraphs are very simple—there is no (as ET says) chartjunk.
</p>
<p>
There are examples of making slopegraphs in ggplot
(<a href="http://www.jameskeirstead.ca/blog/slopegraphs-in-r/">http://www.jameskeirstead.ca/blog/slopegraphs-in-r/</a>,
<a href="http://www.r-bloggers.com/slopegraphs-in-r-2/">http://www.r-bloggers.com/slopegraphs-in-r-2/</a>,
<a href="https://github.com/bobthecat/codebox/blob/master/table.graph.r">https://github.com/bobthecat/codebox/blob/master/table.graph.r</a>) all
over the web, but what's the web if it doesn't have one more example?
</p>
<p>
Plus, mine is simple—less than 20 lines of R code if you don't
include the initialization of the data.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Load the libraries</h2>
<div class="outline-text-2" id="text-1">
<p>
I obviously need the <code>ggplot2</code> library, otherwise I would be lying
in the title. I also load the <code>scales</code> library so I can label the
points nicely by putting commas in the numbers.
</p>
<div class="org-src-container">
<pre class="src src-R" id="libraries">library(ggplot2)
library(scales)
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Load the data</h2>
<div class="outline-text-2" id="text-2">
<p>
Likely you'll load the data from a file into a data frame, and it
doesn't matter too much what it looks like, as long as each row has
the start value, the end value, and the label somewhere in it. Or
it could be three different variables in other places.
</p>
<p>
You'll also need a variable for the x-axis. In my case, this is 24
months (see: <code>months<-24</code>)
</p>
<div class="org-src-container">
<pre class="src src-R" id="constants">months<-24
year1<-c(1338229205,5212325386,31725112511)
year3<-c(1372425378,8836570075,49574919628)
group<-c("Group C", "Group B", "Group A")
a<-data.frame(year1,year3,group)
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Set the arrays of labels</h2>
<div class="outline-text-2" id="text-3">
<p>
This makes the strings we'll use for the labels by combining the
<code>group</code> name with the value at each end of our slope line.
</p>
<p>
I also use the <code>comma_format()()</code> function from the <code>scales</code>
library here to make slightly cleaner looking labels.
</p>
<p>
If you prefered a more Tufte-esque label, you can omit the newline
from the <code>sep=</code> option and flip the order of the second set of
labels.
</p>
<div class="org-src-container">
<pre class="src src-R" id="labels">l11<-paste(a$group,comma_format()(round(a$year1/(3600*24*30.5))),sep="\n")
l13<-paste(a$group,comma_format()(round(a$year3/(3600*24*30.5))),sep="\n")
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Draw the slopelines</h2>
<div class="outline-text-2" id="text-4">
<p>
This line is pretty simple but draws the slopelines using the
<code>geom_segment</code> where the x-range is 0 to <code>months</code> and the y-range
is from the <code>a</code> data frame values.
</p>
<div class="org-src-container">
<pre class="src src-R" id="slopelines">p<-ggplot(a) + geom_segment(aes(x=0,xend=months,y=year1,yend=year3),size=.75)
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Set the theme to be nothingness</h2>
<div class="outline-text-2" id="text-5">
<p>
These settings turn off all of the default <code>ggplot2</code> decorations
(chartjunk?).
</p>
<div class="org-src-container">
<pre class="src src-R" id="theme">p<-p + theme(panel.background = element_blank())
p<-p + theme(panel.grid=element_blank())
p<-p + theme(axis.ticks=element_blank())
p<-p + theme(axis.text=element_blank())
p<-p + theme(panel.border=element_blank())
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-6" class="outline-2">
<h2 id="sec-6">Set the axis labels and limits</h2>
<div class="outline-text-2" id="text-6">
<p>
The x label is empty because we'll use the column headings to
denote the time span (or whatever x-range you have), the y label is
still useful, though, although the <code>theme_text(vjust=X)</code> moves the
label a little closer—you'll want to play with this for your own
plot until it looks correct to you.
</p>
<p>
Also, I make the plot area a little bigger by setting <code>xlim</code> and
<code>ylim</code> to be bigger than the data would imply. This is also a bit
of art, so you'll want to look at this for your own chart, too.
</p>
<div class="org-src-container">
<pre class="src src-R" id="axisLabelsAndLimits">p<-p + xlab("") + ylab("Amount Used")
p<-p + theme(axis.title.y=theme_text(vjust=3))
p<-p + xlim((0-12),(months+12))
p<-p + ylim(0,(1.2*(max(a$year3,a$year1))))
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-7" class="outline-2">
<h2 id="sec-7">Label the slopelines</h2>
<div class="outline-text-2" id="text-7">
<p>
Here we use the labels we created above in the <code>geom_text()</code>
function, once for each side of the slopeline. The first line
below is for the right side (<code>year3</code>) of the chart and the second
is for the left side (<code>year1</code>).
</p>
<p>
The <code>rep.int</code> command that sets <code>x</code> just repeats the
number—either <code>0</code> or <code>24</code>—to make the same number of elements
in <code>x</code> as there are in <code>y</code>.
</p>
<p>
Again, the <code>hjust</code> and <code>size</code> parameters will need some attention
to get your slopegraph to look just right.
</p>
<div class="org-src-container">
<pre class="src src-R" id="slopelineLabels">p<-p + geom_text(label=l13, y=a$year3, x=rep.int(months,length(a)),hjust=-0.2,size=3.5)
p<-p + geom_text(label=l11, y=a$year1, x=rep.int( 0,length(a)),hjust=1.2,size=3.5)
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-8" class="outline-2">
<h2 id="sec-8">Label the columns</h2>
<div class="outline-text-2" id="text-8">
<p>
The columns titles, or labels at the top of each side of the
slopegraph, are set here using <code>geom_text()</code> and setting <code>y</code> to
just a bit above the maximum value from the two lists of values
(<code>year1</code> and <code>year2</code>). Don't forget to pay attention to <code>hjust</code>
and <code>size</code> again.
</p>
<div class="org-src-container">
<pre class="src src-:exports" id="columnLabels">p<-p + geom_text(label="Year 1", x=0, y=(1.1*(max(a$year3,a$year1))),hjust= 1.2,size=5)
p<-p + geom_text(label="Year 3", x=months,y=(1.1*(max(a$year3,a$year1))),hjust=-0.1,size=5)
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-9" class="outline-2">
<h2 id="sec-9">Finally, some graphics</h2>
<div class="outline-text-2" id="text-9">
<p>
In the end, this R code produces this figure:
</p>
<div class="figure">
<p><img src="/assets/group-slopes.png" alt="group-slopes.png" />
</p>
</div>
</div>
</div>
<div id="outline-container-sec-10" class="outline-2">
<h2 id="sec-10">The entire script, suitable for copying-and-pasting is here</h2>
<div class="outline-text-2" id="text-10">
<div class="org-src-container">
<pre class="src src-R" id="wholeScript">library(ggplot2)
library(scales)
months<-24
year1<-c(1338229205,5212325386,31725112511)
year3<-c(1372425378,8836570075,49574919628)
group<-c("Group C", "Group B", "Group A")
a<-data.frame(year1,year3,group)
l11<-paste(a$group,comma_format()(round(a$year1/(3600*24*30.5))),sep="\n")
l13<-paste(a$group,comma_format()(round(a$year3/(3600*24*30.5))),sep="\n")
p<-ggplot(a) + geom_segment(aes(x=0,xend=months,y=year1,yend=year3),size=.75)
p<-p + theme(panel.background = element_blank())
p<-p + theme(panel.grid=element_blank())
p<-p + theme(axis.ticks=element_blank())
p<-p + theme(axis.text=element_blank())
p<-p + theme(panel.border=element_blank())
p<-p + xlab("") + ylab("Amount Used")
p<-p + theme(axis.title.y=theme_text(vjust=3))
p<-p + xlim((0-12),(months+12))
p<-p + ylim(0,(1.2*(max(a$year3,a$year1))))
p<-p + geom_text(label=l13, y=a$year3, x=rep.int(months,length(a)),hjust=-0.2,size=3.5)
p<-p + geom_text(label=l11, y=a$year1, x=rep.int( 0,length(a)),hjust=1.2,size=3.5)
p<-p + geom_text(label="Year 1", x=0, y=(1.1*(max(a$year3,a$year1))),hjust= 1.2,size=5)
p<-p + geom_text(label="Year 3", x=months,y=(1.1*(max(a$year3,a$year1))),hjust=-0.1,size=5)
p
</pre>
</div>
</div>
</div>
R is for Running2013-08-16T00:00:00+00:00http://acaird.github.io/running/computers/2013/08/16/5k-trend<p>
I recently ran a small 5k race in Ann Arbor, MI called the <a href="http://www.uap-ppubcrawl.com/">UA
Plumbers and Pipefitters 5k</a>. It raised money for the <a href="http://semperfifund.org/">Semper Fi
Fund</a>, which is a great cause. It also had an amazing logo of a
running U-shaped trap pipe, and I really wanted the t-shirt and
medal with that logo on it.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">5k</h2>
<div class="outline-text-2" id="text-1">
<p>
This race had a 6:50pm start, which is unusual, but sort of a nice
time, if you ask me, and it was a nice evening—warm, but not too
hot, and humid, but not too humid.
</p>
<p>
I ended up having a nice race, which prompted me to look up my past
times:
</p>
<table id="5ktimes" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption class="t-above"><span class="table-number">Table 1:</span> 5k Times</caption>
<colgroup>
<col class="left" />
<col class="left" />
<col class="right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Event</th>
<th scope="col" class="left">Date</th>
<th scope="col" class="right">Time</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Turkey Trot</td>
<td class="left">11/25/2010</td>
<td class="right">23:09</td>
</tr>
<tr>
<td class="left">Turkey Trot</td>
<td class="left">11/24/2011</td>
<td class="right">22:46</td>
</tr>
<tr>
<td class="left">Turkey Trot</td>
<td class="left">11/22/2012</td>
<td class="right">21:09</td>
</tr>
<tr>
<td class="left">Gallup Gallop</td>
<td class="left">7/14/2013</td>
<td class="right">20:37</td>
</tr>
<tr>
<td class="left">Plumbers and Pipefitters</td>
<td class="left">8/12/2013</td>
<td class="right">19:19</td>
</tr>
</tbody>
</table>
<p>
Noticing that every time was faster, I thought I'd make a plot,
since it would show a trend—a trend that I liked, since I got
faster every time. This might be a different blog post if there were
other trends—one with ``sample data''.
</p>
<div class="figure">
<p><img src="/assets/5k-times.svg" alt="5k-times.svg" />
</p>
<p><span class="figure-number">Figure 1:</span> 5k Times Trend</p>
</div>
<p>
That's a nice graph, if I do say so myself.
</p>
<p>
And my interpretation of the trend and spread is that I ran faster
than expected, which means I can run slower in my next 5k and still
maintain the trend. Yay for running slower!
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Making Graphs of Running Times with R</h2>
<div class="outline-text-2" id="text-2">
<p>
Because I like to make plots with the <a href="http://www.r-project.org">R</a> software for statistical
computing and graphics, that's what I used to make that plot, and
because this would be an <i>even more</i> self-centered blog post if I
didn't share something with you, following are the steps to make
that plot with your own running times.
</p>
</div>
<div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1">The one non-standard library and our data</h3>
<div class="outline-text-3" id="text-2-1">
<p>
The first step is to get the library we need— <code>ggplot2</code> —and
load the data:
</p>
<div class="org-src-container">
<pre class="src src-R" id="initialize">library(ggplot2)
r<-read.table("file.txt", header=TRUE, sep="|")
</pre>
</div>
<p>
Representing dates in R is pretty simple, but representing times is
a little trickier.
</p>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2">Getting the data just so</h3>
<div class="outline-text-3" id="text-2-2">
<p>
The next two lines convert the dates in the
table into dates that R understands and converts the times to
seconds for the sake of the plot.
</p>
<div class="org-src-container">
<pre class="src src-R" id="convert-dates-times">r$Date<-as.Date(r$Date,format='%m/%d/%Y')
r$Times<-(as.numeric(as.POSIXct(strptime(r$Time, format="%M:%OS"))) -
as.numeric(as.POSIXct(strptime("0", format="%S")))
)
</pre>
</div>
<p>
The second line is the result of some Google searching and
StackExchange finding, but in the end it converts the <code>MM:SS</code>
formatted times into seconds and stores it in <code>Times</code> (note the
extra <code>s</code> to denote seconds).
</p>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3">Setting up the y-labels</h3>
<div class="outline-text-3" id="text-2-3">
<p>
We want the y-labels back in our <code>MM:SS</code> format, and it would be
nice, for a small amount of data, to label the y-axis of every
point.
</p>
<div class="org-src-container">
<pre class="src src-R" id="make-y-labels">secs<-c(r$Times,seq(from=18*60, to=max(r$Times)+120, by=60*1))
labels<-paste((as.integer(secs/60)),
formatC(round((secs/60 - as.integer(secs/60)) * 60),
width=2,
flag="0"),
sep=":")
</pre>
</div>
<p>
First we make a vector called <code>secs</code> that has my run times,
converted to seconds, and then some ``normal'' times (19:00,
20:00, etc) converted to seconds. The line:
</p>
<div class="org-src-container">
<pre class="src src-R">seq(from=18*60, to=(max(r$Times)+120, by=60*1))
</pre>
</div>
<p>
makes a sequence of numbers starts at eighteen minutes (because I'm
confident I'll never run a 18:00 5k) and ends at two minutes more
than my slowest time (this leaves room on the plot for labels and
frames the times). The labels will be every one minutes
(<code>by=60*1</code>). That sequence defines the y-axis points, but would
make for non-intuitive labels.
</p>
<p>
The next line creates a vector called <code>labels</code> that converts the
seconds into the format <code>MM:SS</code> by <code>paste</code>-ing together minutes
and seconds separated by a colon (sep=":"). To get minutes, we
simply take the integer part of <code>secs</code> divided by 60, and that's
the first half of our paste. The second half of the paste also
needs to be padded with leading zeros if it isn't long enough
(otherwise your time might be 20:9 instead of 20:09), so we use
the <code>formatC</code> function with the options: our number, width=2 (pad
to two characters), and flag="0" (pad with 0s). Our number is the
decimal part of (<code>secs</code> divided by 60), multiplied by 60 to get
seconds and rounded to the nearest integer.
</p>
<p>
At this point we have two vectors: <code>secs</code> and <code>labels</code> that match
each other—one has seconds and one has <code>MM:SS</code>, each in the same
location in the vector.
</p>
</div>
</div>
<div id="outline-container-sec-2-4" class="outline-3">
<h3 id="sec-2-4">Using the data to make a pretty plot</h3>
<div class="outline-text-3" id="text-2-4">
<p>
At this point, we have all the data we need in the R data frame (a
data frame is like one sheet in an Excel spreadsheet) called <code>r</code>,
some labels in <code>secs</code> and <code>labels</code> and all we have left to do is use
<code>ggplot2</code> to plot it.
</p>
<p>
<code>ggplot2</code> builds a plot piece by piece, which is nice for making
incremental changes, and also nice for explaining, since each piece
stands on its own.
</p>
<div class="org-src-container">
<pre class="src src-R" id="plot-data-and-steps">plot = ggplot(r, aes(x=Date,y=Times,label=r$Event)) + geom_step()
</pre>
</div>
<p>
This line creates the <code>plot</code> object (although you can call it
whatever you want, it's a normal R variable) and starts the
<code>ggplot</code> process by telling it ``We're using the <code>r</code> data frame and
aesthetically we are going to use <code>Date</code> for the x-data and <code>Times</code>
(our Time converted to seconds) for the y-data and we're going to
label it with the <code>Event</code> name. To draw a line, we want steps, not
a series of slopes, so we add <code>geom_step()</code> to the plot.
</p>
<p>
Next we add the text for the <code>label=</code> we specified above and set a
size (3) and a vertical adjustment so they are above the point
(<code>vjust=-0.5</code>):
</p>
<div class="org-src-container">
<pre class="src src-R" id="plot-add-text">plot = plot + geom_text(size=3,vjust=-0.5)
</pre>
</div>
<p>
The x-axis can be a little too short to leave room for the long
race names in the labels, so we add a little on each end, by
subtracting from the minimum date (<code>min(r$Date)</code>) and adding to the
maximum date (<code>max(r$Date)</code>). The amount added is a guess based on
the size of the labels of the first and latest races.
</p>
<div class="org-src-container">
<pre class="src src-R" id="plot-set-xlim">plot = plot + xlim((min(r$Date)-60),max(r$Date)+90)
</pre>
</div>
<p>
Then we add points to each race along the step line and also a
smoothing range (the gray area in the plot) to get some sort of
prediction of the range.
</p>
<div class="org-src-container">
<pre class="src src-R" id="plot-add-points-and-smooth">plot = plot + geom_point() + stat_smooth(method="glm")
</pre>
</div>
<p>
Lastly, we use the <code>secs</code> and <code>labels</code> from above to make y-axis
labels, and set the range of the y-axis to be between 18 minutes
(<code>60*18</code>) (since I don't think I'll break an 18-minute 5k) and the
slowest time (<code>max(r$Times)</code>); turn of the x-axis label, since the
fact they it is dates is pretty evident, and then call <code>plot</code> to
draw the plot.
</p>
<div class="org-src-container">
<pre class="src src-R" id="plot-xylabels-plot">plot = plot + scale_y_continuous(breaks=secs,
labels=labels,
limits=c(60*18,max(r$Times)))
plot = plot + xlab("")
plot
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-2-5" class="outline-3">
<h3 id="sec-2-5">The final R script</h3>
<div class="outline-text-3" id="text-2-5">
<div class="org-src-container">
<pre class="src src-R">library(ggplot2)
r<-read.table("file.txt", header=TRUE, sep="|")
r$Date<-as.Date(r$Date,format='%m/%d/%Y')
r$Times<-(as.numeric(as.POSIXct(strptime(r$Time, format="%M:%OS"))) -
as.numeric(as.POSIXct(strptime("0", format="%S")))
)
secs<-c(r$Times,seq(from=18*60, to=max(r$Times)+120, by=60*1))
labels<-paste((as.integer(secs/60)),
formatC(round((secs/60 - as.integer(secs/60)) * 60),
width=2,
flag="0"),
sep=":")
plot = ggplot(r, aes(x=Date,y=Times,label=r$Event)) + geom_step()
plot = plot + geom_text(size=3,vjust=-0.5)
plot = plot + xlim((min(r$Date)-60),max(r$Date)+90)
plot = plot + geom_point() + stat_smooth(method="glm")
plot = plot + scale_y_continuous(breaks=secs,
labels=labels,
limits=c(60*18,max(r$Times)))
plot = plot + xlab("")
plot
</pre>
</div>
</div>
</div>
</div>
My Food and Wine Pairing Advice2013-08-02T00:00:00+00:00http://acaird.github.io/drinking/2013/08/02/wine-food-pairing<blockquote>
<p>
Leonardo da Vinci said ``Simplicity is the ultimate
sophistication.''. I must be incredibly sophisticated.
</p>
</blockquote>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Traditional Food and Wine Pairing</h2>
<div class="outline-text-2" id="text-1">
</div><div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1">Books</h3>
<div class="outline-text-3" id="text-1-1">
<p>
According to Amazon, there are 1,328 books that are classifies as
being about food and wine pairing. That is not simplicity.
</p>
<div class="figure">
<p><img src="/assets/amazon-food-wine-pairing.png" alt="amazon-food-wine-pairing.png"/></p>
</div>
</div>
</div>
<div id="outline-container-sec-1-2" class="outline-3">
<h3 id="sec-1-2">Pictures</h3>
<div class="outline-text-3" id="text-1-2">
<p>
At <a href="http://shop.winefolly.com/products/basic-food-wine-pairing">Wine Folly</a> is a really nice looking poster about food and wine
pairing. It's great art, but it's not simple.
<img src="/assets/prints-wine-and-food-pairing.jpg" alt="prints-wine-and-food-pairing.jpg"/>
</p>
</div>
</div>
<div id="outline-container-sec-1-3" class="outline-3">
<h3 id="sec-1-3">Rules</h3>
<div class="outline-text-3" id="text-1-3">
<p>
Traditionally, one could rely on the <i>red with meat, white with
fish and foul</i> rule, which is pretty simple, but not yet
sophisticated.
</p>
</div>
</div>
<div id="outline-container-sec-1-4" class="outline-3">
<h3 id="sec-1-4">The Problem</h3>
<div class="outline-text-3" id="text-1-4">
<p>
This much complexity leads to suffering. And I'm here to help end
this suffering.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Sophisticated Food and Wine Pairing</h2>
<div class="outline-text-2" id="text-2">
</div><div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1">New rule</h3>
<div class="outline-text-3" id="text-2-1">
<p>
whatever food you like + whatever wine you have = perfect food and
wine pairing
</p>
<p>
If that isn't clear, I have a diagram for you:
</p>
<div class="figure">
<p><img src="/assets/pairing-food-with-wine.png" alt="pairing-food-with-wine.png"/></p>
</div>
<p>
Sometimes, the food is optional.
</p>
<div class="figure">
<p><img src="/assets/fruit-salad.png" alt="fruit-salad.png"/></p>
</div>
<p>
So.phis.ti.ca.tion.
</p>
</div>
</div>
</div>
Two out of Three Two Lads2013-07-21T00:00:00+00:00http://acaird.github.io/drinking/2013/07/21/two-lads-sparkling-wine<blockquote>
<p>
Two Lads Winery is a Michigan winery that makes a very nice sparkling
Pinot Grigio, but, like many lads, it doesn't keep the juice in the
bottle when it should.
</p>
</blockquote>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Another Michigan Sparkling</h2>
<div class="outline-text-2" id="text-1">
<div class="figure">
<p><img src="/assets/2lads-bottles.jpg" align="right" alt="2lads-bottles.jpg"/></p>
</div>
<p>
Earlier this week I was talking to my friend John about his trip to
California and the wines he and his girlfriend tried. Having been
to Sonoma earlier this year, it was nice to compare notes and hear
about different wineries.
</p>
<p>
Being in Michigan, and being True to the Mitten (even if the bottom
right cuff is <a href="http://www.nytimes.com/2013/07/19/us/detroit-files-for-bankruptcy.html">a bit frayed</a>), we got on the topic of Michigan wines.
</p>
<p>
I'm quite fond of sparkling wine, and was talking to John about the
<a href="http://www.lmawby.com/">Mawby</a> wines from Northern Michigan and our most excellent experience
at <a href="http://www.domainecarneros.com/">Domaine Carneros</a> in Sonoma.
</p>
<p>
While I've never been to the Northern Michigan wineries, despite
having lived just north of them for 15 years, John has been. He
mentioned that he really enjoyed the <a href="http://www.2lwinery.com/">Two Lads</a> sparkling Pinot Grigio
and that it was available in Ann Arbor at a few places (<a href="http://www.producestation.com/">The Produce
Station</a> and <a href="http://www.wholefoodsmarket.com/stores/annarbor">Whole Foods</a>, among others).
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">The Beginning of the End</h2>
<div class="outline-text-2" id="text-2">
<p>
Happily for me, my aunt works at The Produce Station, so I emailed
her to see if they had the 2 Lads Sparkling Pinot Grigio in stock.
</p>
<pre class="example">
Date: Jul 15
From: Gail
To: me
Yes, we have sparkling pinot grigio! gail
</pre>
<p>
Success!
</p>
<p>
After work I went to The Produce Station and picked up some proper
food and three bottles of wine. The bottles are quite pretty.
</p>
<p>
As a brief but relevant aside, it was very hot out—more than
90F—and my car was in the sun all day.
</p>
<p>
After loading my cherry tomatoes and baby carrots and sparkling wine
into the car, I headed home, looking forward to dinner and sharing
a glass of wine with my wife.
</p>
<p>
The distance, according to Google Maps, between The Produce Station
and my house is 3.1 miles. As it turns out, a fateful 3.1 miles.
</p>
<p>
At about 2 miles, after 6 turns, I heard from the back of my station
wagon the very upsetting sound <i>Pft! Shhhhhhhhhh!</i>. Crap. One of
the bottles had done something bad. But the bottle and its badness
were in the way back, and all I could tell was that the windows were
dry and there was no Bellagio-style fountain show evident in the
rear-view mirror. So that was promising. And the sparkling did
smell pretty good, which was also promising.
</p>
<p>
Being a mere mile from home, I made the call to leave the other two
(sealed) bottles back there, avoid pot-holes, and try to make it
home. Also, I was in denial about the state of the back of my car.
</p>
<p>
<i>Pft! Shhhhhhhhhh!</i>.
</p>
<p>
Crap.
</p>
<div class="figure">
<p><img src="/assets/2lads-caps.jpg" align="right" alt="2lads-caps.jpg"/></p>
</div>
<p>
I pulled over. I was not going to lose three bottles of wine in
3.1 miles. That's just mad.
</p>
<p>
The two bottles that burst hadn't really burst—the caps were
still on and 80% of the wine that 2 Lads had put in the bottles was
still there, leaving about 1/2 a bottle of sparkling wine in the
back of my car, but even so…
</p>
<p>
I took all three bottles from the way-back and put them on the
floor of the passenger's side and kept a close eye on them on the
way home. Little bastards. Seriously, would a cork have killed
you?
</p>
<p>
This is where the French, regardless of what you think of the rest
of the country, has their heads on right. If it's carbonated, and
pressurized, and delicious, take some damn care. Like using a cork,
and not just any cork, but a <a href="http://www.nytimes.com/2012/12/23/magazine/who-made-that-champagne-cork.html">special cork</a>, which is then further
controlled with a <a href="http://en.wikipedia.org/wiki/Muselet">muselet</a>, that wire thing that adds all of the
drama to opening a bottle of sparkling wine until the pop of the
cork, which then steals the show.
</p>
<p>
A <a href="http://en.wikipedia.org/wiki/Crown_cork">crown cork</a> is neither a crown nor a cork nor suitable for
sparkling wine. 2 Lads, please, a cork and muselet. For the love
of God and the back of my car and your delicious wine seal those
bottles properly.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Yes, but the wine! How is the wine?!?</h2>
<div class="outline-text-2" id="text-3">
<p>
The sealed bottle got a place of honor among the Mawby and Domain
Carneros bottles in the basement refrigerator; the other two were
in the upstairs refrigerator, at risk of being consumed. But
wait…
</p>
<p>
These bottles are tall. Really tall. Impractically tall.
</p>
<p>
I had to rearrange those door-shelf things so they would fit; they
weren't going on their sides after the little demonstration in the
back of my car. Fool me once, shame on you; waste even more wine,
shame on me.
</p>
<p>
After they were chilled, we opened the one with the least wine in
it.
</p>
<p>
Thumbs up.
</p>
<p>
It's a flavorful, dry sparkling that is very drinkable. It is
another Michigan sparkling wine success that has established its
place in my rotation of Michigan sparklers. Mawby, you still own
the price/performance, but watch out, these Lads have a good thing
going.
</p>
<p>
Now I <i>really</i> wish that missing 1/2 bottle wasn't soaked into the
way-back of my car.
</p>
</div>
</div>
Typesetting in 90 Minutes2013-07-08T00:00:00+00:00http://acaird.github.io/computer/2013/07/08/latex-class<blockquote>
<p>
On Wednesday, August 7, 2013 I'm going to give a quick introduction
to LaTeX. I'll be in the Johnson Rooms in the Lurie Building from
12:00p–1:30pm.
</p>
<p>
I'll be updating this blog entry in the coming weeks; please check
back for updates, instructions, and notes.
</p>
</blockquote>
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">LaTeX in 90 minutes</h2>
<div class="outline-text-2" id="text-1">
<p>
In 90 minutes in the Johnson Rooms we'll go over how to install
LaTeX on computers running Microsoft Windows, Apple OS X, and
Ubuntu Linux, then how to write a simple document and produce a PDF
file, how to add a title, author, table of contents, equations, and
graphics. The last third of the time will be spent answering
specific questions about LaTeX.
</p>
<p>
If you have experience with LaTeX, the early part of the talk isn't
likely to be useful to you, but the later parts might be.
</p>
<p>
You should bring a laptop, as this will be a hands-on discussion.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">The Agenda for the Class</h2>
<div class="outline-text-2" id="text-2">
<dl class="org-dl">
<dt> 12:00p–12:30p </dt><dd>sorting out the installation of LaTeX and its
supporting programs on your laptops, and getting
it installed if you haven't done so already.
</dd>
<dt> 12:30p–1:00p </dt><dd>writing a first simple document and producing a
PDF file, then augmenting the document with a
title, author, equations, figures, tables and
tables of contents, figures, and tables.
</dd>
<dt> 1:00p–1:30p </dt><dd>answering particular questions about LaTeX and
your use of it
</dd>
</dl>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Preparation for the Class</h2>
<div class="outline-text-2" id="text-3">
<p>
Before coming to class, please install LaTeX on your
laptop<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup> and download a copy of <a href="http://tobi.oetiker.ch/lshort/lshort-letter.pdf"><i>The Not So
Short Introduction to LaTeX</i></a> and bring an electronic or printed
copy.
</p>
</div>
<div id="outline-container-sec-3-1" class="outline-3">
<h3 id="sec-3-1">Installing LaTeX on Microsoft Windows</h3>
<div class="outline-text-3" id="text-3-1">
<p>
The LaTeX installation for Windows is called MikTeX and can be
downloaded from <a href="http://www.miktex.org/about">http://www.miktex.org/about</a>.
</p>
<p>
The first few FAQ entries at <a href="http://docs.miktex.org/faq/faq.html">http://docs.miktex.org/faq/faq.html</a>
may be helpful in getting MikTeX installed.
</p>
</div>
</div>
<div id="outline-container-sec-3-2" class="outline-3">
<h3 id="sec-3-2">Installing LaTeX on Apple OS X</h3>
<div class="outline-text-3" id="text-3-2">
<p>
The LaTeX installation for Mac OS X is called MacTeX and can be
downloaded from <a href="http://tug.org/mactex/">http://tug.org/mactex/</a>.
</p>
<p>
That same web page (<a href="http://tug.org/mactex/">http://tug.org/mactex/</a>) links to other helpful
documents about maintaining the MacTeX installation.
</p>
</div>
</div>
<div id="outline-container-sec-3-3" class="outline-3">
<h3 id="sec-3-3">Installing LaTeX on Ubuntu Linux</h3>
<div class="outline-text-3" id="text-3-3">
<p>
LaTeX can be installed on Ubuntu through the software center or
via the command line by typing <code>sudo apt-get install texlive</code> at
a shell prompt.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Writing a Simple Document</h2>
<div class="outline-text-2" id="text-4">
<p>
To test your LaTeX installation, you should create a simple
document<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup>.
</p>
</div>
<div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1">Contents of a simple LaTeX document</h3>
<div class="outline-text-3" id="text-4-1">
<p>
The contents of a very simple LaTeX document are:
</p>
<div class="org-src-container">
<pre class="src src-latex">\documentclass{article}
\begin{document}
This is my document. There isn't much to it.
This is the second paragraph of my document. This
paragraph is longer than the first paragraph because
I kept typing words in this paragraph, and didn't
type as many words in the preceding paragraph.
This is my third paragraph, the shortest yet.
\end{document}
</pre>
</div>
<p>
The first line, <code>\documentclass{article}</code>, is the most common type of
document you'll write in LaTeX—the other common options are
<code>letter</code> for letter that you would mail to someone and <code>book</code>,
which adds the option of chapters to your document. For anything
beyond <code>article</code> you will probably want to consider packages other
than <code>letter</code> and =book=<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup>.
</p>
<p>
The second and last lines, <code>\begin{document}</code> and <code>\end{document}</code>
bound your document—you always have to have these two lines and
nearly all of your content will be between those two lines.
</p>
<p>
Between the <code>\begin</code> and <code>\end</code> lines is the content for your
document. In this example it is three paragraphs. A paragraph is
created by leaving a blank line. You don't have to wrap the lines
yourself, they will be processed by LaTeX when you process your
document.
</p>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2">Typing the document</h3>
<div class="outline-text-3" id="text-4-2">
<p>
A LaTeX document is simply a text file—this is one of the
beauties of LaTeX: you'll always be able to read the text file.
</p>
<p>
If you are familiar with a text editor for programming, such as
<code>emacs</code> or <code>vi</code>, you can use that editor to write LaTeX
documents. If you prefer a more integrated environment for
typing your documents, MacTeX comes with TeXworks and TeXshop,
MikTeX comes with TeXworks, and on Ubuntu there are TeXmaker,
Kile, <a href="http://gummi.midnightcoding.org/">Gummi</a>, and many others.
</p>
<p>
After opening your editor, you should be able to paste in the
sample document above, or start writing your own short document
between the <code>\begin{document}</code> and <code>\end{document}</code> lines.
</p>
<p>
Save the file as <code>sample.tex</code>.
</p>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3">Processing the document</h3>
<div class="outline-text-3" id="text-4-3">
<p>
When you are done writing, you can process the document to a PDF
file and see what you've created<sup><a id="fnr.4" name="fnr.4" class="footref" href="#fn.4">4</a></sup>.
</p>
<p>
If you have a <code>.tex</code> file, you can process it from the command
line in the Windows <code>command</code> shell, the Mac <code>Terminal</code>, or a
Linux <code>xterm</code> by typing:
</p>
<pre class="example">
pdflatex sample.tex
</pre>
<p>
and you should see several lines of output, the first few and last
few should be something like:
</p>
<pre class="example">
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
restricted \write18 enabled.
entering extended mode
(./sample.tex
[... many more lines ...]
LaTeX2e <2011/06/27>
Output written on sample.pdf (1 page, 16198 bytes).
Transcript written on sample.log.
</pre>
<p>
The LaTeX editors will have a menu item or button that will do the
processing for you.
</p>
<p>
If there is an error, it isn't always easy to decode, but some of
the editors will try to set the cursor at the point of the error.
Otherwise, carefully read the error and check your <code>.tex</code> file for
missing brackets, a <code>\begin</code> without its matching <code>\end</code>, and
other typos.
</p>
<p>
Once you are able to successfully process <code>sample.tex</code> into
<code>sample.pdf</code> and open <code>sample.pdf</code> and see it, we can start adding
to the document.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Adding to the document</h2>
<div class="outline-text-2" id="text-5">
<p>
Once we have a simple document, we can start adding more
interesting parts to it.
</p>
</div>
<div id="outline-container-sec-5-1" class="outline-3">
<h3 id="sec-5-1">Title and Author</h3>
<div class="outline-text-3" id="text-5-1">
<p>
Adding a title and author or authors to your document is
straightforward<sup><a id="fnr.5" name="fnr.5" class="footref" href="#fn.5">5</a></sup>.
</p>
<p>
Between the <code>\documentclass</code> and <code>\begin{document}</code> lines, you
should add:
</p>
<pre class="example">
\title{My Sample Document}
\author{Type Your Name Here}
</pre>
<p>
and immediately after the <code>\begin{document}</code> line add the line:
</p>
<pre class="example">
\maketitle
</pre>
<p>
After processing the updated <code>.tex</code> file, you will see a title,
author, and today's date added to the top of your PDF file.
</p>
</div>
</div>
<div id="outline-container-sec-5-2" class="outline-3">
<h3 id="sec-5-2">Sections and subsections</h3>
<div class="outline-text-3" id="text-5-2">
<p>
LaTeX supports numbering sections, subsections, and sub-subsections
by simply adding a line that reads <code>\section{My Section Title}</code> (or
<code>\subsection{My subsection Title}</code> or <code>\subsubsection{My
subsubsection Title}</code>) before the section<sup><a id="fnr.6" name="fnr.6" class="footref" href="#fn.6">6</a></sup>.
</p>
<p>
Editing <code>sample.tex</code> and adding <code>\section</code> before the first and
third paragraphs and <code>\subsection</code> before the second paragraph
brings us to a source document that looks like:
</p>
<div class="org-src-container">
<pre class="src src-latex">\documentclass{article}
\title{My Sample Document}
\author{Type Your Name Here}
\begin{document}
\maketitle
\section{My first section}
This is my document. There isn't much to it.
\subsection{A subsection for fun}
This is the second paragraph of my document. This
paragraph is longer than the first paragraph because
I kept typing words in this paragraph, and didn't
type as many words in the preceding paragraph.
\section{My very short third section}
This is my third paragraph, the shortest yet.
\end{document}
</pre>
</div>
<p>
Processing that file, assuming I didn't make any errors and you
didn't introduce any in its transcription, should result in a
document with a title, author, date, and two sections, one with a
subsection.
</p>
</div>
</div>
<div id="outline-container-sec-5-3" class="outline-3">
<h3 id="sec-5-3">Equations and Tables</h3>
<div class="outline-text-3" id="text-5-3">
</div><div id="outline-container-sec-5-3-1" class="outline-4">
<h4 id="sec-5-3-1">Equations</h4>
<div class="outline-text-4" id="text-5-3-1">
<p>
There are two types of basic equations in LaTeX: equations that
are part of the text, in-line math, and equations that are offset
from the text and numbered<sup><a id="fnr.7" name="fnr.7" class="footref" href="#fn.7">7</a></sup>.
</p>
<p>
Equations are written in text and the keyboard math symbols.
Pythagoras might have written:
</p>
<pre class="example">
\begin{equation}
a^2 + b^2 = c^2
\end{equation}
</pre>
<p>
to get
</p>
<div class="equation">
<p>
\(a^2 + b^2 = c^2\)
</p>
</div>
<p>
Another example is the equation for the area of a circle:
</p>
<pre class="example">
A = \pi r^2
</pre>
<p>
to get
</p>
<div class="equation">
<p>
\(A = \pi r^2\)
</p>
</div>
</div>
</div>
<div id="outline-container-sec-5-3-2" class="outline-4">
<h4 id="sec-5-3-2">Tables</h4>
<div class="outline-text-4" id="text-5-3-2">
<p>
Tables in LaTeX comprise five items that describe a table:
</p>
<ul class="org-ul">
<li>a <code>\begin{table}</code> and matching <code>\end{table}</code> surrounding the
tabular environment<sup><a id="fnr.8" name="fnr.8" class="footref" href="#fn.8">8</a></sup>
</li>
<li>a <code>\begin{tabular}{<tableadvice>}</code> and matching
<code>\end{tabular}</code> around the table data
</li>
<li>the <code><tableadvice></code> which describes the alignment and vertical
lines in the table using <code>|</code> to describe vertical lines and
letters (<code>l</code>,=c=,=r=) to describe alignment
</li>
<li>the <code>&</code> character to define columns <code>\\</code> to define rows, and
<code>\hline</code> to draw horizontal lines
</li>
<li>the caption for the table, using <code>\caption{My Caption}</code> after
the <code>tabular</code> environment and before the end of the <code>table</code>
environment.
</li>
</ul>
<p>
Taking the <b>Degrees Granted for Academic Year 2011-12</b> table from
<a href="http://www.engin.umich.edu/college/about/facts">http://www.engin.umich.edu/college/about/facts</a> we could represent
that table in LaTeX as:
</p>
<pre class="example">
\begin{table}
\begin{tabular}{l|r|r|r}
Degrees Granted for Academic Year 2011-12 & Bachelors & Masters & Doctoral \\ \hline
Degrees Granted & 1,348 & 1,093 & 258 \\
\% Women & 21\% & 21\% & 21\% \\
\% URM & 7\% & 8\% & 14\% \\ \hline
\end{tabular}
\caption{Engineering Degrees Granted for Academic Year 2011-12}
\end{table}
</pre>
<ul class="org-ul">
<li>The <code>|</code> characters in the <code>tabular</code> line instruct LaTeX to put
vertical lines between the columns, but not on the left or
right ends.
</li>
<li>The <code>\hline</code> commands at the ends of the lines instruct LaTeX
to put a horizontal line below the current line.
</li>
<li>The <code>\%</code> is required because <code>%</code> is a special character in
LaTeX that denotes a comment from it to the end of the line;
that's very useful in many cases, but not when you mean
percentage.
</li>
</ul>
<p>
This all produces a table that looks like:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="right"/>
<col class="right"/>
<col class="right"/>
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Degrees Granted for Academic Year 2011-12</th>
<th scope="col" class="right">Bachelors</th>
<th scope="col" class="right">Masters</th>
<th scope="col" class="right">Doctoral</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Degrees Granted</td>
<td class="right">1,348</td>
<td class="right">1,093</td>
<td class="right">258</td>
</tr>
<tr>
<td class="left">% Women</td>
<td class="right">21%</td>
<td class="right">21%</td>
<td class="right">21%</td>
</tr>
<tr>
<td class="left">% URM</td>
<td class="right">7%</td>
<td class="right">8%</td>
<td class="right">14%</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-sec-5-4" class="outline-3">
<h3 id="sec-5-4">Figures from external files</h3>
<div class="outline-text-3" id="text-5-4">
<p>
While it is possible to create images in LaTeX directly using it's
native but limited <code>picture</code> environment or the somewhat
complicated <a href="http://www.texample.net/tikz/">TikZ</a> package<sup><a id="fnr.9" name="fnr.9" class="footref" href="#fn.9">9</a></sup>, it is most common to create images
in another package — <a href="http://www.gnuplot.info/">gnuplot</a> or <a href="http://www.r-project.org/">R</a> for plots, Adobe Illustrator or
another drawing package for diagrams.
</p>
<p>
For this example, we'll make a plot using the statistical package
R and write the plot to a file called <code>mpg-weight.png</code>. The R
code for this is:
</p>
<div class="org-src-container">
<pre class="src src-R">library(ggplot2)
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),
method="lm", formula=y~x, color=cyl,
main="Regression of MPG on Weight",
xlab="Weight", ylab="Miles per Gallon")
</pre>
</div>
<div class="figure">
<p><img src="assets/mpg-weight.png" alt="mpg-weight.png"/></p>
<p>Weight vs. MPG</p>
</div>
<p>
This file is included in the LaTeX output by adding the line:
</p>
<pre class="example">
\usepackage{graphicx}
</pre>
<p>
immediately after the <code>\documentclass{article}</code> line in your
<code>.tex</code> file and then including the lines<sup><a id="fnr.10" name="fnr.10" class="footref" href="#fn.10">10</a></sup>:
</p>
<pre class="example">
\begin{figure}
\includegraphics[width=.9\linewidth]{mpg-weight.png}
\caption{Weight vs. MPG}
\end{figure}
</pre>
<p>
The full path to <code>mpg-weight.png</code> should be specified in the
<code>\includegraphics</code> line; in this example LaTeX will assume that
the image file is in the same directory as the <code>.tex</code> file.
</p>
<p>
You can download this image for using when processing <code>sample.tex</code>
by right-clicking on it and saving it to the same folder as
<code>sample.tex</code>.
</p>
<p>
The <code>graphicx</code> package combined with <code>pdflatex</code> can read PDF, PNG,
JPEG, and MetaPost graphic formats. Other formats should be
converted to one of these for inclusion. If you can export
directly to PDF, that is ideal.
</p>
</div>
</div>
<div id="outline-container-sec-5-5" class="outline-3">
<h3 id="sec-5-5">Tables of Contents, Tables, and Figures</h3>
<div class="outline-text-3" id="text-5-5">
<p>
Adding tables of contents, tables, and figures to the front matter
of your document is straight-forward.
</p>
<p>
To add a table of contents, add the line <code>\tableofcontents</code> after
your <code>\maketitle</code> line to generate a table of contents. Because
LaTeX is a single-pass processor, it stores some of its
information in auxiliary files; this means that you may have to
run <code>pdflatex</code> more than once to get all of the references
resolved<sup><a id="fnr.11" name="fnr.11" class="footref" href="#fn.11">11</a></sup>.
</p>
<p>
To add a list of figures, insert the line <code>\listoffigures</code> after
the <code>\maketitle</code> or <code>\tableofcontents</code> lines in your <code>.tex</code> file.
</p>
<p>
To include a listing of tables at the front of your document,
there is the <code>\listoftables</code> command that is analogous to the
<code>\listoffigures</code> command.
</p>
<p>
Our current <code>sample.tex</code> file now looks like:
</p>
<div class="org-src-container">
<pre class="src src-latex">\documentclass{article}
\usepackage{graphicx}
\title{My Sample Document}
\author{Type Your Name Here}
\begin{document}
\maketitle
\tableofcontents
\listoftables
\listoffigures
\section{My first section}
This is my document. There isn't much to it.
\subsection{A subsection for fun}
This is the second paragraph of my document. This
paragraph is longer than the first paragraph because
I kept typing words in this paragraph, and didn't
type as many words in the preceding paragraph.
\section{My very short third section}
This is my third paragraph, the shortest yet.
However, this section has a table and a figure.
\begin{table}
\begin{tabular}{l|r|r|r}
Degrees Granted for Academic Year 2011-12 & Bachelors & Masters & Doctoral \\ \hline
Degrees Granted & 1,348 & 1,093 & 258 \\
\% Women & 21\% & 21\% & 21\% \\
\% URM & 7\% & 8\% & 14\% \\ \hline
\end{tabular}
\caption{Engineering Degrees Granted for Academic Year 2011-12}
\end{table}
\begin{figure}
\includegraphics[width=.9\linewidth]{mpg-weight.png}
\caption{Weight vs. MPG}
\end{figure}
\end{document}
</pre>
</div>
<p>
This file demonstrates sections, tables, figures, and
front-matter.
</p>
</div>
</div>
<div id="outline-container-sec-5-6" class="outline-3">
<h3 id="sec-5-6">Lists, Bibliographies, and everything else</h3>
<div class="outline-text-3" id="text-5-6">
<p>
<b>Lists</b>
</p>
<p>
Lists in LaTeX are environments similar to other environments
we've seen, like <code>\begin{figure}</code> … <code>\end{figure}</code>; the list
environment for a bulleted list is <code>itemize</code> and for a numbered
list is <code>enumerate</code>.<sup><a id="fnr.12" name="fnr.12" class="footref" href="#fn.12">12</a></sup>
</p>
<p>
Lists can be embedded for sub-items. Example LaTeX that shows
this is:
</p>
<pre class="example">
\begin{enumerate}
\item The first item in my list is this sentence; it's a
pretty long sentences that has several properties:
\begin{itemize}
\item it has 18 words
\item it has a semi-colon
\item it has one apostrophed word
\end{itemize}
\item This is the second item in my list
\item This is the end of my incredibly boring list
\end{enumerate}
</pre>
<ol class="org-ol">
<li>The first item in my list is this sentence; it's a pretty long
sentences that has several properties:
<ul class="org-ul">
<li>it has 18 words
</li>
<li>it has a semi-colon
</li>
<li>it has one apostrophed word
</li>
</ul>
</li>
<li>This is the second item in my list
</li>
<li>This is the end of my incredibly boring list
</li>
</ol>
<p>
<b>Bibliographies</b>
</p>
<p>
LaTeX can support simple bibliographies within a document using
the <code>\cite</code> command in the document for citations and the
<code>\bibitem</code> command in the <code>thebibliography</code> environment<sup><a id="fnr.13" name="fnr.13" class="footref" href="#fn.13">13</a></sup>.
</p>
<p>
For larger projects or areas of study, using <a href="http://www.bibtex.org/">BibTeX</a> to maintain a
bibliographic database and integrate with the <code>\cite</code> commands in
several documents.
</p>
<p>
<b>Debugging LaTeX documents</b>
</p>
<p>
Writing LaTeX documents is similar to writing a computer program,
with all of the debugging issues that go with computer
programming.
</p>
<p>
The error messages produced by LaTeX can be difficult to decode;
my preferred method of debugging is to localize the error without
worrying too much about the error message by processing the
document often, and especially after making significant additions.
</p>
<p>
If processing the document does result in errors that I cannot
understand from the error, I then start by using the <code>%</code> to
comment out parts of the LaTeX file and processing it until the
error is gone, and looking at the commented section for errors.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-6" class="outline-2">
<h2 id="sec-6">Other resources</h2>
<div class="outline-text-2" id="text-6">
<p>
LaTeX has been around for decades and has a lot of documentation
and support.
</p>
</div>
<div id="outline-container-sec-6-1" class="outline-3">
<h3 id="sec-6-1">Websites</h3>
<div class="outline-text-3" id="text-6-1">
<p>
The best web site for LateX support is <i>Google</i>, followed by
<a href="http://tex.stackexchange.com/">StackOverflow</a>. LaTeX-specific web sites are the website for the
<a href="http://www.latex-project.org/">LaTeX Project</a>, the <a href="http://www.tug.org">TeX Users Group</a>, and the <a href="http://en.wikibooks.org/wiki/LaTeX">LaTeX WikiBooks</a> site.
For additional packages for LaTeX, the
<a href="http://ctan.org/">Comprehensive TeX Archive Network</a> is the
definitive source; the <a href="http://ctan.org/pkg/umthesis">umthesis</a> class is available here, for
example<sup><a id="fnr.14" name="fnr.14" class="footref" href="#fn.14">14</a></sup>.
</p>
</div>
</div>
<div id="outline-container-sec-6-2" class="outline-3">
<h3 id="sec-6-2">Books</h3>
<div class="outline-text-3" id="text-6-2">
<p>
There are 82 books in the search results for <code>latex typesetting</code>
at <a href="http://www.amazon.com/s/ref=nb_sb_ss_c_0_8?url=search-alias%3Dstripbooks&field-keywords=latex+typesetting">Amazon</a>. The canonical books, in my opinion, are:
</p>
<ul class="org-ul">
<li><i>LaTeX: A Document Preparation System (2nd Edition)</i> by Leslie Lamport
</li>
<li><i>The LaTeX Graphics Companion</i> by Gossens, Mittelbach, Rahtz,
and Voss
</li>
<li><i>The LaTeX Companion (2nd edition)</i> by Mittelbach, Gossens,
Braams, Carlisle, and Rowley
</li>
</ul>
<p>
there are also a lot of other excellent books, many of which are
<a href="http://www.lib.umich.edu/mlibrary/search/mirlyn/latex">available in the U-M libraries</a>.
</p>
</div>
</div>
<div id="outline-container-sec-6-3" class="outline-3">
<h3 id="sec-6-3">Resources at the University of Michigan</h3>
<div class="outline-text-3" id="text-6-3">
<p>
Emailing <a href="mailto:caen@umich.edu">caen@umich.edu</a> will come to me and others at <a href="http://www.engin.umich.edu/caen">CAEN</a> and
we'll do our best to help you.
</p>
<div id="outline-container-sec-6-4" class="outline-3">
<h3 id="sec-6-4">This Document</h3>
<div class="outline-text-3" id="text-6-4">
<p>
This document is also available as a
<a href=/assets/2013-07-08-latex-class.pdf>PDF file</a> and a
<a href=/assets/latex-in-90minutes.mobi>Kindle-formatted ebook</a>.
</p>
<p>
The <kbd>sample.tex</kbd> file is also
<a href=/assets/sample.tex>available for download</a>
</p>
</div>
</div>
</div>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
This is also described in Appendix A of <i>The Not So Short
Introduction to LATEX</i>
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> <p class="footpara">
See Chapter 1 of <i>The Not So Short Introduction to LaTeX</i>
(<i>TNSSItL</i>) for more detail
</p></div>
<div class="footdef"><sup><a id="fn.3" name="fn.3" class="footnum" href="#fnr.3">3</a></sup> <p class="footpara">
See sections 1.3–1.4 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.4" name="fn.4" class="footnum" href="#fnr.4">4</a></sup> <p class="footpara">
See section 1.5 of <i>TNSSItL</i>, with the exception that we will
always use <code>pdflatex</code> instead of the <code>latex</code>, <code>xdvi</code>, and <code>dvips</code>
commands in <i>TNSSItL</i>; for more on <code>pdflatex</code>, see Section 4.7 of
<i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.5" name="fn.5" class="footnum" href="#fnr.5">5</a></sup> <p class="footpara">
This is also described briefly in Section 1.5 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.6" name="fn.6" class="footnum" href="#fnr.6">6</a></sup> <p class="footpara">
See Section 2.7 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.7" name="fn.7" class="footnum" href="#fnr.7">7</a></sup> <p class="footpara">
See Chapter 3 of <i>TNSSItL</i> for much more detail
</p></div>
<div class="footdef"><sup><a id="fn.8" name="fn.8" class="footnum" href="#fnr.8">8</a></sup> <p class="footpara">
See Sections 2.1.2 and 2.11.6 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.9" name="fn.9" class="footnum" href="#fnr.9">9</a></sup> <p class="footpara">
See Chapter 5 of <i>TNSSItL</i> for more on the <code>picture</code>
environment and TikZ
</p></div>
<div class="footdef"><sup><a id="fn.10" name="fn.10" class="footnum" href="#fnr.10">10</a></sup> <p class="footpara">
See Section 4.1 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.11" name="fn.11" class="footnum" href="#fnr.11">11</a></sup> <p class="footpara">
See Sections 2.7 and 2.12 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.12" name="fn.12" class="footnum" href="#fnr.12">12</a></sup> <p class="footpara">
See Section 2.11.1 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.13" name="fn.13" class="footnum" href="#fnr.13">13</a></sup> <p class="footpara">
See Section 4.2 of <i>TNSSItL</i>
</p></div>
<div class="footdef"><sup><a id="fn.14" name="fn.14" class="footnum" href="#fnr.14">14</a></sup> <p class="footpara">
For an alternative, <code>emacs</code> based, input format, take a look at
<a href="http://orgmode.org">Org-Mode</a>; it can export HTML and LaTeX. This document was written
with Org-Mode and exported to LaTeX and HTML to produce the web page
and PDF file.
</p></div>
</div>
</div>
Github Project Web Pages2013-06-19T00:00:00+00:00http://acaird.github.io/computer/2013/06/19/github-project-pages<p>
Github supports the publishing of web pages and blog posts using the
Jekyll rendering engine by simply including a
<code>GitHubUserName.github.io</code> repository in your project. Github also
supports <a href="https://help.github.com/articles/what-s-the-difference-between-user-and-organization-accounts">organizations</a> that can support git repositories, groups of
users, and unified management of the two. Each repository in an
organization can have its own web pages at a URL like
<code>http://OrganizationName.github.io/ProjectName</code>. I'll describe how I
did this for <a href="http://www.engin.umich.edu/caen">CAEN's</a> Github projects.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Github Pages</h2>
<div class="outline-text-2" id="text-1">
<p>
Creating web pages for projects (or repositories) within a Github
project is documented in many places on the web—you can Google
for it—but this is how I created web pages processed by Jekyll
for CAEN's Github Organization.
</p>
<p>
The general steps to create web pages for CAEN's repos, like
<a href="http://caen.github.io/hadoop">http://caen.github.io/hadoop</a>, are:
</p>
<ol class="org-ol">
<li>Create a special branch of the repository called <code>gh-pages</code>
that will hold all of the web content and none of the actual
project content
</li>
<li>Add and configure the Jekyll files to this new repository
</li>
<li>Add some content, either plain old web pages or blog posting or
both
</li>
</ol>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Making a <code>gh-pages</code> branch</h2>
<div class="outline-text-2" id="text-2">
<p>
The first step is to create an <a href="http://git-scm.com/docs/git-checkout">orphan branch</a> in your Git repository
and remove all of your content from it. The steps to do this are:
</p>
<ol class="org-ol">
<li>First create the orphan branch called <code>gh-pages</code>
</li>
</ol>
<pre class="example">
git checkout --orphan gh-pages
</pre>
<p>
This will create the branch and switch to it. You can type
<code>git status</code> to make sure you are in the newly created branch.
</p>
<ol class="org-ol">
<li>Remove everything from the <code>gh-pages</code> branch in preparation for
adding your web content.
</li>
</ol>
<pre class="example">
git rm -rf .
</pre>
<ol class="org-ol">
<li>Commit all of those changes to that branch with <code>git commit</code>.
You can confirm that your project content still exists by
switching back to the master branch (<code>git checkout master</code>) and
typing <code>ls</code>. After you've satisfied yourself that your <code>git
rm</code>'s didn't delete your work, switch back to the <code>gh-pages</code>
branch (<code>git checkout gh-pages</code>)
</li>
</ol>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Adding Jekyll to you branch</h2>
<div class="outline-text-2" id="text-3">
</div><div id="outline-container-sec-3-1" class="outline-3">
<h3 id="sec-3-1"><span class="section-number-3">3.1</span> Get the Jekyll files</h3>
<div class="outline-text-3" id="text-3-1">
<p>
Now that you have an empty directory, you can add the default
Jekyll files to it. The following example:
</p>
<ol class="org-ol">
<li>Clones the Jekyll Bootstrap code into the <code>gh-pages</code> branch
</li>
<li>Moves all of the Jekyll files from the Jekyll directory to the
top level directory of the <code>gh-pages</code> branch of your project
repository
</li>
<li>Removes the (now empty) <code>jekyll-bootstrap</code> directory
</li>
<li>Adds all of the Jekyll files to this branch of your Git
repository
</li>
</ol>
<pre class="example">
$ git clone https://github.com/plusjade/jekyll-bootstrap.git
Cloning into 'jekyll-bootstrap'...
remote: Counting objects: 1813, done.
remote: Compressing objects: 100% (940/940), done.
remote: Total 1813 (delta 855), reused 1674 (delta 760)
Receiving objects: 100% (1813/1813), 524.41 KiB | 0 bytes/s, done.
Resolving deltas: 100% (855/855), done.
$ mv jekyll-bootstrap/* .
$ \rm -rf jekyll-bootstrap/
$ git add *
$ git commit -m "Adding Jekyll files to gh-pages branch"
</pre>
</div>
</div>
<div id="outline-container-sec-3-2" class="outline-3">
<h3 id="sec-3-2"><span class="section-number-3">3.2</span> Configuring Jekyll</h3>
<div class="outline-text-3" id="text-3-2">
<p>
The configuration for Jekyll pages in <code>_config.yml</code> that are
project pages within an organization is different from the <a href="http://acaird.github.io/computers/2013/05/24/blogging-with-org-and-git/">user
configuration for Jekyll</a>. We'll use CAEN's <code>hadoop</code> Github
project as our example
</p>
<p>
The first set of edits to <code>_config.yml</code> are advised for all Jekyll
configurations and set the title, author, email fields for use by
the themes.
</p>
<pre class="example">
# Themes are encouraged to use these universal variables
# so be sure to set them if your theme uses them.
title : CAEN Hadoop
tagline: Big Data, little data by little data
author :
name : CAEN
email : hadoop-support@umich.edu
</pre>
<p>
The next edit is to set the <code>production_url</code> variable by following
the instructions in the <code>_config.yml</code> file:
</p>
<pre class="example">
# Finally if you are pushing to a GitHub project page, include the project name at the end.
#
production_url : http://caen.github.io/hadoop
</pre>
<p>
Continuing to follow the instructions in the <code>_config.yml</code> file,
the <code>BASE_PATH</code> is set:
</p>
<pre class="example">
# A GitHub Project site exists in the `gh-pages` branch of one of your repositories.
# REQUIRED! Set BASE_PATH to: http://username.github.io/project-name
BASE_PATH : http://caen.github.io/hadoop
</pre>
<p>
I like to turn off comments, analytics, and sharing at the start,
only turning them back on when their supporting infrastucture is
prepared:
</p>
<pre class="example">
comments :
provider : false
analytics :
provider : false
sharing :
provider : false
</pre>
<p>
After all of the edits are made, you should commit the changes to
<code>_config.yml</code> with
</p>
<pre class="example">
git commit _config.yml -m "local edits to _config.yml"
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> Creating and publishing the project pages</h2>
<div class="outline-text-2" id="text-4">
<p>
Once you have the configuration set, you should remove the sample
files that come with Jekyll:
</p>
<pre class="example">
git mv index.md index.md-orig
git rm core-samples/2011-12-29-jekyll-introduction.md
git commit -m "removed sample Jekyll files"
</pre>
</div>
<div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1"><span class="section-number-3">4.1</span> HTML Pages</h3>
<div class="outline-text-3" id="text-4-1">
<p>
Then you can simply add Jekyll+HTML files to that directory; those
files are of the format:
</p>
<pre class="example">
---
layout: page
title: CAEN Hadoop
tagline: <br>Big Data, little data by little data
---
<dl>
<dt><a href=hadoop-user.html>Hadoop User Documentation</a></dt>
<dd>This is the documentation for people to use Hadoop and its
friends like Hive, Pig, Sqoop, etc.</dd>
</dl>
</pre>
<p>
The HTML following the Jekyll header (between the dashed lines) is
all of the HTML that would be found between the <code><body></code> tags.
</p>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2"><span class="section-number-3">4.2</span> Blog Posts</h3>
<div class="outline-text-3" id="text-4-2">
<p>
Creating blog posts is described as part of <a href="http://acaird.github.io/computers/2013/05/24/blogging-with-org-and-git/">Blogging with Emacs
org-mode and Github Pages</a> and the same process applies here,
although you'll want to take a look at the original <code>index.md</code>
file that you off-named above to see the Jekyll code that
automatically includes blog posts from the <code>_posts</code> directory and
include that code in your <code>index.html</code> file.
</p>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3"><span class="section-number-3">4.3</span> Pushing pages to Github</h3>
<div class="outline-text-3" id="text-4-3">
<p>
Once you have your HTML pages and blog posts created, you should add
them to the repository with <code>git add *.html</code>, commit them with <code>git
commit</code>, and push them to github with <code>git push</code>. The first time
you push, you have to add the new branch with the command <code>git push
origin gh-pages</code>, but after that you can push with a simple <code>git
push</code>.
</p>
<p>
After giving Github a few minutes to process the Jekyll into pages,
you can visit your pages at
<code>http://YourOrganization.github.io/YourRepo</code>, in the case of the
CAEN Hadoop pages, this is <a href="http://caen.github.io/hadoop">http://caen.github.io/hadoop</a>.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5"><span class="section-number-2">5</span> Working with Projects that already have Github pages</h2>
<div class="outline-text-2" id="text-5">
<p>
If your organization (in my case, CAEN) has projects that already
have Github pages set up, you simple <code>clone</code> the project then track
the <code>gh-pages</code> branch, which will allow you to see it, push the
<code>gh-pages</code> branch back to generate web content, etc.
</p>
<p>
First, identify a project with a <code>gh-pages</code> branch by either asking
the owner or looking on <code>github.com</code> at the branches for a
<code>gh-pages</code> branch. Once you find one, these steps will get you the
web content and Jekyll configuration:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="right"/>
<col class="left"/>
<col class="left"/>
</colgroup>
<tbody>
<tr>
<td class="right">1</td>
<td class="left"><code>git clone https://github.com/YourOrg/YourProj.git</code></td>
<td class="left">clone the base project</td>
</tr>
<tr>
<td class="right">2</td>
<td class="left"><code>cd YourProj</code></td>
<td class="left">go into the project directory</td>
</tr>
<tr>
<td class="right">3</td>
<td class="left"><code>git checkout –track origin/gh-pages</code></td>
<td class="left">track the <code>gh-pages</code> branch</td>
</tr>
</tbody>
</table>
<p>
Now you should see the HTML files in the top directory of the
<code>gh-pages</code> branch and the posts in the <code>_posts</code> directory. To
switch back to the project, check out the master branch with <code>git
checkout master</code>.
</p>
</div>
</div>
Music in the Modern Era2013-06-07T00:00:00+00:00http://acaird.github.io/computers/2013/06/07/music-today<p>
The College of Engineering IT group (CAEN) has a weekly (mostly) lunch
gathering called SuperHappyDevLunch (SHDL), which could equally be
called MellowNerdsNattering.
</p>
<p>
I wanted to know what everyone did for their music listening these
days, so I asked.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">All of the Options</h2>
<div class="outline-text-2" id="text-1">
<p>
These are the options used by the SHDL members, sorted by me by
reasonability and premium cost.
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="left"/>
<col class="left"/>
<col class="right"/>
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Option</th>
<th scope="col" class="left">Reasonability</th>
<th scope="col" class="left">Cost</th>
<th scope="col" class="right">Premium Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Pandora</td>
<td class="left">high</td>
<td class="left">$0</td>
<td class="right">$35/year</td>
</tr>
<tr>
<td class="left">last.fm</td>
<td class="left">high</td>
<td class="left">$0</td>
<td class="right">$36/year</td>
</tr>
<tr>
<td class="left">Spotify</td>
<td class="left">high</td>
<td class="left">$0</td>
<td class="right">$120/year</td>
</tr>
<tr>
<td class="left">Songza</td>
<td class="left">high</td>
<td class="left">$0</td>
<td class="right">$0</td>
</tr>
<tr>
<td class="left">iTunes Match</td>
<td class="left">high</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">Grooveshark</td>
<td class="left">high</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">SiriusXM web stream</td>
<td class="left">high</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">sky.fm / di.fm</td>
<td class="left">high</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">Amazon's streaming thing</td>
<td class="left">medium</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">USB drive</td>
<td class="left">medium</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">broadcast FM radio</td>
<td class="left">low</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">carefully chosen fillings</td>
<td class="left">low</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
<tr>
<td class="left">crappy Xfinity Music channels</td>
<td class="left">low</td>
<td class="left"> </td>
<td class="right"> </td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Notes on some of the options</h2>
<div class="outline-text-2" id="text-2">
</div><div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1">Pandora</h3>
<div class="outline-text-3" id="text-2-1">
<ul class="org-ul">
<li>7/11 in the room use it
</li>
<li>phone, Roku, TiVo, computer
</li>
<li>channels, not tracks
</li>
<li>doesn't make you post crap to Facebook
</li>
<li>iPhone app
</li>
<li>audio ads unless you pay the nominal fee, you cheap bastard
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2">Songza</h3>
<div class="outline-text-3" id="text-2-2">
<ul class="org-ul">
<li>2/11 in the room use it
</li>
<li>phone, computer
</li>
<li>curated (by music experts) playlists
</li>
<li>iPhone app
</li>
<li>no audio ads
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3">Spotify</h3>
<div class="outline-text-3" id="text-2-3">
<ul class="org-ul">
<li>with some effort you don't have to post everything to Facebook
</li>
<li>computer, phone for premium
</li>
<li>tracks and playlists
</li>
<li>internet available playlists
</li>
<li>free on computer, 30 day premium trial, phone is not free
</li>
<li>banner and audio ads on computer
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-2-4" class="outline-3">
<h3 id="sec-2-4">last.fm</h3>
<div class="outline-text-3" id="text-2-4">
<ul class="org-ul">
<li>not actively used by many people here (1ish/11)
</li>
<li>you can SCROBBLE!!!
<ul class="org-ul">
<li>Rokus, Spotify, Pandora, and others can send listening habit
data to last.fm, which last.fm can then use to suggest
concerts, rat you out to the NSA, suggest stations, and make
you new BFFs nearby
</li>
</ul>
</li>
<li>has stations similar to Pandora, based on artist, song, etc.
</li>
<li>computer and phone
</li>
<li>tags
</li>
<li>graphics ads, no audio ads
</li>
</ul>
</div>
</div>
</div>
Why would 13.1 miles make you sad?2013-06-02T00:00:00+00:00http://acaird.github.io/running/2013/06/02/dxa2-2013<blockquote>
<p>
<i>Mom, why do some of the people running look sad?</i> – 4-year-old
spectator at 5.68 miles
</p>
</blockquote>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Hanging out at the start</h2>
<div class="outline-text-2" id="text-1">
<p>
I haven't run too many organized races—fewer than 10—but the
pre-race drill is always about the same for me: Not seeing anyone
I know, other than those people I came with; Waiting in line for
the bathroom; Wondering about who is fast, experienced, nervous,
hopeful, planning to race, or planning to finish.
</p>
<p>
After four 5k races, three 10k races, and two half-marathons, it's
still great to see the people at the start.
</p>
<p>
At one end of the spectrum are the super-fit, seemingly fast people
who never wear the tech shirt from the race, have really cool
sunglasses, and 13% body-fat.
</p>
<p>
At the other end of the spectrum are what I would call Normal
Americans: people who like the camaraderie of people doing the same
thing they are; people who wear the tech shirt from the race on race
day; people who have no expectation or plans of being anywhere other
than somewhere in the last three-quarters of the field; people with
30% body-fat<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup>.
</p>
<p>
And in the middle is everyone else, in all manner of dress, every
conceivable body type.
</p>
<p>
There is no "type" of person who runs these races. The people in
the running magazines and the cover of the sports sections and with
the stories on ESPN are the elite. Which is off-putting to lots of
people, naturally. The elite runners are the edge of human
athleticism, the top 0.0001%, a combination of evolution,
upbringing, physical prowess, mental ability, training, and luck.
By definition, you and I are unlikely to be them.
</p>
<p>
At any race, there are elite runners, because they like to win. And
then there is a cross-section of the everyone else, slightly (but
only slightly) skewed toward the fit end of the spectrum.
</p>
<p>
The most elite runner and the probable last finisher pin their bibs
on the front of their shirts or shorts. The most elite runner and
the probable last finisher go to the bathroom twice before the race.
The most elite runner and the probable last finisher take of their
hats at the national anthem. The most elite runner and the probable
last finisher line up at the start, listen to the corny
announcements, and start moving when the gun (or, more likely, the
horn) sounds. The most elite runner and the last finisher both go
the same distance, hang out with old and new friends, and have just
as much fun.
</p>
<p>
Because why wouldn't both have just as much fun as each other?
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> 5.68 miles</h2>
<div class="outline-text-2" id="text-2">
<p>
Today's run was a lucky run. The weather was perfect and I had no
aches.
</p>
<p>
The perfect weather meant it was a nice day for a run but, more
importantly, it meant there would be lots of spectators. On the
route from Dexter to Ann Arbor along Huron River Drive are lots of
houses. They aren't that near each other, but there are lots of
them. And the good weather brought the families from those houses
to the end of their driveways to cheer.
</p>
<p>
I haven't asked too many runners about what they think about
spectators, but I love them. And the further I go, the loopier I
get, and the more I love them. It's a virtuous cycle of
exercise-induced loopy-ness and love.
</p>
<p>
Today, well before there would be much suffering or loopy-ness, at
5.68 miles from the start and 7.42 miles from the end, there was a
mom and her son—2 or 3 years-old—at the end of their driveway
cheering the runners on.
</p>
<p>
As I and the group I was with ran by, we all heard the little boy
say:
</p>
<blockquote>
<p>
<i>Mom, why do some of the people running look sad?</i>
</p>
</blockquote>
<p>
and, less than half-way there, we all laughed together.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Friends for 13.1 miles</h2>
<div class="outline-text-2" id="text-3">
<p>
That shared laughter at 5.68 miles is only one example of the
friends one makes over 13.1 miles or 5 kilometers or 10
kilometers.
</p>
<p>
Some of the friends are people you talk to: wryly wondering about
the wisdom of running some distance without being chased, but for
fun; saying excuse me when you turn into someone behind you;
sharing an encouraging word with someone who is passing you or whom
you are passing. If you see these folks at the end, they are the
people to whom you say "Thanks for the run, nice job!"
</p>
<p>
Some of the friends, though, are more imaginary: The girl in the
purple shirt and gray shorts you ran with from miles 2 to 6, before
you couldn't keep up any more, to whom you never spoke; The guy with
the Spartan helmet tattoo on his right tricep that you ran with for a
while, until he started to fade at mile 10, to whom you never spoke;
The man half-again your age that you caught at mile 12 and felt bad
passing out of respect for your elders and respect for anyone who
ran that fast for 12 miles, to whom you never spoke; The man in the
Vibram shoes and German-looking tech shirt who seemed to want to
race at the end, and who was a few seconds faster than you, and was
swallowed up in the finishing chute before you could thank him for
the good finish.
</p>
<p>
This experience has been true in every organized race I've ever
run. Half the reason I'm writing this down is so I can remember
it. The other half is to try to explain what happens over 3.1,
6.2, or 13.1 miles. I suppose it happens over 26.2 miles and every
distance shorter and longer than that, but I couldn't say for sure.
</p>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> Why this actually is fun</h2>
<div class="outline-text-2" id="text-4">
<p>
At the end of the race, whatever distance, everyone who ran it walks
around with their race bib still pinned to their shirt, their
finisher's medal around their neck. And everyone, other runners or
not, are happy and congratulatory and glad to be done and glad to
have their runners done and sweaty and proud and inspired.
</p>
<p>
Walking the few blocks back to my car after today's 13.1 miles,
finisher's medal around my neck, I passed a couple out for a Sunday
walk, and they said "Nice run!"
</p>
<p>
That is why this is fun. For all they knew, I was dead last,
walking my way across the finish. But, still, strangers who
weren't at the finish said "Nice run!". The camaraderie of
runners, their friends, and strangers no where near the finish is
fun.
</p>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
<a href="http://en.wikipedia.org/wiki/Body_fat_percentage">http://en.wikipedia.org/wiki/Body_fat_percentage</a>
</p></div>
</div>
</div>
Blogging with Emacs org-mode and Github Pages2013-05-24T00:00:00+00:00http://acaird.github.io/computers/2013/05/24/blogging-with-org-and-git<p>
Lately, I've become enamoured of writing things in text files and
avoiding proprietary formats as much as possible, but I still like to
be able to render the text into nicer formats for printing (PDF) or
sharing (HTML). There are many lightweight mark-up languages that
support this, <a href="http://daringfireball.net/projects/markdown/">markdown</a> and friends, but I prefer Emacs <a href="http://orgmode.org">org-mode</a> for
its power with including source code, literate programming, strong
LaTeX support, and table editing.
</p>
<p>
Not surprisingly, <code>org-mode</code> has a powerful publishing facility that,
when combined with <a href="http://jekyllrb.com/">Jekyll</a> and <a href="http://github.com">GitHub</a>, can produce pretty a pretty
reasonable blog with categories, tags, archives, and comments, all
separate from the content.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Setting up Publishing and Source locations</h2>
<div class="outline-text-2" id="text-1">
<p>
Because the source material in <code>org-mode</code> can contain comments,
code-snippets you don't want published, or other content intended
for the author and collaborators, I keep the source separate from
the publishing location.
</p>
</div>
<div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1">Publishing: Jekyll and github.io</h3>
<div class="outline-text-3" id="text-1-1">
<p>
Although we won't use this content for a little while, it's
important to make sure this works before getting too far. Also,
the initial push of content to GitHub can take a little while to
be recognized by GitHub, so this will give GitHub some time while
we do other things.
</p>
<p>
To set up a <i>username</i>.github.io web page<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup> you need to get
a "blank" Jekyll set up and push it to GitHub. Later we'll
hard-code the local location of this repository in your <code>.emacs</code>
file, so you might as well choose a good location for it now. On
my MacOS computer, I have it in <code>~/Documents/</code>.
</p>
<p>
The following four commands
</p>
<ol class="org-ol">
<li>clone a blank Jekyll instance into the directory
<code>/username/.github.io</code>
</li>
<li>go into your new directory
</li>
<li>set the git repository to push to and pull from your GitHub
account
</li>
<li>In that directory there will be a file called <code>_config.yml</code>
that you should edit and fill in the fields that need to be
filled in.
</li>
<li>push the default Jekyll site as a repository named
<i>username</i>.github.io to your GitHub account
</li>
</ol>
<pre class="example">
git clone https://github.com/plusjade/jekyll-bootstrap.git ${USER}.github.io
cd ${USER}.github.io
git remote set-url origin git@github.com:${USER}/${USER}.github.com.git
(edit _config.yml)
git push origin master
</pre>
<p>
If you wait a little while and then go to <a href="http://username.github.io">http://username.github.io</a>
(replacing <i>username</i> with your GitHub user name) you'll see the
default Jekyll pages. Don't worry, you'll replace them soon enough.
</p>
</div>
</div>
<div id="outline-container-sec-1-2" class="outline-3">
<h3 id="sec-1-2">Source: org-mode files</h3>
<div class="outline-text-3" id="text-1-2">
<p>
As mentioned, I like to keep my source files separate from the
published content, because <code>org-mode</code> can selectively export
during the publishing step, but anyone can read a text file.
</p>
<p>
I keep my source content in another Git repository, although I
keep that one at <a href="http://bitbucket.org">BitBucket</a> because they let me have private
repositories at no cost. You could also just keep them on your
harddrive as files, or in a local revision control system.
Anywhere that can be seen by Emacs as the same time as seeing the
Git repository <code>username.github.io</code> will work.
</p>
<p>
For the sake of the examples below, I have all of the blog
postings in one directory (<code>blog/</code>) and all of the images in an
<code>images</code> directory in the <code>blog</code> directory (<code>blog/images/</code>).
As with the location of the cloned <code>username.github.io</code> directory,
these paths will be hard-coded in your <code>.emacs</code> file, so you
should choose a good location for them now.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Writing Blog Postings</h2>
<div class="outline-text-2" id="text-2">
<p>
Writing a blog posting is like writing any other document in
<code>org-mode</code> (which is what makes it so great), but there are three
details to which you should pay attention.
</p>
</div>
<div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1">The Jekyll Header</h3>
<div class="outline-text-3" id="text-2-1">
<p>
Jekyll, which is the formatting engine at GitHub that turns your
the raw HTML from the <code>org-mode</code> export into formatted web pages,
does not use Org's <code>#TITLE:</code>, <code>#DESCRIPTION:</code>, or other export
template fields, it uses its own. However, Org does share that and
you can reference those fields, by including things like <code>{ {
{title} } }</code> or <code>{ { {keywords} } }</code> (but without the extra space;
that's there to prevent expansion in my example)<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup>. <i>NB: This
expansion doesn't seem to work in Org-Move version 8.</i> The options
are described in the <a href="http://jekyllrb.com/docs/frontmatter/">Jekyll documentation</a> and must be exported when
Org exports as HTML. At the top of this document is a block that
looks like:
</p>
<pre class="example">
#+BEGIN_HTML
---
layout: page
title: { { {title} } }
tagline: Blogging the way you should
categories: computers
tags: web org-mode emacs org blog github bitbucket git
---
#+END_HTML
</pre>
<p>
The <code>layout</code> field is worth looking at. It has two options: <code>page</code>
and <code>post</code>. <code>page</code> makes it a standard web page—there are no
dates, comments, or other sort of bloggy things. <code>post</code> makes it a
more bloggy post, enabling comments if you have them enabled in
<code>_config.yml</code> and adding a date near the top of the file and
<i>Previous</i>, <i>Archive</i>, <i>Next</i> buttons at the bottom of the page.
</p>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2">Content</h3>
<div class="outline-text-3" id="text-2-2">
<p>
Once you have the Jekyll header (which, frankly, is a lot easier
to do than the preceding section merits), the rest of your
document is standard <code>org-mode</code> paragraphs, sections, links,
embedded code, etc.
</p>
<p>
The layout on the GihHub site will take the first paragraph or
section and put that in the index, saving the rest for the
click-through; I think it looks nice to have a paragraph before
the first section that your reader will see on the main page,
instead of a table of contents or a section heading and the first
part of the section.
</p>
<p>
The filename needs to be of the format
<code>YYYY-MM-DD-the-actual-title</code> for accurate parsing by Jekyll.
</p>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3">Images and Other Assests</h3>
<div class="outline-text-3" id="text-2-3">
<p>
The publishing template that will be in your <code>.emacs</code> file will
move images around between your source content and the publishing
location. This can lead to some confusion between standard HTML
rendering, publishing to the Jekyll location, and LaTeX or other
export formats.
</p>
<p>
Jekyll can support images in its <code>assets/</code> that are referenced in
the blog posts like <code><img src=/assets/img-name.jpg /></code>, but the
"root-ed" path doesn't work well with normal <code>org-mode</code> exports.
</p>
<p>
While there are likely more elegant solutions, I set up different
export options depending on the type of export. For example:
</p>
<pre class="example">
#+BEGIN_HTML
<img src=/assets/YYYY-MM-DD-picture-name.jpg />
#+END_HTML
#+BEGIN_LATEX
\includegraphics[width=0.8\textwidth]{images/YYYY-MM-DD-picture-name.jpg}
#+END_LATEX
</pre>
<p>
Although the image file names don't need the <code>YYYY-MM-DD</code> format
the same way, they help me keep things organized.
</p>
<p>
Other files you'd like to reference can also be put in the
<code>/assets</code> directory.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Publishing Blog Postings</h2>
<div class="outline-text-2" id="text-3">
<p>
With a written blog posting, you can remove the examples from
<code>username.github.io/_posts</code> and leave the empty directory. It will
be re-populated when you publish your blog post.
</p>
</div>
<div id="outline-container-sec-3-1" class="outline-3">
<h3 id="sec-3-1">Configuring your .emacs file</h3>
<div class="outline-text-3" id="text-3-1">
<p>
<code>org-mode</code> has a "publish" facility<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup> that is configured in
your <code>.emacs</code> file in two sections, once for the rendered HTML and
one for the static content (usually images).
</p>
<div class="org-src-container">
<pre class="src src-emacs-lisp">(require 'org-publish)
(setq org-publish-project-alist
<span id="coderef-name1" class="coderef-off"> '(("org-acaird" (name1)</span>
;; Path to your org files.
<span id="coderef-srcdir" class="coderef-off"> :base-directory "~/Documents/blog" (srcdir)</span>
<span id="coderef-extension" class="coderef-off"> :base-extension "org" (extension)</span>
;; Path to your Jekyll project.
<span id="coderef-destination" class="coderef-off"> :publishing-directory "~/Documents/acaird.github.io/_posts" (destination)</span>
:recursive t
;; this was for org-mode pre-version 8
;;:publishing-function org-publish-org-to-html
;; this is for org-mode version 8 and on
:publishing-function org-html-publish-to-html
:headline-levels 4
:html-extension "html"
<span id="coderef-body-only" class="coderef-off"> :body-only t ;; Only export section between <body> </body> (body-only)</span>
)
<span id="coderef-name2" class="coderef-off"> ("org-static-acaird" (name2)</span>
<span id="coderef-imgsrc" class="coderef-off"> :base-directory "~/Documents/blog/images" (imgsrc)</span>
<span id="coderef-imgext" class="coderef-off"> :base-extension "css\\|js\\|png\\|jpg\\|gif\\|pdf\\|mp3\\|ogg\\|swf\\|php" (imgext)</span>
<span id="coderef-imgdest" class="coderef-off"> :publishing-directory "~/Documents/acaird.github.io/assets" (imgdest)</span>
:recursive t
:publishing-function org-publish-attachment)
<span id="coderef-combo" class="coderef-off"> ("blog" :components ("org-acaird" "org-static-acaird")) (combo)</span>
))
</pre>
</div>
<p>
While this is mostly readable, there are a few things to point out
that you might want to edit.
</p>
<ul class="org-ul">
<li>Line <a href="#coderef-name1"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-name1');" onmouseout="CodeHighlightOff(this, 'coderef-name1');">name1</a> defines the name of the rendered section; it doesn't
much matter what it is called because it is included in the
definition of what you'll usually use (in line <a href="#coderef-combo"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-combo');" onmouseout="CodeHighlightOff(this, 'coderef-combo');">combo</a>)
</li>
<li>Line <a href="#coderef-srcdir"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-srcdir');" onmouseout="CodeHighlightOff(this, 'coderef-srcdir');">srcdir</a> defines the source directory; this is where your
<code>.org</code> files go
</li>
<li>Line <a href="#coderef-extension"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-extension');" onmouseout="CodeHighlightOff(this, 'coderef-extension');">extension</a> sets the extensions that the export will
consider; this is set to only look at files ending in <code>.org</code>, but
you can add to it following the pattern in line <a href="#coderef-imgext"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgext');" onmouseout="CodeHighlightOff(this, 'coderef-imgext');">imgext</a> if you
use files that end in <code>.txt</code> or something else
</li>
<li>Line <a href="#coderef-destination"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-destination');" onmouseout="CodeHighlightOff(this, 'coderef-destination');">destination</a> defines where the <code>.html</code> files will be written
on export
</li>
<li>Line <a href="#coderef-name2"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-name2');" onmouseout="CodeHighlightOff(this, 'coderef-name2');">name2</a> is similar to line <a href="#coderef-name1"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-name1');" onmouseout="CodeHighlightOff(this, 'coderef-name1');">name1</a>, but it defines the name
for the rules that handle static items (images and other things
not processed by <code>org-mode</code>)
</li>
<li>Line <a href="#coderef-imgsrc"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgsrc');" onmouseout="CodeHighlightOff(this, 'coderef-imgsrc');">imgsrc</a> is the directory where the images are; it is the
analog of line <a href="#coderef-srcdir"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-srcdir');" onmouseout="CodeHighlightOff(this, 'coderef-srcdir');">srcdir</a>
</li>
<li>Line <a href="#coderef-imgext"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgext');" onmouseout="CodeHighlightOff(this, 'coderef-imgext');">imgext</a> defines the extensions of the files that will be
moved from the directory defined in line <a href="#coderef-imgsrc"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgsrc');" onmouseout="CodeHighlightOff(this, 'coderef-imgsrc');">imgsrc</a> to the directory
defined in line <a href="#coderef-imgdest"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgdest');" onmouseout="CodeHighlightOff(this, 'coderef-imgdest');">imgdest</a>
</li>
<li>Line <a href="#coderef-combo"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-combo');" onmouseout="CodeHighlightOff(this, 'coderef-combo');">combo</a> combines the rendered and static sections into one
name (in the case, <code>blog</code>) to use for exporting
</li>
</ul>
<p>
Once your <code>.emacs</code> file has those lines in it and they are
evaluated, you can type <code>C-c C-e</code> to bring up the export menu.
From the menu, choose <code>X</code> and when prompted, enter the name
defined in line <a href="#coderef-combo"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-combo');" onmouseout="CodeHighlightOff(this, 'coderef-combo');">combo</a> above.
</p>
<p>
That will export the HTML to the <code>username.github.io/_posts</code>
directory (as defined in line <a href="#coderef-destination"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-destination');" onmouseout="CodeHighlightOff(this, 'coderef-destination');">destination</a> above) and copy the
images to the destination defined in line <a href="#coderef-imgdest"class="coderef" onmouseover="CodeHighlightOn(this, 'coderef-imgdest');" onmouseout="CodeHighlightOff(this, 'coderef-imgdest');">imgdest</a> above.
</p>
<p>
Once the files are in the proper locations in the
<code>username.github.io</code> directory, they need to be added to the
repository and pushed to GitHub for publishing.
</p>
</div>
</div>
<div id="outline-container-sec-3-2" class="outline-3">
<h3 id="sec-3-2">Using git commands to publish</h3>
<div class="outline-text-3" id="text-3-2">
<p>
To publish your newly created HTML files, go to your
<code>username.github.io</code> directory and <code>git add</code> and <code>git commit</code> your
new files and <code>git push</code> them to Github.
</p>
<p>
After waiting a few minutes for GitHub to process your file,
you'll see the title and the first bit of text at
<a href="http://username.github.io">http://username.github.io</a> and clicking the title will show you the
full post.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Advanced Topics</h2>
<div class="outline-text-2" id="text-4">
</div><div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1">Themes</h3>
<div class="outline-text-3" id="text-4-1">
<p>
Jekyll supports themes to change the look of your blog pages.
There are some sample themes at <a href="http://themes.jekyllbootstrap.com/"><a href="http://themes.jekyllbootstrap.com/">http://themes.jekyllbootstrap.com/</a></a>
with installation instructions.
</p>
<p>
Jekyll themes are installed into your <code>username.github.io</code>
directory and repository and when it is pushed to GitHub, the
theme you have chosen is applied.
</p>
<p>
Jekyll uses <code>rake</code> (Ruby mAKE) to install and manage themes;
<code>rake</code> comes with MacOS and, likely, with Linux. For Windows you
might have to install Ruby.
</p>
<p>
To install the theme called <code>the-minimum</code> from
<a href="http://themes.jekyllbootstrap.com/"><a href="http://themes.jekyllbootstrap.com/">http://themes.jekyllbootstrap.com/</a></a>, follow these steps
</p>
<ol class="org-ol">
<li><code>cd</code> into your <code>username.github.io</code> directory
</li>
<li>type <code>rake theme:install
git</code>"<a href="https://github.com/jekyllbootstrap/theme-the-minimum.git">https://github.com/jekyllbootstrap/theme-the-minimum.git</a>"=
</li>
<li>commit the new theme: <code>git commit -a</code>
</li>
<li>push the repository to GitHub, wait 5 minutes, and reload your
pages to see the new themes.
</li>
</ol>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2">Comments</h3>
<div class="outline-text-3" id="text-4-2">
<p>
Jekyll can use <a href="http://www.disqus.com">Disqus</a> to support blog comments. This is described
in the <code>_config.yml</code> file in your <code>username.github.io</code> directory.
</p>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3">Google Analytics</h3>
<div class="outline-text-3" id="text-4-3">
<p>
At <i>www.google.com/analytics</i> you can set up an analytics page for
your <code>github.io</code> pages; the end of the set-up results in a
tracking ID that is a 13 character (at least, mine is) string.
Add that string to the <code>analytics:</code> section of <code>_config.yml</code>
and push the changes to GitHub. After a few days, if anyone looks
at your blog, you'll see some nice statistics at Google.
</p>
<pre class="example">
analytics :
provider : google
google :
tracking_id : 'AA-12345678-9'
</pre>
</div>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
Most of this came from <a href="http://jekyllbootstrap.com"><a href="http://jekyllbootstrap.com">http://jekyllbootstrap.com</a></a>
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> <p class="footpara">
More documentation on this in <code>org-mode</code> is at
<a href="http://orgmode.org/org.html#Macro-replacement"><a href="http://orgmode.org/org.html#Macro-replacement">http://orgmode.org/org.html#Macro-replacement</a></a>
</p></div>
<div class="footdef"><sup><a id="fn.3" name="fn.3" class="footnum" href="#fnr.3">3</a></sup> <p class="footpara">
More on the <code>org-mode</code> publishing is at
<a href="http://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html"><a href="http://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html">http://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html</a></a>
</p></div>
</div>
</div>
Using Python to Update Google Sites Pages2013-05-22T00:00:00+00:00http://acaird.github.io/computers/2013/05/22/gsites-with-python<p>
<a href="http://sites.google.com">Google Sites</a> is a popular host for web pages because of its cost
(nothing), its integration with Googles suite of productivity tools,
and its ease of use. To support automated updates of web pages and
other administrative functions, Google offers a programmatic
interface (API) to its web-based tools, called <a href="https://developers.google.com/gdata/docs/directory">Gdata</a>. Following is
an example up authenticating to Google and updating a page on a
Google Sites website.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Why update Google Sites pages with Python?</h2>
<div class="outline-text-2" id="text-1">
<p>
Many people ran their own local web servers since the dawn of the
web, and took advantage of their ownership of those web servers by
having programmatically updated web pages within their website.
This was often done by automatically generated HTML pages,
server-side includes, or local scripts run by the web server.
</p>
<p>
Now, however, many people are more interested in using cloud-based
services for web pages—they leave all of the operations to
someone else, often they can scale better than a local webserver
could be scaled, and they often have very friendly interfaces that
allow for updates by people who are not fluent in HTML or other web
technologies. A common choice for web pages that are hosted on
someone elses' webserver is using Google Sites.
</p>
<p>
Google Sites are excellent for creating web pages with rich content
(videos, images, text) and controlling access to that content.
Local scripts or server-side includes are not permitted, but it is
possible to programmatically update Google Sites pages.
</p>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> How to update Google Sites pages with Python</h2>
<div class="outline-text-2" id="text-2">
<p>
Google provides a library of tools called GData<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup> that allows
computer programs to read data from and write data to many of the
Google sites. The GData libraries are available in several
languages (for more information see
<a href="https://developers.google.com/gdata/"><a href="https://developers.google.com/gdata/">https://developers.google.com/gdata/</a></a>), but the easiest for me to
use was Python, even though I don't really know how to program in
Python.
</p>
</div>
<div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1"><span class="section-number-3">2.1</span> Installing GData</h3>
<div class="outline-text-3" id="text-2-1">
<p>
First, I got the GData Python client library from
<a href="https://developers.google.com/gdata/"><a href="https://developers.google.com/gdata/">https://developers.google.com/gdata/</a></a> and installed it in my home
directory by finding the <code>setup.py</code> in the GData distribution and
typing the command:
</p>
<pre class="example">
python setup.py --home=~/python/
</pre>
<p>
I also ran the included tests to make sure it was all working.
</p>
<p>
GData comes with everything you need to work programmatically with
information at Google.
</p>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2"><span class="section-number-3">2.2</span> Creating an API Project</h3>
<div class="outline-text-3" id="text-2-2">
<p>
In order for your Python program to talk to Google, you need to
create an API Client ID, which you can do for free at
<a href="https://code.google.com/apis/console"><a href="https://code.google.com/apis/console">https://code.google.com/apis/console</a></a>. An API Client ID will give
you a <i>Client ID</i> and a <i>Client secret</i>, both of which you'll need
in your Python program.
</p>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3"><span class="section-number-3">2.3</span> The Beginning of my Python program</h3>
<div class="outline-text-3" id="text-2-3">
<p>
To get started, I imported the Python libraries I knew I'd need.
I learned about the required <code>gdata</code> libraries from the API
documentation.
</p>
<div class="org-src-container">
<pre class="src src-python" id="imports">import sys
import os
import time
# adjust the next line for your installation of gdata
sys.path.append('/Users/acaird/python/lib/python')
import atom.data
import gdata.sites.client
import gdata.sites.data
import gdata.gauth
</pre>
</div>
<p>
This block of code imports the standard Python libraries <code>sys</code>, <code>os</code>,
and <code>time</code>, and you'll see those used later (in the case of <code>sys</code>,
not too much later).
</p>
<p>
Next I use the <code>sys</code> library to tell Python where I installed the
<code>gdata</code> library with the <code>sys.path.append</code> function. You will
almost certainly want to edit that. You can also use the
<code>PYTHONPATH</code> environment variable.
</p>
<p>
Once the program can find the <code>gdata</code> libraries, I import the ones
the documentation says I'll need.
</p>
<p>
At this point, I have all of the tools I need.
</p>
</div>
</div>
<div id="outline-container-sec-2-4" class="outline-3">
<h3 id="sec-2-4"><span class="section-number-3">2.4</span> Authorization to edit pages</h3>
<div class="outline-text-3" id="text-2-4">
<p>
The next block of source code handles the authorization of the
program to make changes to a Google Sites page. The authorization
is done using OAuth, an open standard and one that is well
supported in the GData library<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup>. The flow of the code is:
</p>
<ol class="org-ol">
<li>Set a location for cached credentials
</li>
<li>Try to open the file in that location
<ol class="org-ol">
<li>If the file can be opened, try to read a <code>gauth</code> token from
the file
</li>
<li>If the file cannot be opened, set the token to <code>None</code>
</li>
</ol>
</li>
<li>If there isn't a token, talk to Google to get one
This process will print out a URL to be followed for
authorization and ask for a key from the authorization to be
entered, then authorize the client (this program, via the
variable <code>client</code>), then save the credentials.
</li>
<li>If there is a token, it is used to authorize the client
</li>
</ol>
<p>
In this case, the client secret isn't a secret.<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup> The
<code>user_agent</code> can be anything meaningful to you so you can look at the
logs and see when your Python program changed your web content and
when a person changed it.
</p>
<p>
You'll notice in this code block we create the variable <code>client</code>; in
that creation we also select the Google Site we want to edit, in this
case it is confusingly called the same as my name, <code>acaird</code>. I
suspect, but don't know for sure, you could read the sites (as below)
and select from a list programmatically. In my case I know the name
of the site I want to update, so I just typed it in.
</p>
<p>
The <code>scope</code> in the <code>gdata.gauth.OAuth2Token</code> function call is
specific for Google Sites. For a list of other scopes, see
<a href="http://googlecodesamples.com/oauth_playground/"><a href="http://googlecodesamples.com/oauth_playground/">http://googlecodesamples.com/oauth_playground/</a></a>.
</p>
<p>
<b>WARNING</b> The file to which the token is written is important, it
should be protected or removed if you aren't certain it can be kept
safe.
</p>
<p>
#+NAME vars
#+NAME authorization
</p>
<div class="org-src-container">
<pre class="src src-python">token_cache_path=os.environ['HOME']+'/.gdata-storage'
print "Token Cache: %s" % token_cache_path
try:
with open(token_cache_path, 'r') as f:
saved_blob_string=f.read()
if saved_blob_string is not None:
token = gdata.gauth.token_from_blob(saved_blob_string)
else:
token = None
except IOError:
token = None
if token == None :
print "Getting a new token."
token = gdata.gauth.OAuth2Token( client_id=MyClientId,
client_secret=MyClientSecret,
scope='https://sites.google.com/feeds/',
user_agent='acaird-acexample-v1')
url = token.generate_authorize_url(redirect_uri='urn:ietf:wg:oauth:2.0:oob')
print 'Please go to the URL below and authorize this '
print 'application, then enter the code it gives you.'
print ' %s' % url
code = raw_input("Code: ")
token.get_access_token(code)
client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird')
token.authorize(client)
saved_blob_string = gdata.gauth.token_to_blob(token)
f=open (token_cache_path, 'w')
f.write(saved_blob_string)
else:
print "Using a cached token from %s" % token_cache_path
client = gdata.sites.client.SitesClient(source='acaird-acexample-v1', site='acaird')
token.authorize(client)
f.close()
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-2-5" class="outline-3">
<h3 id="sec-2-5"><span class="section-number-3">2.5</span> Reading data from Google Sites</h3>
<div class="outline-text-3" id="text-2-5">
<div class="org-src-container">
<pre class="src src-python">feed = client.GetSiteFeed()
print 'Google Sites associated with your account: '
counter = 0
for entry in feed.entry:
print ' %i %s (%s)' % (counter,entry.title.text, entry.site_name.text)
counter = counter + 1
print ' --- The End ---'
</pre>
</div>
<p>
This section of code, when run on my account, produces this output:
</p>
<pre class="example">
Google Sites associated with your account:
0 acaird (acaird)
1 CD Squared Project (umcdsquared)
2 U-M GPR Project (umichgpr)
3 ORCI Project Site (umorciprojectsite)
4 UM Projects (umprojectstruthkos)
--- The End ---
</pre>
<p>
Since we already selected the <code>acaird</code> Google Site when we
initialized <code>client</code>, we can start fetching content from it.
</p>
<p>
I'm not sure what most of the code below does, but at the end, <code>old</code>
contains the HTML of the first webpage in the <code>acaird</code> Google Site,
which was my goal.
</p>
<div class="org-src-container">
<pre class="src src-python">kind = 'webpage'
print 'Fetching only %s entries' % kind
uri = '%s?kind=%s' % (client.MakeContentFeedUri(), kind)
feed = client.GetContentFeed(uri=uri)
print "Fetching content feed of '%s'...\n" % client.site
feed = client.GetContentFeed()
uri = '%s?kind=%s' % (client.MakeContentFeedUri(),'webpage')
feed = client.GetContentFeed(uri=uri)
old=feed.entry[0]
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-2-6" class="outline-3">
<h3 id="sec-2-6"><span class="section-number-3">2.6</span> Writing to a Google Sites Page</h3>
<div class="outline-text-3" id="text-2-6">
<p>
To make sure we're updating the web page, here's the current date and
time for later use, and comparison between the output on this screen
and what is in the web page.
</p>
<div class="org-src-container">
<pre class="src src-python">time = time.asctime()
print "Time: %s" % time
</pre>
</div>
<p>
Then I create some new HTML, stored in <code>old.content.html</code>, which I
could print out, but I've commented out that line.
</p>
<p>
Then I call <code>client.Update</code> with the <code>feed.entry</code> in <code>old</code> to update
the page.
</p>
<div class="org-src-container">
<pre class="src src-python">old.content.html = '''
<html:div xmlns:html="http://www.w3.org/1999/xhtml">
<html:table cellspacing="0" border="1"
class="sites-layout-name-one-column sites-layout-hbox">
<html:tbody>
<html:tr>
<html:td class="sites-layout-tile sites-tile-name-content-1">
<html:div dir="ltr">&#160;This is my web page.
It was last updated on %s by <kbd>%s</kbd><br />
</html:div>
</html:td>
</html:tr>
</html:tbody>
</html:table>
</html:div>
''' % (time,sys.argv[0])
# print old.content.html
updated_entry = client.Update(old)
print 'Web page updated.'
</pre>
</div>
</div>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
<a href="http://en.wikipedia.org/wiki/GData">http://en.wikipedia.org/wiki/GData</a>
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> <p class="footpara">
<a href="https://developers.google.com/api-client-library/python/guide/aaa_oauth"><a href="https://developers.google.com/api-client-library/python/guide/aaa_oauth">https://developers.google.com/api-client-library/python/guide/aaa_oauth</a></a>
is a good reference for using the Python library version of GData's OAuth.
</p></div>
<div class="footdef"><sup><a id="fn.3" name="fn.3" class="footnum" href="#fnr.3">3</a></sup> <p class="footpara">
According to
<a href="https://developers.google.com/accounts/docs/OAuth2#installed"><a href="https://developers.google.com/accounts/docs/OAuth2#installed">https://developers.google.com/accounts/docs/OAuth2#installed</a></a> "The
client_id and client_secret obtained during registration are embedded
in the source code of your application. In this context, the
client_secret is obviously not treated as a secret."
</p></div>
</div>
</div>
Bourbon with Rob2013-05-14T00:00:00+00:00http://acaird.github.io/drinking/2013/05/14/mi-bourbon-with-rob<p>
Rob and his family came over tonight with a daughter who needed her
ankle looked at by an orthopaedic surgeon (not me) and bottle of
Michigan-made bourbon (much more me)
</p>
<p>
He brought Traverse City Whiskey Co. Bourbon to compare to my bottle
of Michigan-made bourbon—New Holland Artisan Spirits Beer Barrel
Bourbon.
</p>
<img src=/assets/2013-05-14-michigan-bourbon.jpg align="right" />
<p>
We tried the New Holland Bourbon first. It had a pleasant sweetness
but a moderately sharp alcohol bite at first.
</p>
<p>
The Traverse City Whiskey Co. Bourbon was much less sweet and much
less sharp than the New Holland. It wasn't as complex, but was very
easy to drink.
</p>
<p>
Rob has a history of "blind taste tests of alcoholic drinks", so we
thought we'd try a simple version. I went into the living room and
he poured me a glass of one of the two. With a 50/50 chance, I tried
to taste for either the bite and sweetness of the New Holland or the
smoothness of the Traverse City. I was successful in my guess that
he poured me the New Holland.
</p>
Org in 4 Sections (plus a demo)2013-03-29T00:00:00+00:00http://acaird.github.io/computers/2013/03/29/intro-to-org<p>
Using Emacs org-mode to edit text files (as opposed to its intended purpose
as a task organizer) makes for very civilized files.
</p>
<p>
First, they are text files and can be emailed, read on any computer without
much software, <code>grep</code>'d, <code>sed</code>'d, included in a source code revision control
system (RCS, Hg, Git, SVN, etc.), and compressed a lot.
</p>
<p>
Second, org-mode has excellent support for output formats, dealing especially
well with HTML and LaTeX.
</p>
<p>
Third, the markup for basic editing can be learned very quickly; that's what
we'll do here.
</p>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Getting Org</h2>
<div class="outline-text-2" id="text-1">
</div><div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1"><span class="section-number-3">1.1</span> Emacs</h3>
<div class="outline-text-3" id="text-1-1">
<p>
Org is an Emacs mode, so you need Emacs and the Org package.
</p>
<dl class="org-dl">
<dt> Windows </dt><dd>get emacs from <a href="http://ftp.gnu.org/gnu/emacs/windows/">ftp.gnu.org/gnu/emacs/windows/</a>
</dd>
<dt> Mac </dt><dd>Use Mac Ports (<a href="http://www.macports.org">macports.org</a>) (<code>port install emacs</code>)
</dd>
<dt> Linux </dt><dd>get emacs via your favorite package manager
</dd>
</dl>
</div>
</div>
<div id="outline-container-sec-1-2" class="outline-3">
<h3 id="sec-1-2"><span class="section-number-3">1.2</span> Org-mode</h3>
<div class="outline-text-3" id="text-1-2">
<dl class="org-dl">
<dt> Windows </dt><dd>The Org-mode files come with the Emacs installation from
<code>ftp.gnu.org</code>
</dd>
<dt> Mac </dt><dd>Use Mac Ports and install the <code>org-mode</code> package (<code>port install org-mode</code>)
</dd>
<dt> Linux </dt><dd>Either the Org-mode files will come with the Emacs you install,
or you can get them from the package manager
</dd>
</dl>
</div>
</div>
<div id="outline-container-sec-1-3" class="outline-3">
<h3 id="sec-1-3"><span class="section-number-3">1.3</span> Useful configurations</h3>
<div class="outline-text-3" id="text-1-3">
<p>
It's all emacs, so configuration is done in your <code>~/.emacs</code> file.
</p>
<pre class="example">
(add-to-list 'auto-mode-alist '("\\.\\(org\\|txt\\)$" . org-mode))
(add-hook 'org-mode-hook 'turn-on-auto-fill)
(require 'org-latex)
</pre>
<p>
Those three lines will:
</p>
<ul class="org-ul">
<li>enter org-mode when you open a file ending in <code>.org</code> or <code>.txt</code>
</li>
<li>turn <code>auto-fill-mode</code> on in Emacs when you're in org-mode
</li>
<li>import <code>org-latex</code>
</li>
</ul>
<p>
You can also enter org-mode by typing <code>Esc-x</code> or <code>Meta-x</code> followed by typing
<code>org-mode</code>.
</p>
<p>
When typing in emacs, you can re-wrap lines of text with <code>Esc-q</code> or <code>Meta-q</code>.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Sections, Subsections, Lists, and Formatting</h2>
<div class="outline-text-2" id="text-2">
</div><div id="outline-container-sec-2-1" class="outline-3">
<h3 id="sec-2-1"><span class="section-number-3">2.1</span> Sections and Lists</h3>
<div class="outline-text-3" id="text-2-1">
<ul class="org-ul">
<li>It is plain text, so a section is set up with an asterisk <code>*</code> at the start
of a line, a subsection is two asterisks <code>**</code> at the start of a line, etc.
</li>
<li>A bulleted list is started with any of a dash <code>-</code>, a plus <code>+</code>, and sublists
are just indented.
</li>
<li>A numbered list is started with numbers. Sub-lists with numbers are
intented one or more spaces.
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-2-2" class="outline-3">
<h3 id="sec-2-2"><span class="section-number-3">2.2</span> Example</h3>
<div class="outline-text-3" id="text-2-2">
<pre class="example">
* Section Title
Some text that I would like to add.
The second paragraph of the text I'd like to add.
# this is a comment
Here is a list
- item one
- item two
- subitem one
** Sub-Section Title
1. item one
2. item two
1. item two-sub-one
</pre>
</div>
</div>
<div id="outline-container-sec-2-3" class="outline-3">
<h3 id="sec-2-3"><span class="section-number-3">2.3</span> Formatting</h3>
<div class="outline-text-3" id="text-2-3">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="left"/>
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Format</th>
<th scope="col" class="left">What to Type</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left"><b>Bold</b></td>
<td class="left">* Bold *</td>
</tr>
<tr>
<td class="left"><i>Italic</i></td>
<td class="left">/ Italic /</td>
</tr>
<tr>
<td class="left"><code>Fixed</code></td>
<td class="left">= Fixed =</td>
</tr>
<tr>
<td class="left"><span class="underline">underlined</span></td>
<td class="left">\_ underlined \_</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Tables</h2>
<div class="outline-text-2" id="text-3">
</div><div id="outline-container-sec-3-1" class="outline-3">
<h3 id="sec-3-1"><span class="section-number-3">3.1</span> Tables</h3>
<div class="outline-text-3" id="text-3-1">
<pre class="example">
|--------------------+---------+------+----------|
| Car Make | Cost | City | X-factor |
| and Model | | MPG | |
|--------------------+---------+------+----------|
| Fiat 500 | $19,000 | 30 | 6 |
| VW Bug | $26,390 | 20 | 7 |
| Volvo C70 | $39,950 | 20 | 5 |
| Audi A5 Cabrio | $42,000 | 22 | 9 |
|--------------------+---------+------+----------|
</pre>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="left"/>
<col class="right"/>
<col class="right"/>
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Car Make</th>
<th scope="col" class="left">Cost</th>
<th scope="col" class="right">City</th>
<th scope="col" class="right">X-factor</th>
</tr>
<tr>
<th scope="col" class="left">and Model</th>
<th scope="col" class="left"> </th>
<th scope="col" class="right">MPG</th>
<th scope="col" class="right"> </th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">Fiat 500</td>
<td class="left">$19,000</td>
<td class="right">30</td>
<td class="right">6</td>
</tr>
<tr>
<td class="left">VW Bug</td>
<td class="left">$26,390</td>
<td class="right">20</td>
<td class="right">7</td>
</tr>
<tr>
<td class="left">Volvo C70</td>
<td class="left">$39,950</td>
<td class="right">20</td>
<td class="right">5</td>
</tr>
<tr>
<td class="left">Audi A5 Cabrio</td>
<td class="left">$42,000</td>
<td class="right">22</td>
<td class="right">9</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> Exports</h2>
<div class="outline-text-2" id="text-4">
</div><div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1"><span class="section-number-3">4.1</span> Export Template</h3>
<div class="outline-text-3" id="text-4-1">
<p>
Move to the top of your files and type <code>Ctrl-c</code>, <code>Ctrl-e</code> then press
<code>t</code> to insert an export template that you can edit to create a title,
author, and set some export options.
</p>
<p>
The export template for the slide version of this document is these
lines (plus the default lines in the export template):
</p>
<pre class="example">
#+TITLE: Org in 4 Sections (plus a demo)
#+AUTHOR: Andrew Caird
#+startup: beamer
#+LaTeX_CLASS: beamer
#+BEAMER_FRAME_LEVEL: 2
#+latex_header: \mode<beamer>{\usetheme{Frankfurt}}
</pre>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2"><span class="section-number-3">4.2</span> PDF</h3>
<div class="outline-text-3" id="text-4-2">
<p>
Generating PDF requires a LaTeX installation.
</p>
<dl class="org-dl">
<dt> Windows </dt><dd><a href="http://miktex.org/download"><a href="http://miktex.org/download">http://miktex.org/download</a></a>
</dd>
<dt> Mac </dt><dd><a href="http://www.tug.org/mactex/">http://www.tug.org/mactex/</a>
</dd>
<dt> Linux </dt><dd>your favorite package manager
</dd>
</dl>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3"><span class="section-number-3">4.3</span> HTML</h3>
<div class="outline-text-3" id="text-4-3">
<p>
Type <code>Ctrl-c Ctrl-e</code> to see the export menu and choose from one of these
options:
</p>
<pre class="example">
[h] export as HTML [H] to temporary buffer
[R] export region [b] export as HTML and open in browser
</pre>
</div>
</div>
<div id="outline-container-sec-4-4" class="outline-3">
<h3 id="sec-4-4"><span class="section-number-3">4.4</span> ASCII</h3>
<div class="outline-text-3" id="text-4-4">
<p>
Type <code>Ctrl-c Ctrl-e</code> to see the export menu and choose from one of these
options:
</p>
<pre class="example">
[a/n/u] export as ASCII/Latin-1/UTF-8
[A/N/U] to temporary buffer
</pre>
</div>
</div>
<div id="outline-container-sec-4-5" class="outline-3">
<h3 id="sec-4-5"><span class="section-number-3">4.5</span> Pandoc</h3>
<div class="outline-text-3" id="text-4-5">
<p>
<b>Pandoc</b> isn't an Emacs or Org-mode tool, but it is a file conversion
tool that is aware of Org-mode and can convert to and from many, many
formats.
</p>
<p>
For more information, see <a href="http://johnmacfarlane.net/pandoc/">johnmacfarlane.net/pandoc/</a>.
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="left"/>
</colgroup>
<thead>
<tr>
<th scope="col" class="left">From</th>
<th scope="col" class="left">To</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">markdown</td>
<td class="left">HTML formats</td>
</tr>
<tr>
<td class="left">reStructuredText</td>
<td class="left">ODT, DOCX</td>
</tr>
<tr>
<td class="left">textile</td>
<td class="left">LaTeX</td>
</tr>
<tr>
<td class="left">HTML</td>
<td class="left">PDF</td>
</tr>
<tr>
<td class="left">LaTeX</td>
<td class="left">Markdown, RST, AsciiDoc</td>
</tr>
<tr>
<td class="left">etc.</td>
<td class="left">etc.</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5"><span class="section-number-2">5</span> Demo and Resources</h2>
<div class="outline-text-2" id="text-5">
</div><div id="outline-container-sec-5-1" class="outline-3">
<h3 id="sec-5-1"><span class="section-number-3">5.1</span> Keystrokes</h3>
<div class="outline-text-3" id="text-5-1">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left"/>
<col class="left"/>
</colgroup>
<tbody>
<tr>
<td class="left">esc-enter</td>
<td class="left">make another entry at the same level</td>
</tr>
<tr>
<td class="left">tab</td>
<td class="left">open or close a section</td>
</tr>
<tr>
<td class="left">esc-arrows</td>
<td class="left">indent or outdent a section</td>
</tr>
<tr>
<td class="left">Ctrl-c Ctrl-e</td>
<td class="left">export the current file to another format</td>
</tr>
<tr>
<td class="left">Ctrl-g</td>
<td class="left">cancel the current command</td>
</tr>
<tr>
<td class="left">Ctrl-x Ctrl-f</td>
<td class="left">open a file</td>
</tr>
<tr>
<td class="left">Ctrl-x Ctrl-s</td>
<td class="left">save the file</td>
</tr>
<tr>
<td class="left">Ctrl-x Ctrl-c</td>
<td class="left">quit Emacs</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-sec-5-2" class="outline-3">
<h3 id="sec-5-2"><span class="section-number-3">5.2</span> Resources</h3>
<div class="outline-text-3" id="text-5-2">
<p>
The best resource is <a href="http://orgmode.org">orgmode.org</a>, and from there the most useful
document is the 39-page document <a href="http://orgmode.org/orgguide.pdf">The compact Org-mode Guide</a>, and you
likely won't need all 39-pages. The full Org-mode manual is also
there, but it can be quite dense.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-6" class="outline-2">
<h2 id="sec-6"><span class="section-number-2">6</span> About this post</h2>
<div class="outline-text-2" id="text-6">
<p>
History: <a href="https://github.com/acaird/acaird.github.io/commits/master/_posts/2013-03-29-intro-to-org.html"><a href="https://github.com/acaird/acaird.github.io/commits/master/_posts/2013-03-29-intro-to-org.html">https://github.com/acaird/acaird.github.io/commits/master/_posts/2013-03-29-intro-to-org.html</a></a>
</p>
</div>
</div>
Accessible Storage for Research Computing2013-03-14T00:00:00+00:00http://acaird.github.io/computers/2013/03/14/storage<div class="figure">
<p><img src="/assets/disks.png" alt="disks.png" />
</p>
<p><span class="figure-number">Figure 1:</span> Storage stack described in this document</p>
</div>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Research Storage Background</h2>
<div class="outline-text-2" id="text-1">
<p align=right>
<a href=/assets/2013-03-14-storage.pdf>Download PDF version</a>
</p>
<p>
Types:
</p>
<ul class="org-ul">
<li>Lustre / scratch
</li>
<li>NFS / working storage
</li>
<li>HDFS / Map-Reduce, Hadoop, HBase, columnnar storage
</li>
<li>Web Object Storage / HTTP semantics for large chunks of data
</li>
</ul>
<p>
Characteristics:
</p>
<ul class="org-ul">
<li>storage
</li>
<li>speed
</li>
<li>availability
</li>
</ul>
<p>
Tiers:
</p>
<ul class="org-ul">
<li>Lustre types: /nobackup on Nyx and /scratch on Flux
</li>
<li>NFS types: Value Storage and research storage
</li>
<li>HDFS: generic and highly-tuned
</li>
<li>WOS: local and distant
</li>
</ul>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="left" />
<col class="left" />
<col class="left" />
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Implementation on campus</th>
<th scope="col" class="left">Avail.</th>
<th scope="col" class="left">Speed</th>
<th scope="col" class="left">Capacity</th>
<th scope="col" class="left">Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">/nobackup on Nyx</td>
<td class="left">low</td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Lustre</td>
</tr>
<tr>
<td class="left">/scratch on Flux</td>
<td class="left">medium</td>
<td class="left">100Gbps</td>
<td class="left"> </td>
<td class="left">Lustre</td>
</tr>
<tr>
<td class="left">Value Storage</td>
<td class="left">medium</td>
<td class="left">3Gbps</td>
<td class="left"> </td>
<td class="left">NFSv3, POSIX</td>
</tr>
<tr>
<td class="left">Research Working Storage</td>
<td class="left">high</td>
<td class="left">40Gbps</td>
<td class="left"> </td>
<td class="left">NFSv4, POSIX</td>
</tr>
<tr>
<td class="left">HDFS</td>
<td class="left">user-selected</td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">M-R, HBase, etc</td>
</tr>
</tbody>
</table>
<p>
Strategy:
</p>
<ul class="org-ul">
<li>multi-tier that is X, Y, and Z
</li>
</ul>
</div>
<div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1"><span class="section-number-3">1.1</span> Strategy Background</h3>
<div class="outline-text-3" id="text-1-1">
<p>
(This is from a Google document of similar name at <a href="http://goo.gl/qHnIC">http://goo.gl/qHnIC</a>.)
</p>
<p>
As I understand demand, these are the minimal set of storage options required
to address the scholarly and administrative data and management requirements
at the U-M:
</p>
<ol class="org-ol">
<li>Lustre (parallel storage for HPC)
</li>
<li>NFSv3 (file system based storage)
</li>
<li>NFSv4 (file system based storage)
</li>
<li>CIFS (file system based storage)
</li>
<li>HDFS (distributed storage supporting “big data”: map-reduce (Hadoop),
column-oriented data (NoSQL), etc.)
</li>
<li>Object Storage (something like Amazon S3, replacing or augmenting file
system storage)
</li>
<li>SQL (Oracle, MS SQL Server, MySQL, etc.)
</li>
<li>Backups (relatively short time to recovery)
</li>
<li>Cold storage (relatively long time to recovery)
</li>
</ol>
<p>
These categories encode / encapsulate provisioning and configuration decisions
regarding:
</p>
<ol class="org-ol">
<li>Media (SSD, spinning disk, tape, cloud)
</li>
<li>On-disk formats and file systems
</li>
<li>Network protocols and networking / fabric capacity
</li>
<li>Media management (NAS, SAN, HFS, appliance, DAS, cloud, etc.)
</li>
</ol>
<p>
These storage options have to be made available as accessible services at
acceptable cost / capability tiers (where storage capability is traditionally
expressed in terms of capacity, performance, and availability).
</p>
<p>
They all require implementation decisions.
</p>
<p>
These implementation decisions can be optimized along dimensions of interest:
the number of platforms, vendors, campus providers, products, feature sets,
etc., or otherwise expressed requirements from different communities.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Research Storage Strategy</h2>
<div class="outline-text-2" id="text-2">
<p>
The need for electronic storage of research data at U-M will be met with three
broad classes of storage:
</p>
<ol class="org-ol">
<li><b>very high-speed, temporary storage</b> that is tightly coupled with the
high-performance computing environment
</li>
<li><b>high-speed, secure, and safe storage</b> that is broadly available on campus
and appears via the network as local storage
</li>
<li><b>storage for distributed processing of large amounts of unstructured data</b>
that is scalable in capacity and performance on premise or via a
cloud-computing provider
</li>
</ol>
<p>
The research working storage service is of the second type: data storage that
provides speed of access, security of access, and safety of data—research
working storage. Research working storage (RWS) and its associated business
processes are well-suited to research data, although can be used for other
types of data or as the basis for other services.
</p>
<p>
The intent of the storage service is that it will be useful to a large
fraction<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup> of researchers who need a service that will support
a data lifecycle and also be useful as a basis for other services (such as
data curation) or entitlements to staff, students, or faculty.
</p>
<p>
To be as useful as possible to as many researchers as possible, the Research
Working Storage service will provide:
</p>
<ul class="org-ul">
<li>tools to put as much of the control as possible over the life-cyle of data
into the hands of the data owner
</li>
<li>an integrated data-archiving service that is pre-paid so that there are no
on-going costs for archived data
</li>
<li>a for-fee subscription service appropriate for storing data associated with
research that integrates active storage, backups, archives, business
processes, and IT processes
</li>
<li>a service that is accessible to both the researchers and IT systems that
need it
</li>
<li>a service that is presented securely to on-campus consumers in the broadest
possible way and to the most possible clients
</li>
<li>a storage service that matches the performance of the data-generation and
data-analysis systems available on campus and compares favorably with the
performance of locally provisioned storage
</li>
<li>a service that includes back-ups and archiving of data while ensuring that
the data owner has control over the back-ups, restores, and archives of
their data
</li>
<li>a way that the storage consumers can manage the initiation, alteration and
termination of active storage; the restore or deletion of backup copies of
the data; and storage into and retrival from the archive location
</li>
<li>useful and actionable information to the consumer about the age and usage
of the storage to which they have subscribed
</li>
<li>a cost structure based on a flexible operation that can adopt the best
hardware-based or provider-based technology options without impacting the
delivered service, allowing the service manager to optimize the operations
for costs as the technologies and services change over time
</li>
<li>integration with U-M business practices so payment and billing is done in a
familiar environment
</li>
<li>integration with U-M IT practices so it can be used in a familiar manner to
other IT services
</li>
<li>professional IT and business operations and support
</li>
</ul>
<p>
The RWS service follows the model established for computing by Flux, where
there is a capital investment in the initial service and the unused capacity
before there are enough subscribers to reach sustainability. Following the
capital investment and aggregation of a subscriber base, the money recovered
by the rate would fund the replacement hardware, and the amount of money
recovered by the rate would inform the size of the next version.<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup>
</p>
<p>
Aggregation and abstraction are the key components of this service from an
administrative perspective, as they allow for cost management, economies of
scale, and vendor optimization. At the same time, performance, security, and
data protection are key components of this service from a researchers’
perspective.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Implementation</h2>
<div class="outline-text-2" id="text-3">
<p>
There are three components to the RWS service, the integrated set of which
includes procedures for acquisition, monitoring, and termination that are
integrated into the U-M business processes and easily used by the research
community we support.
</p>
</div>
<div id="outline-container-sec-3-1" class="outline-3">
<h3 id="sec-3-1"><span class="section-number-3">3.1</span> High-speed Storage</h3>
<div class="outline-text-3" id="text-3-1">
<p>
The first of the three components is a high-speed storage service. This
service is what is presented to the researcher and is a proxy for the back-up
and archive services. The quantity of the storage for which the subscriber
pays can be varied over time. The subscriptions can be funded by different
sources with the storage presented to the subscriber as either an aggregated
amount across funding sources or as separate amounts between funding sources,
depending on the needs of the subscriber.
</p>
</div>
<div id="outline-container-sec-3-1-1" class="outline-4">
<h4 id="sec-3-1-1"><span class="section-number-4">3.1.1</span> Technical Details</h4>
<div class="outline-text-4" id="text-3-1-1">
<p>
The technical details of the high-speed storage are:
</p>
<ul class="org-ul">
<li>the storage system provides aggregate bandwidth on the order of
high-performance computing interconnects<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup>; today this
is 40Gbps, but will increase as networking technology advances
</li>
<li>the storage system will provide snap-shots of data on disk for simple,
user-directed recovery of data that is was changed or deleted and for
which a previous version is needed
</li>
<li>the protocols of the storage will be directed toward the systems that are
most likely to consume it; today this is NFS<sup><a id="fnr.4" name="fnr.4" class="footref" href="#fn.4">4</a></sup>
</li>
<li>the presentation of the storage will be to on-campus clients but will not
be restricted to managed systems in order to support as many researchers
and devices as possible; to ensure security with this broad presentation,
the storage will initially be offered via Version 4 of the NFS protocol
(NFSv4) and be integrated into the existing campus Kerberos infrastructure
to provide strong authentication
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-3-1-2" class="outline-4">
<h4 id="sec-3-1-2"><span class="section-number-4">3.1.2</span> Service Details</h4>
<div class="outline-text-4" id="text-3-1-2">
<p>
The service details of the storage are:
</p>
<ul class="org-ul">
<li>the storage will be sold in an ``allocation model’’ where each 50GB-6
month<sup><a id="fnr.5" name="fnr.5" class="footref" href="#fn.5">5</a></sup><sup>, </sup><sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup> unit will have a start date, end date, and
funding source. Units of storage can optionally be combined in projects
so they are presented to the researcher as one aggregated pool of storage
</li>
<li>storage projects will be implemented by system quotas that will vary as
the allocation units come and go; this will also support usage reporting
to the researchers
</li>
<li>when an allocation unit expires, the disk quota will be set to the sum of
the remaining active allocations; if this quota is less than the total
data stored, no more data can be written, but data can be read
</li>
<li>if there are no additional allocations then the disk quota is set to zero.
At this point no data can be written to the disk space owned by the
project, but data can read; if no new allocations are made after two
weeks<sup><a id="fnr.6" name="fnr.6" class="footref" href="#fn.6">6</a></sup>, the data is removed from active storage but kept in
the backups for the duration of the backup retention time. Backups can be
restored to active storage if a new storage allocation is created or they
can be archived to long-term storage if the project with which the data is
associated has remaining archive credits.
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-sec-3-2" class="outline-3">
<h3 id="sec-3-2"><span class="section-number-3">3.2</span> Backups</h3>
<div class="outline-text-3" id="text-3-2">
<p>
All of the data stored is also backed up and a set of backups are kept for a
reasonable period of time and can be restored, deleted, or archived by the
owner of each project.
</p>
<p>
The management of the backups is via a web-based presentation of the data
contained in the backups, command-line tools on some systems, and email-based
support.
</p>
</div>
<div id="outline-container-sec-3-2-1" class="outline-4">
<h4 id="sec-3-2-1"><span class="section-number-4">3.2.1</span> Technical Details</h4>
<div class="outline-text-4" id="text-3-2-1">
<p>
The technical details of the backups are:
</p>
<ul class="org-ul">
<li>the backups of the data are kept for a resonable time and a reasonable
number of copies are kept; initially the backups will cover a time span of
one year, with copies from the previous day; the previous one, two, three
and four weeks; the previous one, two, three, four, five six, eight, ten,
and twelve months (a total of fourteen
copies)<sup><a id="fnr.7" name="fnr.7" class="footref" href="#fn.7">7</a></sup><sup>, </sup><sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup>.
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-3-2-2" class="outline-4">
<h4 id="sec-3-2-2"><span class="section-number-4">3.2.2</span> Service Details</h4>
<div class="outline-text-4" id="text-3-2-2">
<p>
The service details of the backups are:
</p>
<ul class="org-ul">
<li>the backups are only of data from the active storage and are not offered
as a general purpose backup system; there are already several on-campus
options for that service
</li>
<li>the backup system integrated with the RWS service is unique to it because
of the high performance required of it to back up the amount of data the
system is designed to store and because there is only one client from
which to back up data; other backup services on campus do not have the
same performance requirement and must support many
clients<sup><a id="fnr.8" name="fnr.8" class="footref" href="#fn.8">8</a></sup><sup>, </sup><sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup>
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-sec-3-3" class="outline-3">
<h3 id="sec-3-3"><span class="section-number-3">3.3</span> Archives</h3>
<div class="outline-text-3" id="text-3-3">
<p>
The archive portion of the research working storage service will use a
cloud-based data archive solution<sup><a id="fnr.9" name="fnr.9" class="footref" href="#fn.9">9</a></sup>. The process of depositing
and withdrawing data from the archive is researcher-directed and the level of
curation is at the discretion of the researcher. The style of archives in
this service can be called ``data graveyard’’ or ``data dumping ground’’ in
contrast with a curated archival solution that a data management group or
library<sup><a id="fnr.10" name="fnr.10" class="footref" href="#fn.10">10</a></sup> might offer.
</p>
</div>
<div id="outline-container-sec-3-3-1" class="outline-4">
<h4 id="sec-3-3-1"><span class="section-number-4">3.3.1</span> Technical Details</h4>
<div class="outline-text-4" id="text-3-3-1">
<p>
The technical details of the archives are:
</p>
<ul class="org-ul">
<li>the data archive will use a cloud-based data archive solution; a local
abstraction layer will allow U-M to choose appropriate cloud service
providers as the market changes over time<sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup>
</li>
<li>today the most likely cloud service provider of data archive services is
Amazon and their Glacier product <sup><a id="fnr.11" name="fnr.11" class="footref" href="#fn.11">11</a></sup>
<ul class="org-ul">
<li>archives will be initiated by the researcher via a web interface and
the data to be archived will be from a backup set, not from the
working set; this allows for stability of the data over the
potentially long duration of the archiving process<sup><a id="fnr.12" name="fnr.12" class="footref" href="#fn.12">12</a></sup>
</li>
</ul>
</li>
<li>restoring data from archive will be initiated by the researcher via a web
interface and the data to be restored will be restored to active storage
(the NFS tier); before the restore begins an allocation adequate to hold
the restored data must be acquired. The restore will be stopped if the
space is filled before the restore is complete
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-3-3-2" class="outline-4">
<h4 id="sec-3-3-2"><span class="section-number-4">3.3.2</span> Service Details</h4>
<div class="outline-text-4" id="text-3-3-2">
<p>
The business details of the archives are:
</p>
<ul class="org-ul">
<li>as part of the storage allocation, each project will receive tokens for
archiving and restoring data
</li>
<li>U-M will be able to assign archives (or parts of archives) to individuals
outside of U-M or will have a federation model so each researcher has an
identity at the cloud-based archive provider so they can restore their
data outside of the U-M environment by paying the cloud service provider
directly<sup><a id="fnr.12.100" name="fnr.12.100" class="footref" href="#fn.12">12</a></sup>
</li>
</ul>
</div>
</div>
</div>
</div>
<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> Possible Scenarios</h2>
<div class="outline-text-2" id="text-4">
<p>
Following are some scenarios that illustrate the concepts above integrated
into contrived but hopefully representative examples.
</p>
</div>
<div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1"><span class="section-number-3">4.1</span> Augmented base storage allocations</h3>
<div class="outline-text-3" id="text-4-1">
<p>
The College of Engineering has decided to make an allocation of 100GB an
entitlement to all of the researchers in the College. There are many research
groups in the College who require more than that to support their work, and
they are expected to augment their base allocation to suit their needs with
their own funds.
</p>
<p>
Dr. Smith has a research scientist and six graduate students and usually
manages two or three grants, although sometimes there are additional projects.
</p>
<p>
Her base storage allocation of 100GB provided by the College is held in a
project called <code>jmsmith_rs</code> (``Dr. J.M. Smith’s research storage’’) which is
accessible by her six graduate students, her research scientist, and herself,
each with their own directories and some shared directories for collaboration.
She and her graduate students connect to this storage from their laptops and
workstations and it is also available on Flux and in computers in the U-M 3-D
Lab.
</p>
<p>
Dr. Smith makes a request for an additional allocation for that project of
400GB for 4 years, bringing the total disk space available in <code>jmsmith_rs</code> to
1/2PB. She does this knowing that she can reduce the amount of space at any
of the 6-month billing periods if she needs to, but by making it for 4 years,
she avoids the risk of forgetting to renew it.
</p>
<p>
In addition to her main storage project, she is also working on a project
called <i>NexMent</i> with her research scientist and two of her own graduate
students and two graduate students from the Physics department in LSA. That
research project has its own funding source and storage requirements, so she
requests another project to provide storage to herself and the five other
people involved and pays for it ith the grant money for that project. She now
has another project, called <code>jmsmith1_rs</code> with a similar internal structure to
her main project, but a different set of people who are using it.
</p>
<p>
When the <i>NexMent</i> project ends Dr. Smith initiates the process to archive the
data from it and ends the storage allocation. The long-term preservation of
the data, although slow to retrieve, fulfills a portion of the required data
management plan associated with the grant that funded the project and there
are no additional costs assigned to that grant.
</p>
<p>
As her career follows the typically successful arc of U-M faculty, her
resource requirements wax and wane and she can control her costs and adjust
her resources to follow that, all while knowing that her data is stored on
professionally managed, high-quality infrastructure.
</p>
</div>
</div>
<div id="outline-container-sec-4-2" class="outline-3">
<h3 id="sec-4-2"><span class="section-number-3">4.2</span> Researcher with intermediate funding</h3>
<div class="outline-text-3" id="text-4-2">
<p>
Dr. Jones’ research in the ethnography of music is very data intensive—tape
recorders in the field have been replaced with high-fidelity digital recording
devices—but the funding for data intensive work in the world of music is
sparse at best.
</p>
<p>
Dr. Jones has two main components to his research: field work collecting
samples of music before the limitless arm of iTunes reaches everyone; and
analysis, cataloging, and reporting on what he has collected. The funding for
these two activities can come together or each one can be funded individually.
</p>
<p>
Dr. Jones uses the U-M Research Storage Environment for both aspects of his
work.
</p>
<p>
After returning from the field with many gigabytes of audio on his digital
recorders, he copies it to the working storage where he knows it is backed up
and can be quickly accessed. He is able to search and play and work with his
data directly from the working storage without having to hold it all on his
laptop (which is good, because the 256GB SSD in his MacBook Air couldn’t hold
all of it). When the relatively short-term field-work grant ends, he archives
all of the data from the most recent back-up of his working data and works on
writing papers and more grants, knowing his data is safely archived and can be
re-called when he needs (and can afford) to have it.
</p>
<p>
When Dr. Jones is funded to analyze a particular sub-genre of music that he
has recorded on dozens of field trips over the years, he restores those from
the archives to active storage and is able to look at them as a unit. For
him, the ability to keep all of his data for a very long time allows for a
kind of research that would be impossible if he had to decide what to keep as
the end of a grant.
</p>
<p>
The ability to ``warehouse’’ data at a low cost for long periods of time
allows Dr. Jones to use storage as a service instead of risking his data on
low-cost, low-performance, low-quality, or low-all-three hardware. The
ability to work on data over a high-speed connection allows Dr. Jones to save
time transferring data to his laptop and also allows him to work on much
bigger data sets than he could on his laptop, while still offering excellent
performance for his audio analysis tools.
</p>
</div>
</div>
<div id="outline-container-sec-4-3" class="outline-3">
<h3 id="sec-4-3"><span class="section-number-3">4.3</span> Researcher who leaves U-M</h3>
<div class="outline-text-3" id="text-4-3">
<p>
Dr. Robbins has been at the University of Michigan for twelve years, but has
decided to move to Minnesota to be closer to his parents. He has amassed data
in support of his research into disease transmission over the years that he
will need in his new job. Unfortunately, Minnesota State University does not
have a Research Storage Environment like U-M’s, but they will fund the
purchase of several USB harddrives for him.
</p>
<p>
Before Dr. Robbins leaves U-M, he makes his last archive from his the backups
of his active storage and prints the web page with the instructions on
restoring archived data in a non-U-M environment.
</p>
<p>
When he arrives at Minnesota State University, he attaches his USB drives, and
follows the instructions on restoring data from an archive, which include him
paying the archive restoration and transfer fees from his funding in Minesota,
so U-M does not incur any cost for this, although U-M will continue to
maintain his data in the archive for 10 years after the last piece of data was
added to the archive, so when Dr. Robbins’ USB drives fail, he can pay for
another restore and transfer.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5"><span class="section-number-2">5</span> Path to the future</h2>
<div class="outline-text-2" id="text-5">
<p>
This Research Storage Environment positions U-M well to be as efficient as
possible in its support of research IT.
</p>
<ul class="org-ul">
<li>By enabling researchers to use services for their research computing
needs, U-M is positioned to either aggregate demand to one supply and
enjoy economies of scale on campus or use off-campus alternatives at lower
costs.
</li>
<li>Having an archive option of any sort, even if it is a ``data graveyard’’,
is an option that has not been available to researchers at U-M and has the
potential to change what types of research can be done.
</li>
<li>Having an archive option will support the option of a curated archive, and
because we would be using the same technology for both types of archive,
the cost would be lowered for both.
</li>
<li>As more and more workload moves to off-site cloud providers, we can enable
caching of data near the compute resources to ensure the data is available
where it is needed<sup><a id="fnr.13" name="fnr.13" class="footref" href="#fn.13">13</a></sup>, even if it is needed in two very distant
locations at once; when the balance of the workload shifts to off-campus,
we can start using cloud providers for the high-speed (in this example,
NFSv4) storage, and put the smaller caches on campus for local access.
</li>
</ul>
<p>
As mentioned, aggregation and abstraction are the key components of this
service from an administrative perspective, as they allow for cost management,
economies of scale, and vendor optimization. At the same time, performance,
security, and data protection are key components of this service from a
researchers’ perspective.
</p>
</div>
</div>
<div id="outline-container-sec-6" class="outline-2">
<h2 id="sec-6"><span class="section-number-2">6</span> Interaction with other on-campus storage services</h2>
<div class="outline-text-2" id="text-6">
<p>
The service proposed here is one of many different storage options available
to researchers at the University of Michigan, and interaction with all of
those is an important part of this service. In general, this is designed to
be fast enough, large enough, and scalable to that it should present a
reasonable interface to other options.
</p>
</div>
<div id="outline-container-sec-6-1" class="outline-3">
<h3 id="sec-6-1"><span class="section-number-3">6.1</span> Scratch storage on Flux</h3>
<div class="outline-text-3" id="text-6-1">
<p>
Scratch storage on Flux is based on the Lustre parallel file
system<sup><a id="fnr.14" name="fnr.14" class="footref" href="#fn.14">14</a></sup>. Lustre is tightly integrated with Flux and is not
presented to hosts that are not managed by the Flux operators.
</p>
<p>
This level of integration is important to maintain the performance and
security of the file system. In addition, Lustre is only supported on
Linux—there are no Mac or Windows clients.
</p>
<p>
The research working storage service proposed here will provide a location for
long-term storage of large inputs or outputs that are best stored on Lustre
while the related computational jobs are running or are staged to run.
</p>
<p>
Because the Lustre implementation on Flux is a very high-speed (40-80Gb/s) and
very high-capacity (more than 600TB) filesystem, it has the ability to ingest,
store, and output large quantities of data, so a long-term storage location
for that data should be as fast as can be afforded, so that researchers don’t
spend any longer than necessary moving their data.
</p>
<p>
The research working storage service proposed here is a good complement to
Flux’s Lustre installation.
</p>
</div>
</div>
<div id="outline-container-sec-6-2" class="outline-3">
<h3 id="sec-6-2"><span class="section-number-3">6.2</span> ITS Value storage</h3>
<div class="outline-text-3" id="text-6-2">
<p>
The NFS service offered by ITS called Value Storage<sup><a id="fnr.15" name="fnr.15" class="footref" href="#fn.15">15</a></sup> is based
on NFSv3 and is available to anyone on campus. It was built as a low-cost,
reliable NFS service. It was not built specifically for high speed. Value
Storage includes an option to mirror data between two locations and the mirror
is updated daily and there are snapshots of data on disk. Backups are not
included but are offered via ITS’ TSM service.
</p>
<p>
As we develop the components of the research working storage service, several
may be suitable for integration with Value Storage.
</p>
<p>
For researchers who don’t require the level of performance provided by the
research working storage service proposed here, Value Storage offers good
alternative.
</p>
</div>
</div>
<div id="outline-container-sec-6-3" class="outline-3">
<h3 id="sec-6-3"><span class="section-number-3">6.3</span> Department or Lab storage</h3>
<div class="outline-text-3" id="text-6-3">
<p>
Many departments and research laboratories provision local storage and present
that to clients via NFSv3 (for Linux or Mac clients) or CIFS (for Windows or
Mac clients). Most of these storage services are small in capacity (less than
50TB) and low performance relative to the proposed research working storage
service.
</p>
<p>
The advantage offered by local storage services is that they are usually a
one-time cost that can be attributed to a grant as hardware. The
disadvantages are that they are often not operated by people with operational
experience in storage and that puts the data stored on these systems at some
risk; these systems typically provide slow access to data because of their
combination of networking (usually 1Gbps) and the number of disks in the
system (usually less than 12); and these systems are often not expandable
beyond a few tens of terabytes.
</p>
<p>
We expect that the combination of Value Storage, the proposed research working
storage service and its backup and archival components, IT Rationalization
with respect to staff, and the increasing requirements for long-term data
management will lead to fewer and fewer departments or research laboratories
providing local storage.
</p>
</div>
</div>
<div id="outline-container-sec-6-4" class="outline-3">
<h3 id="sec-6-4"><span class="section-number-3">6.4</span> Unstructured or Big Data storage</h3>
<div class="outline-text-3" id="text-6-4">
<p>
Much of the data at U-M that would fall under the new umbrella of ``big data’’
or ``unstructured data’’ (as opposed to relational data that is typically
stored in relational database management systems like Oracle, MySQL, etc.) is
currently stored where it is processed. In some cases this is in a Hadoop
cluster, in other cases is it NoSQL systems and in other cases it is flat
files.
</p>
<p>
The research working storage service will have the performance and capacity to
ingest, store, and archive data from these systems as the current data is no
longer needed but the space on the analysis platform is needed for the next
research project.
</p>
<p>
As a data management support system, the research working storage service is
an excellent complement to existing and future big data clusters.
</p>
</div>
</div>
<div id="outline-container-sec-6-5" class="outline-3">
<h3 id="sec-6-5"><span class="section-number-3">6.5</span> ITS TSM product</h3>
<div class="outline-text-3" id="text-6-5">
<p>
The ITS TSM product<sup><a id="fnr.16" name="fnr.16" class="footref" href="#fn.16">16</a></sup> offers tape backups of data from many sources, and
maintains two copies in separate geographic locations. This service has
historically been viewed as expensive, which it is, and a bad value, which,
for the right data, it is not. However, there is a class of data on campus
for which ITS’ TSM product is too richly featured and thus too expensive. The
backups included in this research storage proposal are a very local,
highly-integrated part of the service, and will not be offered as a generic
backup service separate from the research working storage service. There will
also be integration between the backups associated with the research working
storage service and the archive, which is likely not appropriate for the TSM
service.
</p>
<p>
In addition, we expect to make archive copies of data from backups, which is
not supported in TSM today.
</p>
</div>
</div>
<div id="outline-container-sec-6-6" class="outline-3">
<h3 id="sec-6-6"><span class="section-number-3">6.6</span> Web-based Data Sharing and Collaboration</h3>
<div class="outline-text-3" id="text-6-6">
<p>
In the College of Engineering researchers have expressed interest to us in a
web-based method of sharing data and collaborating with other researchers
(especially those from other institutions for whom getting U-M credentials is
inconvenient). The characteristics of this web-based data sharing, as we
understand them, are around all control of the service being held by the
researcher, including hardware and software selection (Windows, Linux, or
MacOS; a forum, a wiki, a file upload/download service), maintainence of the
access lists, data policies, and presentation.
</p>
<p>
The research working storage service described here would be suitable as the
backing storage for a service like this:
</p>
<ul class="org-ul">
<li>the performance of the storage would be sufficient to serve web-based
requests
</li>
<li>snapshots and backups would offer some insurance against mistakes that
would result in data loss were there only one copy
</li>
<li>the ability to archive the data at the end of the project without moving it
would be nice
</li>
<li>the ability to have multiple, segregated storage areas (or “projects”) will
help with data management
</li>
</ul>
<p>
While the option of a Mac Mini, a Drobo and a CrashPlan subscription is likely
to be less expensive than a service like this, the features offered by this
service may be worthwhile from the perspective of data security and external
data management requirements.
</p>
</div>
</div>
</div>
<div id="outline-container-sec-7" class="outline-2">
<h2 id="sec-7"><span class="section-number-2">7</span> Costs</h2>
<div class="outline-text-2" id="text-7">
<p>
For now, this is just the dumping ground of all of the places I mention
costs<sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup> elsewhere in the document, other than those in the Scenarios
section.
</p>
<ul class="org-ul">
<li>the storage will be sold in units of Quantity per Time, where Quantity and
Time will both vary as the technology, costs, and business operations
change over time; today this will be 50GB of storage for 6 months
</li>
<li>there is no separate cost for the backups, they are integrated into the
research working storage service
</li>
<li>These archives are intended to be a one-time cost for securely storing
data to minimize the costs of active storage allocations. Using the
archive service described here, the costs for active storage can be
minimized to zero and there are no on-going costs for the data kept in the
archive, only costs for storage and retreival.
</li>
<li>behind the scenes the web-based archiving tools will be a set of web
services applications that will query the backup system and the archive
system, presenting options and costs via a web page where the researcher
(or other data manager) can initiate an archive, check on the progress of
an in-progress archive, and view statistics about completed archives
</li>
<li>behind the scenes the web-based archive restore tools will be a set of web
services applications that will query the archive system and active
storage system, presenting options and costs via a web page where the
researcher (or other data manager) can initiate a restore, check on the
progress of an in-progress restore, and view statistics about completed
restores
</li>
<li>if there are real cost differences between sending data to the archive and
restoring data from the archive, that will be reflected in the number of
tokens required for each action. Each 6 month, 50GB
allocation<sup><a id="fnr.5.100" name="fnr.5.100" class="footref" href="#fn.5">5</a></sup><sup>, </sup><sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup> will include enough tokens to archive
50GB two times and restore it once
</li>
<li>archives will be kept for 10 years at no cost to the researcher<sup><a id="fnr.2.100" name="fnr.2.100" class="footref" href="#fn.2">2</a></sup>
</li>
<li>While the costs aren’t yet firm and we haven’t surveyed potential
subscribers to the service, if we don’t think the service cannot be
financially sustainable without subsidies we will investigate other
options for storage appropriate for researchers.
</li>
<li>Because the backups associated with the research working storage service
are so constrained (a single client, no campus-wide networking), they
should be less expensive than TSM or any other option. In addition, we
need some access to the backups to support the user-driven archives.
</li>
</ul>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> <p class="footpara">
Once we have an understanding of the costs associated with
the service, we will survey faculty members about their ability and interest
in paying for the service and discuss cost-sharing options with Colleges.
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> <p class="footpara">
This could also be reflected in a section about costs.
</p></div>
<div class="footdef"><sup><a id="fn.3" name="fn.3" class="footnum" href="#fnr.3">3</a></sup> <p class="footpara">
This is the current speed of the InfiniBand network on
Flux.
</p></div>
<div class="footdef"><sup><a id="fn.4" name="fn.4" class="footnum" href="#fnr.4">4</a></sup> <p class="footpara">
We are aware that Windows protocols may be useful, and we
will try to choose a product that can support current Windows file system
protocols (SMB, CIFS, etc.)
</p></div>
<div class="footdef"><sup><a id="fn.5" name="fn.5" class="footnum" href="#fnr.5">5</a></sup> <p class="footpara">
This quantity and time combination is a policy decision that can
be refined based on costs, market analysis, and specifics of the service.
</p></div>
<div class="footdef"><sup><a id="fn.6" name="fn.6" class="footnum" href="#fnr.6">6</a></sup> <p class="footpara">
The expiration rule is a policy decision that can be refined
based on the needs of the community of subscribers or other information and
requirements.
</p></div>
<div class="footdef"><sup><a id="fn.7" name="fn.7" class="footnum" href="#fnr.7">7</a></sup> <p class="footpara">
The backup retention policy will be determined by
requirements and costs; the example here is just one option.
</p></div>
<div class="footdef"><sup><a id="fn.8" name="fn.8" class="footnum" href="#fnr.8">8</a></sup> <p class="footpara">
The cost and performance profiles for a general purpose
back-up system differs from those of a single purpose back-up system; as we
understand costs we will evaluate them against the costs of the available
options.
</p></div>
<div class="footdef"><sup><a id="fn.9" name="fn.9" class="footnum" href="#fnr.9">9</a></sup> <p class="footpara">
There are no on-campus archive options, so a cloud-based
option is likely the best option.
</p></div>
<div class="footdef"><sup><a id="fn.10" name="fn.10" class="footnum" href="#fnr.10">10</a></sup> <p class="footpara">
The U-M library could use this service as the technical
component or back-end to an archiving service they offer.
</p></div>
<div class="footdef"><sup><a id="fn.11" name="fn.11" class="footnum" href="#fnr.11">11</a></sup> <p class="footpara">
<a href="http://aws.amazon.com/glacier">http://aws.amazon.com/glacier</a> Amazon Glacier is an extremely
low-cost storage service that provides secure and durable storage for data
archiving and backup. In order to keep costs low, Amazon Glacier is optimized
for data that is infrequently accessed and for which retrieval times of
several hours are suitable. With Amazon Glacier, customers can reliably store
large or small amounts of data for as little as $0.01 per gigabyte per month.
</p></div>
<div class="footdef"><sup><a id="fn.12" name="fn.12" class="footnum" href="#fnr.12">12</a></sup> <p class="footpara">
This requirement informs either the selection process for a
cloud/archive service or the level of staffing for a locally-developed
solution.
</p></div>
<div class="footdef"><sup><a id="fn.13" name="fn.13" class="footnum" href="#fnr.13">13</a></sup> <p class="footpara">
Technically, this will be done with sophisticated NFS (or other
storage protocol) caching appliances or software.
</p></div>
<div class="footdef"><sup><a id="fn.14" name="fn.14" class="footnum" href="#fnr.14">14</a></sup> <p class="footpara">
<a href="http://www.lustre.org">http://www.lustre.org</a> Lustre is a parallel distributed file
system, generally used for large scale cluster computing. Lustre file systems
are scalable and can support tens of thousands of client systems, tens of
petabytes of storage, and hundreds of gigabytes per second of aggregate I/O
throughput.
</p></div>
<div class="footdef"><sup><a id="fn.15" name="fn.15" class="footnum" href="#fnr.15">15</a></sup> <p class="footpara">
<a href="http://www.itcs.umich.edu/storage/value">http://www.itcs.umich.edu/storage/value</a> Value storage is
designed to provide a cost-effective way for University researchers (and
others with large storage needs) to store large amounts of data in a
centralized location. Disk space can be purchased in terabyte increments.
</p></div>
<div class="footdef"><sup><a id="fn.16" name="fn.16" class="footnum" href="#fnr.16">16</a></sup> <p class="footpara">
<a href="http://www.itcs.umich.edu/tsm">http://www.itcs.umich.edu/tsm</a> The Tivoli Storage Manager (TSM) service
provides networked backup of data on server-level machines (such as
application and file servers, server-side databases, and research data
collections).
</p></div>
</div>
</div>