24 June 2016

I like the dotplots that R + ggplot2 can make. There are lots of examples of this on the Internet. At least one is at r-bloggers, but Python is useful for many reasons, so I want to make a decent looking, chartjunk-free dotplot using matplotlib.

Dotplots are a much better choice than pie charts for representing most data and can take the place of most bar charts and present a much cleaner looking graphic. Most bar charts do not use the width or area of the bar to represent anything, so the size of the bar is, at best, chart junk, or, at worst, misleading.

Making dotplots using Python and matplotlib is not well documented that I could find, so I figured it out myself with the help of many Google results.

Some sample data is in Table 1. If columns for your data are flipped, change the point arrays (p) in the code that are indexed to [0] to [1] and vice versa.

Table 1: Fruit!
Count Type
10 apple
7 pear
2 avocado
8 orange
4 peach

Sorry, your browser does not support SVG.

The following code will produce the plot above. Hopefully the comments will help.

import matplotlib.pyplot as plt

d = fruit

# sort the data
d = sorted(d, reverse=False)

# Get the plot aspect right for thinner bars that aren't too spread out
fig, ax = plt.subplots(figsize=(12,2.5))

# Create the bars
# The parameters are:
#   - the number of bars for the y-axis
#   - the values from the first column of data
#   - the width of the bars out to the points
#   - color = the color of the bars
#   - edgecolor = the color of the bars' borders
#   - alpha = the transparency of the bars
bars = ax.barh(range(len(d)), [p[0] for p in d], 0.001,
                color="lightgray", edgecolor="lightgray", alpha=0.4)

# Create the points using normal x-y scatter coordinates
# The parameters are:
#   - the x values from the first column of the data
#   - the y values, which are just the indices of the data
#   - the size of the points
points = ax.scatter([p[0] for p in d], range(len(d)), s=30)

# Create the ytic locations centered on the bars
yticloc = []
[yticloc.append(bar.get_y() + bar.get_height()/2.) for bar in bars]

# Turn off all of the borders
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)

# set all of the ticks to 0 length
ax.tick_params(axis=u'both', which=u'both',length=0)

# set the tic locations and labels
ax.set_yticks(yticloc)
ax.set_yticklabels([p[1] for p in d])

# set the x- and y-axis limits a little bigger so things look nice
ax.set_xlim([0,max([p[0] for p in d])+1.1])
ax.set_ylim([-0.7,len(d)])

# Turn on the X (vertical) gridlines
ax.xaxis.grid(True)

# Re-wrap the figure so everything fits
plt.tight_layout(True)

# Save the figure
filename = "hbarplot.svg"
plt.savefig(filename)

# this is for org-mode, in general it produces a Python error
return filename