Dotplots with Matplotlib because I couldn't find this elsewhere
I like the dotplots that R + ggplot2 can make. There are lots of examples of this on the Internet. At least one is at r-bloggers, but Python is useful for many reasons, so I want to make a decent looking, chartjunk-free dotplot using matplotlib.
Dotplots are a much better choice than pie charts for representing most data and can take the place of most bar charts and present a much cleaner looking graphic. Most bar charts do not use the width or area of the bar to represent anything, so the size of the bar is, at best, chart junk, or, at worst, misleading.
Making dotplots using Python and matplotlib is not well documented that I could find, so I figured it out myself with the help of many Google results.
Some sample data is in Table 1. If columns for your data are
flipped, change the point arrays (p
) in the code that are indexed to
[0]
to [1]
and vice versa.
Count | Type |
---|---|
10 | apple |
7 | pear |
2 | avocado |
8 | orange |
4 | peach |
The following code will produce the plot above. Hopefully the comments will help.
import matplotlib.pyplot as plt d = fruit # sort the data d = sorted(d, reverse=False) # Get the plot aspect right for thinner bars that aren't too spread out fig, ax = plt.subplots(figsize=(12,2.5)) # Create the bars # The parameters are: # - the number of bars for the y-axis # - the values from the first column of data # - the width of the bars out to the points # - color = the color of the bars # - edgecolor = the color of the bars' borders # - alpha = the transparency of the bars bars = ax.barh(range(len(d)), [p[0] for p in d], 0.001, color="lightgray", edgecolor="lightgray", alpha=0.4) # Create the points using normal x-y scatter coordinates # The parameters are: # - the x values from the first column of the data # - the y values, which are just the indices of the data # - the size of the points points = ax.scatter([p[0] for p in d], range(len(d)), s=30) # Create the ytic locations centered on the bars yticloc = [] [yticloc.append(bar.get_y() + bar.get_height()/2.) for bar in bars] # Turn off all of the borders ax.spines['top'].set_visible(False) ax.spines['bottom'].set_visible(False) ax.spines['right'].set_visible(False) ax.spines['left'].set_visible(False) # set all of the ticks to 0 length ax.tick_params(axis=u'both', which=u'both',length=0) # set the tic locations and labels ax.set_yticks(yticloc) ax.set_yticklabels([p[1] for p in d]) # set the x- and y-axis limits a little bigger so things look nice ax.set_xlim([0,max([p[0] for p in d])+1.1]) ax.set_ylim([-0.7,len(d)]) # Turn on the X (vertical) gridlines ax.xaxis.grid(True) # Re-wrap the figure so everything fits plt.tight_layout(True) # Save the figure filename = "hbarplot.svg" plt.savefig(filename) # this is for org-mode, in general it produces a Python error return filename