Event Density Plot ... I wonder what this is really called ...
I want to visualize how many concurrent events exist in a time period along with how frequently they start and end. I don’t need to read numbers off the visualization, I just want to get a relative sense of how many events are starting, ongoing, and ending over a time period with some resolution. Something that looks like this:
Looking at the plot, you can immediately see when:
- the most events were starting (about in the middle of the time range)
- the most events were happening (about in the first third of the time range)
- the most events were ending (about at the end of the first third of the time range).
With that information the reader can ask the next questions in more useful ways:
- “why did we stop starting events about half way through the time range?”
- “why did we stop so many events after the first third of the time range?”
- “why was nothing at all happening for the last 5–10% of the time range?”
Those questions aren’t about the data directly, but about the application of the data, which is what data are for (despite people loving it for its own sake sometimes) and they aren’t obvious from the input data (Table 1).
Practice Data
To start, I create some fake data with this Python script where all time is between 1 and 100, there are 20 events, and the longest event duration is 30. If it helps you can think of these numbers as seconds after 4:15am on Thursday, June 16th, 2016. Or days after January 1st, 2000. It doesn’t matter.
import random from tabulate import tabulate data = [] for m in range(1,20): start = random.randint(1,70) end = start + random.randint(1,30) data.append((start,end)) data.sort() print tabulate(data, tablefmt="orgtbl", headers=(["Start","End"]))
Start | End |
---|---|
6 | 11 |
7 | 27 |
8 | 35 |
10 | 11 |
13 | 37 |
14 | 35 |
22 | 34 |
24 | 36 |
28 | 51 |
31 | 59 |
33 | 34 |
36 | 47 |
36 | 58 |
42 | 51 |
42 | 51 |
44 | 66 |
53 | 74 |
69 | 95 |
69 | 96 |
Organizing the Data
The next step is to see how many events are active, starting, and ending at each time over all time (1–100 in our case).
This next bit of Python simply bins the data from the table above into our 100 example time bins, which I won’t make you read through, but you’ll need to bin your data in a similar way. The format of the data is:
Time | Number of Events | Number of Events | Number of Events |
Ending at this time | Ongoing at this time | Starting at this time |
For example, if the frequency of your events is a few every minute, your binned data might look like:
Time | Ending | Ongoing | Starting |
---|---|---|---|
13:50 | 4 | 10 | 3 |
13:51 | 2 | 11 | 1 |
13:52 | 0 | 12 | 4 |
13:53 | 8 | 8 | 2 |
13:54 | 1 | 9 | 4 |
although, since there is no data displayed for the x-axis (the time), it is a lot easier to convert the time into relative time. In this example, the times could be 49800, 49860, 49920, etc. Or if you have a date, using the Unix epoch time (seconds since 00:00:00 UTC 1 January 1970) makes things easy.
timebin = dict() startbin = dict() endbin = dict() for timeincr in range(1, 101): timebin[timeincr] = 0 startbin[timeincr] = 0 endbin[timeincr] = 0 for s, e in timedata: if s == timeincr: startbin[timeincr] += 1 if e == timeincr: endbin[timeincr] += 1 if s <= timeincr and e >= timeincr: if timeincr in timebin: timebin[timeincr] += 1 for m in sorted(timebin): print "|{} | {} | {} | {}".format(m, endbin[m], timebin[m], startbin[m])
Plotting the Density of the Bins
Once we have our bins, then it’s a matter of makeing a density plot over time for each of the three events (starting, ongoing, and ending).
import matplotlib.pyplot as plt def makebarplot(bins): time = [b[0] for b in bins] # extract the x-axis data fig = plt.figure() # get the matplotlib plot figure fig.set_size_inches(8, 1) # set the size of the plot ax = fig.add_subplot(1, 1, 1) # add a plot to the figure; Subplot # is confusing, though. The magical "(1, 1, 1)" here means there # will be one row, one column, and we are working with plot number # 1, all of which is the same as just one plot. There is a little # more documentation on this at: # http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot fig.patch.set_visible(False) # make the background transparent # turn off the borders (called spines) ax.spines['top'].set_visible(False) ax.spines['bottom'].set_visible(False) ax.spines['right'].set_visible(False) ax.spines['left'].set_visible(False) # set all of the ticks to 0 length ax.tick_params(axis=u'both', which=u'both',length=0) # hide everything about the x-axis ax.axes.get_xaxis().set_visible(False) barwidth = 1 # remove gaps between bars color = ["red", "blue", "green"] # set the colors for for row in range(1, len(color)+1): # make as many rows as colors # extract the correct column ongoing = [b[row] for b in bins] # scale the data to the maximum ongoing = [c/float(max(ongoing)) for c in ongoing] # draw a black line at the left end left = 10 border_width = 20 d = border_width ax.barh(row, d, barwidth, color="black", left=left, edgecolor="none", linewidth=0) left += d # fill in the horizontal bar with the right color density # (alpha) for d, c in zip(time, ongoing): ax.barh(row, d, barwidth, alpha=0.9*c+.01, color=color[row-1], left=left, edgecolor="none", linewidth=0) left += d # draw a black line at the right end d = border_width ax.barh(row, d, barwidth, color="black", left=left, edgecolor="none", linewidth=0) # label the rows plt.yticks([1.5, 2.5, 3.5], ['stopping', 'ongoing', 'starting'], size=10) # return the plot to __main__ return plt # do some housekeeping that makes it all go in OrgMode (and hence PDF # and HTML) if __name__ == "__main__": plt = makebarplot(bins) # The file extension controls the output format; .png and .pdf are # good choices along with .svg filename="edplot.svg" plt.savefig(filename) return filename
And now you can see the number of starting events in green, the number of ongoing events in blue, and the number of ending events in red. The darker the color, the more events of that type are happening at that time, hence the name, event density plot.
The Future
This could pretty readily be a Python class, and may be that
someday, but for now the makebarplot
function is sufficient and
hopefully easy to understand and translate to the language of your
choice.
I would also like to include more examples, but thought that would be as likely to add confusion as clarity.