Interactive World Ocean Atlas

This quarter I’ve been taking Jeff Heer’s data visualization class at UW. For our third assignment we had to create an interactive visualization. I partnered up with fellow oceanographer Michelle Weirathmueller to visualize data from the World Ocean Atlas.

Click here to see the final product and click here to  read about our development process on github.


The human microbiome represented with bubble charts

The Human Microbiome Project was an NIH-sponsored initiative to characterize the healthy and diseased microbiome from the human mouth, gut, lung/nose, skin, and vagina. Metagenomic, whole genome, and 16S rRNA sequencing were used to generate taxonomic and functional data.

Taxonomic data was generated by the Human Microbiome Project from 16S rRNA gene sequencing. Mothur software was used to cluster the sequences into operational taxonomic units (OTUs).

I used this microbial community data to create a series of animated bubble charts. Each chart shows an averaged community profile from the healthy human microbiome. Circles are labeled by genus and coloured by class. Circle size is a percentage of each OTU making up the community calculated from several samples.

Click here
Click image to view the charts

Source material:
The Human Microbiome Project
Convert CSV to JSON
Bubble charts with d3
Animated bubble charts


How to represent sets and access Twitter’s API with Python

I followed much of the Canadian election over Twitter because I live in Seattle. This got me wondering whether there is a large overlap in Twitter followers between the party leaders or if people just follow a single party.

The day after the election I used Tweepy to collect a list of follower IDs from @JustinTrudeau, @pmharper, @ThomasMulcair, @ElizabethMay, and @GillesDuceppe.

Sets are typically represented with Venn diagrams. The first option I came across was Ben Frederickson’s code for producing proportional Venn diagrams with D3. The result is really great! Unfortunately, it’s not really possible to represent five proportional overlapping sets with circles. You can see that overlaps between Duceppe, Mulcair, and May were excluded from this representation.

Click here!
Click this image to see interactive tooltip

All possible intersections can be represented with elliptical Venn diagrams. I produced this plot with the VennDiagram package in R. But the ellipses are not proportional.

Click here!

Ultimately, I like this chord diagram the best. This visualization was translated into D3 by Mike Bostock. While it doesn’t show all possible set intersections, it is easy to read and you can get a sense of the overlap between each leader’s followers. People do, however, get counted multiple times in this diagram if they are following more than two party leaders.

Click here!
Click this image to open graphic

Mulcair has a lot less followers than I expected considering he was the leader of the opposition for four years. Harper and Trudeau’s followers have remarkably similar affiliations with a large chunk that follow only Harper and/or Trudeau.

Source material:
Venn Diagrams with D3.js by Ben Frederickson
draw.quintuple.venn {VennDiagram} by inside-R
Chord Diagram by Mike Bostock


Integrating D3 and MySQL

Databases can be used to access much larger dynamic dataset and even record interactions with users. I used data from the Fatal Encounters project to create a choropleth map showing fatal encounters with the police.

Click here to open interactive graphic
Click the map to open graphic.

Police brutality and racial discrimination are some of the most important social issues in America right now. It has been pointed out by the Las Vegas Review-Journal in its series Deadly Force (Nov. 28, 2011) that

“The nation’s leading law enforcement agency [FBI] collects vast amounts of information on crime nationwide, but missing from this clearinghouse are statistics on where, how often, and under what circumstances police use deadly force. In fact, no one anywhere comprehensively tracks the most significant act police can do in the line of duty: take a life.”

The Fatal Encounters project aims to remedy this situation by compiling data on deaths occurring during encounters with the police. The general public can assist by submitting records through a form on the website and the data is freely available.

The issue of discrimination by the police is complicated by correlations with crime and poverty. I believe that there is an appetite by consumers of the media to explore these types of data for themselves. The choropleth map that I created allows the user to filter the data by gender, race, mental illness, cause of death, and official disposition of the police. The data can also be viewed as total deaths per state or deaths per million people to control for population size. Users can click on any count statistic to link to articles associated with that statistic.

The data set is still incomplete and there is some bias. For example, there are 84 deaths per million people in Nevada; that’s much higher than any other state. What’s going on? Are the police in Nevada particularly deadly? Is Nevada especially crime-ridden? More likely the project’s author, D. Brian Burghart an instructor at the University of Nevada, Reno has focused on collecting data from his own state. This bias will disappear as more data is collected. Visualizations and projects like this can inspire people to think critically about stories in the media, investigate their own stories, and participate in data collection.

Since this map has been created, a more polished visualization has been published by Fatal Encounters with Silk.

Source material:
Interactive Data Visualization for the Web by Scott Murray
Using a MySQL database as a source of data by D3noob
Graph data from a MySQL database in Python by modern data
Get Apache, MySQL, PHP and phpMyAdmin working on OSX 10.10 Yosemite, Coolest Guides on the Planet by Neil Gee


Learning D3

I started this project to learn D3. D3.js is a javascript library which can be used to manipulate documents with data. Here is the plot I created:

Click here to open
Click on the plot to view graphic.

In 1951 the graphic designer Will Burtin, published a plot to visualize the efficacy of 3 antibiotics on 16 different bacteria. Antibiotic efficacy is measured by the minimum concentration required to inhibit bacterial growth.

Will Burtin's plot

This plot was admired for its simplicity in comparing the effects of different antibiotics. But it does not clearly show how the bacteria group together in their response. Different visualizations can emphasize different aspects of the data. (See article in American Scientist).

I used Burtin’s data to create a simple bar plot that alternates between displays of different antibiotics. The plot more clearly shows that Gram positive bacteria are more resistant to streptomycin and neomycin while Gram negative bacteria are more resistant to penicillin. It also shows which bacteria do not follow the trend. Gram-staining uses violet and pink dyes to distinguish between different groups of bacteria by cell wall structure.

Source material:
Interactive Data Visualization for the Web by Scott Murray