Bias in Data Visualization

Data is everywhere. It’s being consumed by every smart phone whenever we receive e-mails and notifications, but we’re also producing it faster than ever. Every Facebook post, every Instagram picture, and every geotagged location produces even more data, but what does all of this information mean? As part of a commitment to teach myself something new every month, I decided to jump into the world of data visualization.

Data Visualization

Data Visualization is an incredibly interesting field to me because it combines a lot of different fields and skills. I used Ben Fry’s Ph.D disseration on Computational Information Design to provide me with some background knowledge. In it, he describes data visualization as an intersection of:

  • Computer Science
  • Mathematics, Statistics, and Data Mining
  • Graphic Design
  • Information Visualization and Human Computer Interaction

Computer science is used to acquire (often large unstructured sets of) data, parse it and transform it into something useable. From there, it can be filtered based on specific data points and algorithms can be applied to discern patterns and provide mathematical or statistical context. At that point, the raw data is transformed into a simple representation that is understandable by humans, such as a bar graph or a pie chart. This is where graphic design and usability becomes especially important, as a good visualization needs to tell a story or answer an important question. One complication of data visualization is in how the story is told. There is always an amount of subjectivity in every visualization because you need to exclude some data to make the rest of it understandable. Is the excluded data truly irrelevant or does it obscure a trend? That’s what I’m going to explore.

Acquiring the Data

Before I started this project, I knew that I was going to use it as an excuse to learn D3.js and how to dynamically generate SVG, but I immediately found my first barrier. What am I going to visualize? Where am I going to get the data?

That’s an important question. My first thought was using US Census data, but there were already so many visualizations on that data. I was checking for publically available government data when I got the idea to do a visualization on government spending, specifically military spending. In recent years, there has been a lot of talk about the size of the US defense budget, which has led to decreased military spending, but I’m not here to talk about politics, so let’s jump straight into the data.

The first thing that I noticed about America’s staggering defense budget is this infographic:

This image shows that in 2011, the United States spent more on its defense budget than the next 13 countries combined. Whether it was intentional or not, this chart paints the United States of America as the premier warmonger in the entire world. Sure, it tells a story, but is it an honest one? Fortunately, the article points to the Stockholm International Peace Research Institute as the source. Once I acquired the data set from their website, I noticed that by presenting the defense budget as the only data point, it removed the correlation between military spending and the relative wealth of each country. Although the United States defense budget is staggering, I wanted to know what it looks like in relation to our gross domestic product.

In the spirit of open data, GDP information can be obtained directly from the:

Visualizing the Data

Once I had the data, this part was actually pretty easy, despite never using D3 before. Once I sat down to learn D3, I was a bit overwhelmed by the number of visualization options that I could do. If you’re curious, this D3 gallery provides numerous examples of what can be done with the library. I originally wanted to do something cartographic using publically available GeoJSON data, but for my first visualization project, I decided to stick to a bar chart:

Global Military Spending for 2013

Using the SIPRI data set for 2013, I present the top 15 countries with the highest military spending, which clearly shows the United States as the leader (spending $640B on the military budget). However, my visualization also factors in the nominal GDP of each country and then sorts their military budgets as a percentage of GDP. After this calculation is performed, the information is dynamically resorted and shows that the United States is no longer the biggest military spender. With the addition of this information, the United States no longer appears to be as warmongering as the earlier chart.

While visualizations are never purely objective, I’d like to think that this picture paints a more honest look at the data.

If you have comments on my visualization or find discrepancies with the data, please let me know so that I can fix it.

More Reading
Older// Web Components