Comparisons

Materials for class on Tuesday, October 3, 2017

Contents

Slides

Download the slides from today’s lecture.

First slide

Comparing things

Sparklines

Small multiples

Lollipops

Download Lord of the Rings data

First, we load the libraries we’ll need for all the examples. We’ll use tidyverse in both; we’ll use forcats in the lollipop chart and ggrepel in the slopegraph.

We load the CSV file into R and save it as a variable, or an object, named lotr. We use fct_inorder() inside a mutate command to make the Film variable an ordered factor so that the films plot in the correct order.

Next we summarize the data by race and gender. We can output the summarized table as a Markdown table with knitr::kable(). Remember that this syntax means that we’re using the kable() function inside the knitr package without actually loading it. Alternatively, you can run library(knitr) to load the package and then use kable() as needed without the double colon ::.

Race Gender total_words
Elf Female 1743
Elf Male 1994
Hobbit Female 16
Hobbit Male 8780
Man Female 669
Man Male 8043

We can plot this summarized data, mapping variables to different aesthetics in the plot. A few things to note:

We can also plot race and gender across the three films. Because we used fct_inorder() earlier, the Film variable/column is an ordered factor and the films should be in order. Here we use the original lotr data frame instead of the summarized one, since we want the Film column too. We facet by film with facet_wrap(~ Film).

Slopegraphs

Download General Conference “isms” data This data comes from BYU’s LDS General Conference Corpus. I created a list of "*ism" divided between the 1950s/60s and the 1990s/2000s, and then copied/pasted the results in to a CSV file.

First, we load the data. I already filtered and summarized and tidyified this data, based on the original data that looked liked this. You can see the R code I used to clean and tidy the data here.

word decade count permil
Baptism coldwar 596 191.7
Baptism today 609 207.8
Communism coldwar 216 69.5
Communism today 1 0.3
Criticism coldwar 75 24.1
Criticism today 51 17.4
Mormonism coldwar 135 43.4
Mormonism today 39 13.3
Socialism coldwar 82 26.4
Socialism today 0 0.0

Just plotting the data as-is gives us a rudimentary and ugly slopegraph. Note the group aesthetic—without it, the lines will not plot across the coldwar and today columns.

We can add a bunch of columns to the original isms data frame to help with plotting. Here’s what’s happening:

We can use this enhanced data frame to add labels and color specific lines. Here’s the complete, final plot. A few things to note:

Because we saved the plot to a variable (fancy_plot), we can do stuff with it like saving it to our computer with ggsave():

Finally, just for fun, we can use nicer fonts to make the graphic even nicer. We’ll use Roboto Condensed, which is free from Google Fonts.

With that, we can specify font families and font faces (bold, italic, plain, etc.) in geom_text_repel() and in theme():

We can save the plot with the custom fonts with ggsave() like always, but we have to use the Cairo graphics library to get the fonts to embed and to get a PNG with proper dimensions. Note the difference between the two—PDFs need device = cairo_pdf while PNGs need type = "cairo". It’s different and I don’t fully understand why but ¯\_(ツ)_/¯.

Bullet charts

Download fake performance data

Bullet charts are just a bunch of bar charts stacked on top of each with with extra dots and lines. First we load a data frame I copied from Stephanie Evergreen’s book, and we make sure the region variable—here named measure—is an ordered factor, and we reverse it with fct_rev() because coord_flip() does weird stuff to the ordering.

measure bad satisfactory good target value
Region A 33.3 66.6 100 75 70
Region B 33.3 66.6 100 65 72
Region C 33.3 66.6 100 70 78
Region D 33.3 66.6 100 65 71

With the data in this form, it’s easy to plot. Here are a couple things to note:

Feedback for today

Go to this form and answer these three questions (anonymously if you want):

  1. What new thing did you learn today?
  2. What was the most unclear thing about today?
  3. What was the most exciting thing you learned today?