1 Preface

When done right, graphs can be appealing, informative, and of considerable value to an academic article. Unfortunately, researchers generally suck at making good graphs. We surmise that this is because researchers do not completely master their graphing software, and they are either too lazy or too busy to remedy the situation. Consequently, the produced graph is often a severe distortion of the ideal Platonian graph that the researcher had in mind initially.

This compendium facilitates the creation of good graphs by presenting a set of concrete examples, ranging from the trivial to the advanced. The graphs can all be reproduced and adjusted by copy-pasting code into the R console. A note for R fans: the majority of our plots have been created in base R, but you will encounter some examples in ggplot.

Almost every example in this compendium is driven by the same philosophy: A good graph is a simple graph, in the Einsteinian sense that a graph should be made as simple as possible, but not simpler.

We close with a request and a piece of advice. The request: if you create a clean graph in R that you believe is a candidate for inclusion in this compendium, please do not hesitate to contact us at EJ.Wagenmakers@gmail.com or quentingronau@web.de. Your contribution will be acknowledged explicitly, alongside the code you provided. The advice: when you create a clean graph in R, put it on Flickr (public license) before you sign away your copyright to a publisher. For an example, see Figure 1 from this paper.

This work has profited greatly from interactions with our colleagues, many of whom have contributed graphs of their own.

2 Introduction

Producing clean graphs can be a challenging task. First you have to consider what is the best way in which to convey the information: a line graph, a histogram, a multi-panel plot; such conceptual dilemma’s are not dealt with in this compendium, and instead we recommend the reader to the chapters on creating graphs in the excellent book by Briscoe (1996). Second, you have to use computer software to translate the conceptual graph to a publication-ready figure. This is the phase where this compendium may be useful, because it brings together R code for producing a set of clean, publication-ready figures. Hopefully this will make it easy to copy-paste and adjust the code to suit your own needs.

In our experience, many graphs can be dramatically improved by adhering to the following guidlines: (1) invest sufficient time and effort in the process; (2) omit needless graphical elements, that is, make every element count; (3) judge the relative impact of the graphical elements and ensure that they are in balance; (4) use large font sizes for all text; (5) deviate from the R default settings – with a little effort, you can do a lot better.

This compendium does not discuss figure headings. However, we will say that it is clearly desirable to have the main message of a figure be understood without being forced to read the main text. If possible, start your figure heading by stating what the figure is meant to demonstrate (i.e., its interpretation). For example, do not state “Popularity as a function of president height”; instead, state “Taller presidents are more popular”.

Finally, a note on color. Many graphs look better in color, but there are two complications. First, some academic journals do not publish manuscripts in color, at least not without charging a hefty price. Second, many readers and reviewers do not have a color printer. Below, some graphs have color, whereas others only use grey-scales. Of course this is one of the easiest things to adjust.

Based on this compendium, learning to create good graphs in R will be 80% copy-paste and 20% tinkering. Let’s go plot ourselves some graphs!

3 Correlations

Whenever a researcher reports a correlation, it is imperative to plot the data. Anscombe’s quartet (plotted below) is a famous demonstration of this fact.

3.1 The Electoral Advantage of Being Tall

This plot shows the relation between the height ratio of US presidents and the percentage of the popular vote. Note the large circles for the data, the thick line for the linear relation, and the large font size for the axis labels. Also, note that the line does not touch the y-axis (a subtlety that requires deviating from the default).

Show R-Code

4 Histograms

Histograms are relatively straightforward to create and to interpret. In fact, some people may even find them boring. Luckily, it is easy to increase the reader’s interest level by adding information to the plot. Below we illustrate various ways by which this may be accomplished.

4.1 Including “rug” Tick Marks

When in doubt, add tick marks that showcase the individual data points. This is particularly useful when the number of data points is small. The code below is courtesy of Helen Steingroever. Note that the rug tick marks are jittered.

Show R-Code

4.2 Including a Density Estimator

In R, it is easy to include a nonparametric density estimator. This requires that freq = FALSE in the histogram comment. Courtesy of Helen Steingroever.

Show R-Code

4.3 Including Numbers on Top

This example shows how to display the bar heights, using the function l_ply. Courtesy of Helen Steingroever and Quentin Gronau.

Show R-Code

5 Line Plots

The line plot is one of the most standard plots. Nevertheless, many researchers fail to realize that line plots deserve love and attention too.

5.1 Regular Line Plot

This graph plots error bars with a user-defined function. More to the point, the lines are thick, and they do not overlap with the symbols (type = "c"). Note that the legend is not needed; the legend text could simply have been positioned near the associated graphical elements.

Show R-Code

5.2 Box Plot

Similar to the above, this plot shows the distribuion of the data with a user-defined boxplot function.

Show R-Code

5.3 Violin Plot

By now this plot should look familiar. The distribution of the data is now indicated with a violin plot instead of a box plot. Courtesy of Henrik Singmann, who tweaked the results from the vioplot package. Warning: this a a lot of code.

Show R-Code

5.4 Combined Line and Bar Plot

In many psychological experiments, there are two dependent variables for each participant: mean response time (RT) and mean proportion of errors. This plot shows them both – RTs are on the left y-axis, and errors are on the right y-axis.

Show R-Code

6 Bar Plots

Like their histogram cousin, bar plots are intrinsically boring.

6.1 Including Error Bars

The title says it all. Note that the error bars are added with the l_ply function. Courtesy of Helen Steingroever and Quentin Gronau.

Show R-Code

7 Densities

Densities are ubiquitous, particularly for those who have a predeliction for Bayesian inference. As for the histogram and the bar plot, it is generally a good idea to add more information to the bare-bones plot.

7.1 Standard

This is a relatively standard plot. Note the thickness of the lines and the font size for the axis labels.

Show R-Code

7.2 With a Histogram on Top

This plot adds a histogram to the density plot, but without needlessly displaying the vertical histogram lines as well. In addition, the code defines the extent to which the lines are transparent, so that both the density and the histogram remain visible, and one does not completely block the other from view.

Show R-Code

7.3 Including Text

This plot adds text to the plot. Although this is generally trivial, this particular example contains a mathematical symbol that is tricky to display properly (unless, of course, you know how it works).

Show R-Code

7.4 Another Example

This is another example, featuring a nice Greek letter. Seriously, what is important here is that the labels are positioned next to the associated graphical element. This approach is more direct than creating a legend, when the reader has to decode the legend first, keep the symbols in working memory, and then turn attention to the graph itself. Bottom line: only use legends when you have to. Even then, you may find that the legend box almost never fulfills a useful function, and can safely be omitted.

Show R-Code

7.5 Highlighting Specific Areas

It is cool to be able to highlight specific parts of a density by some color coding scheme. In this example, Ravi Selker shows how that can be done (hint: it’s the polygon function).

Show R-Code

7.6 More Highlighting of Specific Areas

Mijke Rhemtulla also likes to highlight specific parts of a density. This is the first plot in a series, taken from one of Mijke’s stats courses.

Show R-Code

7.7 Still More Highlighting

Part 2…

Show R-Code

7.8 Density Ratios

Part 3…

Show R-Code