When done right, graphs can be appealing, informative, and of considerable value to an academic article. Unfortunately, researchers generally suck at making good graphs. We surmise that this is because researchers do not completely master their graphing software, and they are either too lazy or too busy to remedy the situation. Consequently, the produced graph is often a severe distortion of the ideal Platonian graph that the researcher had in mind initially.
This compendium facilitates the creation of good graphs by presenting a set of concrete examples, ranging from the trivial to the advanced. The graphs can all be reproduced and adjusted by copy-pasting code into the R console. A note for R fans: the majority of our plots have been created in base R, but you will encounter some examples in ggplot.
Almost every example in this compendium is driven by the same philosophy: A good graph is a simple graph, in the Einsteinian sense that a graph should be made as simple as possible, but not simpler.
We close with a request and a piece of advice. The request: if you create a clean graph in R that you believe is a candidate for inclusion in this compendium, please do not hesitate to contact us at EJ.Wagenmakers@gmail.com or firstname.lastname@example.org. Your contribution will be acknowledged explicitly, alongside the code you provided. The advice: when you create a clean graph in R, put it on Flickr (public license) before you sign away your copyright to a publisher. For an example, see Figure 1 from this paper.
This work has profited greatly from interactions with our colleagues, many of whom have contributed graphs of their own.
Producing clean graphs can be a challenging task. First you have to consider what is the best way in which to convey the information: a line graph, a histogram, a multi-panel plot; such conceptual dilemma’s are not dealt with in this compendium, and instead we recommend the reader to the chapters on creating graphs in the excellent book by Briscoe (1996). Second, you have to use computer software to translate the conceptual graph to a publication-ready figure. This is the phase where this compendium may be useful, because it brings together R code for producing a set of clean, publication-ready figures. Hopefully this will make it easy to copy-paste and adjust the code to suit your own needs.
In our experience, many graphs can be dramatically improved by adhering to the following guidlines: (1) invest sufficient time and effort in the process; (2) omit needless graphical elements, that is, make every element count; (3) judge the relative impact of the graphical elements and ensure that they are in balance; (4) use large font sizes for all text; (5) deviate from the R default settings – with a little effort, you can do a lot better.
This compendium does not discuss figure headings. However, we will say that it is clearly desirable to have the main message of a figure be understood without being forced to read the main text. If possible, start your figure heading by stating what the figure is meant to demonstrate (i.e., its interpretation). For example, do not state “Popularity as a function of president height”; instead, state “Taller presidents are more popular”.
Finally, a note on color. Many graphs look better in color, but there are two complications. First, some academic journals do not publish manuscripts in color, at least not without charging a hefty price. Second, many readers and reviewers do not have a color printer. Below, some graphs have color, whereas others only use grey-scales. Of course this is one of the easiest things to adjust.
Based on this compendium, learning to create good graphs in R will be 80% copy-paste and 20% tinkering. Let’s go plot ourselves some graphs!
Whenever a researcher reports a correlation, it is imperative to plot the data. Anscombe’s quartet (plotted below) is a famous demonstration of this fact.
This plot shows the relation between the height ratio of US presidents and the percentage of the popular vote. Note the large circles for the data, the thick line for the linear relation, and the large font size for the axis labels. Also, note that the line does not touch the y-axis (a subtlety that requires deviating from the default).Show R-Code
Histograms are relatively straightforward to create and to interpret. In fact, some people may even find them boring. Luckily, it is easy to increase the reader’s interest level by adding information to the plot. Below we illustrate various ways by which this may be accomplished.
When in doubt, add tick marks that showcase the individual data points. This is particularly useful when the number of data points is small. The code below is courtesy of Helen Steingroever. Note that the rug tick marks are jittered.Show R-Code
In R, it is easy to include a nonparametric density estimator. This requires that
freq = FALSE in the histogram comment. Courtesy of Helen Steingroever.
This example shows how to display the bar heights, using the function
l_ply. Courtesy of Helen Steingroever and Quentin Gronau.
The line plot is one of the most standard plots. Nevertheless, many researchers fail to realize that line plots deserve love and attention too.
This graph plots error bars with a user-defined function. More to the point, the lines are thick, and they do not overlap with the symbols (
type = "c"). Note that the legend is not needed; the legend text could simply have been positioned near the associated graphical elements.
Similar to the above, this plot shows the distribuion of the data with a user-defined boxplot function.Show R-Code
By now this plot should look familiar. The distribution of the data is now indicated with a violin plot instead of a box plot. Courtesy of Henrik Singmann, who tweaked the results from the
vioplot package. Warning: this a a lot of code.
In many psychological experiments, there are two dependent variables for each participant: mean response time (RT) and mean proportion of errors. This plot shows them both – RTs are on the left y-axis, and errors are on the right y-axis.Show R-Code
Like their histogram cousin, bar plots are intrinsically boring.
The title says it all. Note that the error bars are added with the
l_ply function. Courtesy of Helen Steingroever and Quentin Gronau.
Densities are ubiquitous, particularly for those who have a predeliction for Bayesian inference. As for the histogram and the bar plot, it is generally a good idea to add more information to the bare-bones plot.
This is a relatively standard plot. Note the thickness of the lines and the font size for the axis labels.Show R-Code
This plot adds a histogram to the density plot, but without needlessly displaying the vertical histogram lines as well. In addition, the code defines the extent to which the lines are transparent, so that both the density and the histogram remain visible, and one does not completely block the other from view.Show R-Code
This plot adds text to the plot. Although this is generally trivial, this particular example contains a mathematical symbol that is tricky to display properly (unless, of course, you know how it works).Show R-Code
This is another example, featuring a nice Greek letter. Seriously, what is important here is that the labels are positioned next to the associated graphical element. This approach is more direct than creating a legend, when the reader has to decode the legend first, keep the symbols in working memory, and then turn attention to the graph itself. Bottom line: only use legends when you have to. Even then, you may find that the legend box almost never fulfills a useful function, and can safely be omitted.Show R-Code
It is cool to be able to highlight specific parts of a density by some color coding scheme. In this example, Ravi Selker shows how that can be done (hint: it’s the
Mijke Rhemtulla also likes to highlight specific parts of a density. This is the first plot in a series, taken from one of Mijke’s stats courses.Show R-Code
Part 2…Show R-Code
Part 3…Show R-Code