Misuse of bar charts to represent data
Despite being designed for counts and proportions, bar plots are still widely accepted for presenting continuous data, especially in biology and psychology, among other fields. This is problematic because bar graphs can conceal the differences in the distributions of the data, and many different datasets can lead to the same bar graph.
Five different distributions produce nearly identical barplots and error bars (SEM)
The picture below shows five bars representing five different datasets. They all look the same, even error bars displaying SEM are similar.

R code to generate normal, uniform, exponential, gamma, and bimodal distributions and to plot them as bars
library(reshape2)
library(ggplot2)
# Create four datasets with similar means and standard errors but different distributions
set.seed(123)
n <- 200
mu <- 10
sigma <- 5
# Normal distribution
data1 <- rnorm(n/4, mean = mu, sd = sigma*2)
# Uniform distribution
data2 <- runif(n/2, min = mu - sqrt(3) * sigma*2, max = mu + sqrt(3) * sigma*2)
# Exponential distribution
data3 <- rexp(n, rate = 1/mu)
# Gamma distribution
data4 <- rgamma(n, shape = 6, rate = 0.555)
# Bimodal distribution
data5up <- c(rnorm(n/4, mean = mu + 6.5, sd = 1))
data5down <- c(rnorm(n/4, mean = mu - 6, sd = 1))
data5 <- c(data5up, data5down)
data <- cbind(data1,data2,data3,data4,data5)
datID <- as.data.frame(data)
colnames(datID) <- c("Normal", "Uniform", "Exponential", "Gamma", "Bimodal")
datID$id = 1:dim(datID)[1]
datIDmelt <- melt(datID, id.vars="id")
colnames(datIDmelt) <- c("id", "distribution", "value")
ggplot(datIDmelt, aes(x = distribution, y = value, fill = distribution)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
geom_point() +
labs(title = "Comparison of Distributions with Similar Means and Standard Errors") +
theme_minimal()
I created an interactive web tool ScatterPlot.Bar that can combine dotplots with bar charts or with cross bars, violin plots and boxplots and does not require coding. The following graph showing scatterplots of five different types of data distributions was made with that web app.


Journals like the Journal of Biological Chemistry, PLOS Biology, eLife, and Nature, among others, have taken steps to address this issue by implementing new guidelines that encourage or require authors to select figures that show the data distribution.
To sum up, barplots without dotplots can conceal important information on data distribution. Yet the jitter display alone can be difficult to interpret regarding statistics of the distributions. Perhaps the most useful and trustworthy representation is a crossbar (or point) with a range overlayed with jitter.

Comments
Leave a Reply
Your email address will not be published. Required fields are marked *