“Naked” barplots conceal data distribution

Home » “Naked” barplots conceal data distribution

Misuse of bar charts to represent data

Despite being designed for counts and proportions, bar plots are still widely accepted for presenting continuous data, especially in biology and psychology, among other fields. This is problematic because bar graphs can conceal the differences in the distributions of the data, and many different datasets can lead to the same bar graph.

Five different distributions produce nearly identical barplots and error bars (SEM)

The picture below shows five bars representing five different datasets. They all look the same, even error bars displaying SEM are similar.

Comparison of distributions with similar means and standard errors

R code to generate normal, uniform, exponential, gamma, and bimodal distributions and to plot them as bars

library(reshape2)
library(ggplot2)

# Create four datasets with similar means and standard errors but different distributions
set.seed(123)
n <- 200
mu <- 10
sigma <- 5

# Normal distribution
data1 <- rnorm(n/4, mean = mu, sd = sigma*2)
# Uniform distribution
data2 <- runif(n/2, min = mu - sqrt(3) * sigma*2, max = mu + sqrt(3) * sigma*2)
# Exponential distribution
data3 <- rexp(n, rate = 1/mu)
# Gamma distribution
data4 <- rgamma(n, shape = 6, rate = 0.555)
# Bimodal distribution
data5up <- c(rnorm(n/4, mean = mu + 6.5, sd = 1))
data5down <- c(rnorm(n/4, mean = mu - 6, sd = 1))
data5 <- c(data5up, data5down)

data <- cbind(data1,data2,data3,data4,data5)
datID <- as.data.frame(data)
colnames(datID) <- c("Normal", "Uniform", "Exponential", "Gamma", "Bimodal")
datID$id = 1:dim(datID)[1]
datIDmelt <- melt(datID, id.vars="id")
colnames(datIDmelt) <- c("id", "distribution", "value")

ggplot(datIDmelt, aes(x = distribution, y = value, fill = distribution)) +
  stat_summary(fun = mean, geom = "bar") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  geom_point() +
  labs(title = "Comparison of Distributions with Similar Means and Standard Errors") +
  theme_minimal()

I created an interactive web tool ScatterPlot.Bar that can combine dotplots with bar charts or with cross bars, violin plots and boxplots and does not require coding. The following graph showing scatterplots of five different types of data distributions was made with that web app.

Dotplot with crossbar representing 5 distributions

Dot plot layer added to a bar plot representing five different distributions with SD error bars

Journals like the Journal of Biological Chemistry, PLOS Biology, eLife, and Nature, among others, have taken steps to address this issue by implementing new guidelines that encourage or require authors to select figures that show the data distribution.

To sum up, barplots without dotplots can conceal important information on data distribution. Yet the jitter display alone can be difficult to interpret regarding statistics of the distributions. Perhaps the most useful and trustworthy representation is a crossbar (or point) with a range overlayed with jitter.

February 28, 2023

dot plot generator dotplot maker ggplot2 R language scatter plot maker scatterplot Scatterplot generator

Maxim Bespalov

Tags:

dot plot maker dotplot creator ggplot2 scatter plot creator scatter plot maker scatterplot scatterplot creator

“Naked” barplots conceal data distribution

Misuse of bar charts to represent data

Five different distributions produce nearly identical barplots and error bars (SEM)

R code to generate normal, uniform, exponential, gamma, and bimodal distributions and to plot them as bars

How to build your own ChatGPT web app ↗

GPT-4 does data analysis of a pasted dataset ↗

10 Best Practices for Effective Data Visualization: Simplicity ↗

Comments

Leave a Reply