Visualise your experimental data

“Naked” barplots conceal data distribution

“Naked” barplots conceal data distribution

Home » “Naked” barplots conceal data distribution

Misuse of bar charts to represent data

Despite being designed for counts and proportions, bar plots are still widely accepted for presenting continuous data, especially in biology and psychology, among other fields. This is problematic because bar graphs can conceal the differences in the distributions of the data, and many different datasets can lead to the same bar graph.

Five different distributions produce nearly identical barplots and error bars (SEM)

The picture below shows five bars representing five different datasets. They all look the same, even error bars displaying SEM are similar.

Comparison of distributions with similar means and standard errors

R code to generate normal, uniform, exponential, gamma, and bimodal distributions and to plot them as bars

library(reshape2)
library(ggplot2)

# Create four datasets with similar means and standard errors but different distributions
set.seed(123)
n <- 200
mu <- 10
sigma <- 5

# Normal distribution
data1 <- rnorm(n/4, mean = mu, sd = sigma*2)
# Uniform distribution
data2 <- runif(n/2, min = mu - sqrt(3) * sigma*2, max = mu + sqrt(3) * sigma*2)
# Exponential distribution
data3 <- rexp(n, rate = 1/mu)
# Gamma distribution
data4 <- rgamma(n, shape = 6, rate = 0.555)
# Bimodal distribution
data5up <- c(rnorm(n/4, mean = mu + 6.5, sd = 1))
data5down <- c(rnorm(n/4, mean = mu - 6, sd = 1))
data5 <- c(data5up, data5down)

data <- cbind(data1,data2,data3,data4,data5)
datID <- as.data.frame(data)
colnames(datID) <- c("Normal", "Uniform", "Exponential", "Gamma", "Bimodal")
datID$id = 1:dim(datID)[1]
datIDmelt <- melt(datID, id.vars="id")
colnames(datIDmelt) <- c("id", "distribution", "value")

ggplot(datIDmelt, aes(x = distribution, y = value, fill = distribution)) +
  stat_summary(fun = mean, geom = "bar") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  geom_point() +
  labs(title = "Comparison of Distributions with Similar Means and Standard Errors") +
  theme_minimal()

I created an interactive web tool ScatterPlot.Bar that can combine dotplots with bar charts or with cross bars, violin plots and boxplots and does not require coding. The following graph showing scatterplots of five different types of data distributions was made with that web app.

Dotplot with crossbar representing 5 distributions
Dot plot layer added to a bar plot representing five different distributions with SD error bars

Journals like the Journal of Biological Chemistry, PLOS Biology, eLife, and Nature, among others, have taken steps to address this issue by implementing new guidelines that encourage or require authors to select figures that show the data distribution.

To sum up, barplots without dotplots can conceal important information on data distribution. Yet the jitter display alone can be difficult to interpret regarding statistics of the distributions. Perhaps the most useful and trustworthy representation is a crossbar (or point) with a range overlayed with jitter.

How to build your own ChatGPT web app ↗

I will show you how to create a web app that would be running ChatGPT-3.5-turbo model under the hood. It will look like this ChatGPT-based chat but the code bel

GPT-4 does data analysis of a pasted dataset ↗

I was wondering as to whether ChatGPT can analyse the dataset if I copy-pasted it in chat’s text input field. One of the Gapminder datasets is “Mini” at Kaggle.

10 Best Practices for Effective Data Visualization: Simplicity ↗

updated March 8th, 2023 This is a long read on best practices in data visualisation, which will be periodically updated. I will try to supplement each post with


Posted
February 28, 2023
by
Maxim Bespalov

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *