Visualise your experimental data

Combine Scatter Plots With Bar Plots or Box Charts.

Combine Scatter Plots With Bar Plots or Box Charts.

Home » Combine Scatter Plots With Bar Plots or Box Charts.

If you do not want work with R directly try my scatter plot with bar maker – the versatile R-language-based data visualisation website, combining scatter- or dot-plots with bar charts, box- and violin plots.

updated 5th Sept, 2023

Basic Graph Types in R and how to combine them

Introduction

R is a popular programming language for data analysis and visualization. It provides a wide range of libraries and tools for creating different types of visualizations. This article focuses on boxplots and bar plots, and on how to combine them with scatterplots or dot plots.

Box plots

A bar plot is a graph that uses bars to represent the values of a categorical variable.

Creating a Box Plot in R

data <- c(5, 10, 15, 20, 25, 21, 20, 19, 18)
boxplot(data)
Box plot in base R
Box plot in base R

The body of the boxplot, which represents the interquartile range (IQR), is composed of the second and third quartiles of the data. A single circle of value 5 is an outlier and the crossbar represents a median value.

How to add scatterplot to boxplot in base R

data <- c(5, 10, 15, 20, 25, 21, 20, 19, 18)
boxplot(data)
points(x = rep(1, length(data)), y = data, col = "red")
Scatterplot combined with boxplot

However, it is not always optimal to display all points on one line. Several points of similar or identical value would overlap. Jittering can offset them slightly.

data <- c(5, 10, 15, 20, 25, 21, 20, 19, 18)
boxplot(data, outline = FALSE)
points(x = jitter(rep(1, length(data)), amount = 0.05), y = data, col = "red")
Dot plot with excluded outlier

How to supress boxplot outliers while preserving the plot range

data <- c(5, 10, 15, 20, 25, 21, 20, 19, 18)
bp <- boxplot(data, outline = FALSE, plot = FALSE)
plot(1, ylim = range(data), type = "n", ylab = "", xlab = "", xaxt = "n")
boxplot(bp$stats, add = TRUE, outline = FALSE, at = 1)
points(jitter(rep(1, length(data)), amount = 0.05), data, col = "blue")
Boxplot with an outlier as part of scatterplot

Here is a more detailed description of what was done. After creating a numeric vector, a box plot object is created with outline = FALSE and plot = FALSE. Then an empty plot is created with the full range of the data and the box plot is added back on top, followed by the points.

In this section, I will show you how to create a scatter plot together with boxplot using the mtcars dataset.

Loading the mtcars dataset:

Before we begin, we need to load the mtcars dataset, which is included in R. The rows represent car models and the two columns we are interested in are mpg and cyl.

data(mtcars)
plot(1, ylim = range(mtcars$mpg), type = "n", ylab = "", xlab = "", xaxt = "n")
boxplot(mpg ~ cyl, data = mtcars, outline = FALSE)
points(jitter(rep(1:length(unique(mtcars$cyl)), times = table(mtcars$cyl))), mtcars[order(mtcars$cyl), "mpg"], col = "red")
Scatterplot overlaying boxplots, showing mpg by cyl
Scatterplot overlaying colored boxplots

Advantages of ggplot2 Compared to Base R Plotting Functions

ggplot2 offers a more flexible and customizable approach to data visualization. With ggplot2, one can create complex graphs with multiple layers, custom themes, and sophisticated colour palettes.

Creating a BOXplot with ggplot2
install.packages("ggplot2")
library(ggplot2)
data(mtcars)
ggplot(data = mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(mapping = aes(fill = factor(cyl))) +
  labs(title = "Mean Miles per Gallon by Number of Cylinders", x = "Number of Cylinders", y = "Mean Miles per Gallon") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 14), legend.position = "none")
Boxplot drawn with ggplot2

How to add scatter to boxplot with ggplot2

ggplot(data = mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(mapping = aes(fill = factor(cyl))) +
  geom_point(position = position_jitter(width = 0.2), color = "black") +
  labs(title = "Mean Miles per Gallon by Number of Cylinders", x = "Number of Cylinders", y = "Mean Miles per Gallon") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 14), legend.position = "none")
Scatterplot added to boxplot
Box plots overlaid with scatter plot built with ScatterPlot.Bar

Bar Plots in R

Bar plots, also known as bar charts, are a common way to visualize data in R. They are useful for comparing values across categories.

Creating a Bar Plot in base R language

data(mtcars)
mpg_means <- tapply(mtcars$mpg, mtcars$cyl, mean)
barplot(mpg_means, main = "Mean Miles Per Gallon by Number of Cylinders", xlab = "Number of Cylinders", ylab = "Mean Miles Per Gallon")
Base barplot in R

Customizing a Bar Plot in base R

bar_colors <- c("blue", "blue", "blue")
mpg_means <- tapply(mtcars$mpg, mtcars$cyl, mean)
barplot(mpg_means, main = "Mean Miles Per Gallon by Number of Cylinders", xlab = "Number of Cylinders", ylab = "Mean Miles Per Gallon", col = bar_colors, legend.text = "Mean MPG", beside = TRUE, width = c(0.8, 0.8, 0.8))
Customized base barplot

How to build barplot or columns plot with ggplot2 in R

The geom_bar() function counts rows by default, so it needs either stat = "identity" or stat = "summary" depending on the goal.

ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar(aes(fill = factor(cyl)), width = 0.7)
Number of rows by cylinder
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_bar(stat = "identity", aes(fill = factor(cyl)), width = 0.7)
Total mpg by cylinder
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_bar(stat = "summary", fun = mean, aes(fill = factor(cyl))) +
  labs(title = "Mean Miles per Gallon by Number of Cylinders", x = "Number of Cylinders", y = "Mean Miles per Gallon") +
  theme_minimal()
Barplot representing mean mpg values for each cylinder category

To add a scatter plot to the existing barplot:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_bar(stat = "summary", fun = mean, aes(fill = factor(cyl))) +
  geom_point(position = position_jitter(width = 0.2), color = "black")
Barplot with mean and scatterplot

How to add error bars to a barplot in R ggplot

stat_summary(fun.data = mean_se, geom = "errorbar", width = .2)

For standard deviation, another possibility is to use package Hmisc and mean_sdl with geom = "errorbar".

library(Hmisc)
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  stat_summary(fun = mean, geom = "bar") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = .2) +
  geom_point(position = position_jitter(width = 0.2), color = "black") +
  theme_minimal()
Barplot with SD error bars
Barplot with error bars and scatterplot

Alternatively, for a no-code option one could use a free online dot plot generator combining scatter with bars, columns, box- or violin-plots. This web app can also generate a heatmap online and make line plots with standard deviations as error bars.

How to build your own ChatGPT web app ↗

I will show you how to create a web app that would be running ChatGPT-3.5-turbo model under the hood. It will look like this ChatGPT-based chat but the code bel

GPT-4 does data analysis of a pasted dataset ↗

I was wondering as to whether ChatGPT can analyse the dataset if I copy-pasted it in chat’s text input field. One of the Gapminder datasets is “Mini” at Kaggle.

10 Best Practices for Effective Data Visualization: Simplicity ↗

updated March 8th, 2023 This is a long read on best practices in data visualisation, which will be periodically updated. I will try to supplement each post with


Posted
September 7, 2022
by
Maxim Bespalov

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *