Visualise your experimental data

10 Best Practices for Effective Data Visualization: Simplicity

10 Best Practices for Effective Data Visualization: Simplicity

Home » 10 Best Practices for Effective Data Visualization: Simplicity

updated March 8th, 2023

This is a long read on best practices in data visualisation, which will be periodically updated. I will try to supplement each post with R code examples.

List of best practices:

  • Keep it simple

  • Reveal, not conceal

  • Highlight the essential information

  • Label and title the visualization

  • Provide context

  • Consider the audience

  • Test and iterate

  • Use appropriate scales

  • Provide interactivity

  • Use appropriate colours

Firstly, first and firstmost:)

Keep it simple

Why Keeping Charts Simple is Critical for Effective Data Visualization

Introduction: the Importance of Simplicity in Data representation

Data visualization is a powerful tool for communicating complex information to others. However, if done poorly, visualizations can be confusing and even misleading. One of the most critical principles of effective data visualization is to keep it simple. This means avoiding clutter and focusing on the key insights that the data can provide.

INFERIOR Example of categorical data visualisation: stacked bar plot

Consider this code and the resulting stacked bar plot.

# Create a cluttered (in my view) bar chart
library(ggplot2)
data <- data.frame(
  region = c("North", "South", "East", "West"),
  sales = c(100, 200, 150, 175),
  profit = c(50, 100, 75, 80)
)
ggplot(data, aes(x = region)) +
    geom_col(aes(y = sales, fill = "Sales"), position = "dodge") +
    geom_col(aes(y = profit, fill = "Profit"), position = "dodge") +
    scale_fill_manual(name = "", values = c("Sales" = "#F8766D", "Profit" = "#00BA38")) +
    labs(title = "Sales and Profit by Region") +
    theme_minimal() +
    theme(legend.position = "bottom")
Example of cluttered barchart

Cluttered labelling and stacked structures can make comparisons harder.

A cleaner alternative
Example of less cluttered bar chart

Removing unnecessary labels and grouping the categories visually helps the viewer focus on the message rather than on chart furniture.

Another good way to visualise the means of categorical data – grouped bar plots

Several reasons to choose grouped bar plots. Firstly, grouped bar charts allow viewers to easily compare the values of different variables within each group, as each variable is represented by a separate bar. They are also often more visually appealing and less cluttered than stacked bar charts.

library(ggplot2)
library(reshape2)
data <- data.frame(
    region = c("North", "South", "East", "West"),
    Sales = c(100, 200, 150, 175),
    Profit = c(50, 100, 75, 80)
)
melted_data <- melt(data, id.vars = "region")
ggplot(melted_data, aes(x = region, y = value, fill = variable)) +
    geom_bar(stat = "identity", position = "dodge") +
    scale_fill_manual(values = c("#619CFF", "#00BA38")) +
    labs(title = "Sales and Profit by Region", x = "Region", y = "USD") +
    theme_minimal() +
    theme(legend.position = "bottom") +
    guides(fill = guide_legend(title = NULL))
Grouped graph made by ScatterPlot.Bar

Stacked bar charts can become crowded and difficult to read when there are too many variables, while grouped bar charts provide a clearer and more organized way to display the data.

Stacked bar chart example

Faceted plots are often a superior choice over stacked bar charts because each facet presents a separate visualization of the same data.

# melt data for ggplot. "group" is either sales or profits
data_melt <- tidyr::gather(key = "group", value = "value", -Region)

ggplot(data_melt, aes(x = Region, y = value, fill = group)) +
    geom_col(position = "dodge") +
    facet_wrap(~ group, nrow = 1)
Faceted bar plots
Tailoring Visualizations to underlying data: do not overcomplicate

Using overly complex visualization types for simple data is a common mistake that can hinder the audience’s ability to accurately interpret the information.

Accessibility should be a key consideration when designing data visualizations to ensure that the information can be effectively communicated to all members of the intended audience.

Using Software Tools for interactivity

Using software tools that are specifically designed for data visualization, such as plotly, can help to ensure that the resulting visualization is both effective and aesthetically pleasing.

plot_ly(z = as.matrix(temp_data[, 2:7]),
        x = colnames(temp_data[, 2:7]),
        y = temp_data$city,
        type = "surface",
        colors = "RdYlBu")

How to build your own ChatGPT web app ↗

I will show you how to create a web app that would be running ChatGPT-3.5-turbo model under the hood. It will look like this ChatGPT-based chat but the code bel

GPT-4 does data analysis of a pasted dataset ↗

I was wondering as to whether ChatGPT can analyse the dataset if I copy-pasted it in chat’s text input field. One of the Gapminder datasets is “Mini” at Kaggle.

“Naked” barplots conceal data distribution ↗

Barplots with standard error of means error bars can conceal true data distribution.


Posted
March 3, 2023
by
Maxim Bespalov

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *