Visualise your experimental data

GPT-4 does data analysis of a pasted dataset

Home » GPT-4 does data analysis of a pasted dataset

I was wondering as to whether ChatGPT can analyse the dataset if I copy-pasted it in chat’s text input field. One of the Gapminder datasets is “Mini” at Kaggle. It is a csv file. I prompted GPT-4 with request to analyse the dataset and to make a summary table with columns means and SD, and I also asked it to provide its own conclusions.

Prompting GPT-4 to analyse the dataset
Prompting GPT-4 to analyse the dataset and make a table

Then I just copy-pasted the content of csv file into ChatGPT text field and it almost immediately started making summary table. Just as requested means and SD.

Pasted dataset and first rows of summary table
Pasted “Mini-Gapminder” dataset and the first two rows of a summary table

The resulting table has values that are slightly off. The real values are as follows:

Excel-calculated Mini-Gapminder dataset means and SD
Excel-calculated “Mini-Gapminder” dataset means and SD

When I asked why did it make mistakes: “As an AI language model, I do not explicitly perform calculations in the traditional sense. Instead, my responses are based on patterns and knowledge learned from a vast dataset of text during my training.”

ChatGPT then proceeded churning out its interpretation of the data category by category.

End of summary table and first two conclusion pointers
The end of the summary table and first two conclusion pointers

ChatGPT made similar observations for all 9 categories in the dataset, making an overall conclusion on disparities in health, economics and environmental factors across countries and regions. This is not a very deep analysis but it is an automatic one.

Here is the video of the entire process. The text generation part was sped up 10x.

ChatGPT-4 analyses “mini-gapminder” dataset.

How to build your own ChatGPT web app ↗

I will show you how to create a web app that would be running ChatGPT-3.5-turbo model under the hood. It will look like this ChatGPT-based chat but the code bel

10 Best Practices for Effective Data Visualization: Simplicity ↗

updated March 8th, 2023 This is a long read on best practices in data visualisation, which will be periodically updated. I will try to supplement each post with

“Naked” barplots conceal data distribution ↗

Barplots with standard error of means error bars can conceal true data distribution.


Posted
March 22, 2023
by
Maxim Bespalov
Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *