Will ChatGPT replace data scientists? Let’s analyse Tom Cruise’s film career to find out

There has been a lot of discussion about how large language models could potentially replace many white-collar jobs. OpenAI recently released a beta version of its Code Interpreter with little fanfare. However, as a data scientist, the features appear to be a significant development.

Reading this I had two thoughts:

I wanted to try using ChatGPT to work for me to run an analysis and recreate a version of a chart similar to the Nic Cage one I put together last year. With the upcoming release of Mission Impossible: Dead Reckoning Part I, it seems like a good time to build a chart for Tom Cruise's films.
Once I had a clean dataset, I could feed it to the new ChatGPT code interpreter, give it some very open prompts and see how the analysis and resulting charts compared. We can see if the results are good enough to put me out of a job.

Contestant 1: Me

The main question I’d like to explore is how the box office earnings of Tom Cruise's films correlate with their critical reception. I want the audience to be engaged with the analysis, and be able to ask and answer their own questions about the data.

A straightforward bar graph or scatter plot might give some good insights, but I decided to opt for a more interesting, visually compelling approach: a 'slice of pie' chart. The radius and colour of each slice will represent the critic scores, while the proportion of the circumference will show the global box office takings.

I needed two key data points for each of Tom Cruise's films: the global box office earnings and the Rotten Tomatoes critic scores. Last time for Nic Cage I scraped this data manually. I wanted to see this time if ChatGPT could help speed things up.

Interestingly, you can get ChatGPT to generate a CSV of the data but the scores are ‘fictional examples’.

Instead, I was able to paste the unformatted data from Rotten Tomatoes:

I then prompted ChatGPT to reformat the data with the correct delimiter:

and number formatting:

Once I had all the data, I created the 'slice of pie' chart in D3.js. Each film represented a slice. The slice's radius and its colour intensity were based on the critic score; a longer, more blue coloured slice meant a higher score. The proportion of the pie's circumference that the slice occupied corresponded to the film's share of total box office earnings. In essence, a slice would be larger if the film earned more.

The result was a visually striking chart that revealed some fascinating insights at a glance. Not only could you see which of Tom Cruise's films were the most commercially successful, but you could observe how these box office hits correlated with critical reception. With a 'slice of pie' chart, a multi-variable comparison became a clear, visually intuitive exploration.

What does the chart tell us?

More recent films in Tom’s career have earned a lot at the box office, and been critically well received, with the notable exception of ‘The Mummy’ and ‘Jack Reacher: Never Go Back’. There appears to be a general trend of box office takings increasing over time.

The first two Mission Impossible films were commercial, but not critical hits, with scores of 66% and 56% respectively. Each film took an average of $500 million USD at the box office.

Looking at the color distribution of scores, most films appear to be positively reviewed. The median Rotten Tomatoes score is 69% for all of Tom’s films. His best reviewed film was ‘Mission: Impossible - Fallout’ with a score of 97%. The worst reviewed film of Tom’s career is ‘Cocktail’ at 9%.

Let’s see what ChatGPT and the new code interpreter can do with the same dataset:

Contestant 2: ChatGPT (Model: Code Interpreter)

The first step was uploading the data, and giving a short, non-specific prompt to write up an analysis:

Here’s what came back:

ChatGPT Response

It appears that the data in the CSV file is using a pipe ("|") as a separator, rather than the more typical comma. This is not a problem, but we will need to specify this when reading the file. Let's re-read the file with this separator and then display the first few rows again.

The dataset includes the following columns:

year: The year the film was released
film: The title of the film
role: The role Tom Cruise played in the film
gross: The box office gross earnings of the film (in USD)
rt_score: The Rotten Tomatoes score for the film

Let's conduct an initial analysis of the data. We can start by describing the numerical data to get a sense of the distribution of box office earnings and Rotten Tomatoes scores. We can also look at how many films are included and the range of years covered.

The dataset includes data on 46 unique films, spanning 42 years from 1981 to 2023.

Here are some statistics on the box office gross and Rotten Tomatoes scores:

Box Office Gross:
- The minimum box office gross is approximately $1.25 million.
- The maximum box office gross is approximately $1.48 billion.
- The average box office gross is approximately $266 million.
- The gross earnings show a significant spread, as indicated by a large standard deviation.
Rotten Tomatoes Scores:
- The minimum score is 9.
- The maximum score is 97.
- The average score is around 66.
- The scores also display a large spread, suggesting a wide range of critical reception for the films.

Next, let's create some visualizations to better understand the data. We can generate the following charts:

A line plot of the box office gross over time.
A scatter plot of Rotten Tomatoes scores against box office gross.
A histogram of Rotten Tomatoes scores to show the distribution of scores.

Here are the visualizations based on the dataset:

Box Office Gross Over Time: This chart shows a general upward trend in box office gross over time. This could be due to a variety of factors, including inflation, increasing popularity of Tom Cruise, or the growing global box office market.
Rotten Tomatoes Scores vs. Box Office Gross: This scatter plot does not show a clear correlation between the Rotten Tomatoes score and the box office gross. This suggests that a film's critical reception does not necessarily translate into box office success.
Histogram of Rotten Tomatoes Scores: This histogram shows that the distribution of Rotten Tomatoes scores is somewhat skewed to the right, meaning there are more high-scoring films than low-scoring ones. Most of the films have a score of around 70 or higher.

These findings illustrate that Tom Cruise's filmography is diverse in both financial success and critical reception. Despite the wide range in Rotten Tomatoes scores, the majority of his films have been financially successful, with an upward trend in box office gross over time.

Am I getting replaced by AI?

The results by ChatGPT are extremely impressive. The entire analysis was completed in around 10 seconds. Most analysts would take at least an hour or two to load the data, generate summary statistics, create the charts and write-up the findings. But there are a few caveats to what ChatGPT produced that are worth noting:

I provided a clean, properly formatted dataset. This never, ever happens in the real world. A large part of the job of a data scientist is being a ‘data janitor’, cleaning up the data to be usable and gathering multiple sources of data from different places.
The dataset here is trivially small. Most real life datasets are stored on large SQL databases, and code interpreter isn’t (yet) able to connect to those databases and write queries. I’m sure this will be possible in the future however.
In the beta version of Code Interpreter, I wasn’t able to ask any follow-up questions. Generally a good piece of analysis by a data scientist inspires lots of questions from the audience, and being able to find the answers is important.

Closing thoughts

While I was writing this post, I was struck by how useful ChatGPT can be when combined with traditional analysis. It speeds up data cleaning, eliminating the need to spend hours tweaking regular expressions to get the data just right. The summary statistics provided by ChatGPT offer a great starting point for deeper analysis. The charts show interesting trends, and ChatGPT even adds axis labels and titles, something that many real data scientists forget to do. Just like querying a database with SQL or writing Python, I believe that using large language models will become an essential skill for data scientists.