Data Skunkworks

Data Visualisation

Posts

Jul 18, 2023

Will ChatGPT replace data scientists? Let’s analyse Tom Cruise’s film career to find out

Jul 18, 2023

There has been a lot of discussion about how large language models could potentially replace many white-collar jobs. OpenAI recently released a beta version of its Code Interpreter with little fanfare. However, as a data scientist, the features appear to be a significant development.

Read More →

Jul 18, 2023

May 4, 2022

Every James Bond vehicle, visualised

May 4, 2022

A critical component of successful interactive visualizations is orienting the users towards what you want them to look at.

Read More →

May 4, 2022

Posts

Jan 1, 2022

D3.js 2022 Update: Nicolas Cage’s Film Career Visualised

Jan 1, 2022

In my original post I used Tableau to build the visualisation showing the relationship between critic scores and the box office performance of Nicolas Cage’s films. For the latest update, I wanted to use D3.js, particularly for the d3.arc() and d3.pie() functions that make building the chart slices simple and scalable.

Read More →

Jan 1, 2022

Feb 25, 2021

Learning D3.js

Feb 25, 2021

My main tools for drawing data visualisations in the past few years have been Tableau, then PowerBI, then matplotlib and Dash. All are very capable tools, but D3.js has always been the holy grail for interactivity, animations and overall flexibility.

Read More →

Feb 25, 2021

Mar 11, 2019

A song of nodes and edges – Network analysis in Game of Thrones

Mar 11, 2019

To demonstrate the concept of network analysis, I built an interactive, force-directed graph of character relationships for each of George R.R Martin's Game of Thrones novels using D3.js and Tableau.

Read More →

Mar 11, 2019

Data Engineering

Posts

Jul 10, 2020

Ranking the most viewed people on Wikipedia in 2020 (so far)

Jul 10, 2020

In the previous posts, we looked at loading all of the 1.2 TB of pageviews data from Wikipedia, and ranking the most popular people by day. The final step involves bringing all the data together into a visualisation.

Read More →

Jul 10, 2020

Using Python to scrape Wikipedia for images of the most viewed people in 2020

Jul 10, 2020

Can we find the most viewed people on Wikipedia each day in 2020 and get a picture of each one?

Read More →

Jul 10, 2020

How to load and analyse 48 billion Wikipedia page views with Google BigQuery

Jul 10, 2020

This year, the English language Wikipedia has averaged around 8 billion page views per month, making it one of the most visited websites in the world. The first half of 2020 has been incredibly eventful, and I was interested building a dataset to see exactly which pages Wikipedia users have been most interested in.

Read More →

Jul 10, 2020

Machine Learning

Posts

Apr 30, 2020

Understanding XGBoost in five minutes

Apr 30, 2020

Some consider XGBoost something of a data science ‘black box’. The models are not quite as easy to interpret as a decision tree, or a set of manually coded rules. The aim of this post is to demystify XGBoost, and give a mostly ‘maths-free’ explanation of how it works.

Read More →

Apr 30, 2020

Mar 11, 2020

Smiling and dialling with machine learning: how to call the right customers

Mar 11, 2020

Uplift modelling is a powerful, but relatively unknown technique in large organisations. How can businesses use data science and machine learning to market to the right customers?

Read More →

Mar 11, 2020

Risk Modelling

Posts

Sep 18, 2020

Building a quantitative cyber-risk model in Python

Sep 18, 2020

How can we use Python to build a cyber risk portfolio for an online retailer?

Read More →

Sep 18, 2020

Aug 17, 2020

Why companies should use quantitative modelling for cyber risk

Aug 17, 2020

Quantitative cyber risk modelling unlocks a whole new class of decision making for a C-level executive

Read More →

Aug 17, 2020

Natural Language Processing

Posts

Feb 1, 2021

Topic modelling clickbait headlines with Python

Feb 1, 2021

Very often I come across business problems where a large amount of natural language data needs to be processed and analysed. One good starting point for those problems is to build a topic model.

Feb 1, 2021