Failing Forwards — With Game Data

David Orn Johannesson
3 min readApr 2, 2021

Understanding your data before you ask questions.

Source : https://cdn.arstechnica.net/wp-content/uploads/2021/01/gamestop-stock-rise-800x450.jpg

In this article we look at a dataset containing information about sales for games. We try to decide if the dataset “makes sense” and what information we can draw from it. We roll around in the hey trying to come up with informative results and present the data. The main question is what features should a perspective stock buyer look at.

The data originates from kaggle [https://www.kaggle.com/gregorut/videogamesales]. As with most datasets, we can’t just assume they are correct. That would corrupt everything downstream. So let’s first look at the total sales over time and get a feel for the data.

Plotting the overall sales, per region, for the entire dataset shows that we really need to be careful in what questions we can ask.

Okay. Interesting. Our initial hope was to build a autoregressive model and predict forwards into the future. However, anyone who wants to tell me that the sales of videogames has crashed since 2010 hasn’t really been paying attention. Prediction on this dataset would not be interesting. Does that mean we can’t use the dataset?

Well sort of. We need to come up with different questions that would guide us towards either a better dataset or what features we should develop further in our search.

Rather than predict, let’s ask some analytical questions that this dataset can answer. The most important note is who is selling the most. Also lets correlate the number of sold values to revenue, this will tell us if developing a lot of games is a predictor to making money (yes, this is a deeply flawed question since the answer will always be that more games result in more revenue, just stay with me for a while)). Firstly lets get the top 20 publishers of games and plot the overall revenue for each of these publishers.

Plotting the values for units produced and the overall gross global sales.

Computing the correlation between numbers of produced games and overall sales is about 0.61. Suggesting, unsurprisingly that producing multiple games results in more sales. However the connection isn’t excessively high. Suggesting that a company can also use a quality over quantity strategy.

Finally lets understand what type of games, and platforms, are the most revenue making. Firstly, lets look at the platforms. Remember that we don’t fully believe this dataset. We think the results might point in the right direction where someone should look further.

Looking at overall sales based on platform. Remember the dataset ends around 2015 so the actual increase of PS4 sales was enormous. Needless to say this points to Sony [Playstation] being on the right track.

Finally let’s plot the genre types as function of sales. They will give us information about what types of games used to be high sellers.

The dataset is time skewed. Further analysis is required to examine time trends in different genres.

Based on our limited exploration we would want to look further into the development that Nintendo and EA are doing on action games.

Okay, this article hasn’t really given us anything ground breaking or informative. Note that it’s a part of a Udacity Nanodegree in Data Science and should be approached like that. Besides it’s the first article and I am getting used to the process and structure.

The code to generate the figures is available on my github page and open for all who think it might help. There is also more technical stuff there.

Thank you for your time and have a great day.

6The dataset is available at Kaggle:

The relevant code is on my git repo

This post is done as a part of the Udacity Nanodegree series, part 1 introduction. For information about the course go to Udacity

Any other questions feel free to contact me via

  • cudea0@gmail.com

--

--

David Orn Johannesson

Research Engineer with background in mechatronics and electrical engineering. Woking on computation and big datasets