Failing Forwards — With Game Data
Understanding your data before you ask questions.
In this article we look at a dataset containing information about sales for games. We try to decide if the dataset “makes sense” and what information we can draw from it. We roll around in the hey trying to come up with informative results and present the data. The main question is what features should a perspective stock buyer look at.
The data originates from kaggle [https://www.kaggle.com/gregorut/videogamesales]. As with most datasets, we can’t just assume they are correct. That would corrupt everything downstream. So let’s first look at the total sales over time and get a feel for the data.
Okay. Interesting. Our initial hope was to build a autoregressive model and predict forwards into the future. However, anyone who wants to tell me that the sales of videogames has crashed since 2010 hasn’t really been paying attention. Prediction on this dataset would not be interesting. Does that mean we can’t use the dataset?
Well sort of. We need to come up with different questions that would guide us towards either a better dataset or what features we should develop further in our search.
Rather than predict, let’s ask some analytical questions that this dataset can answer. The most important note is who is selling the most. Also lets correlate the number of sold values to revenue, this will tell us if developing a lot of games is a predictor to making money (yes, this is a deeply flawed question since the answer will always be that more games result in more revenue, just stay with me for a while)). Firstly lets get the top 20 publishers of games and plot the overall revenue for each of these publishers.
Computing the correlation between numbers of produced games and overall sales is about 0.61. Suggesting, unsurprisingly that producing multiple games results in more sales. However the connection isn’t excessively high. Suggesting that a company can also use a quality over quantity strategy.
Finally lets understand what type of games, and platforms, are the most revenue making. Firstly, lets look at the platforms. Remember that we don’t fully believe this dataset. We think the results might point in the right direction where someone should look further.
Finally let’s plot the genre types as function of sales. They will give us information about what types of games used to be high sellers.
Based on our limited exploration we would want to look further into the development that Nintendo and EA are doing on action games.
Okay, this article hasn’t really given us anything ground breaking or informative. Note that it’s a part of a Udacity Nanodegree in Data Science and should be approached like that. Besides it’s the first article and I am getting used to the process and structure.
The code to generate the figures is available on my github page and open for all who think it might help. There is also more technical stuff there.
Thank you for your time and have a great day.
6The dataset is available at Kaggle:
The relevant code is on my git repo
- git@github.com:CuDeA0/udnanoDSC01.git
This post is done as a part of the Udacity Nanodegree series, part 1 introduction. For information about the course go to Udacity
Any other questions feel free to contact me via
- cudea0@gmail.com