Here is a project I did involving the collection of Data via an API for the massive video game DotA 2.
The field of statistics is broad. It is nearly impossible to find a single facet of life that can’t have statistical analysis applied to it. In my years of studying statistical analysis, I have also spent a fair amount of time playing a video game known as Defense of the Ancients 2, or DotA 2.
This game features a ton, so I wont go into detail, but the basic premise is a 5 VS 5 battle arena. 5 real people play as one of over 100 unique heroes vs a team of 5 other real people with their respective heroes.
The goal? Destroy the enemies base and their Ancient, a structure that gives them their power.
With so many variables and variations possible, I thought it would be a fun project to do a bit of analysis on my personal data. I have played thousands of games over the years. That is a lot of Data that I can perform analysis on.
This is made possible with OpenDota. Using the API function, I am able to get all my personal data from the beginning of when I started to play.
I start by importing Pandas and requests. The requests package simplifies working with https requests. The resulting files can then be read into pandas DataFrames with the pandas.read_json() function.
The data we requested is in Json form, which is similar to a dictionary. Pandas included function parses through it and makes a tidy DataFrame from it.
import pandas as pd
import requests
r = requests.get('https://api.opendota.com/api/players/96974530/matches')
df = pd.read_json(r.text)
df
So now we have the raw data from every match I have ever played in. I don’t want to dive too deep into what is possible, so why don’t I just compare something like how long a match is when compared to the hero I played. I can do this many ways, but I will be making a boxplot of the duration column with specific hero_id column values.
I will first make a subset of the data frame with just where hero_id = 18 and 90. These ID numbers represent the Heroes Sven and Keeper of the Light (KotL). Then I will plot the duration of both and compare how long the games lasted when I chose one or the other.
subset = df.loc[df['hero_id'].isin([18, 90])]
plot = subset.boxplot(column=['duration'], by=['hero_id'])
plot.set_title('Game Duration Comparison')
plot.set_xlabel('Sven KotL')
plot.set_ylabel('Duration (minutes)')
plt.show()
We can see that there is not much difference between the two, but there is a slight increase in duration when I choose KotL compared to when I choose Sven. But this is just one very simple analysis you can do. OpenDota allows for deep analysis of many variables, and as a statistician who loves the game I want to be able to better understand how I play and improve.
There are a ton of other things you can perform data analysis on, and it helps when you do it with something you love! I would suggest you find something that means a lot to you that you can perform some form of statistical analysis on to better understand not only what you love, but how statistics is used all around us to make connections!
If you want to learn more about the game, go to the DotA 2 website