Scraping Steam User Reviews using Steam Web API and Python
Data Science

Scraping Steam User Reviews using Steam Web API and Python

Introduction

Gaming is an industry that has been gaining a lot of traction in the recent years. Here and there you might be surprised to see or hear about shows, trends and news related to gaming. Even recently, Microsoft bought Activision-Blizzard, one of the biggest gaming companies, for almost 70 Billion dollars. Gaming has become an important part of technology and people who are interested in tech ,in my opinion, should be aware of its impact, influence and potential. That is why I created this blog to show some ways on how we can explore this rapidly growing world.

What is Steam?

Steam is a service that allows the distribution and buying of videogames and is probably the biggest platform for it. Steam also has a web API that allows us to get data on its applications, mainly games, and users. This will be the focus of the blog and how to do it.

Steam Data Gathering

There are multiple sources that shows how to get data through steam api but for this blog I used the following:

Before we explore the API, it is important that your first register through steam to get an API key  in this site:

Certain methods require a key for it to work. This is why getting a key is important to have more access in the API.

Scope

This blog will only cover methods that allows to learn more about user reviews, the application, and the users behind them. Specifically, these are the methods and their documentation:

If you're having a hard time understanding the documentation, the site, https://steamapi.xpaw.me/#, can show how to execute certain methods in the given API. I'll also spend a section for each method.

Getting app user reviews

User reviews is a way for us to gauge how people feel about a certain product or service and is usually the first thing users see. This makes it significant for both user and distributors as it can decide the future of the product. Getting this data is very crucial for a business' success.

Using the steam api method, get reviews, we can do just that for most video games in that platform. This is done by simply visiting 

https://store.steampowered.com/appreviews/?<appid>json=1

where appid, is the unique identifier assigned to each application in the Steam platform. Getting an applications appid can easily be done by searching for the application in

https://steamdb.info/

Searching for Elden Ring on it will give me a bunch of data but on the top you'll see something akin to the picture below:

Changing the <appid> to Elden Ring's appid will then result to:

https://store.steampowered.com/appreviews/1245620?json=1

Clicking on the link will show you a bunch of reviews for Elden Ring. Looking closely, however, you can see that the reviews are not as much as it is in the Steam's review page (300,000+). This is because the API only shows 20 reviews at a time and you must pass the 'cursor' as a parameter to get the next set of reviews. Let's show this through python.

 

This is a helper function that we'll be using throughout the blog to get requests from the api. This function calls itself recursively until we get a response from the API. sleep_time makes it so that the API thinks of us as a human rather than a computer by introducing a degree of randomness. If the API detects that we've been using it too frequently, like a bot, it will potentially block is from being able to access it for a time.

Also make sure to provide your API key as we're going to be using it to access certain API. Here, I declared it as MY_API_KEY. You can also store it somewhere and load it to a variable for more security.

This function just helps us format the reviews we receive and provides a column for each author data. 


Here is the code for us to get all the reviews for the game "Elden Ring". Here I set certain parameters and assigned them to the dictionary params. You can check the documentation to understand this or explore other parameters here: https://partner.steamgames.com/doc/store/getreviews. Basically, with these params, I'm getting all the reviews without filter (filter, review_type, purchase_type = all) from the start of the game until the current day (day_range).

cursor dictates the next set of reviews I can explore by setting it as a parameter. Every set of reviews has a single, unique, cursor value. I keep track of all the cursors in the explored_cursors list and once it repeats, meaning I've explore all reviews, I stop the process.

The 'num_per_page' seems to only work once you have a cursor value. The cursor = False is just a placeholder value.

Getting Owned Games by Players

Now we have data of the game's, Elden Ring's, review. We can, however, expand on this data by getting an idea of what type of user makes them. One of the data returned by appreviews method is 'steamid' which is a unique identifier for each steam user. We can explore other methods, like GetOwnedGames, to gain more idea on what game each user prefers and provide more justification for their review.

This data is not publicly available but users can choose to make it so and only the games of users who chose to do such will be shown in this method.

Before moving forward, however, I only took the steamid of those who gave a negative review (voted_up = false). Elden Ring has a lot of reviews and by doing this we can lessen the search space.

As we're using a different method in the API this time, the parameters would also change with it. In understanding it better, make sure to check the documentation for it here: https://partner.steamgames.com/doc/webapi/IPlayerService#GetOwnedGames. The parameters essentially just makes it so that I get all the games in the users account if he decided to show it.

The try except clause is to skip those users that has their games private and if not, add their games to the games_list and all its features.

Only 1044 users out of 19458 users, about 5.36%, resulted in a response when I tried this on the negative reviewers. Even so, it was enough to give me 530,000+ data entries on 23,000 unique games which is still enough to provide an insight of some sort.

Get Player Summary

This is just a method to help us get more data on the user. More data is always better and provides more justification for user reviews. There are other methods for this but this provides simple basic data and it might be better to not overcomplicate things so I chose this. You can check more about it here: https://partner.steamgames.com/doc/webapi/ISteamUser#GetPlayerSummaries

The good thing about this method is that you can get data of 100 users at a time compared to GetOwnedGames which is just one at a time. We'll take full advantage of that in the following code:

The parameter, "steamids", can accept at most 100 steamids at a time. It needs to be in a comma separated format and that's what I've done here. The min and max variables just makes sure that I get 100 or less data at a time starting from the last 100 data. 

Get App Details

If you remember earlier, we got all the games from some of the user who gave a negative review. We can explore those games further using this method. In checking this documentation you can go to: https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI#appdetails

Before we begin, I first made sure to get only the unique appids on the games owned by the users as they tend to duplicate across all of them. 

And here is the code to get the appdetails:

Let's start with the params. The only parameter that stands out here really is filters. Filters allows dictating what values should the method return by specifying it in a comma separated value. Here I specified that I wanted the basic values, such as name, description etc, and then categories, genres, platforms and its release date.

Even though the documentation says it can take 100 appids at a time, it doesn't seem to be able to work at times as verified by multiple users. This also happened at the time of writing this blog. This made me have to get the appdetails one at a time. You can however try to do it 100 at a time by mimicking the code at GetPlayerSummary. 

Keep in my mind that certain app, for some reason, does not have their details available through the API. That's why we had the try except clause at the start. I also limited the application to games only but if you want to include others you can simply remove that part of the code. Certain games also does not have values on certain features which makes me question their integrity so I just skipped those too. You can decide for yourself how you want to go about this. Finally, I adjusted categories and genres to be shown in a list per game. 

Conclusion

And there you have it! You have more than enough data to investigate user reviews on a game. With the proper techniques you can now answer questions such as, 'What makes this game good?', "How do we make this game better?", "Who would want this game?", etc.  There are a lot of other ways you can go about this so make sure to explore the data. You can even dig through the API to find even more angles to this problem.

Hopefully you learned found something useful by reading this and if there are any feedbacks, I'd love to hear and learn from them! Thank you!

  • Hurly Zade Christian Cabalan
  • Mar, 27 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.