This repository contains a Scrapy spider for Steam digital game plattaform reviews.
| Name | |
|---|---|
| Carlos Humberto Carreño Díaz | cahucadi@uoc.edu |
| David Barrera Montesdeoca | dbarreram@uoc.edu |
First, you will need a Python 3.x+ virtualenv.
After cloning the repository with
git clone git@github.com:cahucadi/GamesScraping.gitInstall Python requirements with:
pip install -r requirements.txtFirst you need to locate game_scraping/game_url.txt file to define the url you want to crawl using Steam Community page.
This file must have the game url with and id (APP_ID) and language (LANGUAGE) of the specific review (english, spanish, latam, etc), using the following format:
https://steamcommunity.com/app/APP_ID/reviews/?browsefilter=mostrecent&snr=1_5_100010_&filterLanguage=LANGUAGEYou can initiate the crawl using:
scrapy crawl review_spider -o reviews.jsonNext you can generate a .csv file (semicolon separated) using:
python main.pyBeware, it can take several hours to proccess
Most important files:
main.py: used for .csv generation once you get the reviews.json filegame_scraping/classes.py: This file contains project's main classes for scrapy item structure from scrapy.Item classfunctions.py: This file contains project's main helpers functions (format, parsing, clean)functions.py: This file contains scrapy default configuration
game_scraping/review_spider.py: This file contains project's main spider from scrapy.Spider classutil_functions.py: This file contains spider needed functions
The dataset is available at Zenodo with DOI:
10.5281/zenodo.4244834
And published at:
http://doi.org/10.5281/zenodo.4244834