Skip to content

luxdotdev/dataset

Repository files navigation

Overwatch Match Data Dataset

This dataset contains anonymized Overwatch match data including game events, player statistics, and match outcomes.

Download

All files are available in the Google Drive folder: Parsertime Anonymized 12-1

Dataset Contents

This Google Drive folder contains:

  • SQL dump file (ptime-pscale-prod-anonymized-2025-12-01.sql) - Complete database with all tables and data
  • CSV exports - Individual tables exported as CSV files for easy analysis in Excel, Python, R, etc.
  • 23 event tables containing in-game events (kills, hero swaps, ultimate usage, etc.)
  • Anonymized player and team identifiers for privacy protection
  • Match metadata including map types, round information, and timestamps

Prerequisites

  • For SQL dump: PostgreSQL 17.5 (recommended) or 16+
  • For CSV files: Any spreadsheet software (Excel, Google Sheets) or programming language (Python, R, etc.)
  • Basic familiarity with SQL (for PostgreSQL option)

Which Format Should I Use?

Use the SQL dump if you:

  • Want to run complex SQL queries with JOINs across tables
  • Need to preserve relationships between tables
  • Are comfortable with PostgreSQL
  • Want the complete relational database structure

Use the CSV files if you:

  • Want quick access without database setup
  • Need to analyze individual tables
  • Prefer working with spreadsheets or data analysis libraries (pandas, R)
  • Want to import into other tools (Tableau, Power BI, etc.)

Quick Start

Option 1: PostgreSQL Database (Full Dataset)

Use this option if you want to run SQL queries and have the complete relational database.

# 1. Create a new database
createdb -h localhost -p 5432 -U your_username overwatch_data

# 2. Restore the SQL dump
PGPASSWORD=your_password psql \
  --no-psqlrc \
  -h localhost \
  -p 5432 \
  -U your_username \
  -d overwatch_data \
  -f ptime-pscale-prod-anonymized-2025-12-01.sql

Note: The --no-psqlrc flag is required to avoid backslash command restrictions during restore.

Option 2: CSV Files (Simplified Analysis)

Use this option if you want to analyze the data without setting up PostgreSQL. Each table is available as a separate CSV file that you can:

  • Open in Excel or Google Sheets
  • Load into Python with pandas: pd.read_csv('Kill_anon.csv')
  • Import into R: read.csv('Kill_anon.csv')
  • Use with any other data analysis tool

CSV files are simpler to work with but don't include the relationships between tables.

Database Schema

Event Types

The dataset includes the following event types:

  • Combat Events: Kill, DefensiveAssist, OffensiveAssist
  • Hero Events: HeroSpawn, HeroSwap
  • Ultimate Events: UltimateCharged, UltimateStart, UltimateEnd
  • Objective Events: ObjectiveCaptured, ObjectiveUpdated, PayloadProgress, PointProgress
  • Match Events: MatchStart, MatchEnd, RoundStart, RoundEnd, SetupComplete
  • Hero-Specific Events: DvaRemech, RemechCharged, MercyRez, EchoDuplicateStart, EchoDuplicateEnd
  • Statistics: PlayerStat (comprehensive player performance metrics)

Key Tables

  • MatchStart_anon: Match initialization data with map and team information
  • MatchEnd: Final match scores and outcomes
  • Kill_anon: Combat elimination events with attacker/victim details
  • PlayerStat_anon: Detailed player statistics per round
  • HeroSwap_anon: Hero selection changes during matches
  • Ultimate*_anon: Ultimate ability tracking
  • And more...

Example Queries

Get all kills in a specific match

PostgreSQL:

SELECT
  match_time,
  attacker_hero,
  attacker_name,
  victim_hero,
  victim_name,
  event_ability
FROM "Kill_anon"
WHERE "scrimId" = 1234
ORDER BY match_time;

Python (pandas):

import pandas as pd

kills = pd.read_csv('Kill_anon.csv')
match_kills = kills[kills['scrimId'] == 1234].sort_values('match_time')
print(match_kills[['match_time', 'attacker_hero', 'victim_hero', 'event_ability']])

Calculate hero pick rates

PostgreSQL:

SELECT
  player_hero,
  COUNT(*) as spawn_count,
  SUM(hero_time_played) as total_time_played
FROM "HeroSpawn_anon"
GROUP BY player_hero
ORDER BY total_time_played DESC;

Python (pandas):

import pandas as pd

hero_spawns = pd.read_csv('HeroSpawn_anon.csv')
pick_rates = hero_spawns.groupby('player_hero').agg({
    'player_hero': 'count',
    'hero_time_played': 'sum'
}).rename(columns={'player_hero': 'spawn_count'})
pick_rates = pick_rates.sort_values('hero_time_played', ascending=False)
print(pick_rates)

Get player statistics summary

PostgreSQL:

SELECT
  player_name,
  player_hero,
  eliminations,
  deaths,
  hero_damage_dealt,
  healing_dealt,
  ultimates_used
FROM "PlayerStat_anon"
WHERE "scrimId" = 1234
ORDER BY eliminations DESC;

Python (pandas):

import pandas as pd

stats = pd.read_csv('PlayerStat_anon.csv')
match_stats = stats[stats['scrimId'] == 1234].sort_values('eliminations', ascending=False)
print(match_stats[['player_name', 'player_hero', 'eliminations', 'deaths', 'hero_damage_dealt']])

Data Privacy

All personally identifiable information has been anonymized:

  • Player names are replaced with hashed identifiers (e.g., P_6af4f2c6)
  • Team names are replaced with hashed identifiers (e.g., T_5568bd9d)
  • Original timestamps and sensitive metadata have been removed

Troubleshooting

Error: "backslash commands are restricted"

Make sure to use the --no-psqlrc flag when restoring:

PGPASSWORD=password psql --no-psqlrc -h localhost -p 5432 -U user -d db -f ptime-pscale-prod-anonymized-2025-12-01.sql

Error: "type EventType does not exist" or "relation does not exist"

The SQL dump includes all necessary schema definitions. If you encounter these errors, ensure you're restoring to a fresh database:

# Drop and recreate the database
dropdb overwatch_data
createdb overwatch_data

# Then restore again
PGPASSWORD=password psql --no-psqlrc -h localhost -p 5432 -U user -d overwatch_data -f ptime-pscale-prod-anonymized-2025-12-01.sql

Connection refused errors

Ensure PostgreSQL is running and accessible:

# Check if PostgreSQL is running
pg_isready -h localhost -p 5432

# Or check the service status
# On macOS: brew services list
# On Linux: systemctl status postgresql

Port conflicts

If port 5432 is already in use by another PostgreSQL instance, you can either:

  • Stop the other instance, or
  • Run your PostgreSQL on a different port (e.g., 5433)

Working with CSV files

If you're having trouble with the SQL dump, the CSV files provide an easier alternative. Most issues with CSVs involve:

  • Encoding: Files are UTF-8 encoded
  • NULL values: Represented as empty fields or \N
  • Delimiters: Standard comma-separated format

File Formats

  • SQL Dump: PostgreSQL 17.5 plain text format, includes schema and data
  • CSV Files: UTF-8 encoded, comma-delimited, NULL values represented as \N

Dataset Statistics

  • Event tables: 23
  • Total events: ~92,000+
  • Matches: 1,900+
  • PostgreSQL version: 17.5

License

Copyright 2025 lux.dev.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Questions or Issues?

If you encounter any problems with this dataset, please contact the maintainer at lucas@lux.dev.

Acknowledgments

This dataset contains anonymized competitive Overwatch match data collected for research and analysis purposes.

About

An anonymized version of the Parsertime Overwatch 2 dataset.

Resources

License

Stars

Watchers

Forks

Packages

No packages published