Skip to content
/ dodgers Public

An automated dashboard tracking LA Dodgers performance, continuously updated with game-by-game data from 1958 to today. Feeds https://dodgersdata.bot.

License

Notifications You must be signed in to change notification settings

stiles/dodgers

Repository files navigation

LA Dodgers team tracker

This repository feeds Dodgers Data Bot, a statistical dashboard about the LA Dodgers' performance.

The code executes an automated workflow to fetch, process and store the team's current standings along with historical game-by-game records dating back to 1958. It also collects batting and pitching data, among other statistics, for the same period.

These records are processed and used to bake out the site using the Jekyll static site generator, in concert with Github Pages, and D3.js for charts.

The data is sourced from the heroes at Baseball Reference and Baseball Savant and consolidated into unified datasets for analysis and visualization purposes only. The resulting site is a non-commercial fan project.

Automated tweets

In addition to the data processing scripts, the repository contains scripts that generate and post daily updates to an account on Twitter, @DodgersDataBot.

  • Daily summaries: The scripts/23_post_daily_summaries.py script fetches the latest team summary data and posts tweets about the team's overall performance, batting and pitching statistics. This is automated by the .github/workflows/post_summaries.yml workflow, which runs at different times throughout the day to provide timely updates.
  • Lineup and pitching matchup: The scripts/17_fetch_lineup.py script fetches the daily starting lineup and tweets the pitching matchup once it's announced. This is automated by the .github/workflows/tweet_lineup.yml workflow.
  • News roundup: The scripts/24_fetch_news.py script fetches the top Dodgers-related headlines from the LA Times, Dodgers Nation and MLB.com. It then formats these into a single tweet. This is automated by the .github/workflows/post_news.yml workflow, which runs every day at 1 p.m. PT.

How it works

The repository includes numerous Python scripts that perform the following daily operations for team standings, pitching and batting, by season, including:

Scripts:

  • League standings (reference for rankings): scripts/00_fetch_league_standings.py
  • Update Savant boxscores archive (discovers new games, fetches only new finals): scripts/02_update_boxscores_archive.py
  • League ranks (scraped): scripts/03_scrape_league_ranks.py
  • Latest and historical standings: scripts/04_fetch_process_standings.py
  • Team batting (figures and league ranks): scripts/05_fetch_process_batting.py
  • Team pitching (figures and league ranks): scripts/06_fetch_process_pitching.py
  • Dashboard summary statistics: scripts/07_create_toplines_summary.py
  • Team post-season history: scripts/08_fetch_process_season_outcomes.py
  • Run differential for current season (from Savant boxscores): scripts/09_build_wins_losses_from_boxscores.py
  • Past/present team batting performance: scripts/10_fetch_process_historic_batting_gamelogs.py
  • Team attendance (all teams): scripts/11_fetch_process_attendance.py
  • Past/present team pitching performance: scripts/12_fetch_process_historic_pitching_gamelogs.py
  • Team schedule: scripts/13_fetch_process_schedule.py
  • MLB batting (league-level tables): scripts/14_fetch_process_batting_mlb.py
  • xwOBA rolling windows (current season): scripts/15_fetch_xwoba.py
  • Shohei Ohtani season data: scripts/16_fetch_shohei.py
  • Roster: scripts/19_fetch_roster.py
  • Game pitch-by-pitch: scripts/20_fetch_game_pitches.py
  • Pitch summaries: scripts/21_summarize_pitch_data.py

Separate tweet/automation scripts are documented in the sections below (lineups, daily summaries, news, etc.).

What they do:

  1. Fetch current season, batting and pitching data: Download the current season's game-by-game standings for the LA Dodgers from Baseball Reference. The latest season's batting statitics for each player also fetched, as are the latest season's pitching statistics for each pitcher and the team as a whole. A to-date season summary with standings information and major batting statistics is also created.
  2. Process data: Cleans and formats the fetched standings and batting data for consistency with the historical dataset.
  3. Concatenate with historic data: Merges the current season's data for batting and standings with pre-existing datasets containing records for the 1958 to 2023 seasons.
  4. Save and export data: Outputs the combined datasets in CSV, JSON and Parquet formats.
  5. Upload to AWS S3: Uploads the files to an AWS S3 bucket for use and archiving.

GitHub Actions workflow

The repository uses GitHub Actions to automate the execution of the scripts each day, ensuring the datasets remains up-to-date throughout the baseball season. The key workflows include:

  • fetch.yml: This is the main data pipeline, running multiple times a day during the season. It executes all the Python scripts responsible for fetching, processing, and saving the core team and player statistics needed to build the site.
  • build_site.yml: Fetches and processes all core team data (standings, batting, pitching, etc.) and rebuilds the site. Runs daily.
  • post_summaries.yml: Posts statistical summaries to Twitter at 8am, 10am, and 12pm PT.
  • tweet_lineup.yml: Checks hourly for the day's lineup and posts the pitching matchup to Twitter once available.
  • post_news.yml: Fetches and posts a news roundup to Twitter at 1pm PT.

Configuration and usage

To utilize this repository for your own tracking or analysis on the Dodgers or another team, follow these steps:

  1. Fork the repository: Create a copy of this repository under your own GitHub account.
  2. Configure secrets: Add the following secrets to your repository settings for secure AWS S3 uploads (optional):
    • AWS_ACCESS_KEY_ID: Your AWS Access Key ID.
    • AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key.
  3. Adjust the scripts (Optional): Modify the Python scripts as necessary to fit your specific team, data processing or analysis needs.
  4. Monitor actions: Check the "Actions" tab in your GitHub repository to see the workflow executions and ensure data is being updated as expected.

Data storage and access

The processed datasets — which aren't all documented below yet — are uploaded to an AWS S3 bucket.

Standings

Latest season summary

Data structure: Each row represents a statistic for the latest point in the season

Stat Value Category
wins 15 standings
losses 11 standings
record 15-11 standings
win_pct 57% standings
win_pct_decade_thispoint 57% standings
runs 139 standings
runs_against 112 standings
run_differential 27 standings
home_runs 30 batting
home_runs_game 1.15 batting
home_runs_game_last 1.54 batting
home_runs_game_decade 1.36 batting
stolen_bases 16 batting
stolen_bases_game 0.62 batting
stolen_bases_decade_game 0.49 batting
batting_average .268 batting
batting_average_decade .253 batting
summary The Dodgers have played 26 games this season compiling a 15-11 record — a winning percentage of 57%. The team's last game was an 11-2 away win to the WSN in front of 26,298 fans. The team has won 5 of its last 10 games. standings

Game-by-game standings, 1958 to present (10,400+ rows):

Data structure: Each row represents a game in a specific season

column_name column_type column_description
gm int64 Game number of season
game_date datetime64[ns] Game date (%Y-%m-%d)
home_away object Game location ("home" vs. "away")
opp object Three-digit opponent abbreviation
result object Dodgers result ("W" vs. "L")
r int64 Dodgers runs scored
ra int64 Runs allowed by Dodgers
record object Dodgers season record after game
wins int64 Dodgers wins after game
losses int64 Dodgers losses after game
win_pct float64 Dodgers season record after game
rank object Rank in division*
gb float64 Games back in division*
time object Game length
time_minutes int64 Game length, in minutes
day_night object Start time: "D" vs. "N"
attendance int64 Home team attendance
year object Season year

* Before divisional reorganization in the National League in 1969, these figures represented league standings.

Batting

Season-by-season batting statistics, by player, 1958 to present:

Data structure: Each row represents a player in a specific season

column_name column_type column_description
rk object Rank order at output
pos object Position
name object Player name
age object Player age on June 30
g int64 Game appearances
pa int64 Plate appearances*
ab int64 At-bats*
r int64 Runs scored
h int64 Hits
2b int64 Doubles
3b int64 Triples
hr int64 Home runs
rbi int64 Runs batted in
sb int64 Stolen bases
cs int64 Caught stealing
bb int64 Walks
so int64 Strikeouts
ba float64 Batting average
obp float64 On-base percentage
slg float64 Slugging percentage
ops float64 OPB + SLG
ops_plus float64 OPS adjusted to player's home park
tb int64 Total bases
gdp int64 Double plays grounded into
hbp int64 Hit by pitch
sh int64 Sacrifice hits
sf int64 Sacrifice flies
ibb int64 Intentional walks
season object Season
bats object Player's batting side (right, left, unknown)

* An at-bat is when a player reaches base via a fielder's choice, hit or an error — not including catcher's interference — or when a batter is put out on a non-sacrifice. A plate appearance refers to each completed turn batting, regardless of the result.

Other current season player batting statistics:

Season-by-season batting at the team level, 1958 to present:

  • How the team ranks or ranked in the league by season

Data structure coming soon

  • Team aggregates by season for major batting stats: hits, homers, strikeouts, etc.

Data structure coming soon

xwOBA (current season)

Back end

  • scripts/15_fetch_xwoba.py fetches rolling 100 plate appearance xwOBA series per batter from Baseball Savant
  • Filters to a maintained allowlist of regular batters (ALLOWED_BATTERS) and normalizes names to match roster output
  • Emits player names as "First Last"
  • Writes outputs and uploads to S3
    • Current timeseries per allowed batter
    • League average xwOBA snapshot

Columns (primary)

column_name description
rn Rolling window rank from most recent to oldest
rn_fwd Same as rn preserved for plotting
xwoba Expected wOBA for the rolling window
player_name Batter name in "First Last"
player_id Savant player id
max_game_date Last game date in the window (Pacific time)
league_avg_xwoba MLB average xwOBA used for comparison

Front end

  • assets/js/dashboard.js reads dodgers_xwoba_current.json and renders a grid of small multiples on index.markdown
  • Each panel plots xwoba over the last up to 100 plate appearances (x axis from 100 → 1)
  • The title shows an up or down arrow colored against MLB average, and a dashed reference line marks the average; the label includes a halo for legibility

Pitching

Shohei Ohtani's pitches (current season):

Data structure: Each row represents a single pitch thrown by Shohei Ohtani

column_name column_description
x Horizontal location of the pitch (feet)
z Vertical location of the pitch (feet)
vel Pitch velocity (mph)
pitch_type_abbr Two-letter pitch type abbreviation (e.g., FF, ST)
gd Game date and timestamp
pid Unique pitch identifier

Shohei Ohtani's pitch mix (current season):

Data structure: Each row represents a pitch type in his arsenal

column_name column_description
pitchType Two-letter abbreviation
name Full name of the pitch type
percent Usage percentage
count Total number of pitches thrown

Current season pitching:

  • Team aggregates for major pitching stats: runs, ERA, etc.

  • Team's league ranking for major pitching stats: runs, ERA, etc.

Data structure coming soon


Notes

This project, which started as a few scrapers, has grown into a detailed project and outgrown its original documentation. More to come soon. If you have questions in the meantime, please let me know.

Contributions

Contributions, suggestions and enhancements are welcome! Please open an issue or submit a pull request if you have suggestions for improvement.

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.

About

An automated dashboard tracking LA Dodgers performance, continuously updated with game-by-game data from 1958 to today. Feeds https://dodgersdata.bot.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published