VintageFIPS

"VintageFIPS: Harmonizing Historical County‑Level FIPS Codes."

A FIPS code crosswalk helper class. I attached update specification for the SEER data and the MCD data (AK and HI are not included).

Overview

VintageFIPS provides a programmatic way to:

Load a pandas DataFrame containing historical FIPS codes and associated data.
Define county‐level operations (split, merge, create, rename) via simple mapping specifications.
Apply these operations to update FIPS codes and, optionally, redistribute numeric columns by specified weights.

This is particularly useful when working with longitudinal datasets where county boundaries or codes change over time.

API Reference

Class: `VintageFIPS`

VintageFIPS(
    vintage_df: pd.DataFrame,
    vintage_fips: str,
    fips_new_name: str = "fips2020"
)

vintage_df: Your input DataFrame, including a column of historical FIPS codes and any numeric columns to redistribute.
vintage_fips: The name of the column in vintage_df containing the original FIPS codes (strings of length 5).
fips_new_name: The name of the new column to create for updated FIPS codes (default: "fips2020").

Method: `update_data`

update_data(
    fips_specs: List[Dict[str, Any]],
    apply_to_data: bool = False
) -> pd.DataFrame

fips_specs: A list of mapping dictionaries. Each spec must include:
- mode: one of "split", "merge", "create", or "rename".
- span: a dict mapping a span variable (e.g. { 'year': ['1980', '1981', ...] }) to the list of vintage values on which to apply this spec; use an empty list for all periods.
- from: a list of dicts { 'fips': str, 'name': str, 'wgt': float } describing source codes and optional weights.
- to: a list of dicts { 'fips': str, 'name': str, 'wgt': float } describing target codes and weights.
apply_to_data: If True, numeric columns in vintage_df are multiplied by normalized weights during splits/merges; if False, FIPS codes are updated without modifying values.

Returns a new DataFrame with:

A column named fips_new_name containing updated FIPS codes.
If apply_to_data=True, numeric columns redistributed per spec.

Update Specification

Each element of the fips_specs list is a dictionary that instructs how to transform one or more vintage FIPS codes. The keys are:

mode (str): Operation to perform. One of:
- "split": One source code split into multiple target codes, redistributing values by weights.
- "merge": Multiple source codes combined into a single target code, summing weights.
- "create": Introduce a new code (no source) with full weight.
- "rename": Change code and/or name without redistributing numeric data.
span (Dict[str, List[str]]): Restricts when to apply this spec.
- Keys are vintage variables (e.g. "year", "vintage").
- Values are lists of vintage values (e.g. years or codes). Use [] for all periods.
from (List[Dict[str, Any]]): Source codes. Each dict contains:
- fips (str): Original FIPS code.
- name (str): Associated county name.
- wgt (float): Weight representing share of the source’s data (normalized across all sources for merge, or set to 1 for rename/create).
to (List[Dict[str, Any]]): Target codes, same structure as from:
- fips (str): Destination FIPS code.
- name (str): Destination county name.
- wgt (float): Weight for splitting; normalized across all targets (sum to original source’s total weight).

Example Specification

{
    'mode': 'split',
    'span': {'year': ['1980', '1981']},
    'from': [
        {'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}
    ],
    'to': [
        {'fips': '04027', 'name': 'Yuma County',   'wgt': 127975},
        {'fips': '04012', 'name': 'La Paz County', 'wgt':  15894}
    ]
}

This spec applies only for vintage years 1980 and 1981, splitting code 04910 into two codes with weights proportional to the values provided.

Usage Examples

1. SEER Population Data (`seer_population.py`)

import pandas as pd
from vintage_fips import VintageFIPS

# 1) Load SEER population data
df_seer = pd.read_parquet(
    'data/interim/population_counties_seer_us_1969_2020.parquet'
)

# 2) (Optional) Filter out non‑county or unwanted states
# df_seer = df_seer[~df_seer['fips'].str[:2].isin(['00','99','02','15'])]

# 3) Instantiate the updater
updater = VintageFIPS(
    vintage_df=df_seer,
    vintage_fips='fips',
    fips_new_name='fips2020'
)

# 4) Define SEER-specific mapping specs (see seer_population.py for full list)
specs = [
    {
        'mode': 'split',
        'span': {'year': [str(y) for y in range(1969, 1994)]},
        'from': [{'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}],
        'to': [
            {'fips': '04027', 'name': 'Yuma County',   'wgt': 127975},
            {'fips': '04012', 'name': 'La Paz County', 'wgt':  15894}
        ]
    },
    # ... additional specs ...
]

# 5) Apply updates
df_seer_updated = updater.update_data(
    fips_specs=specs,
    apply_to_data=True
)

# 6) Aggregate population by new FIPS and year
result = (
    df_seer_updated
    .groupby(['fips2020', 'year'], observed=True)
    ['population']
    .sum()
    .reset_index()
)

print(result.head())

2. Multiple Cause-of-Death Vintage FIPS (`mcd_fips_vintage.py`)

import pandas as pd
from vintage_fips import VintageFIPS

# 1) Load MCD data (pickled DataFrame)
df_mcd = pd.read_pickle(
    'data/interim/mcd_vintage_fips_year_month_death.pkl'
)

# 2) Pre‑filter out territories and invalid codes
# df_mcd = df_mcd[~df_mcd['fips'].str[:2].isin(['02','15'])]
# df_mcd = df_mcd[~df_mcd['fips'].str[2:].isin(['000','999'])]

# 3) Instantiate the updater
updater = VintageFIPS(
    vintage_df=df_mcd,
    vintage_fips='fips',
    fips_new_name='fips2020'
)

# 4) Load or define MCD-specific specs in mcd_fips_vintage.py
specs = [
    {
        'mode': 'rename',
        'span': {'vintage': [str(y) for y in range(1982, 2003)]},
        'from': [{'fips': '12025', 'name': 'Miami', 'wgt': 1}],
        'to':   [{'fips': '12086', 'name': 'Miami-Dade', 'wgt': 1}]
    },
    # ... additional specs ...
]

# 5) Apply updates and redistribute death counts
df_mcd_updated = updater.update_data(
    fips_specs=specs,
    apply_to_data=True
)

# 6) Post‑process: drop unneeded columns and aggregate
summary = (
    df_mcd_updated
    .drop(columns=['date', 'fips'], errors='ignore')
    .groupby([c for c in df_mcd_updated.columns if c not in ['deaths']],
             observed=True)
    .sum()
    .reset_index()
)
print(summary.head())

Citation Suggestion

Lee, Jaeseok Sean. 2025. “VintageFIPS: Harmonizing Historical County‑Level FIPS Codes.”. https://github.com/SeanJSLee/VintageFIPS.

License

MIT Licesne

References

NIH SEER US population data: https://seer.cancer.gov/popdata
CDC MCD US death data: https://wonder.cdc.gov/mcd.html

or NBER: https://www.nber.org/research/data/mortality-data-vital-statistics-nchs-multiple-cause-death-data
FIPS County Code Changes (vintage mappings):

CENSUS: https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html

David Dorn: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf
SEER Rural‑Urban Continuum Codes: https://seer.cancer.gov/seerstat/variables/countyattribs/ruralurban.html

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
example		example
LICENSE		LICENSE
README.MD		README.MD
vintage_fips.py		vintage_fips.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VintageFIPS

Overview

API Reference

Class: `VintageFIPS`

Method: `update_data`

Update Specification

Example Specification

Usage Examples

1. SEER Population Data (`seer_population.py`)

2. Multiple Cause-of-Death Vintage FIPS (`mcd_fips_vintage.py`)

Citation Suggestion

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VintageFIPS

Overview

API Reference

Class: VintageFIPS

Method: update_data

Update Specification

Example Specification

Usage Examples

1. SEER Population Data (seer_population.py)

2. Multiple Cause-of-Death Vintage FIPS (mcd_fips_vintage.py)

Citation Suggestion

License

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Class: `VintageFIPS`

Method: `update_data`

1. SEER Population Data (`seer_population.py`)

2. Multiple Cause-of-Death Vintage FIPS (`mcd_fips_vintage.py`)

Packages