Skip to content

SeanJSLee/VintageFIPS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VintageFIPS

"VintageFIPS: Harmonizing Historical County‑Level FIPS Codes."

A FIPS code crosswalk helper class. I attached update specification for the SEER data and the MCD data (AK and HI are not included).


Overview

VintageFIPS provides a programmatic way to:

  • Load a pandas DataFrame containing historical FIPS codes and associated data.
  • Define county‐level operations (split, merge, create, rename) via simple mapping specifications.
  • Apply these operations to update FIPS codes and, optionally, redistribute numeric columns by specified weights.

This is particularly useful when working with longitudinal datasets where county boundaries or codes change over time.


API Reference

Class: VintageFIPS

VintageFIPS(
    vintage_df: pd.DataFrame,
    vintage_fips: str,
    fips_new_name: str = "fips2020"
)
  • vintage_df: Your input DataFrame, including a column of historical FIPS codes and any numeric columns to redistribute.
  • vintage_fips: The name of the column in vintage_df containing the original FIPS codes (strings of length 5).
  • fips_new_name: The name of the new column to create for updated FIPS codes (default: "fips2020").

Method: update_data

update_data(
    fips_specs: List[Dict[str, Any]],
    apply_to_data: bool = False
) -> pd.DataFrame
  • fips_specs: A list of mapping dictionaries. Each spec must include:

    • mode: one of "split", "merge", "create", or "rename".
    • span: a dict mapping a span variable (e.g. { 'year': ['1980', '1981', ...] }) to the list of vintage values on which to apply this spec; use an empty list for all periods.
    • from: a list of dicts { 'fips': str, 'name': str, 'wgt': float } describing source codes and optional weights.
    • to: a list of dicts { 'fips': str, 'name': str, 'wgt': float } describing target codes and weights.
  • apply_to_data: If True, numeric columns in vintage_df are multiplied by normalized weights during splits/merges; if False, FIPS codes are updated without modifying values.

Returns a new DataFrame with:

  • A column named fips_new_name containing updated FIPS codes.
  • If apply_to_data=True, numeric columns redistributed per spec.

Update Specification

Each element of the fips_specs list is a dictionary that instructs how to transform one or more vintage FIPS codes. The keys are:

  • mode (str): Operation to perform. One of:

    • "split": One source code split into multiple target codes, redistributing values by weights.
    • "merge": Multiple source codes combined into a single target code, summing weights.
    • "create": Introduce a new code (no source) with full weight.
    • "rename": Change code and/or name without redistributing numeric data.
  • span (Dict[str, List[str]]): Restricts when to apply this spec.

    • Keys are vintage variables (e.g. "year", "vintage").
    • Values are lists of vintage values (e.g. years or codes). Use [] for all periods.
  • from (List[Dict[str, Any]]): Source codes. Each dict contains:

    • fips (str): Original FIPS code.
    • name (str): Associated county name.
    • wgt (float): Weight representing share of the source’s data (normalized across all sources for merge, or set to 1 for rename/create).
  • to (List[Dict[str, Any]]): Target codes, same structure as from:

    • fips (str): Destination FIPS code.
    • name (str): Destination county name.
    • wgt (float): Weight for splitting; normalized across all targets (sum to original source’s total weight).

Example Specification

{
    'mode': 'split',
    'span': {'year': ['1980', '1981']},
    'from': [
        {'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}
    ],
    'to': [
        {'fips': '04027', 'name': 'Yuma County',   'wgt': 127975},
        {'fips': '04012', 'name': 'La Paz County', 'wgt':  15894}
    ]
}

This spec applies only for vintage years 1980 and 1981, splitting code 04910 into two codes with weights proportional to the values provided.



Usage Examples

1. SEER Population Data (seer_population.py)

import pandas as pd
from vintage_fips import VintageFIPS

# 1) Load SEER population data
df_seer = pd.read_parquet(
    'data/interim/population_counties_seer_us_1969_2020.parquet'
)

# 2) (Optional) Filter out non‑county or unwanted states
# df_seer = df_seer[~df_seer['fips'].str[:2].isin(['00','99','02','15'])]

# 3) Instantiate the updater
updater = VintageFIPS(
    vintage_df=df_seer,
    vintage_fips='fips',
    fips_new_name='fips2020'
)

# 4) Define SEER-specific mapping specs (see seer_population.py for full list)
specs = [
    {
        'mode': 'split',
        'span': {'year': [str(y) for y in range(1969, 1994)]},
        'from': [{'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}],
        'to': [
            {'fips': '04027', 'name': 'Yuma County',   'wgt': 127975},
            {'fips': '04012', 'name': 'La Paz County', 'wgt':  15894}
        ]
    },
    # ... additional specs ...
]

# 5) Apply updates
df_seer_updated = updater.update_data(
    fips_specs=specs,
    apply_to_data=True
)

# 6) Aggregate population by new FIPS and year
result = (
    df_seer_updated
    .groupby(['fips2020', 'year'], observed=True)
    ['population']
    .sum()
    .reset_index()
)

print(result.head())

2. Multiple Cause-of-Death Vintage FIPS (mcd_fips_vintage.py)

import pandas as pd
from vintage_fips import VintageFIPS

# 1) Load MCD data (pickled DataFrame)
df_mcd = pd.read_pickle(
    'data/interim/mcd_vintage_fips_year_month_death.pkl'
)

# 2) Pre‑filter out territories and invalid codes
# df_mcd = df_mcd[~df_mcd['fips'].str[:2].isin(['02','15'])]
# df_mcd = df_mcd[~df_mcd['fips'].str[2:].isin(['000','999'])]

# 3) Instantiate the updater
updater = VintageFIPS(
    vintage_df=df_mcd,
    vintage_fips='fips',
    fips_new_name='fips2020'
)

# 4) Load or define MCD-specific specs in mcd_fips_vintage.py
specs = [
    {
        'mode': 'rename',
        'span': {'vintage': [str(y) for y in range(1982, 2003)]},
        'from': [{'fips': '12025', 'name': 'Miami', 'wgt': 1}],
        'to':   [{'fips': '12086', 'name': 'Miami-Dade', 'wgt': 1}]
    },
    # ... additional specs ...
]

# 5) Apply updates and redistribute death counts
df_mcd_updated = updater.update_data(
    fips_specs=specs,
    apply_to_data=True
)

# 6) Post‑process: drop unneeded columns and aggregate
summary = (
    df_mcd_updated
    .drop(columns=['date', 'fips'], errors='ignore')
    .groupby([c for c in df_mcd_updated.columns if c not in ['deaths']],
             observed=True)
    .sum()
    .reset_index()
)
print(summary.head())

Citation Suggestion

Lee, Jaeseok Sean. 2025. “VintageFIPS: Harmonizing Historical County‑Level FIPS Codes.”. https://github.com/SeanJSLee/VintageFIPS.


License

MIT Licesne


References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages