"VintageFIPS: Harmonizing Historical County‑Level FIPS Codes."
A FIPS code crosswalk helper class. I attached update specification for the SEER data and the MCD data (AK and HI are not included).
VintageFIPS provides a programmatic way to:
- Load a pandas DataFrame containing historical FIPS codes and associated data.
- Define county‐level operations (split, merge, create, rename) via simple mapping specifications.
- Apply these operations to update FIPS codes and, optionally, redistribute numeric columns by specified weights.
This is particularly useful when working with longitudinal datasets where county boundaries or codes change over time.
VintageFIPS(
vintage_df: pd.DataFrame,
vintage_fips: str,
fips_new_name: str = "fips2020"
)- vintage_df: Your input DataFrame, including a column of historical FIPS codes and any numeric columns to redistribute.
- vintage_fips: The name of the column in
vintage_dfcontaining the original FIPS codes (strings of length 5). - fips_new_name: The name of the new column to create for updated FIPS codes (default:
"fips2020").
update_data(
fips_specs: List[Dict[str, Any]],
apply_to_data: bool = False
) -> pd.DataFrame-
fips_specs: A list of mapping dictionaries. Each spec must include:
mode: one of"split","merge","create", or"rename".span: a dict mapping a span variable (e.g.{ 'year': ['1980', '1981', ...] }) to the list of vintage values on which to apply this spec; use an empty list for all periods.from: a list of dicts{ 'fips': str, 'name': str, 'wgt': float }describing source codes and optional weights.to: a list of dicts{ 'fips': str, 'name': str, 'wgt': float }describing target codes and weights.
-
apply_to_data: If
True, numeric columns invintage_dfare multiplied by normalized weights during splits/merges; ifFalse, FIPS codes are updated without modifying values.
Returns a new DataFrame with:
- A column named
fips_new_namecontaining updated FIPS codes. - If
apply_to_data=True, numeric columns redistributed per spec.
Each element of the fips_specs list is a dictionary that instructs how to transform one or more vintage FIPS codes. The keys are:
-
mode (
str): Operation to perform. One of:"split": One source code split into multiple target codes, redistributing values by weights."merge": Multiple source codes combined into a single target code, summing weights."create": Introduce a new code (no source) with full weight."rename": Change code and/or name without redistributing numeric data.
-
span (
Dict[str, List[str]]): Restricts when to apply this spec.- Keys are vintage variables (e.g.
"year","vintage"). - Values are lists of vintage values (e.g. years or codes). Use
[]for all periods.
- Keys are vintage variables (e.g.
-
from (
List[Dict[str, Any]]): Source codes. Each dict contains:fips(str): Original FIPS code.name(str): Associated county name.wgt(float): Weight representing share of the source’s data (normalized across all sources for merge, or set to 1 for rename/create).
-
to (
List[Dict[str, Any]]): Target codes, same structure asfrom:fips(str): Destination FIPS code.name(str): Destination county name.wgt(float): Weight for splitting; normalized across all targets (sum to original source’s total weight).
{
'mode': 'split',
'span': {'year': ['1980', '1981']},
'from': [
{'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}
],
'to': [
{'fips': '04027', 'name': 'Yuma County', 'wgt': 127975},
{'fips': '04012', 'name': 'La Paz County', 'wgt': 15894}
]
}This spec applies only for vintage years 1980 and 1981, splitting code 04910 into two codes with weights proportional to the values provided.
import pandas as pd
from vintage_fips import VintageFIPS
# 1) Load SEER population data
df_seer = pd.read_parquet(
'data/interim/population_counties_seer_us_1969_2020.parquet'
)
# 2) (Optional) Filter out non‑county or unwanted states
# df_seer = df_seer[~df_seer['fips'].str[:2].isin(['00','99','02','15'])]
# 3) Instantiate the updater
updater = VintageFIPS(
vintage_df=df_seer,
vintage_fips='fips',
fips_new_name='fips2020'
)
# 4) Define SEER-specific mapping specs (see seer_population.py for full list)
specs = [
{
'mode': 'split',
'span': {'year': [str(y) for y in range(1969, 1994)]},
'from': [{'fips': '04910', 'name': 'La Paz & Yuma', 'wgt': 1}],
'to': [
{'fips': '04027', 'name': 'Yuma County', 'wgt': 127975},
{'fips': '04012', 'name': 'La Paz County', 'wgt': 15894}
]
},
# ... additional specs ...
]
# 5) Apply updates
df_seer_updated = updater.update_data(
fips_specs=specs,
apply_to_data=True
)
# 6) Aggregate population by new FIPS and year
result = (
df_seer_updated
.groupby(['fips2020', 'year'], observed=True)
['population']
.sum()
.reset_index()
)
print(result.head())import pandas as pd
from vintage_fips import VintageFIPS
# 1) Load MCD data (pickled DataFrame)
df_mcd = pd.read_pickle(
'data/interim/mcd_vintage_fips_year_month_death.pkl'
)
# 2) Pre‑filter out territories and invalid codes
# df_mcd = df_mcd[~df_mcd['fips'].str[:2].isin(['02','15'])]
# df_mcd = df_mcd[~df_mcd['fips'].str[2:].isin(['000','999'])]
# 3) Instantiate the updater
updater = VintageFIPS(
vintage_df=df_mcd,
vintage_fips='fips',
fips_new_name='fips2020'
)
# 4) Load or define MCD-specific specs in mcd_fips_vintage.py
specs = [
{
'mode': 'rename',
'span': {'vintage': [str(y) for y in range(1982, 2003)]},
'from': [{'fips': '12025', 'name': 'Miami', 'wgt': 1}],
'to': [{'fips': '12086', 'name': 'Miami-Dade', 'wgt': 1}]
},
# ... additional specs ...
]
# 5) Apply updates and redistribute death counts
df_mcd_updated = updater.update_data(
fips_specs=specs,
apply_to_data=True
)
# 6) Post‑process: drop unneeded columns and aggregate
summary = (
df_mcd_updated
.drop(columns=['date', 'fips'], errors='ignore')
.groupby([c for c in df_mcd_updated.columns if c not in ['deaths']],
observed=True)
.sum()
.reset_index()
)
print(summary.head())Lee, Jaeseok Sean. 2025. “VintageFIPS: Harmonizing Historical County‑Level FIPS Codes.”. https://github.com/SeanJSLee/VintageFIPS.
MIT Licesne
-
NIH SEER US population data: https://seer.cancer.gov/popdata
-
CDC MCD US death data: https://wonder.cdc.gov/mcd.html
or NBER: https://www.nber.org/research/data/mortality-data-vital-statistics-nchs-multiple-cause-death-data
-
FIPS County Code Changes (vintage mappings):
CENSUS: https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html
David Dorn: https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf
-
SEER Rural‑Urban Continuum Codes: https://seer.cancer.gov/seerstat/variables/countyattribs/ruralurban.html