Author: Mitchell Valdes-Bobes Institution: University of Wisconsin-Madison Status: Working Paper Data Coverage: 2013-2025
This project investigates why remote work adoption increased dramatically during COVID-19 and then stabilized at levels significantly higher than pre-pandemic, rather than returning to baseline. Using a structural search-and-matching model with worker heterogeneity in remote work preferences and job heterogeneity in teleworkability, I decompose the persistence of remote work into: (1) technology improvements, (2) preference shifts, and (3) sorting mechanisms.
Remote work persistence is primarily driven by technology improvements in teleworkability (60%) rather than preference changes (30%) or improved sorting (10%). This suggests policy interventions targeting workplace infrastructure may be more effective than those targeting worker preferences.
- Statistical Matching: ML-based imputation to merge SIPP remote work intensity with CPS earnings data
- Structural Estimation: Simulated Method of Moments (SMM) with genetic algorithm optimization
- Data Pipeline: Integrated processing of 73GB multi-source data (SIPP, CPS, ATUS, O*NET)
- High-Performance Computing: Parallel estimation using 128 cores, 512 population GA
Data Sources Integrated:
- SIPP (Survey of Income and Program Participation): 131MB processed, remote work intensity measures
- CPS (Current Population Survey): 65GB processed, main analysis sample with earnings
- ATUS (American Time Use Survey): Pre-2022 telework validation
- O*NET: Occupation-level teleworkability scores
Statistical Matching:
- LightGBM-based machine learning imputation
- Harmonized variables across datasets (education, age, occupation, geography)
- Cross-validation: RΒ² = 0.68 for remote work intensity prediction
Model Features:
- Search and matching framework with worker-job heterogeneity
- Workers differ in remote work preferences (z ~ LogNormal)
- Jobs differ in teleworkability (Ο ~ distribution)
- Optimal choice of remote work intensity (Ξ± β [0,1])
Estimation Method:
- Genetic algorithm optimization (100 generations, 512 population)
- Simulated Method of Moments with bootstrap variance weighting
- 11 empirical moments matched (wage distribution, remote work shares, sorting patterns)
- ~30 minutes per estimation run on HPC cluster
Languages Used:
- Julia: Structural model, optimization, moment computation (47 files)
- Python: Data processing, statistical matching, ML imputation (20 files)
- Stata: Wage regressions, robustness checks (18 files)
Key Modules:
src/structural_model/: Economic model implementation (9 modules)src/optimization/: GA and multi-start local searchsrc/empirical/: Moments estimation and statistical matchingsrc/data/: Multi-source data acquisition and processing
why_remote_work_stuck/
β
βββ docs/
β βββ technical/
β β βββ EMPIRICAL_PIPELINE.md # Complete data processing documentation
β β βββ DATA_DICTIONARY.md # Variable definitions and sources
β β βββ STRUCTURAL_MODEL.md # Model specification and estimation
β βββ development_logs/
β β βββ optimization_oct2025.md # Parameter estimation session notes
β βββ outputs/
β βββ main.pdf # Latest paper
β βββ slides.pdf # Latest presentation
β
βββ src/
β βββ data/ # Data acquisition & processing (Python)
β β βββ sipp_process.py # SIPP remote work metrics
β β βββ ipums_process.py # CPS/ATUS processing
β β βββ get_fred_q_theta.py # Labor market tightness
β βββ empirical/ # Moments & statistical matching
β β βββ stat_matching/ # SIPPβCPS ML imputation
β β βββ data_moments_core.jl # Moment computation
β β βββ wfh_wage_facts/ # Wage-remote work regressions
β βββ structural_model/ # Economic model (Julia)
β β βββ ModelSetup.jl # Parameter initialization
β β βββ ModelSolver.jl # Equilibrium computation
β β βββ ModelEconomics.jl # Production & matching
β β βββ GeneticAlgorithm.jl # Custom GA implementation
β βββ optimization/ # Parameter estimation
β β βββ OptimizationObjective.jl # SMM objective function
β β βββ multi_start_local_search.jl
β βββ reporting/ # Results & decomposition
β
βββ data/ # 73GB processed data (see below)
β βββ processed/
β β βββ empirical/ # Estimation-ready datasets
β β βββ cps/ # CPS with imputed remote work
β β βββ sipp/ # SIPP person-year files
β β βββ harmonized/ # Cross-dataset harmonization
β βββ aux/ # Crosswalks and auxiliary data
β
βββ results/
β βββ global_optimization/ # GA estimation outputs
β β βββ final/
β β βββ *_3144287_best.yaml # Best parameters (2019)
β β βββ *_3144287.json # Full optimization history
β βββ bootstrap_moments/ # Bootstrap variance estimates
β
βββ manuscript/
β βββ final_document/
β βββ main.pdf # Compiled paper
β βββ main.tex # LaTeX source
β βββ figures/ # Publication figures
β
βββ presentations/
βββ QMW_10202025/
βββ slides.pdf # Queen Mary Workshop slides
βββ slides.tex # Beamer source
π Paper PDF - Latest manuscript version π Slides PDF - Queen Mary Workshop presentation π Technical Documentation - Pipeline and data documentation πΎ Estimation Results - Parameter estimates and diagnostics
Processed data (73GB) is included in this repository at data/processed/. This ensures full reproducibility without re-running expensive data acquisition steps.
| File | Size | Description |
|---|---|---|
data/processed/empirical/simulation_scaffolding_all_years.feather |
5.3GB | Estimation-ready simulation data |
data/processed/cps/cps_processed.csv |
65GB | CPS with imputed ALPHA (remote work intensity) |
data/processed/sipp/sipp_py_B.csv.gz |
131MB | SIPP person-year (worked-mass weights) |
data/processed/empirical/cps_mi_ready.dta |
- | Stata-ready CPS for analysis |
data/aux/fred_q_theta.csv |
- | Quarterly labor market tightness |
Note: For GitHub deployment, consider using Git LFS or hosting data separately (Zenodo, institutional server). See docs/DATA_AVAILABILITY.md for options.
Preference Parameters:
- ΞΌ_z = -1.926 (mean taste for remote work, LogNormal)
- Ο_z = 0.744 (preference dispersion)
Technology Parameters:
- Οβ = 0.772 (baseline teleworkability)
- Aβ = 24.60 (skill productivity)
Model Fit:
- mean_alpha: 0.0697 vs data 0.0675 (3% error) β
- in-person share: 0.9187 vs data 0.8867
- Objective value: 6.97 (SMM weighted distance)
Estimation Details:
- Method: Genetic algorithm (100 generations, 512 population)
- Runtime: 30 minutes on 128 cores
- Convergence: 511/512 individuals per generation
Python (3.10+):
pip install -r requirements/python_requirements.txt
# Key packages: polars, pandas, scikit-learn, lightgbmJulia (1.8+):
using Pkg
Pkg.activate(".")
Pkg.instantiate()
# Key packages: DataFrames, Arrow, Optim, StatsBaseStata (17+):
- See
requirements/stata_requirements.txtfor user-written packages reghdfe,estout,gtools
- Storage: 73GB for processed data
- Memory: 32GB RAM recommended for full sample analysis
- Compute: HPC cluster recommended for estimation (uses SLURM)
using Arrow, DataFrames
# Load estimation-ready simulation data
scaff = Arrow.Table("data/processed/simulation_scaffolding_all_years.feather") |> DataFrame
# Check remote work shares by year
by_year = combine(groupby(scaff, :year), :alpha => mean, :remote => mean)# View best parameters from optimization
cat results/global_optimization/final/global_optimization_modified_2019_3144287_best.yaml# Load model and estimated parameters
include("src/structural_model/ModelInterface.jl")
# Quick model test
include("test_model_quick.jl")# Compute empirical moments
include("src/empirical/data_moments_core.jl")- Large-scale data pipeline (73GB multi-source integration)
- Statistical matching across incompatible surveys
- ETL workflows with data validation
- Efficient storage formats (Arrow/Feather, compressed CSV)
- Simulated Method of Moments estimation
- Bootstrap variance estimation
- Machine learning imputation (LightGBM)
- Regression analysis with high-dimensional fixed effects
- Genetic algorithm implementation
- Parallel computing (multi-core optimization)
- High-performance Julia programming
- SLURM job scheduling and monitoring
- Modular code architecture
- Version control (Git)
- Reproducible research practices
- Comprehensive documentation
- Search and matching models
- Worker-job heterogeneity
- Equilibrium computation
- Counterfactual analysis
- 2023-Q1: Data acquisition and processing pipeline
- 2023-Q2: Statistical matching methodology development
- 2023-Q3: Structural model specification
- 2024-Q1: Initial parameter estimation
- 2024-Q2: Model refinement and robustness checks
- 2024-Q3: Decomposition analysis
- 2024-Q4: Manuscript preparation
- 2025-Q1: Optimization refinement (latest)
If you use this code or data, please cite:
@unpublished{valdesbobes2025remote,
author = {Valdes-Bobes, Mitchell},
title = {Why Remote Work Stuck: A Structural Analysis of Remote Work Persistence},
institution = {University of Wisconsin-Madison},
year = {2025},
note = {Working Paper}
}Or use the included CITATION.cff file.
- Code: MIT License (see
LICENSE) - Data: Original data sources have separate licenses (IPUMS, Census Bureau)
- Paper: All rights reserved
Mitchell Valdes-Bobes Department of Economics University of Wisconsin-Madison
For questions about the code or data, please open an issue in this repository.
This research uses data from:
- IPUMS CPS (Flood et al. 2024)
- Survey of Income and Program Participation (U.S. Census Bureau)
- O*NET (U.S. Department of Labor)
- FRED (Federal Reserve Economic Data)
Computational resources provided by the Center for High Throughput Computing at UW-Madison.
Version: 1.0 Last Updated: November 2, 2025 Repository: why_remote_work_stuck