Skip to content

mitchv34/why_remote_work_stuck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Why Remote Work Stuck: Structural Analysis of Remote Work Persistence

Author: Mitchell Valdes-Bobes Institution: University of Wisconsin-Madison Status: Working Paper Data Coverage: 2013-2025


Research Overview

This project investigates why remote work adoption increased dramatically during COVID-19 and then stabilized at levels significantly higher than pre-pandemic, rather than returning to baseline. Using a structural search-and-matching model with worker heterogeneity in remote work preferences and job heterogeneity in teleworkability, I decompose the persistence of remote work into: (1) technology improvements, (2) preference shifts, and (3) sorting mechanisms.

Key Finding

Remote work persistence is primarily driven by technology improvements in teleworkability (60%) rather than preference changes (30%) or improved sorting (10%). This suggests policy interventions targeting workplace infrastructure may be more effective than those targeting worker preferences.

Methods

  • Statistical Matching: ML-based imputation to merge SIPP remote work intensity with CPS earnings data
  • Structural Estimation: Simulated Method of Moments (SMM) with genetic algorithm optimization
  • Data Pipeline: Integrated processing of 73GB multi-source data (SIPP, CPS, ATUS, O*NET)
  • High-Performance Computing: Parallel estimation using 128 cores, 512 population GA

Technical Highlights

1. Large-Scale Data Engineering (73GB Pipeline)

Data Sources Integrated:

  • SIPP (Survey of Income and Program Participation): 131MB processed, remote work intensity measures
  • CPS (Current Population Survey): 65GB processed, main analysis sample with earnings
  • ATUS (American Time Use Survey): Pre-2022 telework validation
  • O*NET: Occupation-level teleworkability scores

Statistical Matching:

  • LightGBM-based machine learning imputation
  • Harmonized variables across datasets (education, age, occupation, geography)
  • Cross-validation: RΒ² = 0.68 for remote work intensity prediction

2. Structural Model Estimation

Model Features:

  • Search and matching framework with worker-job heterogeneity
  • Workers differ in remote work preferences (z ~ LogNormal)
  • Jobs differ in teleworkability (ψ ~ distribution)
  • Optimal choice of remote work intensity (Ξ± ∈ [0,1])

Estimation Method:

  • Genetic algorithm optimization (100 generations, 512 population)
  • Simulated Method of Moments with bootstrap variance weighting
  • 11 empirical moments matched (wage distribution, remote work shares, sorting patterns)
  • ~30 minutes per estimation run on HPC cluster

3. Code Architecture

Languages Used:

  • Julia: Structural model, optimization, moment computation (47 files)
  • Python: Data processing, statistical matching, ML imputation (20 files)
  • Stata: Wage regressions, robustness checks (18 files)

Key Modules:

  • src/structural_model/: Economic model implementation (9 modules)
  • src/optimization/: GA and multi-start local search
  • src/empirical/: Moments estimation and statistical matching
  • src/data/: Multi-source data acquisition and processing

Repository Structure

why_remote_work_stuck/
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ technical/
β”‚   β”‚   β”œβ”€β”€ EMPIRICAL_PIPELINE.md      # Complete data processing documentation
β”‚   β”‚   β”œβ”€β”€ DATA_DICTIONARY.md         # Variable definitions and sources
β”‚   β”‚   └── STRUCTURAL_MODEL.md        # Model specification and estimation
β”‚   β”œβ”€β”€ development_logs/
β”‚   β”‚   └── optimization_oct2025.md    # Parameter estimation session notes
β”‚   └── outputs/
β”‚       β”œβ”€β”€ main.pdf                   # Latest paper
β”‚       └── slides.pdf                 # Latest presentation
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data/                          # Data acquisition & processing (Python)
β”‚   β”‚   β”œβ”€β”€ sipp_process.py            # SIPP remote work metrics
β”‚   β”‚   β”œβ”€β”€ ipums_process.py           # CPS/ATUS processing
β”‚   β”‚   └── get_fred_q_theta.py        # Labor market tightness
β”‚   β”œβ”€β”€ empirical/                     # Moments & statistical matching
β”‚   β”‚   β”œβ”€β”€ stat_matching/             # SIPPβ†’CPS ML imputation
β”‚   β”‚   β”œβ”€β”€ data_moments_core.jl       # Moment computation
β”‚   β”‚   └── wfh_wage_facts/            # Wage-remote work regressions
β”‚   β”œβ”€β”€ structural_model/              # Economic model (Julia)
β”‚   β”‚   β”œβ”€β”€ ModelSetup.jl              # Parameter initialization
β”‚   β”‚   β”œβ”€β”€ ModelSolver.jl             # Equilibrium computation
β”‚   β”‚   β”œβ”€β”€ ModelEconomics.jl          # Production & matching
β”‚   β”‚   └── GeneticAlgorithm.jl        # Custom GA implementation
β”‚   β”œβ”€β”€ optimization/                  # Parameter estimation
β”‚   β”‚   β”œβ”€β”€ OptimizationObjective.jl   # SMM objective function
β”‚   β”‚   └── multi_start_local_search.jl
β”‚   └── reporting/                     # Results & decomposition
β”‚
β”œβ”€β”€ data/                              # 73GB processed data (see below)
β”‚   β”œβ”€β”€ processed/
β”‚   β”‚   β”œβ”€β”€ empirical/                 # Estimation-ready datasets
β”‚   β”‚   β”œβ”€β”€ cps/                       # CPS with imputed remote work
β”‚   β”‚   β”œβ”€β”€ sipp/                      # SIPP person-year files
β”‚   β”‚   └── harmonized/                # Cross-dataset harmonization
β”‚   └── aux/                           # Crosswalks and auxiliary data
β”‚
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ global_optimization/           # GA estimation outputs
β”‚   β”‚   └── final/
β”‚   β”‚       β”œβ”€β”€ *_3144287_best.yaml    # Best parameters (2019)
β”‚   β”‚       └── *_3144287.json         # Full optimization history
β”‚   └── bootstrap_moments/             # Bootstrap variance estimates
β”‚
β”œβ”€β”€ manuscript/
β”‚   └── final_document/
β”‚       β”œβ”€β”€ main.pdf                   # Compiled paper
β”‚       β”œβ”€β”€ main.tex                   # LaTeX source
β”‚       └── figures/                   # Publication figures
β”‚
└── presentations/
    └── QMW_10202025/
        β”œβ”€β”€ slides.pdf                 # Queen Mary Workshop slides
        └── slides.tex                 # Beamer source

Quick Access

πŸ“„ Paper PDF - Latest manuscript version πŸ“Š Slides PDF - Queen Mary Workshop presentation πŸ“– Technical Documentation - Pipeline and data documentation πŸ’Ύ Estimation Results - Parameter estimates and diagnostics


Data Availability

Processed data (73GB) is included in this repository at data/processed/. This ensures full reproducibility without re-running expensive data acquisition steps.

Key Processed Files

File Size Description
data/processed/empirical/simulation_scaffolding_all_years.feather 5.3GB Estimation-ready simulation data
data/processed/cps/cps_processed.csv 65GB CPS with imputed ALPHA (remote work intensity)
data/processed/sipp/sipp_py_B.csv.gz 131MB SIPP person-year (worked-mass weights)
data/processed/empirical/cps_mi_ready.dta - Stata-ready CPS for analysis
data/aux/fred_q_theta.csv - Quarterly labor market tightness

Note: For GitHub deployment, consider using Git LFS or hosting data separately (Zenodo, institutional server). See docs/DATA_AVAILABILITY.md for options.


Estimation Results Summary

2019 Baseline Parameters (Job 3144287)

Preference Parameters:

  • ΞΌ_z = -1.926 (mean taste for remote work, LogNormal)
  • Οƒ_z = 0.744 (preference dispersion)

Technology Parameters:

  • Οˆβ‚€ = 0.772 (baseline teleworkability)
  • A₁ = 24.60 (skill productivity)

Model Fit:

  • mean_alpha: 0.0697 vs data 0.0675 (3% error) βœ…
  • in-person share: 0.9187 vs data 0.8867
  • Objective value: 6.97 (SMM weighted distance)

Estimation Details:

  • Method: Genetic algorithm (100 generations, 512 population)
  • Runtime: 30 minutes on 128 cores
  • Convergence: 511/512 individuals per generation

Requirements

Software Dependencies

Python (3.10+):

pip install -r requirements/python_requirements.txt
# Key packages: polars, pandas, scikit-learn, lightgbm

Julia (1.8+):

using Pkg
Pkg.activate(".")
Pkg.instantiate()
# Key packages: DataFrames, Arrow, Optim, StatsBase

Stata (17+):

  • See requirements/stata_requirements.txt for user-written packages
  • reghdfe, estout, gtools

System Requirements

  • Storage: 73GB for processed data
  • Memory: 32GB RAM recommended for full sample analysis
  • Compute: HPC cluster recommended for estimation (uses SLURM)

Quick Start

1. Explore Processed Data

using Arrow, DataFrames

# Load estimation-ready simulation data
scaff = Arrow.Table("data/processed/simulation_scaffolding_all_years.feather") |> DataFrame

# Check remote work shares by year
by_year = combine(groupby(scaff, :year), :alpha => mean, :remote => mean)

2. View Estimation Results

# View best parameters from optimization
cat results/global_optimization/final/global_optimization_modified_2019_3144287_best.yaml

3. Run Model with Estimated Parameters

# Load model and estimated parameters
include("src/structural_model/ModelInterface.jl")

# Quick model test
include("test_model_quick.jl")

4. Reproduce Moments

# Compute empirical moments
include("src/empirical/data_moments_core.jl")

Skills Demonstrated

Data Engineering

  • Large-scale data pipeline (73GB multi-source integration)
  • Statistical matching across incompatible surveys
  • ETL workflows with data validation
  • Efficient storage formats (Arrow/Feather, compressed CSV)

Statistical & Econometric Methods

  • Simulated Method of Moments estimation
  • Bootstrap variance estimation
  • Machine learning imputation (LightGBM)
  • Regression analysis with high-dimensional fixed effects

Computational Methods

  • Genetic algorithm implementation
  • Parallel computing (multi-core optimization)
  • High-performance Julia programming
  • SLURM job scheduling and monitoring

Software Engineering

  • Modular code architecture
  • Version control (Git)
  • Reproducible research practices
  • Comprehensive documentation

Economic Modeling

  • Search and matching models
  • Worker-job heterogeneity
  • Equilibrium computation
  • Counterfactual analysis

Project Timeline

  • 2023-Q1: Data acquisition and processing pipeline
  • 2023-Q2: Statistical matching methodology development
  • 2023-Q3: Structural model specification
  • 2024-Q1: Initial parameter estimation
  • 2024-Q2: Model refinement and robustness checks
  • 2024-Q3: Decomposition analysis
  • 2024-Q4: Manuscript preparation
  • 2025-Q1: Optimization refinement (latest)

Citation

If you use this code or data, please cite:

@unpublished{valdesbobes2025remote,
  author = {Valdes-Bobes, Mitchell},
  title = {Why Remote Work Stuck: A Structural Analysis of Remote Work Persistence},
  institution = {University of Wisconsin-Madison},
  year = {2025},
  note = {Working Paper}
}

Or use the included CITATION.cff file.


License

  • Code: MIT License (see LICENSE)
  • Data: Original data sources have separate licenses (IPUMS, Census Bureau)
  • Paper: All rights reserved

Contact

Mitchell Valdes-Bobes Department of Economics University of Wisconsin-Madison

For questions about the code or data, please open an issue in this repository.


Acknowledgments

This research uses data from:

  • IPUMS CPS (Flood et al. 2024)
  • Survey of Income and Program Participation (U.S. Census Bureau)
  • O*NET (U.S. Department of Labor)
  • FRED (Federal Reserve Economic Data)

Computational resources provided by the Center for High Throughput Computing at UW-Madison.


Version: 1.0 Last Updated: November 2, 2025 Repository: why_remote_work_stuck

About

Structural model explaining why remote work stabilized post-COVID rather than reverting to pre-pandemic levels

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors