Skip to content

replace SimpleNamespace configs with typed dataclasses and enums#24

Open
matthewcornell wants to merge 4 commits intomainfrom
mc/simple-namespace-refactor
Open

replace SimpleNamespace configs with typed dataclasses and enums#24
matthewcornell wants to merge 4 commits intomainfrom
mc/simple-namespace-refactor

Conversation

@matthewcornell
Copy link
Member

@matthewcornell matthewcornell commented Feb 23, 2026

This PR is part of Operational models refactoring ideas #42. It is paired with the operational-models PR [replace SimpleNamespace configs with typed dataclasses and enums #43].

This PR replaces SimpleNamespace configs with typed dataclasses and enums. It is the first step in Operational models refactoring ideas #42. Specifically we introduce a dedicated config.py module with concrete, typed classes replacing ad-hoc SimpleNamespace objects used for model and run configuration. Key changes:

  • Add DataSource, Disease, PowerTransform, and PoolingStrategy enums
  • Add abstract ModelConfig and RunConfig dataclasses with concrete subclasses: SARIXModelConfig, SARIXFourierModelConfig, GBQRModelConfig, SARIXRunConfig, and GBQRRunConfig
  • Export all config classes and models from __init__.py
  • Update gbqr.py and sarix.py to consume typed config objects
  • Update integration tests to construct typed configs instead of SimpleNamespace

We also:

  • Bump version to 1.3.0
  • Update pyproject.toml to pin pandas to 2.* (pandas 3.* introduced breaking changes)
  • Update uv.lock to use latest iddata commit

Copy link
Contributor

@lshandross lshandross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few comments, questions, and places where I think we should get Nick's input, but this looks so much better, Matt!

class Disease(str, Enum):
FLU = "flu"
COVID = "covid"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSV should be added


class PowerTransform(str, Enum):
FOURTH_ROOT = "4rt"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add none

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should have a NONE option as well.

theta_pooling: PoolingStrategy = PoolingStrategy.NONE
sigma_pooling: PoolingStrategy = PoolingStrategy.NONE
x: list = field(default_factory=list)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check with @nickreich if these are indeed the defaults we want for the SARIX model class parameters (and we should do the same for the GBQR model class)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that strongly about what the defaults are/should be. Defaulting to NONE seems reasonable to me.



@dataclass
class SARIXRunConfig(RunConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed whether it made sense to move these parameters to the model class and decided on getting @nickreich 's opinion. I think it could make sense to eliminate SARIX and GBQR run config classes if possible since they store only a few parameters unique to the particular model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! We could have documentation on some of the enums within the classes for users?

date = datetime.date.fromisoformat("2024-01-06")
fips_codes = ["US", "01", "02", "04", "05"] # fewer locs for faster testing
# model_config = create_test_sarix_model_config(main_source=["nhsn"], theta_pooling="shared", sigma_pooling="none")
# model_config = create_test_sarix_model_config(main_source=[DataSource.NHSN], theta_pooling="shared", sigma_pooling="none")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I had left this commented code in by mistake. We can get rid of it

date = datetime.date.fromisoformat("2024-01-06")
fips_codes = ["US", "01", "02", "04", "05"] # fewer locs for faster testing
# model_config = create_test_sarix_model_config(main_source=["nhsn"], theta_pooling="shared", sigma_pooling="none")
# model_config = create_test_sarix_model_config(main_source=[DataSource.NHSN], theta_pooling="shared", sigma_pooling="none")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we can get rid of this commented code

Copy link
Contributor

@trobacker trobacker Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! One thing to keep in mind is that if model development goes through many iterations, we may end up having a lot of enums in the configs. I think we should keep some kind of a doc for their purpose and intention. E.g. what does "Pooling" mean to a user of the model (even if it may be obvious).

Copy link
Contributor

@trobacker trobacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I left a comment about documenting the enums for users of the code and supported some of @lshandross's comments. I'll leave approval for one more by Nick for now.

Copy link
Member

@nickreich nickreich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments

theta_pooling: PoolingStrategy = PoolingStrategy.NONE
sigma_pooling: PoolingStrategy = PoolingStrategy.NONE
x: list = field(default_factory=list)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that strongly about what the defaults are/should be. Defaulting to NONE seems reasonable to me.

@nickreich
Copy link
Member

Re: whether to keep RunConfig subclasses or move inference params to ModelConfig

I think the inference/fitting parameters (num_warmup, num_samples, num_chains for SARIX; num_bags, bag_frac_samples for GBQR) belong on ModelConfig, not a model-specific RunConfig. A few reasons:

  1. They're model-specific. SARIX has warmup/samples/chains, GBQR has bags/bag_frac. These are different knobs for different estimation procedures — they don't belong in a shared RunConfig base class, and having model-specific RunConfig subclasses just to hold them feels like the wrong abstraction.

  2. The current code is already inconsistent. In operational-models, short_run mutates model_config.num_bags for GBQR but run_config.num_warmup for SARIX. Putting them all on ModelConfig resolves this.

  3. It simplifies RunConfig. If inference params move to ModelConfig, the remaining RunConfig fields (disease, ref_date, locations, quantiles, output paths) are truly model-agnostic. You can drop SARIXRunConfig and GBQRRunConfig entirely and have a single RunConfig class.

On formalizing short_run: Short runs are a core workflow (used in testing, CI, and development), so it's worth making them a documented part of the model config rather than leaving them as ad-hoc mutations in caller scripts. A lightweight way to do this: put production defaults on the ModelConfig fields, and add a short_run() method that each model subclass defines:

@dataclass
class SARIXModelConfig(ModelConfig):
    # ... structural params ...
    num_warmup: int = 2000
    num_samples: int = 2000
    num_chains: int = 1

    def short_run(self):
        """Return a copy with reduced inference intensity for testing."""
        return dataclasses.replace(self, num_warmup=100, num_samples=100)

@dataclass
class GBQRModelConfig(ModelConfig):
    # ... structural params ...
    num_bags: int = 100
    bag_frac_samples: float = 0.7

    def short_run(self):
        return dataclasses.replace(self, num_bags=10)

This way each model defines what "short run" means for itself, it's discoverable/documented, and the caller just does model_config.short_run() without needing to know which knobs to turn. The q_levels reduction for short runs would stay in the caller since that's an output format concern, not a model intensity concern.

@nickreich
Copy link
Member

The above was drafted (obviously, I think) with Claude. But with substantial steering from me, especially towards the part of thinking of the short_run flag as part of how a model is defined formally.

@matthewcornell
Copy link
Member Author

Thanks, @nickreich ! I've integrated your suggestions into the plan at this comment.

@matthewcornell
Copy link
Member Author

@nickreich Re: whether to keep RunConfig subclasses or move inference params to ModelConfig:

I think the inference/fitting parameters (num_warmup, num_samples, num_chains for SARIX; num_bags, bag_frac_samples for GBQR) belong on ModelConfig, not a model-specific RunConfig.

num_warmup, num_samples, num_chains are currently in SARIXRunConfig, and I agree they should be moved to SARIXModelConfig, and then delete SARIXRunConfig. This is as you outlined. However gbqr's num_bags and bag_frac_samples are already in GBQRModelConfig. I wonder if you meant that save_feat_importance should be moved from GBQRRunConfig to GBQRModelConfig and then delete GBQRRunConfig?

@nickreich
Copy link
Member

I mean, I think we need to move it to the GBQRModelConfig if we want to remove the RunConfig objects. It doesn't feel like the save_feat_importance "fits" there as well as the others since it's not really a model parameter, but I guess it's just ok and we should make the move.

@matthewcornell
Copy link
Member Author

I mean, I think we need to move [save_feat_importance] to the GBQRModelConfig if we want to remove the RunConfig objects. It doesn't feel like the save_feat_importance "fits" there as well as the others since it's not really a model parameter, but I guess it's just ok and we should make the move.

@lshandross , @trobacker What are your thoughts about save_feat_importance? I'm a little concerned that Nick's not seeing an obvious place for it to go.

@lshandross
Copy link
Contributor

Another suggestion would be to make it an optional argument (I don't know if that's the correct terminology) in the general run config that only gets used if the model is a GBQR. But I don't know if I have a strong preference between my suggestion and Nick's

…o corresponding *ModelConfig classes. made RunConfig concrete. added Disease.RSV, PowerTransform.NONE, updated gbqr.py to handle Disease.RSV and PowerTransform.NONE
@matthewcornell
Copy link
Member Author

I've updated the code to include suggested changes. Please review d74fe38 .

Copy link
Contributor

@lshandross lshandross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment but it's not blocking. Thank you for making the changes, Matt!


def create_test_sarix_run_config(ref_date, states, hsas, num, tmp_path):
run_config = SARIXRunConfig(
def create_test_sarix_run_config(ref_date, states, hsas, tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this helper function should be taken out of this script and moved into a separate one (the ones to create sarix and gbqr model configs could also be moved here), as well as renamed to be just create_test_run_config which can be reused by both sarix and gbqr models. If others agree, then maybe we can file an issue for it since it doesn't really belong in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants