Skip to content

350 mice#356

Open
RobertClay wants to merge 647 commits intodevelopmentfrom
350_MICE
Open

350 mice#356
RobertClay wants to merge 647 commits intodevelopmentfrom
350_MICE

Conversation

@RobertClay
Copy link
Collaborator

@RobertClay RobertClay commented Dec 12, 2023

Rudimentary MICE into the data pipeline for MINOS.

added full_mice_dataset command that runs MICE on arc4 node. takes about 20mins when the job load.

data pipeline split into two making two pops for transition and input data respectively.

TODO in the future

  • integrate MICE data into transitions as well using with() command.
  • much more detail on imputation difference in distribution for key variables. a lot of notebooks etc.
  • validation of the scaled populations under imputation as well. already sort of started this for scotpop.
  • merging on non-imputed data rather than cbind might be a bit less sensitive to input data but will induce missing data.

@RobertClay RobertClay requested a review from ld-archer December 12, 2023 15:16
@RobertClay RobertClay requested a review from ld-archer February 9, 2024 15:44
Copy link
Collaborator

@ld-archer ld-archer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added the functionality to run default simulation with mice data locally now for testing and development. Handovers from this show that predicted income needs some work on scaling. Could perhaps try log-normal LMM as in branch 231.

Also reverted the old_pidp stuff back to just pidp as it broke all non-synthetic simulation runs. The idea is good and I understand the need but the execution needs to work with both synthetic and non-synthetic runs.

Can see that the data prep is done for cross validation but none of the simulation targets use this data, and visualisation script is not set up to visualise anything but default non-mice runs. CV even more important here as we're separating the transition and input populations.

Nothing set up for SIPHER7 experiment. If we are going to shift all our default runs to a mice imputed population then we need an input population that works for both. I'm happy to do this myself when the above problems are sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants