Skip to content

process_quantile differing number of rows output for rstan and cmdstanr #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
micahwiesner67 opened this issue May 14, 2025 · 2 comments

Comments

@micahwiesner67
Copy link
Collaborator

micahwiesner67 commented May 14, 2025

Running a model fit using a 'cmdstanr' backend and a 'rstan' backend on the same simple test data test_data.parquet and extracting the results by passing the data through process_quantiles yields a differing number of rows coming out of process_quantiles.

Intuition tells me if we are processing data for 5 reference_dates the results should contain 55 rows (2x the number of rows for each variable that contains a bound (as this is split between 0.5 and 0.95 quantile reporting), while processed_obs_cases will just have 1 row for each reference_date). The cmdstanr data has 57 rows while the rstan back-end has the documented 55 cases as expected.

It is unclear at this time if this is due to code in the process_quantiles, or extract_draws_from_fit function or the underlying EpiNow2::epinow() function

cmdstanr result - bad

Image

rstan result - good

Image

My current hypothesis is that the extract_draws_from_fit function is not the issue and the post_process_and_merge function code is where the issue arises

@zsusswein @kgostic putting here for documentation

@kgostic
Copy link
Collaborator

kgostic commented May 14, 2025

Let me know if you want to jump on a call tomorrow. I'd be curious about the underlying values too.

  1. Let's take pp_nowcast_cases as a case study. It looks like in both cases, we get the number of rows we expect, but do we get the same values? I think that's one of the variables you're trying to plot that isn't rendering.
  2. Then focus on processed_obs_data. How do those values compare? What rows are present in the cmdstan result but not in the rstan?
  3. Same Q as 2 for growth rate.

@micahwiesner67
Copy link
Collaborator Author

micahwiesner67 commented May 22, 2025

  1. We get the same values coming out of summaries (excluding the duplicate rows for processed_obs_data and growth_rate)
  2. The processed_obs_data rows being duplicated and growth_rate rows are also duplicated
  3. The values that come out of both are the same, but the growth_rate rows are actually missing two rows. The final date (time index = 5) values are both dropped

After further probing, this discrepancy does arise in the draws_table

Additionally,
I just tested if the cmdstanr version being installed could be related to this issue, this does not seem to be the case. I removed the commit hash in the DESCRIPTION file and installed cmdstanr version 0.9.0 (with the same version of EpiNow2 - 1.6.1) and the duplicate rows still is an issue

@PatrickTCorbett can you look into a bandaid fix in the post-processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants