Conversation
…or thread safety In parallel ODE solving, multiple threads reading/writing the shared op->badSolve flag creates a race condition where one thread's failed solve can incorrectly prevent other threads from continuing. This introduces a localBadSolve variable in ind_indLin0, ind_liblsoda0, and ind_lsoda0 that tracks bad-solve state per-individual, and removes the shared op->badSolve check from iniSubject in par_solve.h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These shared fields are written from multiple threads during parallel ODE solving. Without synchronization this is undefined behavior in C/C++. Use #pragma omp atomic write to ensure thread-safe stores. Also refactor push*Dose allocation failure handling to use local flags instead of writing op->badSolve inside critical sections then reading it outside. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rf_errorcall is not thread-safe and will crash or corrupt R state when called from OpenMP worker threads. Replace with the same NA_REAL return pattern already used for backward==2 case, plus setting the badSolve flag atomically to signal the error after the parallel region. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The abort flag is read by all threads and written by the master thread without synchronization. Using volatile prevents the compiler from caching the value in a register, ensuring threads see updates promptly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rx_get_thread(mx) used <= which allowed index==mx (one past the end of per-thread arrays sized to mx). Changed to < for correct bounds. Also changed the overflow fallback from returning 0 (which collides with thread 0's slot) to clamping to mx-1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
linCmtB used a single global stan::math::linCmtStan object and associated Eigen matrices, making it unsafe for parallel execution. Refactored to use the same per-thread vector pattern as linCmtA: - Created linB_t struct holding all per-thread state - Changed __linCmtB from a single object to std::vector<linB_t> - Added ensureLinCmtB(nCores) called during solve setup - linCmtB() now indexes __linCmtB by omp_get_thread_num() - R-facing utility functions use element 0 (main thread) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all 16 file-scope static variables in dop853.c into a dop853_ctx_t struct defined in dop853.h. Each call to dop853() now allocates the context on the stack, making the solver reentrant and safe for concurrent use from multiple threads. Changes: - Add dop853_ctx_t struct to dop853.h containing nfcn/nstep/naccpt/nrejct, hout/xold/xout, nrds/indir, yy1/k1-k10, and rcont1-rcont8 - dopcor() takes dop853_ctx_t* as first parameter, uses local aliases for array pointers to minimize diff in the computation body - contd8() becomes static and takes dop853_ctx_t* (never called externally since all callers use iout=0) - Remove accessor functions (nfcnRead, nstepRead, etc.) which were never called from outside dop853.c - Public dop853() API is unchanged - callers need no modifications This enables future parallelization of par_dop() with OpenMP, which was previously blocked by the comment "This part CAN be parallelized, if dop is thread safe..." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in src/handle_evid.cpp by keeping our thread-safe version (atomic badSolve flag + NA_REAL return) over main's Rf_errorcall calls, which are not safe from OpenMP parallel regions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The macro `#define J lcb.J` causes `lcb.J` on line 683 to expand to `lcb.lcb.J`, which fails because linB_t has no member named lcb. Use the macro name `J` directly instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Some of these are more dangerous like the dop optimizations. Claude (at least for me), isn't exactly always doing the right things 😄 |
|
But they don't hurt. It just takes a review and testing I guess. |
|
Yeah, I looked at each and they seemed reasonable to me, but I'm not sure that they are. I have several projects now that need faster estimation, so I thought, what faster way than making nlmixr2 faster! 😄 |
|
Well all the pooled estimation methods are now multi-threaded in the development version. This issue with focei may be in The fixes here may make the threading more stable, but |
|
I think the key is to make tests such that you feel you're adequately plumbing the depths. That's what I've been doing. I'll take a function that I want Claude to work on an just build a battery of tests for it -- I have claude make the tests :). Then I have it do what I want with those tests run at the end to make sure it didn't break anything. I think the key is it may be difficult to do this with all those. Though, like I said, you can have Claude make them. |
|
For #991 I have re-prompted and it still doesn't get the code right yet... It has tests but it isn't producing the right behavior. |
|
Are you using copilot? If you can, try pulling it to your personal computer and use claude code it's indescribably better. Copilot at Pfizer is still telling me to use R functions that dont exist. |
|
I am using claude in the terminal. I will have to wait unitl tomorrow since I have maxed the usage again. |
|
(You can see that Bill is also using claude. In github it shows Bill and I are co-authoring with Claude). |
|
Sorry. You mentioned copilot in the meeting the other day. I'll try and see if I can help tomorrow after work. Can you point out the tests you're using? |
|
And FYI, the reason I came here is that a crash with some of the work I'm having Claude do in nlmixr2est noted a crash in rxode2 (mainly around the unsuccessful integration flag). So, I thought that I'd do a general check rather than just the targeted check for the one issue already identified. |
|
It seems more reasonable now, but it doens't pass the test suite. I am asking for it to fix it. We will see what happens. |
|
This passes, asking for copilot review. |
There was a problem hiding this comment.
Pull request overview
This PR is part of a broader effort to make rxode2 (and by extension nlmixr2 FOCEI estimation) more thread-safe for parallel ODE solving under OpenMP.
Changes:
- Refactored
dop853from file-scope static variables to a per-call context struct (dop853_ctx_t), making it fully reentrant. - Refactored
linCmtBfrom a single global object to a per-threadstd::vector<linB_t>, matching the existinglinCmtAthreading pattern. - Replaced reads of the shared
op->badSolveflag mid-solve with per-individuallocalBadSolvevariables, and protected writes toop->badSolve/op->naTimewith#pragma omp atomic write; replaced non-thread-safeRf_errorcallcalls in hot-path functions.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/rxomp.h |
Fixed off-by-one in thread index check (<= → <) and changed out-of-range fallback from slot 0 to last valid slot |
src/rxData.cpp |
Added call to ensureLinCmtB(nCores) before parallel solve to pre-allocate per-thread linCmtB state |
src/par_solve.h |
Removed early op->badSolve exit from iniSubject to prevent one thread's failure from blocking other threads' initialization |
src/par_solve.cpp |
Added localBadSolve variables, volatile int abort, and _Pragma("omp atomic write") in badSolveExit macro |
src/linCmt.cpp |
Replaced global __linCmtB singleton with a per-thread std::vector<linB_t>; added ensureLinCmtB |
src/handle_evid.cpp |
Replaced Rf_errorcall (not thread-safe) with atomic write to op->badSolve + return NA |
src/dop853.h |
Added dop853_ctx_t struct; removed declarations of deleted accessor functions |
src/dop853.c |
Moved all static solver state into a local dop853_ctx_t on the stack; deleted global accessor functions |
inst/include/rxode2parseHandleEvid.h |
Used allocFailed flag + #pragma omp atomic write instead of direct op->badSolve assignment inside/outside critical sections |
inst/include/rxode2parseGetTime.h |
Replaced direct writes to op->badSolve/op->naTime with atomic writes via temporary variables |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
inst/include/rxode2parseGetTime.h
Outdated
| if (ISNA(ret)) { | ||
| op->badSolve=1; | ||
| if (op->naTime == 0) { | ||
| op->naTime = 1 + 10*cmt; | ||
| } | ||
| int newBadSolve = 1; | ||
| int newNaTime = 1 + 10*cmt; | ||
| #pragma omp atomic write | ||
| op->badSolve = newBadSolve; | ||
| #pragma omp atomic write | ||
| op->naTime = newNaTime; |
There was a problem hiding this comment.
The old code had if (op->naTime == 0) { op->naTime = 1 + 10*cmt; } to preserve the first error's diagnostic code. The new code unconditionally overwrites op->naTime with every error occurrence. In parallel mode this means the error code used for the final error message (in rxode2_df.cpp:212-239) is non-deterministic — it may reflect a later/different error rather than the first one that triggered the bad solve. While op->badSolve just needs to be set once, the op->naTime diagnostic value ideally should preserve the first non-zero value to give a consistent error message. The same behavior change applies to getRate, getDur, and handleInfusionItem in this file.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
src/dop853.h
Outdated
| double *yy1, *k1, *k2, *k3, *k4, *k5, *k6, *k7, *k8, *k9, *k10; | ||
| double *rcont1, *rcont2, *rcont3, *rcont4; | ||
| double *rcont5, *rcont6, *rcont7, *rcont8; | ||
| } dop853_ctx_t; |
There was a problem hiding this comment.
The dop853.h documentation comment block (lines 160-171) still references the now-deleted accessor functions (xRead, hRead, nstepRead, naccptRead, nrejctRead, nfcnRead) and the comment at line 77 documents contd8(i,s) with the old two-argument signature, whereas the refactored contd8 is now static and takes three arguments (dop853_ctx_t *ctx, int ii, double x). This documentation section should be updated or removed to avoid confusion for future maintainers.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@mattfidler I've opened a new pull request, #994, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@mattfidler I've opened a new pull request, #995, to work on those changes. Once the pull request is ready, I'll request review from you. |
- Update contd8() signature from 2 to 3 arguments (ctx, i, s) - Replace deleted accessor functions (xRead, hRead, etc.) with equivalent dop853_ctx_t struct field documentation - Update "Remarks about C version" to mention context struct - Fix typo: 'stff' -> 'stiff' Co-authored-by: mattfidler <514778+mattfidler@users.noreply.github.com>
…-write pattern Co-authored-by: mattfidler <514778+mattfidler@users.noreply.github.com>
Preserve first error's diagnostic code in op->naTime during parallel solve
Fix stale documentation in dop853.h after thread-safe refactoring
This is part of an effort I'm trying to make focei estimation in nlmixr2 more thread safe.