Open
Conversation
- Break long lines in sandbox.py to fit 120-char limit - Add noqa comments for imports in __init__.py - Format tim_file_analyzer.py with black
…sts: parity + logs via savepar()
…sts: parity + logs via savepar()
- Protocol - Add hello proto_version=1.2 and capabilities: get_kind, dir, setitem, get_slice, path_access - Non-exceptional attribute discovery (get-kind) and optional dir RPC - Array semantics - Introduce write-through ArrayProxy for numpy-backed attrs (stoas, toaerrs, freqs) - Reads expose plain numpy via __array__; __repr__/__str__/__getattr__ delegate to ndarray - Writes route via setitem RPC; add get_slice RPC to avoid fetching whole arrays for reads - Guard __len__ for 0-d; support fancy/masked indexing; optional safe dtype cast on set - Dotted paths - Gate first-hop mapping access to mapping-like (__getitem__) objects only - Support psr['PAR'].val/err/fit/set via dotted-path resolution - Process lifecycle & IO - Popen: pass env, close_fds, start_new_session, Windows CREATE_NEW_PROCESS_GROUP when available - Group kill with POSIX killpg; Windows terminate/kill fallbacks - Thread-safe RPC framing with a per-worker send lock - Errors & logging - Stderr ring with optional tail included in exceptions; cap tail by bytes (16KiB) and lines - Tests - Add unit tests comparing sandbox vs native for parameter mapping and TOA edits+fit - Full suite green: array writes now update worker; residuals match native after fit
…ot installed. This fixes that bug.
Collaborator
Author
|
Hi @mattpitkin , It's been a while. I don't know how many people use this, but I do. I have fixed certain bugs now and then, and for me it's very usable. Especially in unit tests that would trip the github CI in other packages. You think we can merge it? |
mattpitkin
approved these changes
Mar 20, 2026
Collaborator
mattpitkin
left a comment
There was a problem hiding this comment.
Looks good to me. I'd be happy to merge this, especially as it is standalone from the normal usage and shouldn't affect anything else. However, I'd like to give @vallis a couple of days to comment before we hit merge.
Collaborator
Author
|
Yes, that's reasonable. And indeed, it's fully separate, so nothing will break. Thanks! |
Bug fixes: - Fix RSS calculation always returning 0 on Linux (integer division order) - Fix load_many silently producing empty results when given a generator - Fix _rpc retry not handling setitem (would misroute as call after crash) Structural improvements: - Decompose _worker_stdio_main (430→60 line loop) into handler functions - Only recycle worker on Crashed/Timeout, not on all Tempo2Error subclasses - Route all attribute access through _get_kind_safe for crash recovery - Extract _dispatch_rpc so try/except retry paths are identical - Factor duplicated venv search paths into _venv_search_paths() Cleanup: - Remove dead _ArrayProxy.__del__ (referenced non-existent self._wp) - Fix _rpc_lock TOCTOU race by moving init to _WorkerProc.__init__ - Remove double-retry in load_many (tempopulsar already retries internally) - Make policy an explicit __init__ parameter instead of kwargs.pop - Move threading/collections to module-level imports - Drop Windows compat (libstempo is Linux/macOS only) - Replace blanket flake8 noqa: E501 with project .flake8 config New tests (16): - load_many: generator input, bad files, empty input - Live crash recovery: SIGKILL worker + auto-recover, state preservation - _ArrayProxy arithmetic: mul, add, sub, div, np.asarray, len/shape - Explicit policy parameter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Sandbox Mode for Crash-Protected libstempo Usage
Summary
This PR introduces a comprehensive sandbox mode for libstempo that provides crash isolation and automatic retry capabilities. The sandbox runs each
tempopulsarinstance in a separate subprocess, preventing tempo2 crashes from affecting the main Python kernel. Especially when running in scripts, processing many pulsars for (I)PTA purposes those random non-deterministic crashes that tempo2 tends to cause can be a pain, and the sandbox provides a frictionless workaround.Notes
I have not tested this PR yet on a large number of devices, but it is fully operational. Test verifies against a native libstempo instance, and the github runners all finish. Demo notebooks work. Still, I am looking for bug reports! Only
add_gwbis not compatible, because that function callstempo2natively.Suitability for libstempo
I originally wrote this sandbox as part of another one of my projects (more IPTA-focused), which I will release soon. If the sandbox is deemed 'out of scope' for libstempo I'd be most happy to make it available through other means. To me the libstempo repo seems most sensible.
Key Features
🛡️ Crash Isolation
🔄 Automatic Retry & Recovery
🌍 Environment Flexibility
🚀 Proactive TOA Handling
Usage
Basic Usage (Drop-in Replacement)
Advanced Configuration
Bulk Processing
Performance Characteristics
Based on comprehensive performance testing with J1909-3744_NANOGrav_dfg+12 data:
residuals(), ~1.0x fordesignmatrix()The overhead is primarily due to inter-process communication, which is the price of process isolation. For heavy computations, the overhead becomes negligible relative to the actual work.
Implementation Details
Architecture
Tempo2Error,Tempo2Crashed,Tempo2Timeout)New Files Added
libstempo/sandbox.py(1,232 lines): Main sandbox implementationlibstempo/tim_file_analyzer.py(548 lines): TOA file analysis utilitiestests/test_sandbox.py(94 lines): Comprehensive test suiteREADME.mdwith sandbox documentationTesting
add_gwbis not supported because it calls tempo2 natively --> Won't fixWhen to Use
Use Sandbox when:
Use Direct when:
Backward Compatibility
This PR is made to be fully backward compatible. The sandbox is opt-in and doesn't affect existing code. All existing libstempo functionality remains unchanged.
Documentation
Some irony
The github CI occasionally fails on the regular tests because of random segmentation faults.