Add cloud dataset integration test for reading ingested data#53
Merged
stevevanhooser merged 52 commits intomainfrom Apr 1, 2026
Merged
Add cloud dataset integration test for reading ingested data#53stevevanhooser merged 52 commits intomainfrom
stevevanhooser merged 52 commits intomainfrom
Conversation
Test downloads the Carbon fiber dataset from cloud, opens its session, reads carbonfiber probe timeseries and stimulator probe data, and verifies values match expected results. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The test now authenticates explicitly via login() and passes the client to downloadDataset, matching the CI setup where TEST_USER_2_USERNAME and TEST_USER_2_PASSWORD secrets are mapped to these env vars. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The CI workflow runs all tests but was not setting NDI_CLOUD_USERNAME and NDI_CLOUD_PASSWORD, causing every cloud test to be skipped. Map the TEST_USER_2 secrets so cloud integration tests actually execute. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
… to warning - Compute tests (hello-world, zombie) now skip with pytest.skip() when the user lacks compute permissions instead of failing. - downloadDataset: silent failures (doc added without error but not in DB) are now a warning, not a RuntimeError. This is expected for older datasets that may have duplicate IDs or docs merged with internally-created session/dataset documents. Only hard failures (conversion errors, explicit add() exceptions) raise RuntimeError. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…docs The check now simply verifies that every document downloaded from the cloud is present in the local database. Extra local documents (e.g. session or session-in-a-dataset docs created internally) are expected and no longer flagged. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Missing remote documents now always print their document_class for diagnostics. Session/dataset document types are expected to be absent from the local DB (superseded by internally-created docs) and are logged as a note rather than raising an error. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The Carbon fiber dataset contains documents whose types are defined in NDIcalc-vis-matlab (calc/, neuro/, vision/ under ndi_common/). The installer now clones NDIcalc-vis-matlab and copies its database_documents and schema_documents into NDI-python's ndi_common so they are discoverable at runtime. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…types The Carbon fiber dataset includes a dataset_session_info document that gets superseded by the locally-created one during dataset init. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When a device epoch entry contains a single epochprobemap (not wrapped in a list), iterating over it fails with TypeError. Normalize the input to a list before iterating. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
NDI-compress-python is needed to decompress binary data files fetched from the cloud. Added as a pip dependency in pyproject.toml. Test assertions now give clearer messages when readtimeseries returns None or empty arrays (indicating binary files aren't accessible). https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
system_mfdaq.py: readchannels_epochsamples, samplerate, epochsamples2times, and epochtimes2samples now check _is_ingested(epochfiles) and route to the corresponding _ingested methods on the DAQ reader. Previously they always called the non-ingested methods, which tried to read raw disk files that don't exist for cloud-downloaded datasets. mfdaq.py: readchannels_epochsamples_ingested now falls back to session.database_openbinarydoc() when the data_file doesn't exist locally, triggering the ndic:// on-demand cloud fetch mechanism. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print epoch table, devinfo, and epochfiles to understand why readtimeseries returns None for cloud-ingested data. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Need to understand why readtimeseries returns None — print the probe's actual class (may not be timeseries_mfdaq), epoch table structure, and what getchanneldevinfo returns. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The readtimeseriesepoch method silently catches AttributeError/TypeError from epochtimes2samples and returns None. Add explicit error-propagating diagnostics to see the actual exception being swallowed. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
_resolve_device was looking up DAQ systems via getattr(session, 'daqsystem', []) which doesn't exist on ndi_session. The DAQ system is already stored in the epoch table entry's underlying_epochs by buildepochtable, so use it directly instead of re-searching. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
_get_daqsystems always created ndi_daq_system (base class) which lacks epochtimes2samples. Use session._document_to_object() instead, which checks the document's ndi_daqsystem_class and creates the correct subclass (ndi_daq_system_mfdaq for MFDAQ systems). https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Cloud-ingested daqsystem documents may not have the ndi_daqsystem_class field set. Previously this fell through to creating the base ndi_daq_system which lacks epochtimes2samples and other MFDAQ methods. Default to ndi_daq_system_mfdaq when the class name is empty, since most DAQ systems are MFDAQ. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
getepochfiles returns (file_list, epoch_id) tuple but all methods were passing the raw tuple to _is_ingested and the DAQ reader. Add _getepochfiles helper to consistently unpack the file list. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
epochtimes2samples_ingested and epochsamples2times_ingested failed when the probe's hardware channel numbers (e.g. 9-24) didn't match the ingested document's channel numbers (e.g. 1-16). The sample rate lookup returned NaN for all channels, causing 'Cannot handle different sample rates'. Now falls back to querying all available channels in the ingested document when the specific channel lookup finds no matches. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When channel-level sample rates are not available, try reading sample_rate directly from the ingested epochtable. Include diagnostic info in the error message to show what channels, sample rates, and epochtable keys are available. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print epochtable keys, channel count, and first channel's fields to understand why samplerate_ingested can't find matching channels. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The MATLAB-ingested data uses compressed segment files (ai_group*_seg.nbf_*) read via ndicompress, and channel metadata from channel_list.bin. The previous Python implementation tried to read a single VHSB data_file which doesn't exist for MATLAB-ingested cloud data. Key changes: - getchannelsepoch_ingested: reads channel_list.bin via database_openbinarydoc (triggers ndic:// cloud fetch) and parses with mfdaq_epoch_channel - samplerate_ingested: now returns (sr, offset, scale) tuple matching MATLAB, looks up channels by both type AND number - readchannels_epochsamples_ingested: reads compressed segment files using ndicompress.expand_ephys/expand_digital/expand_time, handles segment arithmetic and channel group decoding - epochsamples2times_ingested/epochtimes2samples_ingested: updated for new samplerate_ingested return signature https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- Add from_dict classmethod to ChannelInfo in mfdaq.py (the fallback path used it but it didn't exist) - Standardize channel types on both sides when matching in samplerate_ingested — the channel_list.bin may use abbreviations like 'ai' while the probe requests 'analog_in' - Include available channels in error message for debugging https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Implements ingested event reading for both derived digital events (dep/den/dimp/dimn) and native events/markers/text. For native events, reads evmktx_group*_seg.nbf_* compressed files via ndicompress. Routes system_mfdaq.readevents_epochsamples through _is_ingested check. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- Update test_daq.py mocks to handle samplerate_ingested returning (sr, offset, scale) tuple and database_openbinarydoc fallback - Add detailed diagnostics for channel_list.bin access: print ingested doc class, property keys, file_info structure, and exact error from database_openbinarydoc https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
CI summary only shows the fail message, not captured stdout. Collect all diagnostic info into the fail message so we can see epochfiles, doc_class, file_info structure, and channel_list.bin access result. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- open_session: propagate dataset's cloud_client to the recreated session so _try_cloud_fetch can download binary files via ndic:// - getchannelsepoch_ingested: raise with context when both channel_list.bin and JSON fallback fail, instead of returning empty list silently https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…iles MATLAB writes channel_list.bin as a tab-delimited struct array format (read via vlt.file.loadStructArray), not JSON. The Python readFromFile was using json.load() which failed on the binary data. Now tries loadStructArray first, falls back to JSON. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- readFromFile: try JSON first (Python-generated), fall back to vlt.file.loadStructArray (MATLAB tab-delimited). Previous order caused loadStructArray to misparse JSON files. - readchannels_epochsamples_ingested: log segment read failures as warnings instead of silently swallowing them. - Test: detect all-NaN data and fail with clear message. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
ndicompress.expand_ephys returns (data, error_signal) tuple, not a bare array. Extract data[0] from the tuple before using .shape. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print d1 shape, first values, t1[0], and the scale/offset/samplerate from channel info to diagnose why values don't match expected. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB's underlying2scaled does (d - offset) * scale, not d * scale + offset. With offset=32768 and scale=0.195, this converts raw Intan ADC values to microvolts correctly. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print t0_t1 epoch bounds to verify sample positioning. The scaled values will show whether the offset is a sample position issue. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB sorts epochs by epoch_id. Without sorting, Python's epoch 1 could map to t00002 while MATLAB's epoch 1 maps to t00001, causing readtimeseries to read from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
epochtimes2samples returns 1-based MATLAB indices. Convert to 0-based Python indices in readtimeseriesepoch (s0-1, s1-1) and propagate through readchannels_epochsamples_ingested segment arithmetic. The data was shifted by one sample because MATLAB arrays are 1-indexed but Python arrays are 0-indexed. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB uses 1-based sample indices (sample 1 = first sample). Python uses 0-based (sample 0 = first sample). All times2samples and samples2times functions now use 0-based indexing: Python: s = round((t - t0) * sr) t = t0 + s / sr MATLAB: s = 1 + round((t - t0) * sr) t = t0 + (s - 1) / sr Updated functions: - mfdaq.epochtimes2samples / epochsamples2times - mfdaq.epochtimes2samples_ingested / epochsamples2times_ingested - probe.timeseries.times2samples / samples2times - system_mfdaq.epochtimes2samples / epochsamples2times (docstrings) Updated all tests and bridge YAML files to document the difference. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Read a few samples around t=10 to see which position has the expected value 55.77 and determine the exact offset. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When the epochtable stores t0_t1 as a flat list [0, 2584.87], the code iterated over scalars and created (0, 0) and (2584.87, 2584.87) instead of the correct (0, 2584.87). Now detects flat pairs (2 scalar elements) and wraps as a single [(t0, t1)] tuple. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Need to check if the epoch sorting puts t00002 first (meaning there's no t00001) or if we're reading from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
3155b8f to
e757c27
Compare
The MATLAB channelgroupdecoding returns indices into the segment data columns (within the subset of channels matching the group and type). The Python version was returning the raw channel numbers instead, causing an off-by-one channel shift (e.g., reading channel 10's data when channel 9 was requested, because channel number 9 was used as a 0-based column index into data that starts at column 0 = channel 1). Now matches MATLAB: finds the channel's position within its group subset and returns that as a 0-based index. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Use absolute import 'ndi.daq.mfdaq' instead of relative '..daq.mfdaq' since the file is in ndi.file.type, not ndi.daq. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
channelgroupdecoding now returns 0-based indices within each group's channel subset, not channel numbers. Channel 1 at index 0 in group 1, channel 3 at index 0 in group 2. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator's readtimeseriesepoch was passing device_epoch_id (a string like 't00002') to dev.readevents_epochsamples() which expects an epoch_number (int). Added device_epoch_number to the base getchanneldevinfo return dict, and use it in the stimulator. Also: - Fix 1-based sample indices (s0 = 1 + ...) to 0-based - Log readevents_epochsamples errors instead of silently catching https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
All except Exception: pass/silent blocks now log warnings with the actual error message. This makes it visible when event reading, metadata reading, analog reading, devicestring parsing, or timeref creation fails instead of silently returning empty data. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print channeltype, channel, devepoch for the stimulator probe and try readevents_epochsamples directly to expose the actual error. Include ds/ts key sizes in failure message. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator was using device_epoch_id (string) instead of device_epoch_number (int) for DAQ system calls. Also add debug logging of parsed devicestring to diagnose channel detection. Print devicestring in test for visibility. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print timestamps/data shapes, first values, and handle dict returns to understand what readevents_epochsamples_ingested actually returns. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB's getchanneldevinfo iterates ALL epochprobemaps in the underlying epoch and extracts channels from every matching one. The Python version only looked at the single matching epm stored in the probe's epoch table entry. Also print all underlying epochprobemaps and their devicestrings in the test diagnostic to understand what channels are available. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
md channels are handled separately via getmetadata, not readevents. Print per-channel results from readevents to see the event data structure for mk1-3 and e1-3. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator's stimid can be a nested numpy array where stimid[0] is itself an array. Use np.asarray().ravel() to flatten before extracting the scalar value. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a comprehensive integration test module for verifying that ingested datasets can be successfully downloaded from the cloud and read correctly. The test validates the NDI cloud orchestration workflow by downloading a Carbon fiber microelectrode dataset and verifying timeseries data integrity.
Key Changes
tests/test_cloud_read_ingested.pywith integration tests for cloud dataset operationsNDI_CLOUD_USERNAMEandNDI_CLOUD_PASSWORDenvironment variables are not setNotable Implementation Details
668b0539f13096e04f1feccd) as a stable test fixturehttps://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL