Add cloud dataset integration test for reading ingested data by stevevanhooser · Pull Request #53 · Waltham-Data-Science/NDI-python

stevevanhooser · 2026-03-30T17:51:28Z

Summary

This PR adds a comprehensive integration test module for verifying that ingested datasets can be successfully downloaded from the cloud and read correctly. The test validates the NDI cloud orchestration workflow by downloading a Carbon fiber microelectrode dataset and verifying timeseries data integrity.

Key Changes

New test module: tests/test_cloud_read_ingested.py with integration tests for cloud dataset operations
Dataset fixtures: Module-scoped fixtures for downloading and opening cloud datasets
Carbonfiber probe validation: Test that reads timeseries data from a carbon-fiber probe and verifies channel values match expected results (16 channels with specific numeric values)
Stimulator probe validation: Test that reads stimulator probe timeseries and verifies stimulation ID and timing parameters
Credential-based skipping: Tests automatically skip if NDI_CLOUD_USERNAME and NDI_CLOUD_PASSWORD environment variables are not set
Temporary directory handling: Uses temporary directories for dataset downloads to avoid persistent test artifacts

Notable Implementation Details

Uses the Carbon fiber microelectrode dataset (ID: 668b0539f13096e04f1feccd) as a stable test fixture
Validates numeric precision with appropriate tolerances (0.001 for floating-point comparisons)
Handles both scalar and array-like return values for stimulation timing parameters
Verifies exact session count (expects exactly 1 session in the dataset)
Tests both probe discovery by name and by type attributes

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Test downloads the Carbon fiber dataset from cloud, opens its session, reads carbonfiber probe timeseries and stimulator probe data, and verifies values match expected results. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The test now authenticates explicitly via login() and passes the client to downloadDataset, matching the CI setup where TEST_USER_2_USERNAME and TEST_USER_2_PASSWORD secrets are mapped to these env vars. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The CI workflow runs all tests but was not setting NDI_CLOUD_USERNAME and NDI_CLOUD_PASSWORD, causing every cloud test to be skipped. Map the TEST_USER_2 secrets so cloud integration tests actually execute. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

… to warning - Compute tests (hello-world, zombie) now skip with pytest.skip() when the user lacks compute permissions instead of failing. - downloadDataset: silent failures (doc added without error but not in DB) are now a warning, not a RuntimeError. This is expected for older datasets that may have duplicate IDs or docs merged with internally-created session/dataset documents. Only hard failures (conversion errors, explicit add() exceptions) raise RuntimeError. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

…docs The check now simply verifies that every document downloaded from the cloud is present in the local database. Extra local documents (e.g. session or session-in-a-dataset docs created internally) are expected and no longer flagged. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Missing remote documents now always print their document_class for diagnostics. Session/dataset document types are expected to be absent from the local DB (superseded by internally-created docs) and are logged as a note rather than raising an error. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The Carbon fiber dataset contains documents whose types are defined in NDIcalc-vis-matlab (calc/, neuro/, vision/ under ndi_common/). The installer now clones NDIcalc-vis-matlab and copies its database_documents and schema_documents into NDI-python's ndi_common so they are discoverable at runtime. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

…types The Carbon fiber dataset includes a dataset_session_info document that gets superseded by the locally-created one during dataset init. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

When a device epoch entry contains a single epochprobemap (not wrapped in a list), iterating over it fails with TypeError. Normalize the input to a list before iterating. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

NDI-compress-python is needed to decompress binary data files fetched from the cloud. Added as a pip dependency in pyproject.toml. Test assertions now give clearer messages when readtimeseries returns None or empty arrays (indicating binary files aren't accessible). https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

system_mfdaq.py: readchannels_epochsamples, samplerate, epochsamples2times, and epochtimes2samples now check _is_ingested(epochfiles) and route to the corresponding _ingested methods on the DAQ reader. Previously they always called the non-ingested methods, which tried to read raw disk files that don't exist for cloud-downloaded datasets. mfdaq.py: readchannels_epochsamples_ingested now falls back to session.database_openbinarydoc() when the data_file doesn't exist locally, triggering the ndic:// on-demand cloud fetch mechanism. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print epoch table, devinfo, and epochfiles to understand why readtimeseries returns None for cloud-ingested data. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Need to understand why readtimeseries returns None — print the probe's actual class (may not be timeseries_mfdaq), epoch table structure, and what getchanneldevinfo returns. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The readtimeseriesepoch method silently catches AttributeError/TypeError from epochtimes2samples and returns None. Add explicit error-propagating diagnostics to see the actual exception being swallowed. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

_resolve_device was looking up DAQ systems via getattr(session, 'daqsystem', []) which doesn't exist on ndi_session. The DAQ system is already stored in the epoch table entry's underlying_epochs by buildepochtable, so use it directly instead of re-searching. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

_get_daqsystems always created ndi_daq_system (base class) which lacks epochtimes2samples. Use session._document_to_object() instead, which checks the document's ndi_daqsystem_class and creates the correct subclass (ndi_daq_system_mfdaq for MFDAQ systems). https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Cloud-ingested daqsystem documents may not have the ndi_daqsystem_class field set. Previously this fell through to creating the base ndi_daq_system which lacks epochtimes2samples and other MFDAQ methods. Default to ndi_daq_system_mfdaq when the class name is empty, since most DAQ systems are MFDAQ. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

getepochfiles returns (file_list, epoch_id) tuple but all methods were passing the raw tuple to _is_ingested and the DAQ reader. Add _getepochfiles helper to consistently unpack the file list. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

epochtimes2samples_ingested and epochsamples2times_ingested failed when the probe's hardware channel numbers (e.g. 9-24) didn't match the ingested document's channel numbers (e.g. 1-16). The sample rate lookup returned NaN for all channels, causing 'Cannot handle different sample rates'. Now falls back to querying all available channels in the ingested document when the specific channel lookup finds no matches. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

When channel-level sample rates are not available, try reading sample_rate directly from the ingested epochtable. Include diagnostic info in the error message to show what channels, sample rates, and epochtable keys are available. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print epochtable keys, channel count, and first channel's fields to understand why samplerate_ingested can't find matching channels. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The MATLAB-ingested data uses compressed segment files (ai_group*_seg.nbf_*) read via ndicompress, and channel metadata from channel_list.bin. The previous Python implementation tried to read a single VHSB data_file which doesn't exist for MATLAB-ingested cloud data. Key changes: - getchannelsepoch_ingested: reads channel_list.bin via database_openbinarydoc (triggers ndic:// cloud fetch) and parses with mfdaq_epoch_channel - samplerate_ingested: now returns (sr, offset, scale) tuple matching MATLAB, looks up channels by both type AND number - readchannels_epochsamples_ingested: reads compressed segment files using ndicompress.expand_ephys/expand_digital/expand_time, handles segment arithmetic and channel group decoding - epochsamples2times_ingested/epochtimes2samples_ingested: updated for new samplerate_ingested return signature https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

- Add from_dict classmethod to ChannelInfo in mfdaq.py (the fallback path used it but it didn't exist) - Standardize channel types on both sides when matching in samplerate_ingested — the channel_list.bin may use abbreviations like 'ai' while the probe requests 'analog_in' - Include available channels in error message for debugging https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Implements ingested event reading for both derived digital events (dep/den/dimp/dimn) and native events/markers/text. For native events, reads evmktx_group*_seg.nbf_* compressed files via ndicompress. Routes system_mfdaq.readevents_epochsamples through _is_ingested check. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

- Update test_daq.py mocks to handle samplerate_ingested returning (sr, offset, scale) tuple and database_openbinarydoc fallback - Add detailed diagnostics for channel_list.bin access: print ingested doc class, property keys, file_info structure, and exact error from database_openbinarydoc https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

CI summary only shows the fail message, not captured stdout. Collect all diagnostic info into the fail message so we can see epochfiles, doc_class, file_info structure, and channel_list.bin access result. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

- open_session: propagate dataset's cloud_client to the recreated session so _try_cloud_fetch can download binary files via ndic:// - getchannelsepoch_ingested: raise with context when both channel_list.bin and JSON fallback fail, instead of returning empty list silently https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

…iles MATLAB writes channel_list.bin as a tab-delimited struct array format (read via vlt.file.loadStructArray), not JSON. The Python readFromFile was using json.load() which failed on the binary data. Now tries loadStructArray first, falls back to JSON. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

- readFromFile: try JSON first (Python-generated), fall back to vlt.file.loadStructArray (MATLAB tab-delimited). Previous order caused loadStructArray to misparse JSON files. - readchannels_epochsamples_ingested: log segment read failures as warnings instead of silently swallowing them. - Test: detect all-NaN data and fail with clear message. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

ndicompress.expand_ephys returns (data, error_signal) tuple, not a bare array. Extract data[0] from the tuple before using .shape. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print d1 shape, first values, t1[0], and the scale/offset/samplerate from channel info to diagnose why values don't match expected. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

MATLAB's underlying2scaled does (d - offset) * scale, not d * scale + offset. With offset=32768 and scale=0.195, this converts raw Intan ADC values to microvolts correctly. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print t0_t1 epoch bounds to verify sample positioning. The scaled values will show whether the offset is a sample position issue. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

MATLAB sorts epochs by epoch_id. Without sorting, Python's epoch 1 could map to t00002 while MATLAB's epoch 1 maps to t00001, causing readtimeseries to read from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

epochtimes2samples returns 1-based MATLAB indices. Convert to 0-based Python indices in readtimeseriesepoch (s0-1, s1-1) and propagate through readchannels_epochsamples_ingested segment arithmetic. The data was shifted by one sample because MATLAB arrays are 1-indexed but Python arrays are 0-indexed. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

MATLAB uses 1-based sample indices (sample 1 = first sample). Python uses 0-based (sample 0 = first sample). All times2samples and samples2times functions now use 0-based indexing: Python: s = round((t - t0) * sr) t = t0 + s / sr MATLAB: s = 1 + round((t - t0) * sr) t = t0 + (s - 1) / sr Updated functions: - mfdaq.epochtimes2samples / epochsamples2times - mfdaq.epochtimes2samples_ingested / epochsamples2times_ingested - probe.timeseries.times2samples / samples2times - system_mfdaq.epochtimes2samples / epochsamples2times (docstrings) Updated all tests and bridge YAML files to document the difference. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Read a few samples around t=10 to see which position has the expected value 55.77 and determine the exact offset. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

When the epochtable stores t0_t1 as a flat list [0, 2584.87], the code iterated over scalars and created (0, 0) and (2584.87, 2584.87) instead of the correct (0, 2584.87). Now detects flat pairs (2 scalar elements) and wraps as a single [(t0, t1)] tuple. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Need to check if the epoch sorting puts t00002 first (meaning there's no t00001) or if we're reading from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The MATLAB channelgroupdecoding returns indices into the segment data columns (within the subset of channels matching the group and type). The Python version was returning the raw channel numbers instead, causing an off-by-one channel shift (e.g., reading channel 10's data when channel 9 was requested, because channel number 9 was used as a 0-based column index into data that starts at column 0 = channel 1). Now matches MATLAB: finds the channel's position within its group subset and returns that as a 0-based index. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Use absolute import 'ndi.daq.mfdaq' instead of relative '..daq.mfdaq' since the file is in ndi.file.type, not ndi.daq. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

channelgroupdecoding now returns 0-based indices within each group's channel subset, not channel numbers. Channel 1 at index 0 in group 1, channel 3 at index 0 in group 2. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The stimulator's readtimeseriesepoch was passing device_epoch_id (a string like 't00002') to dev.readevents_epochsamples() which expects an epoch_number (int). Added device_epoch_number to the base getchanneldevinfo return dict, and use it in the stimulator. Also: - Fix 1-based sample indices (s0 = 1 + ...) to 0-based - Log readevents_epochsamples errors instead of silently catching https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

All except Exception: pass/silent blocks now log warnings with the actual error message. This makes it visible when event reading, metadata reading, analog reading, devicestring parsing, or timeref creation fails instead of silently returning empty data. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print channeltype, channel, devepoch for the stimulator probe and try readevents_epochsamples directly to expose the actual error. Include ds/ts key sizes in failure message. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The stimulator was using device_epoch_id (string) instead of device_epoch_number (int) for DAQ system calls. Also add debug logging of parsed devicestring to diagnose channel detection. Print devicestring in test for visibility. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print timestamps/data shapes, first values, and handle dict returns to understand what readevents_epochsamples_ingested actually returns. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

MATLAB's getchanneldevinfo iterates ALL epochprobemaps in the underlying epoch and extracts channels from every matching one. The Python version only looked at the single matching epm stored in the probe's epoch table entry. Also print all underlying epochprobemaps and their devicestrings in the test diagnostic to understand what channels are available. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

md channels are handled separately via getmetadata, not readevents. Print per-channel results from readevents to see the event data structure for mk1-3 and e1-3. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

The stimulator's stimid can be a nested numpy array where stimid[0] is itself an array. Use np.asarray().ravel() to flatten before extracting the scalar value. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

claude added 30 commits March 30, 2026 17:50

Add cloud readIngested integration test

0a7b3cb

Test downloads the Carbon fiber dataset from cloud, opens its session, reads carbonfiber probe timeseries and stimulator probe data, and verifies values match expected results. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Fix black formatting in test_cloud_read_ingested.py

c6a39a6

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Add dataset_session_info and session_in_a_dataset to allowed missing …

93aad30

…types The Carbon fiber dataset includes a dataset_session_info document that gets superseded by the locally-created one during dataset init. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Add diagnostic prints to carbonfiber test for debugging

2c2fafe

Print epoch table, devinfo, and epochfiles to understand why readtimeseries returns None for cloud-ingested data. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Add ingested document structure diagnostics to test

e54dd07

Print epochtable keys, channel count, and first channel's fields to understand why samplerate_ingested can't find matching channels. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

claude added 11 commits March 31, 2026 12:54

Unpack expand_ephys tuple return value

2c47352

ndicompress.expand_ephys returns (data, error_signal) tuple, not a bare array. Extract data[0] from the tuple before using .shape. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Add debug prints for raw data values and scale/offset

d70a268

Print d1 shape, first values, t1[0], and the scale/offset/samplerate from channel info to diagnose why values don't match expected. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Fix underlying2scaled formula: (data - offset) * scale

46fa1fc

MATLAB's underlying2scaled does (d - offset) * scale, not d * scale + offset. With offset=32768 and scale=0.195, this converts raw Intan ADC values to microvolts correctly. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print t0_t1 and scaled data values for debugging sample position

1dafeec

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Add t0_t1 diagnostic; keep underlying2scaled formula

fd24520

Print t0_t1 epoch bounds to verify sample positioning. The scaled values will show whether the offset is a sample position issue. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Sort epoch tables by epoch_id alphanumerically

b2cbeb0

MATLAB sorts epochs by epoch_id. Without sorting, Python's epoch 1 could map to t00002 while MATLAB's epoch 1 maps to t00001, causing readtimeseries to read from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Debug: read near t=10 to check sample alignment

58cc2c3

Read a few samples around t=10 to see which position has the expected value 55.77 and determine the exact offset. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Print all epoch IDs to verify epoch ordering

e757c27

Need to check if the epoch sorting puts t00002 first (meaning there's no t00001) or if we're reading from the wrong epoch. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

stevevanhooser force-pushed the claude/add-cloud-dataset-test-ENbIP branch from 3155b8f to e757c27 Compare March 31, 2026 23:55

claude added 11 commits April 1, 2026 00:21

Fix import path for standardize_channel_type in mfdaq_epoch_channel

2ef3e19

Use absolute import 'ndi.daq.mfdaq' instead of relative '..daq.mfdaq' since the file is in ndi.file.type, not ndi.daq. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Update channelgroupdecoding test for 0-based group indices

6ff702b

channelgroupdecoding now returns 0-based indices within each group's channel subset, not channel numbers. Channel 1 at index 0 in group 1, channel 3 at index 0 in group 2. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Improve stimulator readevents diagnostic output

46782ea

Print timestamps/data shapes, first values, and handle dict returns to understand what readevents_epochsamples_ingested actually returns. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Fix test to exclude md channels from readevents call

7a9a202

md channels are handled separately via getmetadata, not readevents. Print per-channel results from readevents to see the event data structure for mk1-3 and e1-3. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

Fix stimid extraction for multi-dimensional numpy array

6dce4b2

The stimulator's stimid can be a nested numpy array where stimid[0] is itself an array. Use np.asarray().ravel() to flatten before extracting the scalar value. https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

stevevanhooser merged commit bd3334b into main Apr 1, 2026
5 checks passed

stevevanhooser deleted the claude/add-cloud-dataset-test-ENbIP branch April 1, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cloud dataset integration test for reading ingested data#53

Add cloud dataset integration test for reading ingested data#53
stevevanhooser merged 52 commits intomainfrom
claude/add-cloud-dataset-test-ENbIP

stevevanhooser commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stevevanhooser commented Mar 30, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants