Skip to content

Add cloud dataset integration test for reading ingested data#53

Merged
stevevanhooser merged 52 commits intomainfrom
claude/add-cloud-dataset-test-ENbIP
Apr 1, 2026
Merged

Add cloud dataset integration test for reading ingested data#53
stevevanhooser merged 52 commits intomainfrom
claude/add-cloud-dataset-test-ENbIP

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Summary

This PR adds a comprehensive integration test module for verifying that ingested datasets can be successfully downloaded from the cloud and read correctly. The test validates the NDI cloud orchestration workflow by downloading a Carbon fiber microelectrode dataset and verifying timeseries data integrity.

Key Changes

  • New test module: tests/test_cloud_read_ingested.py with integration tests for cloud dataset operations
  • Dataset fixtures: Module-scoped fixtures for downloading and opening cloud datasets
  • Carbonfiber probe validation: Test that reads timeseries data from a carbon-fiber probe and verifies channel values match expected results (16 channels with specific numeric values)
  • Stimulator probe validation: Test that reads stimulator probe timeseries and verifies stimulation ID and timing parameters
  • Credential-based skipping: Tests automatically skip if NDI_CLOUD_USERNAME and NDI_CLOUD_PASSWORD environment variables are not set
  • Temporary directory handling: Uses temporary directories for dataset downloads to avoid persistent test artifacts

Notable Implementation Details

  • Uses the Carbon fiber microelectrode dataset (ID: 668b0539f13096e04f1feccd) as a stable test fixture
  • Validates numeric precision with appropriate tolerances (0.001 for floating-point comparisons)
  • Handles both scalar and array-like return values for stimulation timing parameters
  • Verifies exact session count (expects exactly 1 session in the dataset)
  • Tests both probe discovery by name and by type attributes

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL

claude added 30 commits March 30, 2026 17:50
Test downloads the Carbon fiber dataset from cloud, opens its session,
reads carbonfiber probe timeseries and stimulator probe data, and
verifies values match expected results.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The test now authenticates explicitly via login() and passes the client
to downloadDataset, matching the CI setup where TEST_USER_2_USERNAME and
TEST_USER_2_PASSWORD secrets are mapped to these env vars.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The CI workflow runs all tests but was not setting NDI_CLOUD_USERNAME
and NDI_CLOUD_PASSWORD, causing every cloud test to be skipped. Map
the TEST_USER_2 secrets so cloud integration tests actually execute.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
… to warning

- Compute tests (hello-world, zombie) now skip with pytest.skip() when
  the user lacks compute permissions instead of failing.
- downloadDataset: silent failures (doc added without error but not in DB)
  are now a warning, not a RuntimeError. This is expected for older datasets
  that may have duplicate IDs or docs merged with internally-created
  session/dataset documents. Only hard failures (conversion errors,
  explicit add() exceptions) raise RuntimeError.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…docs

The check now simply verifies that every document downloaded from the
cloud is present in the local database. Extra local documents (e.g.
session or session-in-a-dataset docs created internally) are expected
and no longer flagged.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Missing remote documents now always print their document_class for
diagnostics. Session/dataset document types are expected to be absent
from the local DB (superseded by internally-created docs) and are
logged as a note rather than raising an error.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The Carbon fiber dataset contains documents whose types are defined in
NDIcalc-vis-matlab (calc/, neuro/, vision/ under ndi_common/). The
installer now clones NDIcalc-vis-matlab and copies its database_documents
and schema_documents into NDI-python's ndi_common so they are
discoverable at runtime.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…types

The Carbon fiber dataset includes a dataset_session_info document that
gets superseded by the locally-created one during dataset init.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When a device epoch entry contains a single epochprobemap (not wrapped
in a list), iterating over it fails with TypeError. Normalize the input
to a list before iterating.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
NDI-compress-python is needed to decompress binary data files fetched
from the cloud. Added as a pip dependency in pyproject.toml.

Test assertions now give clearer messages when readtimeseries returns
None or empty arrays (indicating binary files aren't accessible).

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
system_mfdaq.py: readchannels_epochsamples, samplerate, epochsamples2times,
and epochtimes2samples now check _is_ingested(epochfiles) and route to
the corresponding _ingested methods on the DAQ reader. Previously they
always called the non-ingested methods, which tried to read raw disk
files that don't exist for cloud-downloaded datasets.

mfdaq.py: readchannels_epochsamples_ingested now falls back to
session.database_openbinarydoc() when the data_file doesn't exist
locally, triggering the ndic:// on-demand cloud fetch mechanism.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print epoch table, devinfo, and epochfiles to understand why
readtimeseries returns None for cloud-ingested data.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Need to understand why readtimeseries returns None — print the probe's
actual class (may not be timeseries_mfdaq), epoch table structure,
and what getchanneldevinfo returns.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The readtimeseriesepoch method silently catches AttributeError/TypeError
from epochtimes2samples and returns None. Add explicit error-propagating
diagnostics to see the actual exception being swallowed.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
_resolve_device was looking up DAQ systems via getattr(session,
'daqsystem', []) which doesn't exist on ndi_session. The DAQ system
is already stored in the epoch table entry's underlying_epochs by
buildepochtable, so use it directly instead of re-searching.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
_get_daqsystems always created ndi_daq_system (base class) which lacks
epochtimes2samples. Use session._document_to_object() instead, which
checks the document's ndi_daqsystem_class and creates the correct
subclass (ndi_daq_system_mfdaq for MFDAQ systems).

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Cloud-ingested daqsystem documents may not have the ndi_daqsystem_class
field set. Previously this fell through to creating the base
ndi_daq_system which lacks epochtimes2samples and other MFDAQ methods.
Default to ndi_daq_system_mfdaq when the class name is empty, since
most DAQ systems are MFDAQ.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
getepochfiles returns (file_list, epoch_id) tuple but all methods
were passing the raw tuple to _is_ingested and the DAQ reader.
Add _getepochfiles helper to consistently unpack the file list.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
epochtimes2samples_ingested and epochsamples2times_ingested failed when
the probe's hardware channel numbers (e.g. 9-24) didn't match the
ingested document's channel numbers (e.g. 1-16). The sample rate lookup
returned NaN for all channels, causing 'Cannot handle different sample
rates'. Now falls back to querying all available channels in the
ingested document when the specific channel lookup finds no matches.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When channel-level sample rates are not available, try reading
sample_rate directly from the ingested epochtable. Include diagnostic
info in the error message to show what channels, sample rates, and
epochtable keys are available.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print epochtable keys, channel count, and first channel's fields
to understand why samplerate_ingested can't find matching channels.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The MATLAB-ingested data uses compressed segment files (ai_group*_seg.nbf_*)
read via ndicompress, and channel metadata from channel_list.bin. The
previous Python implementation tried to read a single VHSB data_file
which doesn't exist for MATLAB-ingested cloud data.

Key changes:
- getchannelsepoch_ingested: reads channel_list.bin via database_openbinarydoc
  (triggers ndic:// cloud fetch) and parses with mfdaq_epoch_channel
- samplerate_ingested: now returns (sr, offset, scale) tuple matching MATLAB,
  looks up channels by both type AND number
- readchannels_epochsamples_ingested: reads compressed segment files using
  ndicompress.expand_ephys/expand_digital/expand_time, handles segment
  arithmetic and channel group decoding
- epochsamples2times_ingested/epochtimes2samples_ingested: updated for
  new samplerate_ingested return signature

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- Add from_dict classmethod to ChannelInfo in mfdaq.py (the fallback
  path used it but it didn't exist)
- Standardize channel types on both sides when matching in
  samplerate_ingested — the channel_list.bin may use abbreviations
  like 'ai' while the probe requests 'analog_in'
- Include available channels in error message for debugging

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Implements ingested event reading for both derived digital events
(dep/den/dimp/dimn) and native events/markers/text. For native events,
reads evmktx_group*_seg.nbf_* compressed files via ndicompress.
Routes system_mfdaq.readevents_epochsamples through _is_ingested check.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- Update test_daq.py mocks to handle samplerate_ingested returning
  (sr, offset, scale) tuple and database_openbinarydoc fallback
- Add detailed diagnostics for channel_list.bin access: print ingested
  doc class, property keys, file_info structure, and exact error from
  database_openbinarydoc

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
CI summary only shows the fail message, not captured stdout. Collect
all diagnostic info into the fail message so we can see epochfiles,
doc_class, file_info structure, and channel_list.bin access result.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- open_session: propagate dataset's cloud_client to the recreated
  session so _try_cloud_fetch can download binary files via ndic://
- getchannelsepoch_ingested: raise with context when both channel_list.bin
  and JSON fallback fail, instead of returning empty list silently

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
…iles

MATLAB writes channel_list.bin as a tab-delimited struct array format
(read via vlt.file.loadStructArray), not JSON. The Python readFromFile
was using json.load() which failed on the binary data. Now tries
loadStructArray first, falls back to JSON.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
- readFromFile: try JSON first (Python-generated), fall back to
  vlt.file.loadStructArray (MATLAB tab-delimited). Previous order
  caused loadStructArray to misparse JSON files.
- readchannels_epochsamples_ingested: log segment read failures as
  warnings instead of silently swallowing them.
- Test: detect all-NaN data and fail with clear message.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
claude added 11 commits March 31, 2026 12:54
ndicompress.expand_ephys returns (data, error_signal) tuple, not a
bare array. Extract data[0] from the tuple before using .shape.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print d1 shape, first values, t1[0], and the scale/offset/samplerate
from channel info to diagnose why values don't match expected.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB's underlying2scaled does (d - offset) * scale, not d * scale + offset.
With offset=32768 and scale=0.195, this converts raw Intan ADC values
to microvolts correctly.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print t0_t1 epoch bounds to verify sample positioning. The scaled
values will show whether the offset is a sample position issue.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB sorts epochs by epoch_id. Without sorting, Python's epoch 1
could map to t00002 while MATLAB's epoch 1 maps to t00001, causing
readtimeseries to read from the wrong epoch.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
epochtimes2samples returns 1-based MATLAB indices. Convert to 0-based
Python indices in readtimeseriesepoch (s0-1, s1-1) and propagate
through readchannels_epochsamples_ingested segment arithmetic.

The data was shifted by one sample because MATLAB arrays are 1-indexed
but Python arrays are 0-indexed.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB uses 1-based sample indices (sample 1 = first sample).
Python uses 0-based (sample 0 = first sample). All times2samples
and samples2times functions now use 0-based indexing:

  Python: s = round((t - t0) * sr)       t = t0 + s / sr
  MATLAB: s = 1 + round((t - t0) * sr)   t = t0 + (s - 1) / sr

Updated functions:
  - mfdaq.epochtimes2samples / epochsamples2times
  - mfdaq.epochtimes2samples_ingested / epochsamples2times_ingested
  - probe.timeseries.times2samples / samples2times
  - system_mfdaq.epochtimes2samples / epochsamples2times (docstrings)

Updated all tests and bridge YAML files to document the difference.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Read a few samples around t=10 to see which position has the
expected value 55.77 and determine the exact offset.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
When the epochtable stores t0_t1 as a flat list [0, 2584.87], the code
iterated over scalars and created (0, 0) and (2584.87, 2584.87) instead
of the correct (0, 2584.87). Now detects flat pairs (2 scalar elements)
and wraps as a single [(t0, t1)] tuple.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Need to check if the epoch sorting puts t00002 first (meaning there's
no t00001) or if we're reading from the wrong epoch.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
@stevevanhooser stevevanhooser force-pushed the claude/add-cloud-dataset-test-ENbIP branch from 3155b8f to e757c27 Compare March 31, 2026 23:55
claude added 11 commits April 1, 2026 00:21
The MATLAB channelgroupdecoding returns indices into the segment data
columns (within the subset of channels matching the group and type).
The Python version was returning the raw channel numbers instead,
causing an off-by-one channel shift (e.g., reading channel 10's data
when channel 9 was requested, because channel number 9 was used as
a 0-based column index into data that starts at column 0 = channel 1).

Now matches MATLAB: finds the channel's position within its group
subset and returns that as a 0-based index.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Use absolute import 'ndi.daq.mfdaq' instead of relative '..daq.mfdaq'
since the file is in ndi.file.type, not ndi.daq.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
channelgroupdecoding now returns 0-based indices within each group's
channel subset, not channel numbers. Channel 1 at index 0 in group 1,
channel 3 at index 0 in group 2.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator's readtimeseriesepoch was passing device_epoch_id (a
string like 't00002') to dev.readevents_epochsamples() which expects
an epoch_number (int). Added device_epoch_number to the base
getchanneldevinfo return dict, and use it in the stimulator.

Also:
- Fix 1-based sample indices (s0 = 1 + ...) to 0-based
- Log readevents_epochsamples errors instead of silently catching

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
All except Exception: pass/silent blocks now log warnings with the
actual error message. This makes it visible when event reading,
metadata reading, analog reading, devicestring parsing, or timeref
creation fails instead of silently returning empty data.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print channeltype, channel, devepoch for the stimulator probe and
try readevents_epochsamples directly to expose the actual error.
Include ds/ts key sizes in failure message.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator was using device_epoch_id (string) instead of
device_epoch_number (int) for DAQ system calls. Also add debug
logging of parsed devicestring to diagnose channel detection.
Print devicestring in test for visibility.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
Print timestamps/data shapes, first values, and handle dict returns
to understand what readevents_epochsamples_ingested actually returns.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
MATLAB's getchanneldevinfo iterates ALL epochprobemaps in the
underlying epoch and extracts channels from every matching one.
The Python version only looked at the single matching epm stored
in the probe's epoch table entry.

Also print all underlying epochprobemaps and their devicestrings
in the test diagnostic to understand what channels are available.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
md channels are handled separately via getmetadata, not readevents.
Print per-channel results from readevents to see the event data
structure for mk1-3 and e1-3.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
The stimulator's stimid can be a nested numpy array where stimid[0]
is itself an array. Use np.asarray().ravel() to flatten before
extracting the scalar value.

https://claude.ai/code/session_01A7rAxYf5pSvs19iVJe3ncL
@stevevanhooser stevevanhooser merged commit bd3334b into main Apr 1, 2026
5 checks passed
@stevevanhooser stevevanhooser deleted the claude/add-cloud-dataset-test-ENbIP branch April 1, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants