Querying by time and move data interpolating into experiment #85

reneburghardt · 2025-07-04T08:24:55Z

Still needs testing

…om experiment

gitnotebooks · 2025-07-04T08:24:58Z

Review these changes at https://app.gitnotebooks.com/sensorium-competition/experanto/pull/85

Copilot

Pull Request Overview

Adds precision-scaled interpolation and higher-level data retrieval methods to Experiment, and refactors Dataset to use these methods.

Introduce interpolate_precision → scale_precision in Experiment for consistent time truncation.
Add get_data_for_interval and get_data_for_chunks to Experiment.
Refactor datasets.py to delegate chunked interpolation to Experiment and remove duplicate time calculations.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
experanto/experiment.py	Added precision scaling; new `get_data_for_interval` and `get_data_for_chunks` methods; updated signature of `interpolate`.
experanto/datasets.py	Updated `__getitem__` to use `Experiment.get_data_for_chunks` and cleaned up redundant time‐generation logic.

Comments suppressed due to low confidence (4)

experanto/experiment.py:83

The return type annotation -> tuple[np.ndarray, np.ndarray] does not match the method’s implementation or docstring, which returns a dict[str, np.ndarray]. Update the signature accordingly.

    def get_data_for_interval(

experanto/experiment.py:145

The return type -> tuple[np.ndarray, np.ndarray] conflicts with the actual return of two dictionaries (out, timestamps). Consider annotating -> tuple[dict[str, np.ndarray], dict[str, np.ndarray]].

    def get_data_for_chunks(

experanto/experiment.py:105

Docstring for get_data_for_interval states it returns a dict but the signature says a tuple. Synchronize the documentation with the code behavior.

        dict[str, np.ndarray]

experanto/experiment.py:115

New methods get_data_for_interval and get_data_for_chunks should have unit tests covering edge cases such as missing sampling rates, single vs. multiple devices, and empty intervals.

        devices = self._resolve_devices(devices)

reneburghardt · 2025-10-15T11:00:36Z

@pollytur @schewskone ready for review from my side

experanto/experiment.py

tests/create_experiment.py

pollytur · 2025-10-15T16:34:24Z

@reneburghardt why have you decided to move chunks to the Experiment level? What is the high level motivation behind this decision? (I think I missed this discussion)

reneburghardt · 2025-10-15T16:38:38Z

@reneburghardt why have you decided to move chunks to the Experiment level? What is the high level motivation behind this decision? (I think I missed this discussion)

To get data with regards to chunk size or with regards to interval seemed to me to be on the same level or, in other words, having both options not in the same place felt weird. They also share a fair bit of code such as scaling for precision and resolving sampling rate and offset.

How do you see that @pollytur ?

pollytur · 2025-12-11T10:09:07Z

[follow up from meeting with Fabee] lets keep intervals quering on the level of experiment but chunking and disretezination should only be on the level of dataset

…anto into querying_by_time

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 14 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-01T18:47:28Z

tests/test_experiment.py

+def test_resolve_chunk_sizes_with_dict():
+    with create_experiment() as experiment_path:
+        experiment = Experiment(
+            root_folder=experiment_path, modality_config=get_default_config()
+        )
+
+        custom = {"device_0": 10.0, "device_1": 20.0}
+        result = experiment._resolve_chunk_sizes(
+            list(custom.keys()), chunk_sizes=custom
+        )
+        assert result == custom


The test for resolving chunk sizes with a dictionary (test_resolve_chunk_sizes_with_dict) passes float values (10.0, 20.0) in the dictionary, but the _resolve_chunk_sizes method doesn't convert dictionary values to integers. This means chunk_sizes could contain floats when integers are expected. The test should either verify that floats are converted to ints, or the test should use integer values to match the expected behavior.

Copilot · 2026-01-01T18:47:28Z

tests/create_sequence_data.py

+    if shifts_per_signal:
+        shifts = np.random.rand(n_signals) / meta["sampling_rate"] * 0.9
+        np.save(sequence_root / "meta" / "phase_shifts.npy", shifts)


The variable 'shifts' is only defined inside the 'if shifts_per_signal' block, but it's referenced in the return statement on line 69 outside of this block. When 'shifts_per_signal' is False, this will cause a NameError because 'shifts' will be undefined. Initialize 'shifts = None' before the conditional block to ensure it's always defined.

Copilot · 2026-01-01T18:47:29Z

experanto/datasets.py

+        # TODO: find better convention for image, video, color, gray channels. This makes the monkey data same as mouse.
+        if "screen" in out:
+            if out["screen"].shape[-1] == 3:
+                out["screen"] = out["screen"].permute(0, 3, 1, 2)
+            if out["screen"].shape[0] == self.chunk_sizes["screen"]:
+                out["screen"] = out["screen"].transpose(0, 1)


The screen-specific transformation logic (permute and transpose operations) has been moved outside the device loop. The condition on line 636 checks 'if out["screen"].shape[0] == self.chunk_sizes["screen"]' which determines whether to transpose. This logic assumes the chunk_size dimension is in position 0 after transforms, but this might not always be the case depending on the transforms applied. The previous code (removed) applied this logic immediately after transforms within the loop, which may have been more reliable. Verify that this reordering doesn't break the logic for different transform configurations.

Suggested change

# TODO: find better convention for image, video, color, gray channels. This makes the monkey data same as mouse.

if "screen" in out:

if out["screen"].shape[-1] == 3:

out["screen"] = out["screen"].permute(0, 3, 1, 2)

if out["screen"].shape[0] == self.chunk_sizes["screen"]:

out["screen"] = out["screen"].transpose(0, 1)

# TODO: find better convention for image, video, color, gray channels. This makes the monkey data same as mouse.

if device_name == "screen":

if out["screen"].shape[-1] == 3:

out["screen"] = out["screen"].permute(0, 3, 1, 2)

if out["screen"].shape[0] == self.chunk_sizes["screen"]:

out["screen"] = out["screen"].transpose(0, 1)

Copilot · 2026-01-01T18:47:29Z

experanto/experiment.py

+        elif isinstance(chunk_sizes, (int, float)):
+            return {d: int(chunk_sizes) for d in devices}
+        else:
+            return chunk_sizes


The _resolve_chunk_sizes method converts float values to int on line 227, but chunk_sizes should be integers representing the number of samples. If a dictionary is passed with float values (as demonstrated in test_resolve_chunk_sizes_with_dict on line 446), those float values won't be converted to integers. For consistency, the method should also convert dictionary values to int, similar to how it handles the single int/float case.

Suggested change

return chunk_sizes

# Ensure all chunk sizes are integers, even when provided as a dict

return {device: int(size) for device, size in chunk_sizes.items()}

Copilot · 2026-01-01T18:47:29Z

experanto/experiment.py

+        """
+        Retrieve interpolated data for a fixed number of timesteps (chunk size) per device.
+
+        Parameters
+        ----------
+        chunk_sizes : int or dict[str, int], optional
+            Number of time steps to retrieve per device. If a single int is provided, it is used for all devices.
+            If a dictionary is provided, it should map device names to their respective number of timesteps.
+            If None, default chunk sizes defined in the configuration are used.
+        target_sampling_rates : float or dict[str, float], optional
+            Target sampling rate(s) in Hz. If a single float is provided, it is applied to all devices.
+            If a dictionary is provided, it should map device names to their respective sampling rates.
+            If None or a device is not specified in the dictionary, the default sampling rate from the modality config is used.
+        devices : str or list of str, optional
+            Devices to retrieve data for. If None, all available devices (`self.devices`) are used.


The docstring has incomplete or missing documentation for the 'start_time' parameter. The Parameters section jumps directly to documenting 'chunk_sizes' without first documenting 'start_time', which is the first parameter in the function signature. Add proper documentation for the 'start_time' parameter explaining that it represents the starting time point from which to retrieve data chunks.

Copilot · 2026-01-01T18:47:31Z

experanto/experiment.py

+    def get_data_for_chunks(
+        self,
+        start_time: float,
+        chunk_sizes: Optional[Union[int, dict[str, int]]] = None,
+        target_sampling_rates: Optional[Union[float, dict[str, float]]] = None,
+        devices: Optional[Union[str, List[str]]] = None,
+    ) -> dict[str, np.ndarray]:


The return type annotation for 'get_data_for_chunks' indicates it returns 'dict[str, np.ndarray]', but the function actually returns a tuple of two dictionaries: '(out, timestamps)' as seen on line 204. The correct return type should be 'tuple[dict[str, np.ndarray], dict[str, np.ndarray]]'.

Copilot · 2026-01-01T18:47:31Z

tests/test_experiment.py

+def test_get_device_offset_applies_scale_precision():
+    with create_experiment() as experiment_path:
+        modality_config = get_default_config()
+        modality_config["device_0"]["offset"] = 0.1234
+        experiment = Experiment(
+            root_folder=experiment_path,
+            modality_config=modality_config,
+            interpolate_precision=3,
+        )
+
+        offset = experiment._get_device_offset("device_0")
+        assert offset == 123


The test expects the offset to be scaled by the precision (0.1234 * 10^3 = 123.4, rounded to 123), but there's a potential issue: the test doesn't verify that this is the correct behavior, and the implementation in _get_device_offset (line 232-234) uses round() which could produce unexpected results for edge cases. Consider adding assertions that verify the rounding behavior matches expectations, especially for negative offsets and values near rounding boundaries.

Copilot · 2026-01-01T18:47:31Z

experanto/datasets.py

@@ -85,11 +85,8 @@ def __init__(
        modality_config: dict = DEFAULT_MODALITY_CONFIG,
        seed: Optional[int] = None,
        safe_interval_threshold: float = 0.5,


The 'interpolate_precision' parameter was removed from ChunkDataset's init signature (previously on removed lines around line 87). However, Experiment now has this parameter with a default value of 5. If ChunkDataset users previously relied on setting a custom interpolate_precision, they can no longer do so through ChunkDataset. This is a breaking API change. Consider whether ChunkDataset should still accept and forward this parameter to Experiment for backward compatibility.

Suggested change

safe_interval_threshold: float = 0.5,

safe_interval_threshold: float = 0.5,

interpolate_precision: int = 5,

Copilot · 2026-01-01T18:47:31Z

experanto/datasets.py

        out = {}
        timestamps = {}
        s = self._valid_screen_times[idx]
+        out, timestamps = self._experiment.get_data_for_chunks(
+            s,
+            chunk_sizes=self.chunk_sizes,
+            target_sampling_rates=self.sampling_rates,
+            devices=self.device_names,
+        )


The variable name 'out' is used to store both data and timestamps initially (line 615), but then 'out' is reassigned to only contain data while 'timestamps' is a separate variable. This creates confusion about what 'out' represents at different points in the code. Consider using more descriptive names like 'data' instead of 'out' from the beginning, or avoiding the initial assignment pattern altogether by directly unpacking: 'data, timestamps = self._experiment.get_data_for_chunks(...)'.

Copilot · 2026-01-01T18:47:32Z

tests/create_experiment.py

+import yaml
+


Import of 'yaml' is not used.

Suggested change

import yaml

… into querying_by_time

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-09T15:55:42Z

tests/create_sequence_data.py

+    if shifts_per_signal:
+        shifts = np.random.rand(n_signals) / meta["sampling_rate"] * 0.9
+        np.save(sequence_root / "meta" / "phase_shifts.npy", shifts)

-            fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape)
-            fp[:] = data[:]
-            fp.flush()  # Ensure data is written to disk
-            del fp
-        meta["dtype"] = str(data.dtype)
+    with open(sequence_root / "meta.yml", "w") as f:
+        yaml.dump(meta, f)

-        if shifts_per_signal:
-            shifts = np.random.rand(n_signals) / meta["sampling_rate"] * 0.9
-            np.save(SEQUENCE_ROOT / "meta" / "phase_shifts.npy", shifts)
+    return timestamps, data, shifts if shifts_per_signal else None


The variable shifts is only defined when shifts_per_signal is True (line 63), but it's referenced in the ternary expression on line 69 (shifts if shifts_per_signal else None). In Python, even though the condition is evaluated first, all names in the expression must be defined. If shifts_per_signal is False, this will raise a NameError. Initialize shifts = None before line 62 to fix this issue.

Copilot · 2026-01-09T15:55:43Z

experanto/experiment.py

-    def interpolate(self, times: slice, device=None) -> tuple[np.ndarray, np.ndarray]:
+    def interpolate(
+        self, times: slice, device: str = None
+    ) -> tuple[np.ndarray, np.ndarray]:


The return type annotation is incorrect. The method now returns only the interpolated values (either a dict or np.ndarray), not a tuple of (values, valid). The return type should be Union[dict[str, np.ndarray], np.ndarray] to match the actual return behavior.

Suggested change

) -> tuple[np.ndarray, np.ndarray]:

) -> Union[dict[str, np.ndarray], np.ndarray]:

Copilot · 2026-01-09T15:55:43Z

experanto/experiment.py

+    def get_data_for_chunks(
+        self,
+        start_time: float,
+        chunk_sizes: Optional[Union[int, dict[str, int]]] = None,
+        target_sampling_rates: Optional[Union[float, dict[str, float]]] = None,
+        devices: Optional[Union[str, List[str]]] = None,
+    ) -> dict[str, np.ndarray]:
+        """
+        Retrieve interpolated data for a fixed number of timesteps (chunk size) per device.
+
+        Parameters
+        ----------
+        chunk_sizes : int or dict[str, int], optional
+            Number of time steps to retrieve per device. If a single int is provided, it is used for all devices.
+            If a dictionary is provided, it should map device names to their respective number of timesteps.
+            If None, default chunk sizes defined in the configuration are used.
+        target_sampling_rates : float or dict[str, float], optional
+            Target sampling rate(s) in Hz. If a single float is provided, it is applied to all devices.
+            If a dictionary is provided, it should map device names to their respective sampling rates.
+            If None or a device is not specified in the dictionary, the default sampling rate from the modality config is used.
+        devices : str or list of str, optional
+            Devices to retrieve data for. If None, all available devices (`self.devices`) are used.
+
+        Returns
+        -------
+        dict[str, np.ndarray]
+            A dictionary mapping each device name to its corresponding interpolated data array
+            of shape `(chunk_size, ...)`.
+
+        Raises
+        ------
+        AssertionError
+            If a specified device is not found in `self.devices`.
+        ValueError
+            If no sampling rate is specified or available for a device.
+        """


The start_time parameter is missing from the documentation. While other parameters are documented, start_time (the first required parameter) is not described in the Parameters section.

Copilot · 2026-01-09T15:55:43Z

experanto/experiment.py

+        if chunk_sizes is None:
+            return {d: self.modality_config[d].get("chunk_size") for d in devices}
+        elif isinstance(chunk_sizes, (int, float)):
+            return {d: int(chunk_sizes) for d in devices}


The _resolve_chunk_sizes method converts floats to ints, but chunk sizes should always be integers. Accepting floats as a valid type (in line 225) could lead to confusion. Consider removing float from the type check or add validation to ensure only integer values are provided.

Suggested change

return {d: int(chunk_sizes) for d in devices}

if isinstance(chunk_sizes, float) and not chunk_sizes.is_integer():

raise ValueError(

f"chunk_sizes must be an integer number of timesteps, got {chunk_sizes!r}"

)

resolved_size = int(chunk_sizes)

return {d: resolved_size for d in devices}

Copilot · 2026-01-09T15:55:44Z

experanto/datasets.py

        out = {}
        timestamps = {}


The variables out and timestamps are initialized as empty dictionaries on lines 612-613, but immediately overwritten on line 615 when unpacking the return value from get_data_for_chunks. The initialization is unnecessary and should be removed.

Suggested change

out = {}

timestamps = {}

reneburghardt added 4 commits July 3, 2025 16:40

introduce draft version of function to get data within an interval fr…

8f16338

…om experiment

move time_delta creation outside of loop

1f3f04c

add docstring, formatting and dict of target sampling rates

dd100fc

introduce get_data_for_chunks

acdc49b

reneburghardt added 2 commits July 4, 2025 10:25

code styling

c697353

remove precision argument

04937fe

pollytur requested review from Copilot and pollytur July 4, 2025 10:11

Copilot AI reviewed Jul 4, 2025

View reviewed changes

fix return type

bd69238

goirik-chakrabarty mentioned this pull request Oct 10, 2025

Added example notebook and changes for added user defined filter functions #88

Open

reneburghardt added 2 commits October 14, 2025 23:02

create first tests for experiment

95cb5ef

fix sequence tests and finish writing tests for experiment class

f368935

reneburghardt marked this pull request as ready for review October 15, 2025 10:55

schewskone reviewed Oct 15, 2025

View reviewed changes

experanto/experiment.py Show resolved Hide resolved

tests/create_experiment.py Show resolved Hide resolved

schewskone approved these changes Oct 15, 2025

View reviewed changes

reuse offset function

47d985e

reneburghardt marked this pull request as draft December 31, 2025 08:49

Merge branch 'main' of https://github.com/sensorium-competition/exper…

37bb42d

…anto into querying_by_time

Copilot AI review requested due to automatic review settings January 1, 2026 18:43

Copilot started reviewing on behalf of reneburghardt January 1, 2026 18:43 View session

Copilot AI reviewed Jan 1, 2026

View reviewed changes

reneburghardt added 3 commits January 9, 2026 16:41

make returns of interpolate functions consistent

509a56b

make returns of interpolate functions consistent

2ebdd14

Merge branch 'querying_by_time' of github.com:reneburghardt/experanto…

baa0fe8

… into querying_by_time

Copilot AI review requested due to automatic review settings January 9, 2026 15:50

Copilot started reviewing on behalf of reneburghardt January 9, 2026 15:50 View session

Copilot AI reviewed Jan 9, 2026

View reviewed changes

	return chunk_sizes
	# Ensure all chunk sizes are integers, even when provided as a dict
	return {device: int(size) for device, size in chunk_sizes.items()}

	safe_interval_threshold: float = 0.5,
	safe_interval_threshold: float = 0.5,
	interpolate_precision: int = 5,

	) -> tuple[np.ndarray, np.ndarray]:
	) -> Union[dict[str, np.ndarray], np.ndarray]:

-            return {d: int(chunk_sizes) for d in devices}
+            if isinstance(chunk_sizes, float) and not chunk_sizes.is_integer():
+                raise ValueError(
+                    f"chunk_sizes must be an integer number of timesteps, got {chunk_sizes!r}"
+                )
+            resolved_size = int(chunk_sizes)
+            return {d: resolved_size for d in devices}

Querying by time and move data interpolating into experiment #85

Are you sure you want to change the base?

Querying by time and move data interpolating into experiment #85

Uh oh!

Conversation

reneburghardt commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitnotebooks bot commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

reneburghardt commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

pollytur commented Oct 15, 2025

Uh oh!

reneburghardt commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pollytur commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

reneburghardt commented Jul 4, 2025 •

edited

Loading

reneburghardt commented Oct 15, 2025 •

edited

Loading

pollytur commented Dec 11, 2025 •

edited

Loading