Skip to content

Extension of BidsComponent.filter api #335

@pvandyken

Description

@pvandyken

Background

Currently, BidsComponentRow.filter supports a position-only spec argument that takes directly the list of values to be kept. The idea is that since the row has only one entity, explicitly specifying the entity name via kwarg is redundant.

When implementing the above, we had raised the possibility of extending the spec filtering mechanism to BidsComponent and BidsPartialComponent (hereon treated synonymously). I'd like to propose a start, not necessarily comprehensive, for this API.

Note below when I refer to "entries", I refer to a specific entity combination in a component (e.g. (subject, session, run)

Proposal

Filtering with lists of tuples

Analogous to BidsComponentRow.filter taking a list of values, BidsComponent.filter shall take list of tuples of strings:

class BidsComponent:
	def filter(self, spec: Sequence[tuple[str, ...]], /, ...): ...

Each tuple shall of the same length, equal to the number of entities in the component. Each position in the tuple shall correspond to the one of the component entities, the order matching the internal entity order of the component. For example, for a component with entities "subject", "session" (in that order), a filter spec may look like: [("001", "01"), ("001", "02"), ("002", "01"), ...].

Filtering with BidsComponents

Any BidsComponent may be filtered against another BidsComponent. Only entries found in the template component shall be kept in the original component. Entities found in only one of the two components will not be considered. The logic would be identical to that proposed for the consensus logic in BidsDataset.exand()

Motivating Example

The above API would enable the following example:

# pandas dataframe containing metadata for the dataset, indexed by subject and session ids
df = get_metadata()

inputs = generate_inputs(...)
component = inputs["T1w"]

# Select only the subject/session combinations found in the metadata
component = component.filter(component["subject", "session"].filter(df.index))

This example is not easily possible using a combination of snakebids and pure python (my current workaround is to convert the snakebids component into a pandas dataframe). More particularly, indexing by multiple entities (e.g. subject/session) pairs at the same time is not possible with snakebids currently. The new API will enable this, as shown in the example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions