Inefficient caching with polars

### Problem

`__getitem__`  does three full-DataFrame scans per call:

```python
stay_id = self.outcome_df[self.vars["GROUP"]].unique()[idx]      # scan 1: outcome_df
window = self.features_df.filter(pl.col(GROUP) == stay_id)...    # scan 2: features_df (2.8M rows)
labels = self.outcome_df.filter(pl.col(GROUP) == stay_id)...     # scan 3: outcome_df
```

Every stay lookup re-examined the entire feature frame, which makes caching take a couple of minutes. For example, for the AKI dataset (~66k stays, ~2.8M feature rows), building the RAM cache calles `__getitem__` ~66k times, each scanning ~2.8M rows. 

### Proposed solution

Partition both DataFrames by stay once at `__init__`, keep the per-stay slices as numpy arrays in dicts, and make `__getitem__` do a dict lookup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient caching with polars #184

Problem

Proposed solution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inefficient caching with polars #184

Description

Problem

Proposed solution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions