Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ repos:
- id: check-toml

- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.8.22
rev: 0.10.10
hooks:
- id: uv-lock

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: 'v0.12.12'
rev: 'v0.15.6'
hooks:
- id: ruff-check
args: [--fix, --unsafe-fixes, --exit-non-zero-on-fix]
Expand All @@ -34,7 +34,7 @@ repos:
types_or: [python, jupyter]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.17.1
rev: v1.19.1
hooks:
- id: mypy
entry: python3 -m mypy --config-file pyproject.toml
Expand All @@ -43,7 +43,7 @@ repos:
exclude: "data"

- repo: https://github.com/crate-ci/typos
rev: v1.37.1
rev: v1
hooks:
- id: typos
exclude: ^data/
Expand Down
8 changes: 5 additions & 3 deletions src/biothink/self_reflection/data_process/process_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,9 +201,11 @@ def process_dataset(dataset, remove_original_output=False):
# First, filter out rows with unwanted tokens to avoid multiprocessing issues
print("Filtering dataset to remove unwanted tokens...")
filtered = dataset.filter(
lambda x: "[No Retrieval]" not in x.get("output", "")
and "[Continue to Use Evidence]" not in x.get("output", "")
and x.get("output", "").startswith("[Retrieval]"),
lambda x: (
"[No Retrieval]" not in x.get("output", "")
and "[Continue to Use Evidence]" not in x.get("output", "")
and x.get("output", "").startswith("[Retrieval]")
),
num_proc=4,
)

Expand Down
Loading