Goals:
- Fetch annotated articles from variantAnnotations stored in PharmGKB API
- Create a general benchmark for an extraction system that can output a score for an extraction system Given: Article, Ground Truth Variants (Manually extracted and recorded in var_drug_ann.tsv:) Input: Extracted Variants Output: Score
- System for extracting drug related variants annotations from an article. Associations in which the variant affects a drug dose, response, metabolism, etc.
- Continously fetch new pharmacogenomic articles
This repository contains Python scripts for running and building a Pharmacogenomic Agentic system to annotate and label genetic variants based on their phenotypical associations from journal articles.
We manage a few repos externally:
- PubMed Downloader: This repo is used to download all the markdown files from the PMIDs represented in
var_drug_ann.tsv - Huggingface/AutoGKB: This converts the annotations and article text into a dataset format for benchmarking
pixi run setup-repo
OR
pixi run gdown —-id 1qtQWvi0x_k5_JofgrfsgkWzlIdb6isr9
unzip autogkb-data.zip