A comprehensive benchmarking system for evaluating the quality of genomic knowledge base annotations.
This repository contains two versions of the AutoGKB benchmark system:
- Benchmark V1: The original, comprehensive benchmark that evaluates four types of annotations.
- Benchmark V2: A newer, more modular benchmark that is currently focused on variant matching and sentence-level validation.
The original benchmark system evaluates four types of annotations:
- Drug Annotations (
var_drug_ann) - Drug-gene-variant associations - Phenotype Annotations (
var_pheno_ann) - Phenotype-gene-variant associations - Functional Analysis (
var_fa_ann) - Functional effects of variants - Study Parameters (
study_parameters) - Study design and statistical parameters
PYTHONPATH=src pixi run python src/benchmark_v1/run_benchmark.pyPYTHONPATH=src pixi run python src/benchmark_v1/run_benchmark.py --single_file PMC5508045PYTHONPATH=src pixi run python src/benchmark_v1/run_benchmark.py \
--single_file PMC5508045 \
--show_mismatchesThe overall score is a weighted average of individual benchmark scores. Each field is scored using appropriate comparison metrics, including exact match, semantic similarity, and numeric tolerance. The system also performs dependency validation to penalize logical inconsistencies.
For more details on the V1 benchmark, please refer to the original README_BENCHMARK.md.
The V2 benchmark is a newer, more modular system designed for focused evaluations. It currently includes benchmarks for variant matching, sentence validation, and field extraction.
The variant benchmark (variant_bench.py) compares a list of proposed variants against a ground truth set, calculating match rates, misses, and extras.
PYTHONPATH=src pixi run python src/benchmark_v2/variant_bench.py score_annotation <path_to_annotation_file>PYTHONPATH=src pixi run python src/benchmark_v2/variant_bench.py score_all_annotations --annotations_dir <path_to_annotations_dir>PYTHONPATH=src pixi run python src/benchmark_v2/variant_bench.py score_generated_variants <path_to_generated_variants_file>The sentence benchmark (sentence_bench.py) evaluates the quality of generated sentences against ground truth sentences from the literature.
The field extractor (field_extractor.py) is a utility for extracting specific fields from annotation files.
The V2 variant benchmark provides a JSON output with the following structure:
{
"timestamp": "",
"run_name": "",
"total_match_rate": 0.0,
"per_annotation_scores": [
{
"pmcid": "PMC5508045",
"title": "",
"match_rate": 0.0,
"matches": [],
"misses": [],
"extras": []
}
]
}The src/modules directory contains modular, multi-stage pipelines for developing and evaluating new methods for knowledge extraction. Each stage is designed to be run independently, with outputs from one stage feeding into the next.
The primary experimental pipelines are:
- Variant Finding: Extracts genetic variants from full-text articles.
- Sentence Generation: Generates sentences describing the clinical significance of each variant.
- Citation Finding: Identifies the source sentence from the original article that supports each generated sentence.
- Summary Generation: Creates a final, concise summary of the key pharmacogenomic findings in the article.
Each experiment directory contains a detailed README.md with instructions on how to run the specific pipeline, including example commands and descriptions of available methods. Please refer to these files for more information on each stage of the experimental process.
Required packages are managed with pixi and are listed in the pixi.toml file. Key dependencies include:
sentence-transformersscikit-learnnumpypydantic
To install all dependencies, run:
pixi installWhen adding new features:
- Add new benchmark modules to the
src/benchmark_v2directory. - Ensure that new features are tested.
- Document new metrics and functionalities in this README.