Skip to content

canvas-medical/dataset-usage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Expert-Annotated Clinical Encounters: Dataset Usage

Scripts and prompts for working with the Expert-Annotated Clinical Encounters dataset, a benchmark for evaluating AI-generated medical documentation.

The dataset contains 513 clinical encounter cases with expert-authored evaluation rubrics. Each case includes a conversational transcript, longitudinal patient chart history, point-in-time encounter context, and three evaluation rubrics (two clinician-authored, one LLM-drafted).

Dataset: PhysioNet (link to be updated upon publication)

Repository Structure

dataset-usage/
  scripts/
    load_data.py      # Load and query cases and rubrics from the CSV files
    score_note.py      # Interactively score a clinical note against a rubric
  prompts/
    llm_rubric_generation.md   # Prompt template for generating LLM rubrics
    note_scoring.md            # Prompt template for automated note scoring

Getting Started

1. Download the Dataset

Download cases.csv and rubrics.csv from PhysioNet and place them in a local directory (e.g., ./data/).

2. Load and Explore

Look up a case and its associated rubrics:

python scripts/load_data.py --data-dir ./data --case-id 457

Output:

Case 457: patients_sleep_issues_and_dating_life_76746416-b
  Provenance: real_world
  Specialty: psychiatry
  Encounter type: follow-up, medication-review
  Acuity: moderate

  Rubrics (3):
    Rubric 9424 (clinician): 5 criteria, total weight 100.0
      [30] Reward for capturing all key patient symptoms and concerns...
      ...
    Rubric 9451 (clinician): 5 criteria, total weight 100.0
      ...
    Rubric 3744 (llm): 5 criteria, total weight 100.0
      ...

3. Score a Note

Generate a clinical note using your system, save it as a text file, then score it interactively against a rubric:

python scripts/score_note.py --data-dir ./data --rubric-id 9424 --note my_generated_note.txt

The tool displays each criterion and prompts you to mark it as met or not met, then computes the weighted score.

4. Generate Your Own Rubrics

The prompts/llm_rubric_generation.md file contains the exact prompt template used to generate the LLM-drafted rubrics in the dataset. You can use this with any LLM to create rubrics for new cases or to experiment with different rubric generation strategies.

5. Automated Scoring

The prompts/note_scoring.md file contains the prompt template used for automated scoring of notes against rubrics. This is useful for building your own scoring pipeline at scale.

Dataset Schema

cases.csv (513 rows, 20 columns)

Field Description
case_id Unique case identifier
name Descriptive slug
provenance real_world or synthetic
specialty Clinical specialty tag
encounter_type Encounter type tag
encounter_length short, medium, or long
problem_count single-problem or multi-problem
acuity low, moderate, or high
demographics Patient demographics
transcript JSON array of conversational turns
current_medications Current medications (RxNorm coded)
current_conditions Current conditions (ICD-10 coded)
condition_history Condition history (ICD-10 coded)
current_allergies Current allergies
family_history Family history (ICD-10 coded)
surgery_history Surgery history (SNOMED coded)
current_goals Current patient goals
staged_commands Pre-existing note context
clinician_rubric_ids Comma-separated rubric IDs
llm_rubric_id LLM rubric ID

rubrics.csv (7,415 rows, 6 columns)

Field Description
rubric_id Groups criteria belonging to the same rubric
case_id Foreign key to cases.csv
author_type clinician or llm
criterion_index Zero-based index within the rubric
criterion Natural-language documentation requirement
weight Clinical importance weight

To reconstruct a full rubric, group rows by rubric_id and order by criterion_index. Each case has exactly 3 rubrics (2 clinician-authored, 1 LLM-drafted).

Scoring Methodology

For a given clinical note and rubric:

  1. Reconstruct the rubric by grouping rows by rubric_id and ordering by criterion_index.
  2. Evaluate each criterion as met or not met.
  3. Compute the score as the sum of weights for met criteria divided by the total rubric weight, yielding a 0-100% normalized score.

Requirements

Python 3.10+ (standard library only, no external dependencies).

Citation

If you use this dataset in your research, please cite:

[Citation to be added upon publication]

License

Scripts in this repository are released under the MIT License. The dataset itself is subject to the PhysioNet data use agreement.

About

Scripts and examples for working with our expert-annotated clinical encounters dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages