Clinical Variant Pathogenicity Analysis (GRCh38)

Assignment Overview

This assignment investigates the pathogenicity of six clinically significant genetic variants using publicly available genomic databases and standardized variant interpretation guidelines. The objective is to integrate clinical, computational, and genomic evidence into a structured and reproducible analysis workflow.

All genomic coordinates are based on the GRCh38 reference genome.

Selected Variants

Common Genetic Diseases

Disease	Gene	Variant (HGVS)	Genomic Position (GRCh38)	Clinical Significance
Duchenne Muscular Dystrophy	DMD	c.8713C>T (p.Arg2905Ter)	chrX:32389644	Pathogenic
Cystic Fibrosis	CFTR	c.1521_1523delCTT (p.Phe508del)	chr7:117548628	Pathogenic
Hereditary Breast & Ovarian Cancer	BRCA1	c.5266dupC (p.Gln1756Profs*74)	chr17:43093141	Pathogenic

Rare Genetic Diseases

Disease	Gene	Variant (HGVS)	Genomic Position (GRCh38)	Clinical Significance
Wilson Disease	ATP7B	c.3207C>A (p.His1069Gln)	chr13:52533870	Pathogenic
Gaucher Disease Type 1	GBA	c.1226A>G (p.Asn409Ser)	chr1:155235252	Pathogenic
Phenylketonuria	PAH	c.1222C>T (p.Arg408Trp)	chr12:102917337	Pathogenic

Variants were selected from ClinVar based on:

Pathogenic or Likely Pathogenic classification
Strong review status (≥2 stars when available)
Multiple independent submissions

Methodology

1. Variant Selection and Clinical Evidence (ClinVar)

For each disorder:

The disease was searched in ClinVar.
Variants classified as Pathogenic or Likely Pathogenic were filtered.
Review status and supporting studies were examined.
Gene name, HGVS nomenclature, genomic coordinates, and clinical assertions were extracted.
Evidence summaries were written in the Excel file.

2. Phenotype and Inheritance Analysis (OMIM)

For each gene:

The OMIM entry was reviewed.
Disease phenotype descriptions were extracted.
Major clinical features were summarized.
Mode of inheritance (Autosomal Dominant, Autosomal Recessive, etc.) was documented in the Excel file.

3. Pathogenicity Scoring (UCSC Genome Browser)

Each variant position was analyzed using the UCSC Genome Browser with the following tracks enabled:

AlphaMissense
REVEL

The following were recorded:

AlphaMissense classification
AlphaMissense score
REVEL score

Screenshots were captured and stored in the /Screenshots directory in the repository.

4. ACMG/AMP Variant Interpretation

Each variant was evaluated according to ACMG/AMP guidelines.

Criteria assessed included:

PVS1 (null variant in gene with established loss-of-function mechanism)
PS1 (same amino acid change as known pathogenic variant)
PM2 (absence or rarity in population databases)
PP3 (computational evidence supporting deleterious effect)

A final ACMG classification was assigned with justification in the Excel file.

5. VCF File Creation

A multi-variant VCF file was manually constructed using GRCh38 coordinates for all 6 variants across both common and rare disease categories.

The VCF file includes:

VCF v4.2 header with contig definitions
Chromosome (CHROM)
Position (POS)
Reference allele (REF)
Alternate allele (ALT)
Quality (QUAL)
Filter status (FILTER)
INFO annotations including GENE, PROT, CONSEQUENCE, CLNSIG, CLNDN, CLNREVSTAT, ALPHAMISSENSE, REVEL, ACMG criteria, and TRANSCRIPT

The file was compressed and indexed using bgzip and tabix in WSL. The tabix indexing step was used as a workaround to resolve contig header parsing issues encountered during sorting:

# Compress and index VCF
bgzip patient_variants.vcf
tabix -p vcf patient_variants.vcf.gz

# Sort and index
bcftools sort patient_variants.vcf.gz -Oz -o patient_variants_sorted.vcf.gz
bcftools index -t patient_variants_sorted.vcf.gz

ClinVar Annotation

Variants were annotated using the official GRCh38 ClinVar VCF dataset downloaded from NCBI ClinVar. A vcfanno configuration file (clinvar_config.toml) was created to extract the following ClinVar INFO fields and add them to the variant file: CLNSIG, CLNDN, CLNREVSTAT, CLNDISDB, and CLNHGVS.

The config file is available in the /Data directory in the repository.

Workflow (done in WSL):

# Install required tools
sudo apt update
sudo apt install bcftools tabix wget -y

# Download vcfanno
wget https://github.com/brentp/vcfanno/releases/download/v0.3.5/vcfanno_linux64
chmod +x vcfanno_linux64
sudo mv vcfanno_linux64 /usr/local/bin/vcfanno

# Download ClinVar GRCh38 dataset
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi

# Compress and index VCF (tabix used to bypass contig header issue)
bgzip patient_variants.vcf
tabix -p vcf patient_variants.vcf.gz

# Sort and index
bcftools sort patient_variants.vcf.gz -Oz -o patient_variants_sorted.vcf.gz
bcftools index -t patient_variants_sorted.vcf.gz

# Annotate using vcfanno
vcfanno clinvar_config.toml patient_variants_sorted.vcf.gz > patient_variants_annotated.vcf

# Export summary as TSV
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/GENE\t%INFO/CLNSIG\t%INFO/CLNDN\t%INFO/CLNREVSTAT\n' \
  patient_variants_annotated.vcf -o annotated_summary.tsv

The annotated,original and sorted VCF files are available in the /Data directory in this repository.

Assignment Submitted by:

Washma Sajjad
Ashna Abrar
Nawal Babar
Manaal Tufail

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Data		Data
Excel File		Excel File
Screenshots		Screenshots
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Variant Pathogenicity Analysis (GRCh38)

Assignment Overview

Selected Variants

Common Genetic Diseases

Rare Genetic Diseases

Methodology

1. Variant Selection and Clinical Evidence (ClinVar)

2. Phenotype and Inheritance Analysis (OMIM)

3. Pathogenicity Scoring (UCSC Genome Browser)

4. ACMG/AMP Variant Interpretation

5. VCF File Creation

ClinVar Annotation

Assignment Submitted by:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Clinical Variant Pathogenicity Analysis (GRCh38)

Assignment Overview

Selected Variants

Common Genetic Diseases

Rare Genetic Diseases

Methodology

1. Variant Selection and Clinical Evidence (ClinVar)

2. Phenotype and Inheritance Analysis (OMIM)

3. Pathogenicity Scoring (UCSC Genome Browser)

4. ACMG/AMP Variant Interpretation

5. VCF File Creation

ClinVar Annotation

Assignment Submitted by:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages