Skip to content
View shraddhapiparia's full-sized avatar
💭
All problems in Computer Science can be solved with another level of indirection
💭
All problems in Computer Science can be solved with another level of indirection

Block or report shraddhapiparia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shraddhapiparia/README.md

Hi there, I’m Shraddha Piparia — Computational Biologist & ML Scientist 👋

Computational biologist and machine learning scientist working at the intersection of genomics, proteomics, single-cell biology, and reproducible computational pipelines.

I have a Ph.D. in Computer Science and currently work in computational biology, building ML methods and scalable workflows for disease heterogeneity, patient stratification, and treatment response.

  • Genomics + representation learning
  • Single-cell and multi-omics analysis
  • UK Biobank and large-scale proteomics
  • Reproducible bioinformatics pipelines

🌐 Personal site


Start Here

If you only look at a few projects, these are the best entry points:

  1. Genotype representation learning – LD-aware VAE + transformer framework for discovering latent genomic structure and asthma-related biology.
  2. Single-cell benchmarking – reproducible single-cell workflows and perturbation-aware cell-state scoring.
  3. Large-scale proteomics – Olink NPX and UK Biobank pipelines for Long COVID and disease subtype discovery.
  4. End-to-end reproducible genomics pipelines – Nextflow, Docker, and configuration-driven workflows from raw sequencing data to analysis.

Featured Projects

LD-aware representation learning for genotype data using VAEs and transformers.

  • Recovering asthma-related genomic structure without phenotype labels
  • Identified strong HLA class II and PDE4D signals from latent embeddings
  • Demonstrated improved latent organization versus baseline approaches

Key result: recovered known asthma biology with embedding structure achieving ARI = 0.999 and revealing multiple independent HLA haplotype axes.

Benchmarking perturbation-aware cell-state scoring methods in single-cell RNA-seq data.

  • Reproducible Scanpy workflow from preprocessing through scoring and visualization
  • Compared multiple scoring approaches and control gene sets
  • Includes interferon-stimulated PBMC analysis and communication scoring

Key result: built an interpretable benchmark where interferon-response programs achieved near-perfect separation while preserving biologically meaningful cell-state differences.

Scalable Olink NPX proteomics analysis for pediatric Long COVID and UK Biobank replication.

  • Disease subgroup discovery using WHO-defined symptom profiles
  • Spark + SQL workflows for tens of thousands of participants
  • Regression, enrichment, volcano plots, and replication analyses

Key result: developed reproducible pipelines to compare neurocognitive and non-neurocognitive Long COVID subtypes across pediatric and UK Biobank cohorts.

Educational but production-style genomics workflow from FASTQ through variant calling, GWAS, PRS, and biological interpretation.

  • Covers QC, alignment, variant calling, annotation, PCA, GWAS, PRS, and eQTL analysis
  • Uses a centralized Conda environment and documented module-level workflows
  • Excludes large/generated outputs from version control while preserving reproducible commands

Key result: provides a clean, reproducible template for moving from raw sequencing data to interpretable disease-associated variants.

Analysis of miRNA signatures associated with inhaled corticosteroid response in asthma.

  • Differential expression, enrichment, and subgroup analyses
  • Focus on miR-584-5p and neurocognitive phenotype differences
  • Includes publication-ready plots and reproducible analysis scripts

Key result: identified candidate miRNA signatures associated with treatment response heterogeneity.


Selected Technical Themes

Machine Learning & Modeling

Python · PyTorch · Representation learning · VAEs · Transformers · SHAP · NLP

Computational Biology

Genomics · GWAS/PLINK · Polygenic Risk Scores · Single-cell RNA-seq · Multi-omics · UK Biobank · Scanpy · Seurat · Olink NPX proteomics · NGS pipelines · WNN integration

Scalable & Reproducible Infrastructure

Nextflow · Docker · Conda · SLURM · GitHub Actions · Spark SQL · UK Biobank RAP

Data & Visualization

NumPy · Pandas · scikit-learn · Matplotlib · Seaborn · SQL · PySpark


Research and Writing


What I Care About

I enjoy building computational biology projects that are both scientifically meaningful and technically reproducible: methods that can move from raw data to interpretable biological insight, with clear workflows, versioned environments, and reusable code.

Pinned Loading

  1. COVID-Radiology-Study COVID-Radiology-Study Public

    Pediatric CXR radiology impression ML (NLP features + Random Forest + SHAP)

    Jupyter Notebook

  2. miRNA_ics_interaction miRNA_ics_interaction Public

    Scalable R-based workflow for identifying miRNA modifiers of inhaled corticosteroid response in asthma using interaction models, reproducible renv environments, and publication-ready visualizations.

    R

  3. proteomics_npx_analysis proteomics_npx_analysis Public

    Scalable Olink NPX proteomics workflow for identifying neurocognitive Long COVID signatures in pediatric and UK Biobank cohorts using logistic regression, PySpark, and protein interaction analysis.

    Python

  4. blockbased-genotype-embedding-analysis blockbased-genotype-embedding-analysis Public

    This repository explores LD-aware genotype embeddings for asthma by learning compact representations of genomic blocks instead of individual SNPs. The resulting subject-level embeddings are used fo…

    Python

  5. sc-cell-state-benchmark sc-cell-state-benchmark Public

    Benchmarking cell-state scoring methods in single-cell RNA-seq using biological perturbations, negative controls, and robustness tests to distinguish real signal from technical noise.

    Python

  6. from-fastq-to-asthma-gwas from-fastq-to-asthma-gwas Public

    From raw sequencing reads to asthma-focused GWAS: small, practical genomics projects demonstrating QC, alignment, GATK variant calling, annotation, PCA, PRS, eQTL follow-up, and reproducible Nextfl…

    Python