Decoding Functional Specialization in GPCRs through Evolution-Guided Residue Profiling - Supplementary Information
This directory contains supplementary information for the manuscript "Decoding Functional Specialization in GPCRs through Evolution-Guided Residue Profiling".
Supplementary Information/
├── README.md # This file
├── Supplementary_Table.xlsx # Comprehensive data table with all annotations
├── Figures/ # Manuscript figures
│ ├── Figure1.png # Automated ortholog identification workflow
│ ├── Figure2.png # Class A, olfactory, and Class T analysis
│ ├── Figure3.png # Class B1 and B2 analysis
│ ├── Figure4.png # Class F analysis
│ ├── Figure5.png # Class C analysis
│ └── Figure6.png # Subtype-specific selective residues in ligand and transducer selectivity
└── code_and_data/ # Computational analysis files
├── benchmark/ # Interactive benchmark visualizations
├── ortholog_pipeline/ # Ortholog identification pipeline
├── CRs_and_SRs_on_structure/ # Structural visualizations
└── scatter_plots_and_CR_SR_calculation/ # Scatter plot generation
Supplementary_Table.xlsx contains comprehensive data from the GPCR conservation analysis, including:
- Scatter Plot Information: Conservation percentages and entropy values for all residues
- Initial Annotations: Detailed functional annotations assigned during analysis
- Simplified Annotations: Streamlined functional categories for visualization
- CR/SR Information: Classification of residues as Common (CR) or Selective (SR)
- Class-Specific Data: Separate sheets for each GPCR class (A, B1, B2, C, F, T, Olfactory)
- Ligand: Residues involved in ligand binding
- Transducer: Residues involved in G protein/transducer binding
- Known Motifs: Conserved motifs (CWxP, NPxxY, PIF, DRY, Na+ Pocket)
- Disulfide Bridge: Cysteine residues forming disulfide bonds
- Cholesterol: Cholesterol binding site residues
- ICL2: Intracellular loop 2 residues (Class T specific)
- Tethered Agonist: Residues involved in tethered agonist binding (Class B2)
- VFT Ligand: Venus Flytrap domain ligand binding residues (Class C)
- Allosteric Modulator: Residues involved in allosteric modulation
- WNT Binding: Residues involved in WNT protein binding (Class F)
- Other: Residues without specific functional annotation
Each GPCR class has its own worksheet containing:
- Residue numbers (using class-specific reference proteins)
- GPCRdb numbers for the given reference receptor
- Family-wide conservation percentages across family members
- Family-wide entropy values for sequence variability
- Initial detailed annotations
- Simplified functional categories
- CR/SR classification with thresholds
The Figures/ directory contains all manuscript figures:
- Figure 1: Automated ortholog identification and conservation analysis workflow
- Figure 2: Family-wide conservation and structural mapping for Class A, olfactory, and Class T receptors
- Figure 3: Conservation and structural organization for Class B1 and adhesion (B2) GPCRs
- Figure 4: Evolutionary conservation patterns in Class F GPCRs across distinct domains
- Figure 5: Evolutionary conservation patterns in Class C GPCRs across different domains
- Figure 6: Subtype-specific selective residues in ligand and transducer selectivity
The code_and_data/ directory contains all computational analysis files:
- Interactive HTML visualizations for benchmarking evolutionary predictions against experimental data
- 4 visualization files:
FigS1_DMS_benchmark.html- Deep mutational scanning data (ADRB2, MC4R, GPR68, V2R)FigS2_correlation_plots.html- Correlation analyses across experimental datasetsFigS3_evolutiıon_benchmark.html- Evolution-based functional predictions (OPSD, Sanders)Fig2cde.html- Combined view (ClinVar, mutational intolerance, activation network)
- Consolidated mapping: Single unified GPCR residue mapping file for mapping class A receptors to HRH2
- Data folder: Experimental datasets including DMS studies, clinical variants, and evolutionary data
- Analysis scripts: Python scripts for calculating combined mutational intolerance scores
- See
benchmark/README.mdfor detailed documentation
- Complete pipeline for identifying GPCR orthologs
- Python scripts for sequence processing and phylogenetic analysis
- Human protein sequences organized by GPCR class
- SLURM scripts for cluster execution
- Structural visualizations of CR/SR mapping
- PyMOL session files for interactive 3D analysis
- AlphaFold2 models and experimental structures
- PNG images used as manuscript figures
- Scripts for generating scatter plots
- CR/SR calculation algorithms
- Label files with simplified annotations
- Special analysis for Class F GPCRs
- Class alignments: Multiple sequence alignments used for receptor mapping and analysis
- Open Supplementary_Table.xlsx for comprehensive analysis data
- Navigate to specific class worksheets for detailed information
- Use conservation and entropy values for further analysis
- Open PNG files in any image viewer
- Figures correspond to manuscript figures with same numbering
- Navigate to code_and_data/benchmark/
- Open any HTML file in a modern web browser (Chrome, Firefox, Edge)
- Explore interactive plots:
- FigS1: Deep mutational scanning benchmarks (10 panels)
- FigS2: Correlation analyses across datasets (20 panels)
- FigS3: Evolution-based predictions (2 panels)
- Fig2cde: Combined horizontal view (3 panels)
- Features: hover for details, click legend to toggle, pan and zoom
- All data loads locally - no internet connection required
- Navigate to code_and_data/ for computational scripts
- Follow README files in each subdirectory for specific instructions
- Use provided SLURM scripts for cluster execution
- Run benchmark/data/calculate_combined_significance.py to calculate combined mutational intolerance scores
The analysis uses representative sequences ("reps") and multiple sequence alignments to determine receptor sets and mapping:
- CD-HIT clustering identifies representative sequences from ortholog sets
- Top 5 largest clusters are selected as representatives
- Representatives provide evolutionary diversity while reducing computational complexity
- Initial Alignment: Representative sequences + human sequences aligned using MAFFT
- Human Extraction: Human sequences extracted from the alignment
- Final Analysis: Family-wide analysis performed on human-only alignments
- Full alignments: Complete receptor sequences with representatives
- Human-only alignments: Extracted human sequences for family-wide analysis
- Domain-specific alignments: Specific regions (VFT domain, CRD domain, etc.)
Different alignments produce different outcomes:
- Receptor Set: Different representative sequences change receptor composition
- Mapping: Alignment positions determine residue numbering and conservation calculations
- Analysis Scope: Domain-specific alignments focus on particular functional regions
The class_alignments/ directory contains:
- Standard alignments with representatives
- Human-only extracted alignments
- Domain-specific alignments for particular regions
- Scripts for extracting human sequences from representative alignments
All data and code are available at: https://github.com/CompGenomeLab/GPCR_Family_Divergence
For questions or issues with the data or code, please contact:
- Berkay Selcuk: selcuk.1@buckeyemail.osu.edu
- Ogün Adebali: oadebali@sabanciuniv.edu