IDG Interactome PPI Analysis Pipeline

The Interactome PPI Analysis Pipeline uses two different protein interaction identifying software – SAINT and CompPASS – to form a consensus prediction on protein-protein interactions. The goal of this pipeline is to better identify protein-protein interactions and filter through contaminants in Affinity Purification Mass Spectrometry (AP/MS) data. It currently adds GO and BioGrid annotations.

#Installation

MaxQuant
Python 3.9
R 4.0.3
Copy precompiled binary file for Saint Express from “external” folder to “bin” folder
Install cRomppass branch from the smarasolo github repository using:

library("devtools")
devtools::install_github("smarasolo/cRomppass")
library("cRomppass")

Install org.Hs.eg.db through the BiocManger package

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("org.Hs.eg.db")

#Input Files experimentalDesign.csv

5 columns
- Experiment name – the protein name and run number of the experiment
  - ** Has to match experiment name that was used in MaxQuant because it is matched with the MS/MS count column from MaxQuant for further analysis **
  - You can make sure the experiment name is the same by checking the experiment_name MS/MS count column in the MaxQuant results
Type – T, for test/experiments, or C, for controls
Bait – name of bait
Replicate – replicate number
Bait ID – Uniprot ID for Bait
- Exactly matches the identifier in the MaxQuant results so if a bait matches two Protein IDS, it is recorded in the Bait Id

proteinGroups.txt

#Usage Basic usage:

Open up a terminal (powershell, cmd, or cmder)
Navigate to the directory with your proteinGroups.txt and experimentDesign.csv file

>cd "C:\Users\majorlab.3225-WD-00001\Box\CellBio-MajorLab\Users\Dhaval\MS data searches\IDG\MiniTurbo-TurboID\2019022"

Execute the program. By default it will look for a proteinGroups.txt file and ExperimentalDesign.csv file. If you changed the names, you can specifiy their path when calling the program:

>score_APMS.exe --experimentalDesign ED_MA.csv

Full usage options:

>score_APMS.exe --help
usage: score_APMS.exe [-h] [--proteinGroups PROTEINGROUPS] [--experimentalDesign EXPERIMENTALDESIGN] [--outputPath OUTPUTPATH] [-qs {spc,LFQ,intensity}]
                      [-qc {spc,LFQ,intensity}] [-s {v2,express,q}] [--nburnin NBURNIN] [--niter NITER] [--lowMode {0,1}] [--minFold {0,1}] [--normalize {0,1}]
                      [--normalize-control {0,1}] [--compress-n-ctrl COMPRESS_N_CTRL] [--compress-n-rep COMPRESS_N_REP]

This is the entry point to the program. It will execute the requested tasks.

optional arguments:
  -h, --help            show this help message and exit
  --proteinGroups PROTEINGROUPS
                        path to MaxQuant ProteinGroups.txt
  --experimentalDesign EXPERIMENTALDESIGN
                        path to experimental design file
  --outputPath OUTPUTPATH
                        path for the output directory. If it already exists it will be overwritten.
  -qs {spc,LFQ,intensity}, --quantification-saint {spc,LFQ,intensity}
                        quantification values to use for SAINT
  -qc {spc,LFQ,intensity}, --quantification-comppass {spc,LFQ,intensity}
                        quantification values to use for CompPASS
  -s {v2,express,q}, --SAINT {v2,express,q}
                        SAINT version to use
  --nburnin NBURNIN     SAINT v2: number of burn-in iterations in MCMC. Default: 2000
  --niter NITER         SAINT v2: number of main iterations in MCMC. Default: 10000
  --lowMode {0,1}       SAINT v2: exclude extremely high counts in the model. - If baits are densely connected or dataset is small (few baits), use 1. - otherwise, use 0.
                        Default: 0
  --minFold {0,1}       SAINT v2: forcing separation between true and false distributions. - If user wishes to allow typical contaminants with significant differential
                        enrichment over control purifications, use 0. - otherwise, use 1. Default: 0
  --normalize {0,1}     SAINT v2: divide the counts by the total spectral counts in each IP. Default: 1
  --normalize-control {0,1}
                        SAINTq: normalize control intensities by multiplying a constant to all control intensities so that the average observed test intensities is equal
                        to the average control intensities. Default: 0
  --compress-n-ctrl COMPRESS_N_CTRL
                        SAINTq: the number of control baits used in calculations, with priority for baits with greatest intensities. Setting this number to a large number
                        makes the program use all available control data (recommend in cases with at most several controls. Default: 1000
  --compress-n-rep COMPRESS_N_REP
                        SAINTq: the number of test bait replicates used for scoring, with priotiy given to the baits with higher probability scores. If this number is
                        greater than or equal to the number of available replicates, then the scores will use the data from all replicates. Otherwise, the highest scoring
                        replicate scores wil lbe averaged to yield the final probability score. Default: 1000

#Output files Saint Input:

bait.txt – list of baits
interaction.txt – list of interactions
prey.txt – list of prey

Saint Output:

list.txt
- Output columns in list:
  - Spec
  - SpecSum
  - AvgSpec
  - NumReplicates
  - ctrlCounts – the number of spectra in the pry across all controls
  - AvgP – the average probability score for a bait and prey pair
  - MaxP – the largest probability score for a bait and prey pair across all replicate purifications
  - TopoAvgP
  - SaintScore - the probability the bait/prey interaction is a true interaction
  - logOddsScore
  - FoldChange
  - BFDR – A Bayesian false discovery rate SAINT calculates from a combined probability score from independent scoring of each replicate
  - Boosted_by

CompPASS Input:

to_CompPASS.csv – formats experimentalDesign to fit input needed for compPASS

CompPASS Output:

compPASS.csv – runs compPASS on data from MaxQuant
- Columns in compPASS
  - AvePSM
  - SumAPSM
  - Mean
  - SD
  - Little.N – Number of baits(including control) that had at least 1 spectral count
  - Little.P – Number of replicates for this particular bait that had at least 1 spectral count
  - Z - Z-score for each bait and prey pair
  - WD – WD-Score for each bait and prey pair as defined by Sowa et al.
  - Entropy – The Shannon Entropy for each spectral count

Final Output:

Merge_CompPASS_Saint.csv – Merges the compPASS and Saint outputs
Annotated_Merge_NO_FILTER.csv – Annotates the merged output with GO and Biogrid annotations, but no filter for Saint or CompPASS
Annotated_Merge_Saint_filter.csv – Takes the merged and annotated file from before and filters Saint for BFDR <= 0.05 and AvgP >= 0.7
Annotated_Merge_All_filtered.csv – Takes the merged, annotated, and filtered for Saint file and filters compPASS by taking the top 5% scored of each bait or the top 10 of each bait, if there are more than 10 in the top 5%
[bait name].csv – Every bait will have one of these files that includes the Entrez ID of any prey-prey interaction seen in that bait

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
external		external
.gitignore		.gitignore
Cytoscape.R		Cytoscape.R
Merge_CompPASS_SAINT.R		Merge_CompPASS_SAINT.R
README.md		README.md
annotate.R		annotate.R
annotate_filter.R		annotate_filter.R
compPASS.R		compPASS.R
experimental_design.py		experimental_design.py
protein_groups.py		protein_groups.py
score_APMS.py		score_APMS.py
tab_to_JSON.py		tab_to_JSON.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDG Interactome PPI Analysis Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IDG Interactome PPI Analysis Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages