Skip to content

PPI Network Analysis which related to a list of specific (interested) human genes

Notifications You must be signed in to change notification settings

AAbasinejad/PPI-Network-Analysis

Repository files navigation

PPI Network Analysis

Introductions


Protein-Protein Interaction network analysis which related to a list of specific (interested) human genes.

Data and Code


In this project PPI Network analysis related to specific human seed genes has been carried out by using Human Integrated Interactions Database (a.k.a IID) which is an on-line database of detected and predicted protein-protein interactions (PPIs) and BioGRID which is an interaction repository with data compiled through comprehensive curation efforts.

The BioGRID and IID datasets mentioned above is exactly the ones that has been used in this project but you can find other formats of same biogrid dataset here or other PPI networks databases provided by IID here.

Furthermore, all basic and essential needed data that was not provided by mentioned DBs(or not provided completely) has been fetched (Basic_info.py) from National Center for Biotechnology Information NCBI website which was approved also by HUGO Gene Nomenclature Committee (a.k.a HGNC) website.

In order to run this code you have to put all needed files in a same directory and run this command in terminal:

python main.py <seed_genes.txt> <BIOGRID-ORGANISM-Homo_sapiens dataset> <IID dataset>

Note: Needed files are main.py, Basic_info.py, Interactions.py, Network_Analysis.py plus the two mentioned datasets.
Note: This project has been done in Python 3.x, so ... .
P.S.: running of this code in lunch break is strongly recommended. :D

Specific Libraries(that you may need to install :)

import requests
import pandas as pd
from bs4 import BeautifulSoup
import networkx as nx
import markov_clustering
import community
from scipy.stats import hypergeom

Modules


TBD soon...
A brief explaination of each module(both technically and conceptually) will be here ASAP. ;)

Terms


For better understanding of this project, some terms and abbreviations will be defined in follow:

PPI: Protein-Protein Interaction
Uniprot AC: Uniprot AC ‘Accession Number’ of each gene (a.k.a Uniprot entry, e.g. P01344, P15502, etc.).
GeneSymbol: Scienctific symbol of each gene (e.g. IGF2, ELN, PTPRC, etc.).
Note: In general GeneSymbol is more important for both practical and scientific purposes since it's more understandable.
SG: in code it stands for Seed Gene (in fact it refers to seed_genes list).
SGI: Seed Gene Interactions which refers to interactions that involves seed genes only. (from both DBs)
Union_Interactions: It represents all interactions that involves at least one seed gene. (from both DBs)
Intersection_Interactions: all interactions that involves at least one seed gene, confirmed by both DBs.
Note: In general in both Union and Intersection interactions, the interactions between interactomes which has direct interaction with a seed_gene has been considered.
Note: In code, variables with sgi, u an I signs refers to SGI, Union_Interactions and Intersection_Interactions respectively.
p-value: p-value is to measure under- or over-enrichment based on the cumulative distribution function (CDF) of the hypergeometric distribution.
Note: for computing p-value (in general) you can use Hypergeometric p-value calculator or hypergeom library from scipy.stats in python.
Note: in this project we will often use the terms ‘gene’ and ‘protein’ as synonyms, even if they are not, from the purely biological point of view.

putative disease proteins using the DIAMOnD tool


Using the tool DIAMOnD, compute the putative disease protein list using as reference interactome (“network_file”) the latest BioGrid interactome already used to collect PPIs. The DIAMOnD.py file up here is compatible with Python 3.x while you can find comatible version with Python 2.x on DIAMOnD source page.<br/) In order to find how to use it, check the source page as well. ;)

Results and discussion


TBD ASAP... ;)

About

PPI Network Analysis which related to a list of specific (interested) human genes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages