-
Notifications
You must be signed in to change notification settings - Fork 0
Preparing STRING Files
STRING is a database of known and predicted protein-protein interactions. cMonkey uses these interactions to improve its clustering results. In order to do so, it builds a network using the interactions for the genes for the current organism. This network can then be used in the network scoring component of cMonkey.
STRING is an enormous database and so it is a good idea to prepare the input to provide only the necessary data for cMonkey's network scoring algorithm, namely the names of the genes and their score.
STRING files are tab-delimited files containing entries of the form
Gene1<TAB>Gene2<TAB>Normalized Score
cmonkey-python provides the utility extract_string_links.sh to write the interactions for an organism, given the KEGG code and a database file (either gzip'ed or plain).