Skip to content

Preparing STRING Files

weiju edited this page Mar 15, 2012 · 2 revisions

STRING is a database of known and predicted protein-protein interactions. cMonkey uses these interactions to improve its clustering results. In order to do so, it builds a network using the interactions for the genes for the current organism. This network can then be used in the network scoring component of cMonkey.

STRING is an enormous database and so it is a good idea to prepare the input to provide only the necessary data for cMonkey's network scoring algorithm, namely the names of the genes and their score.

STRING files are tab-delimited files containing entries of the form

Gene1<TAB>Gene2<TAB>Normalized Score

cmonkey-python provides the utility extract_string_links.sh to write the interactions for an organism, given the KEGG code and a database file (either gzip'ed or plain).

Clone this wiki locally