-
Notifications
You must be signed in to change notification settings - Fork 1
David
Looking at the possibility of using David instead of or in addition to IDecoder.
We can download text files from here: Request Knowledgebase Files. You have to log in to download the files.
You can choose the species and a central identifier. Also you can check some annotation categories. I have been checking all of General Annotations and Main_Accessions. You are allowed to check a maximum of 10 annotation categories. So I checked the following Annotations:
- Chromosome
- Cytoband
- Entrez_gene_summary
- Homologous_gene
- Official_Gene_Symbol
- PIR_Summary
- SP_Comment
- ENSEMBL_GENE_ID
- ENTREZ_GENE_ID
After clicking Submit you get an email that allows you to download a zip file containing more files.
For example, when I chose the Central Identifier as "AFFYMETRIX_EXON_GENE_ID", I received the following files:
- AFFYMETRIX_EXON_GENE_ID2CHROMOSOME.txt
- AFFYMETRIX_EXON_GENE_ID2CYTOBAND.txt
- AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_SUMMARY.txt
- AFFYMETRIX_EXON_GENE_ID2HOMOLOGOUS_GENE.txt
- AFFYMETRIX_EXON_GENE_ID2OFFICIAL_GENE_SYMBOL.txt
- AFFYMETRIX_EXON_GENE_ID2PIR_SUMMARY.txt
- AFFYMETRIX_EXON_GENE_ID2SP_COMMENT.txt
- AFFYMETRIX_EXON_GENE_ID2ENSEMBL_GENE_ID.txt
- AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_ID.txt
- AFFYMETRIX_EXON_GENE_ID2DAVID_GENE_NAME.txt
- AFFYMETRIX_EXON_GENE_ID2TAX_ID.txt
Note that there are two extra files - 11 files and 9 requested annotations.
The DAVID_GENE_NAME file is interesting. It has, as the second column, a name for the gene. However, the same name can match multiple AFFYMETRIX ID's. I view this as identifying the "connected component". That is, it specifies a unique gene that can have many other ID's.
How could we use these files?
- Download files for every possible central identifier.
- Perhaps not every possible central identifier.
- Create a table DAVID_IDENTIFIERS with two columns, each unique
- DAVID_GENE_NAME varchar2
- DAVID_ID Number
- Create a table OTHER_IDENTIFIERS with four columns.
- identifier varchar2
- identifier_Source varchar2
- id_number number
- david_id number The two columns identifier and identifier_source will together be unique. id_number will be unique. There may be many records with the same david_id. Those records with the same david_id should be considered to be the same gene.
- We'll have to figure out what to do if a user enters an ID for a gene that is not unique and that corresponds to multiple david_ids.
- But the basic idea will be that genes will have a unique david_id.
- From the david id, we can get all other corresponding id's without a search.
- AFFYMETRIX_3PRIME_IVT_ID
- AFFYMETRIX_EXON_GENE_ID
- AFFYMETRIX_SNP_ID
- AGILENT_CHIP_ID
- AGILENT_ID
- AGILENT_OLIGO_ID
- ENSEMBL_GENE_ID
- ENSEMBL_TRANSCRIPT_ID
- ENTREZ_GENE_ID
- FLYBASE_GENE_ID
- FLYBASE_TRANSCRIPT_ID
- GENBANK_ACCESSION
- GENOMIC_GI_ACCESSION
- GENPEPT_ACCESSION
- ILLUMINA_ID
- IPI_ID
- MGI_ID
- OFFICIAL_GENE_SYMBOL
- PFAM_ID
- PIR_ID
- PROTEIN_GI_ACCESSION
- REFSEQ_GENOMIC
- REFSEQ_MRNA
- REFSEQ_PROTEIN
- REFSEQ_RNA
- RGD_ID
- SGD_ID
- TAIR_ID
- UCSC_GENE_ID
- UNIGENE
- UNIPROT_ACCESSION
- UNIPROT_ID
- UNIREF100_ID
- WORMBASE_GENE_ID
- WORMPEP_ID
- ZFIN_ID