Skip to content
walkerhound edited this page Sep 21, 2012 · 9 revisions

Looking at the possibility of using David instead of or in addition to IDecoder.

We can download text files from here: Request Knowledgebase Files. You have to log in to download the files.

You can choose the species and a central identifier. Also you can check some annotation categories. I have been checking all of General Annotations and Main_Accessions. You are allowed to check a maximum of 10 annotation categories. So I checked the following Annotations:

  1. Chromosome
  2. Cytoband
  3. Entrez_gene_summary
  4. Homologous_gene
  5. Official_Gene_Symbol
  6. PIR_Summary
  7. SP_Comment
  8. ENSEMBL_GENE_ID
  9. ENTREZ_GENE_ID

After clicking Submit you get an email that allows you to download a zip file containing more files.

For example, when I chose the Central Identifier as "AFFYMETRIX_EXON_GENE_ID", I received the following files:

  1. AFFYMETRIX_EXON_GENE_ID2CHROMOSOME.txt
  2. AFFYMETRIX_EXON_GENE_ID2CYTOBAND.txt
  3. AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_SUMMARY.txt
  4. AFFYMETRIX_EXON_GENE_ID2HOMOLOGOUS_GENE.txt
  5. AFFYMETRIX_EXON_GENE_ID2OFFICIAL_GENE_SYMBOL.txt
  6. AFFYMETRIX_EXON_GENE_ID2PIR_SUMMARY.txt
  7. AFFYMETRIX_EXON_GENE_ID2SP_COMMENT.txt
  8. AFFYMETRIX_EXON_GENE_ID2ENSEMBL_GENE_ID.txt
  9. AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_ID.txt
  10. AFFYMETRIX_EXON_GENE_ID2DAVID_GENE_NAME.txt
  11. AFFYMETRIX_EXON_GENE_ID2TAX_ID.txt

Note that there are two extra files - 11 files and 9 requested annotations.

The DAVID_GENE_NAME file is interesting. It has, as the second column, a name for the gene. However, the same name can match multiple AFFYMETRIX ID's. I view this as identifying the "connected component". That is, it specifies a unique gene that can have many other ID's.

How could we use these files?

  • Download files for every possible central identifier.
    • Perhaps not every possible central identifier.
  • Create a table DAVID_IDENTIFIERS with two columns, each unique
  1. DAVID_GENE_NAME varchar2
  2. DAVID_ID Number
  • Create a table OTHER_IDENTIFIERS with four columns.
  1. identifier varchar2
  2. identifier_Source varchar2
  3. id_number number
  4. david_id number The two columns identifier and identifier_source will together be unique. id_number will be unique. There may be many records with the same david_id. Those records with the same david_id should be considered to be the same gene.
  • We'll have to figure out what to do if a user enters an ID for a gene that is not unique and that corresponds to multiple david_ids.
  • But the basic idea will be that genes will have a unique david_id.
  • From the david id, we can get all other corresponding id's without a search.
List of all possible ID sources:
  1. AFFYMETRIX_3PRIME_IVT_ID
  2. AFFYMETRIX_EXON_GENE_ID
  3. AFFYMETRIX_SNP_ID
  4. AGILENT_CHIP_ID
  5. AGILENT_ID
  6. AGILENT_OLIGO_ID
  7. ENSEMBL_GENE_ID
  8. ENSEMBL_TRANSCRIPT_ID
  9. ENTREZ_GENE_ID
  10. FLYBASE_GENE_ID
  11. FLYBASE_TRANSCRIPT_ID
  12. GENBANK_ACCESSION
  13. GENOMIC_GI_ACCESSION
  14. GENPEPT_ACCESSION
  15. ILLUMINA_ID
  16. IPI_ID
  17. MGI_ID
  18. OFFICIAL_GENE_SYMBOL
  19. PFAM_ID
  20. PIR_ID
  21. PROTEIN_GI_ACCESSION
  22. REFSEQ_GENOMIC
  23. REFSEQ_MRNA
  24. REFSEQ_PROTEIN
  25. REFSEQ_RNA
  26. RGD_ID
  27. SGD_ID
  28. TAIR_ID
  29. UCSC_GENE_ID
  30. UNIGENE
  31. UNIPROT_ACCESSION
  32. UNIPROT_ID
  33. UNIREF100_ID
  34. WORMBASE_GENE_ID
  35. WORMPEP_ID
  36. ZFIN_ID

Go Back to Future Directions

Clone this wiki locally