David

Looking at the possibility of using David instead of or in addition to IDecoder.

We can download text files from here: Request Knowledgebase Files. You have to log in to download the files.

You can choose the species and a central identifier. Also you can check some annotation categories. I have been checking all of General Annotations and Main_Accessions. You are allowed to check a maximum of 10 annotation categories. So I checked the following Annotations:

Chromosome
Cytoband
Entrez_gene_summary
Homologous_gene
Official_Gene_Symbol
PIR_Summary
SP_Comment
ENSEMBL_GENE_ID
ENTREZ_GENE_ID

After clicking Submit you get an email that allows you to download a zip file containing more files.

For example, when I chose the Central Identifier as "AFFYMETRIX_EXON_GENE_ID", I received the following files:

AFFYMETRIX_EXON_GENE_ID2CHROMOSOME.txt
AFFYMETRIX_EXON_GENE_ID2CYTOBAND.txt
AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_SUMMARY.txt
AFFYMETRIX_EXON_GENE_ID2HOMOLOGOUS_GENE.txt
AFFYMETRIX_EXON_GENE_ID2OFFICIAL_GENE_SYMBOL.txt
AFFYMETRIX_EXON_GENE_ID2PIR_SUMMARY.txt
AFFYMETRIX_EXON_GENE_ID2SP_COMMENT.txt
AFFYMETRIX_EXON_GENE_ID2ENSEMBL_GENE_ID.txt
AFFYMETRIX_EXON_GENE_ID2ENTREZ_GENE_ID.txt
AFFYMETRIX_EXON_GENE_ID2DAVID_GENE_NAME.txt
AFFYMETRIX_EXON_GENE_ID2TAX_ID.txt

Note that there are two extra files - 11 files and 9 requested annotations.

The DAVID_GENE_NAME file is interesting. It has, as the second column, a name for the gene. However, the same name can match multiple AFFYMETRIX ID's. I view this as identifying the "connected component". That is, it specifies a unique gene that can have many other ID's.

How could we use these files?

Download files for every possible central identifier.
- Perhaps not every possible central identifier.
Create a table DAVID_IDENTIFIERS with two columns, each unique

DAVID_GENE_NAME varchar2
DAVID_ID Number

Create a table OTHER_IDENTIFIERS with four columns.

identifier varchar2
identifier_Source varchar2
id_number number
david_id number The two columns identifier and identifier_source will together be unique. id_number will be unique. There may be many records with the same david_id. Those records with the same david_id should be considered to be the same gene.

We'll have to figure out what to do if a user enters an ID for a gene that is not unique and that corresponds to multiple david_ids.
But the basic idea will be that genes will have a unique david_id.
From the david id, we can get all other corresponding id's without a search.

List of all possible ID sources:

AFFYMETRIX_3PRIME_IVT_ID
AFFYMETRIX_EXON_GENE_ID
AFFYMETRIX_SNP_ID
AGILENT_CHIP_ID
AGILENT_ID
AGILENT_OLIGO_ID
ENSEMBL_GENE_ID
ENSEMBL_TRANSCRIPT_ID
ENTREZ_GENE_ID
FLYBASE_GENE_ID
FLYBASE_TRANSCRIPT_ID
GENBANK_ACCESSION
GENOMIC_GI_ACCESSION
GENPEPT_ACCESSION
ILLUMINA_ID
IPI_ID
MGI_ID
OFFICIAL_GENE_SYMBOL
PFAM_ID
PIR_ID
PROTEIN_GI_ACCESSION
REFSEQ_GENOMIC
REFSEQ_MRNA
REFSEQ_PROTEIN
REFSEQ_RNA
RGD_ID
SGD_ID
TAIR_ID
UCSC_GENE_ID
UNIGENE
UNIPROT_ACCESSION
UNIPROT_ID
UNIREF100_ID
WORMBASE_GENE_ID
WORMPEP_ID
ZFIN_ID

Go Back to Future Directions

David

List of all possible ID sources:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally