| Literature DB >> 17576678 |
Da Wei Huang1, Brad T Sherman, Qina Tan, Joseph Kir, David Liu, David Bryant, Yongjian Guo, Robert Stephens, Michael W Baseler, H Clifford Lane, Richard A Lempicki.
Abstract
All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies. The newly updated DAVID Bioinformatics Resources consists of the DAVID Knowledgebase and five integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification Tool, the DAVID Functional Annotation Tool, the DAVID Gene ID Conversion Tool, the DAVID Gene Name Viewer and the DAVID NIAID Pathogen Genome Browser. The expanded DAVID Knowledgebase now integrates almost all major and well-known public bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a variety of public bioinformatics databases. For any uploaded gene list, the DAVID Resources now provides not only the typical gene-term enrichment analysis, but also new tools and functions that allow users to condense large gene lists into gene functional groups, convert between gene/protein identifiers, visualize many-genes-to-many-terms relationships, cluster redundant and heterogeneous terms into groups, search for interesting and related genes or terms, dynamically view genes from their lists on bio-pathways and more. With DAVID (http://david.niaid.nih.gov), investigators gain more power to interpret the biological mechanisms associated with large gene lists.Entities:
Mesh:
Year: 2007 PMID: 17576678 PMCID: PMC1933169 DOI: 10.1093/nar/gkm415
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Over 22 types of gene identifiers integrated by the DAVID Gene Concept within the DAVID Knowledgebase
| Gene ID Type | Total ID | Unique Cluster |
|---|---|---|
| AFFY_ID | 2254679 | 845117 |
| ENTREZ_GENE_ID | 1734858 | 1602339 |
| GENPEPT_ACCESSION | 4065385 | 2511637 |
| GENBANK_ACCESSION | 16828735 | 2409120 |
| GENEBANK_ID | 20291282 | 2358084 |
| PIR_ACCESSION | 282281 | 258079 |
| PIR_ID | 308092 | 266645 |
| PIR_NREF_ID | 3355759 | 2677404 |
| REFSEQ_GENOMIC | 1866800 | 1552597 |
| REFSEQ_MRNA | 645831 | 561447 |
| REFSEQ_PROTEIN | 1644632 | 1373467 |
| REFSEQ_RNA | 1364 | 852 |
| UNIGENE | 161138 | 158938 |
| UNIPROT_ACCESSION | 2864344 | 2097488 |
| UNIPROT_ID | 2789453 | 2096712 |
| UNIREF100_ID | 2552342 | 2088692 |
| OFFICIAL_GENE_SYMBOL | 1693151 | 1600906 |
| FLYBASE_ID | 27109 | 26642 |
| HAMAP_ID | 63925 | 63822 |
| HSSP_ID | 265000 | 258750 |
| TIGR_ID | 120117 | 111699 |
| WORMBASE_ID | 43675 | 21243 |
| RGD_ID | 25230 | 25060 |
| NOT SURE | ALL IDs |
Any of the gene identifier types above can be cross-mapped to the DAVID Knowledgebase. ‘Not Sure’ is a new ID type specifically designed for the DAVID web site. For a given ‘not sure’ ID, all possible matching IDs will be systematically scanned across the entire DAVID collection.
Figure 1.A DAVID gene constructed by a single linkage algorithm. Two UniRef100 clusters, two NRef 100 clusters and one Entrez Gene cluster were systematically found sharing one or more protein identifiers with each other. The single-linkage rule can further iteratively agglomerate them as a whole into one DAVID gene. Thus, for this particular example of tyrosine-protein phosphatase non-receptor type 21 (PTPN21), the resulting DAVID gene is able to collect and integrate all gene/protein identifiers more comprehensively than each original gene cluster.
The wide-range collection of heterogeneous functional annotations in the DAVID Knowledgebase
| GO_BIOLOGICAL PROCESS | BLOCKS_ID | ALIAS_GENE_SYMBOL |
| GO_MOLECULAR FUNCTION | COG_KOG_NAME | CHROMOSOME |
| GO_CELLULAR COMPONENT | INTERPRO_NAME | CYTOBAND |
| PANTHER_BIOLOGICAL PROCESS | PDB_ID | GENE_NAME |
| PANTHER_MOLECULAR FUNCTION | PFAM_NAME | GENE_SYMBOL |
| COG_KOG_ONTOLOGY | PIR_ALN | HOMOLOGOUS_GENE |
| PIR_HOMOLOGY_DOMAIN | ENTREZ_GENE_SUMMARY | |
| BIND | PIR_SUPERFAMILY_NAME | OMIM_ID |
| DIP | PRINTS_NAME | PIR_SUMMARY |
| MINT | PRODOM_NAME | PROTEIN_MW |
| NCICB_CAPATHWAY | PROSITE_NAME | REFSEQ_PRODUCT |
| TRANSFAC_ID | SCOP_ID | SEQUENCE_LENGTH |
| HIV_INTERACTION | SMART_NAME | SP_COMMENT |
| HIV_INTERACTION_CATEGORY | TIGRFAMS_NAME | |
| HPRD_INTERACTION | PANTHER_SUBFAMILY | PIR_SEQ_FEATURE |
| REACTOME_INTERACTION | PANTHER_FAMILY | SP_COMMENT_TYPE |
| SP_PIR_KEYWORDS | ||
| GENETIC_ASSOCIATION_DB | BioCarta | UP_SEQ_FEATURE |
| OMIM_DISEASE | KEGG_PATHWAY | |
| PANTHER_PATHWAY | GNF Microarray | |
| GENERIF_SUMMARY | PID | UNIGENE EST |
| PUBMED_ID | BBID | CGAP SAGE |
| HIV_INTERACTION_PUBMED_ID | KEGG_REACTION | CGAP EST |
Over 60 functional categories from dozens of independent public sources (databases) (see Supplementary File 2 for a complete list) are collected and integrated in the DAVID Knowledgebase.
Figure 2.An HTML report from the Functional Annotation Clustering. The annotation cluster 1 in the example shows that GO term cytokine activity, KEGG pathway cytokine–cytokine receptor interaction, and GO term receptor binding, etc. are grouped together. Thus, the different biological aspects regarding a relevant biology can be explored at the same time.
Figure 3.A roadmap to choose appropriate DAVID functions and tools.