| Literature DB >> 15985174 |
Felix Kokocinski1, Nicolas Delhomme, Gunnar Wrobel, Lars Hummerich, Grischa Toedt, Peter Lichter.
Abstract
BACKGROUND: Interpreting the results of high-throughput experiments, such as those obtained from DNA-microarrays, is an often time-consuming task due to the high number of data-points that need to be analyzed in parallel. It is usually a matter of extensive testing and unknown beforehand, which of the possible approaches for the functional analysis will be the most informative.Entities:
Mesh:
Year: 2005 PMID: 15985174 PMCID: PMC1189078 DOI: 10.1186/1471-2105-6-161
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Layered architecture of the FACT framework. The database reflects the abstraction of any experimental or annotation data to DataSets with DataFeatures, originating from a specific DataSource for which DataTypes and Parameters have been defined. The core library (API) supplies all functionality for accessing the database and for the operation of diverse modules, which are adaptors for specific DataSources or functions. The web interface or other applications are using FACT API functions.
Figure 2FACT database schema. The database schema reflects the generalized handling of heterogeneous data. At the definition layer the data sources are defined as experimental, annotational or analysis sources. Also the types of data that they use are specified here. These types are linked to the individual sources which are defined in the data source layer. Parameters that the functions handling the sources can take are stored as well. The actual data – experimental as well as annotational – are saved as data set and data features in the data set layer.
Figure 3Outline of the flow of information. Modular DataSource-Adapters accomplish data access and data transformation from heterogeneous sources, making FACT a flexible framework.
Data Types and Sources accessible by current annotation modules.
| European Bioinformatics Institute and Wellcome Trust Sanger Institute (GB) [8], | Ensembl ID, Gene Symbol, Gene Name, Chromosomal Location, Homologues Genes, Interpro Domains, RefSeq Accession Number, Affymetrix ID | |
| University of Indiana (USA) [10], | euGene ID, Gene Symbol, Gene Name, GDB ID, OMIM ID, Genomic Localization, GeneOntology Terms, Protein Accession Numbers | |
| Lawrence Livermore National Laboratory (USA) [28], | Clone Image ID | |
| National Institute of Aging, NIH (USA) [11], | Pathway Name and Image-link | |
| GeneOntology Consortium [2], | ID and Name of GO-Term (Biological Process, Molecular Function, Cellular Localization) | |
| National Cancer Institute, NIH (USA) [29], | Biocarta name, Biocarta short name, KEGG Pathway Name, KEGG Pathway ID, PFAM ID | |
| NCBI/NIH (USA) [30], | A. LocusLink ID, Gene Symbol, Gene Name, Genomic Localization, GeneOntology Terms, OMIM ID B. Key references (PubMed links) | |
| Jackson Laboratory (USA) [31], | MGI ID / Gene Symbol | |
| Internal | Deutsches Krebsforschungs zentrum, Div. Molecular Genetics (D) | General Information on available Clones |
| University of California Santa Cruz (USA), | Calculated relative CpG content of genomic region | |
| EMBL (D) [12], | Protein interaction data (computed and imported from other databases) | |
| Affymetrix Inc. / FACT, | Use of Affymetrix probe IDs | |
| European Bioinformatics Institute (GB) [3], | Pathway information |
Current data analysis and display modules
| Simple Count | FACT | Count and display of occurrences of annotation terms |
| GO-Term Comparison | In part from | Detection of significantly overrepresented GO terms in Gene List, based upon hypergeometric tail probability |
| MedLiner | List Publications with co-occurrences of terms | |
| CGH database | Deutsches Krebsforschungs-zentrum, Div. Molecular Genetics (D) | Compare CGH results to archived data |
| goCluster | Detection of significantly overrepresented GO terms (based upon Fisher's exact test) in Clusters built with k-means algorithm | |
| Hypergeometric Tail | In part from | Detection of significantly overrepresented terms of any kind, based upon hypergeometric tail probability |
| CGH – Expression Comparison | FACT | Detect correlation between genomic and expression data sets, based on two-sided T-Tests |
| Chromosomal Plot | FACT | Display values or occurrences in genomic context |
Figure 4Application of the FACT system for the functional analysis of microarray data of the development of non-melanoma skin cancer (. Occurrences of annotation terms are counted and displayed to draw the researchers attention to potentially characteristic features of the data set. In this case the genomic bands at 1q21 seem to play an important role in the experiment.
Figure 5Application of the FACT system in non-melanoma skin cancer research (. Overrepresented terms from a Gene Ontology annotation are displayed in a chart. The usage of the GO system is the most common approach for the functional interpretation of gene lists.
Figure 6Application of the FACT system in non-melanoma skin cancer research (. Visual representation of the genomic distribution of analyzed features highlights the involvement of the genomic band 1q21 (first 5 chromosomes shown). In this case the localization of human homologues genes corresponding to murine clones over- and under expressed in squamous cell carcinoma are displayed.
Figure 7Application of the FACT system in non-melanoma skin cancer research (. FACT's simple automated literature screen function displays publications mentioning groups of genes identified in the study (top 5 hits shown).