| Literature DB >> 24038354 |
Olivier Tassy1, Olivier Pourquié.
Abstract
The function of genes is often evolutionarily conserved, and comparing the annotation of ortholog genes in different model organisms has proved to be a powerful predictive tool to identify the function of human genes. Here, we describe Manteia, a resource available online at http://manteia.igbmc.fr. Manteia allows the comparison of embryological, expression, molecular and etiological data from human, mouse, chicken and zebrafish simultaneously to identify new functional and structural correlations and gene-disease associations. Manteia is particularly useful for the analysis of gene lists produced by high-throughput techniques such as microarrays or proteomics. Data can be easily analyzed statistically to characterize the function of groups of genes and to correlate the different aspects of their annotation. Sophisticated querying tools provide unlimited ways to merge the information contained in Manteia along with the possibility of introducing custom user-designed biological questions into the system. This allows for example to connect all the animal experimental results and annotations to the human genome, and take advantage of data not available for human to look for candidate genes responsible for genetic disorders. Here, we demonstrate the predictive and analytical power of the system to predict candidate genes responsible for human genetic diseases.Entities:
Mesh:
Year: 2013 PMID: 24038354 PMCID: PMC3964984 DOI: 10.1093/nar/gkt807
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of the Manteia architecture. Data are collected from various databases for several species. They are then processed by the management tools and accessed with the online user interface. Over 30 exploration, graphical representation and statistical tools are available. All the tools are interoperable, which makes it possible to analyze the information, perform data mining and make predictions.
Figure 2.Graphical representations. Manteia uses interactive graphs to represent a result or simplify an annotation. Figure (a) is a word cloud computed from a gene file to give an overview of its functions. The more a word is used in the page, the bigger it appears in the cloud. (b) Specific annotation based on ontologies like GO or phenotypes are simplified using a radar chart showing the distribution of individual terms in broad annotation categories. A list of genes can be analyzed using dedicated graphs to represent their interactions (c) or the molecular complexes they form (d). GO, phenotypes and protein motifs annotations are represented using tree maps where each tile represents a keyword and its size the number of genes corresponding to this category.
Tools available from the ‘Refine’ interface
| Expression data | |
| Digital differential display | Identifies genes differentially expressed in different samples using ESTs |
| EST count | Predicts gene expression levels using ESTs |
| Annotations | |
| GO | Gene functional annotation |
| Protein motif | Protein motif prediction |
| Phenotype | Phenotype description for mutated genes |
| OMIM | Human genetic disorder description |
| Chromosome location | Returns the genes contained in a given chromosomal region |
| SNP | Returns the genes associated to a given single-nucleotide polymorphism |
| Biological pathway | Returns the genes involved in a given biological pathway |
| Molecular complex | Returns the genes involved in a given molecular complex |
| Interactome | Returns the genes involved in a given molecular interaction |
| Transcription factors from DBD | Identifies the genes with a transcription factor activity |
| Transcription targets | Returns the genes that regulate or are regulated by the given genes |
| Annotated with … | Returns the genes annotated with one of the data sets listed above |
| Species | |
| Orthology | Returns the corresponding orthologs of a gene |
| Species filter | Returns the genes that belong to a given species |
| Boolean tools | |
| Query builder | Addresses Boolean questions to the system using a mixture of data |
| Annotation distribution | Shows the distribution of genes in different annotation categories |
| Boolean list | Identifies shared or specific genes from two lists |
| Venn diagram | Creates a Venn diagram for up to four lists of genes |
| Export | |
| Create custom Ref for Statistics | Creates a custom reference to be used with statistics tools |
| Convert gene ID list | Converts a gene identifier into another identifier |
| Export gene ID list | Exports the current list of genes |
| Export annotation | Exports the annotation features of the current list of genes |
| Export corresponding probe sets | Exports the corresponding probe sets of the current list of genes |
Figure 3.Statistics tools. The statistic module of Manteia allows one to highlight the terms of an annotation that are enriched or depleted in a set of genes. Here the GO annotation enrichment is exemplified. The P-value column gives the significance of the enrichment. The two following columns correct this value for multiple testing. The blue color indicates a statistical significance. Terms that are related in the ontology can be highlighted to ease the analysis. Here GO terms related to the NOTCH and WNT signaling pathways are colored in beige and green, respectively. Statistics tools can be used in combination with other exploration tools to compute a correlation between different types of data. In this example, ‘Phenotype search’ (on top) is used to assemble a list of genes leading to an abnormal somite development. The resulting GO statistics are filtered with the keyword ‘signaling pathway’. The conditional probability module (right hand side on the GO statistics screenshot) is used to compute the probability of having a gene annotated with a given GO term when it is already annotated with the ‘abnormal somite development’ phenotype (first column), and conversely (second column).
Figure 4.Data integration tool. The ‘Querybuilder’ makes it possible to address a question to the system using several types of data. The user enters a keyword, selects the best matching suggestion among a list and builds a question using Boolean operators (and, or, not). Several questions can be addressed at the same time using a separator followed by a weight reflecting the relative importance of each query. Alternatively, Manteia can automatically create a query from an OMIM file to look for the best candidate genes for that disease. The matching genes are ordered according to their relevance. The column at the right hand side indicates which queries match the gene annotation. Last, an interactive chord diagram is generated to show how many genes are returned by each query and how many share the same features.
Figure 5.Annotation distribution. The ‘Annotation distribution’ tool generates a bar plot and a donut chart showing how many genes correspond to each Boolean query defined by the user. Here the search for genes responsible for the development or the abnormality of the cardiovascular, respiratory, skeletal and renal systems is exemplified. This way it is possible to see the relative importance of different annotation categories in a given data set and see how the distribution evolves over different experimental conditions. The Venn diagram generator can then be used to see the genes that are shared among the results returned by the queries.
Figure 6.Candidate genes prediction for OMIM diseases. Manteia ranks mouse candidate genes according to the number of phenotypic features they share with the human disease genes. Figure (a) shows the ranking of mouse genes when searched using the E-Q method alone with the number of genes expected by chance. Figure (b) shows the distribution obtained when genes are searched in an area of 5 (purple), 10 (red), 50 (green) and 100 Mb (yellow). Most of known or suspected disease genes rank within the first 10 candidates. (c–f) shows the ranking distribution for each search area compared with the distribution expected by chance. The first positions, where most genes are found, are significant.