| Literature DB >> 26691694 |
Christian X Weichenberger1, Hagen Blankenburg2, Antonia Palermo3, Yuri D'Elia4, Eva König5, Erik Bernstein6, Francisco S Domingues7.
Abstract
BACKGROUND: During the last decade, a great number of extremely valuable large-scale genomics and proteomics datasets have become available to the research community. In addition, dropping costs for conducting high-throughput sequencing experiments and the option to outsource them considerably contribute to an increasing number of researchers becoming active in this field. Even though various computational approaches have been developed to analyze these data, it is still a laborious task involving prudent integration of many heterogeneous and frequently updated data sources, creating a barrier for interested scientists to accomplish their own analysis.Entities:
Mesh:
Year: 2015 PMID: 26691694 PMCID: PMC4687148 DOI: 10.1186/s12864-015-2279-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Hierarchical setup of the Dintor framework. On the top level, a Galaxy web server provides access via a graphical user interface to all available tools. The web interface is built on a collection of Unix shell command line tools, which come with detailed help pages. These tools can further be separated into two large subgroups: one is dedicated to querying the Ensembl database and employs Perl as a programming language. The other subgroup contains the remaining modules, which are implemented in Python. All modules are characterized by accessing either external or internal relational databases such as Ensembl or the Gene Ontology database, or operate on locally stored text files provided with the distribution. For privacy or performance reasons it is possible to configure Dintor such that it accesses only local data. The associations of these animal drawings with the respective programming languages are the protected trademarks of O’Reilly Media, Inc. Used with permission
Fig. 2Parkinson’s disease GWA annotation pipeline. Shown here is the workflow for processing the PD GWA input table containing dbSNP identifiers. Gray boxes indicate tabular text files. Boxes with rounded corners and blue background designate file-processing tools accepting as input a table and extending it with additional information by appending new data columns. These tool boxes are labeled with their respective Dintor tool names. Arrows indicate the workflow direction by connecting input and output data files, with the processing tool placed next to the arrow. The pipeline starts with converting dbSNP identifiers from the original table to coordinates referring to the GRCh37 genome, and ends with a double invocation of Dintor’s fly gene identifier converter, DMGeneIdConverter, in order to retrieve fly annotation symbols (CG IDs) and VDRC transformant identifiers (Trf IDs)
Fig. 3Variant annotation pipeline. This figure illustrates the pipeline for processing three genetic variations identified by exome sequencing as potential causative de novo point mutations in sporadic autism spectrum disorders. The symbols used in this figure are the same as in Fig. 2. The analysis starts by lifting the genomic coordinates of the three point mutations from the originally provided NCBI36 coordinates to GRCh37. Conservation and variation consequence information is added before the affected genes are identified. Ultimately, pharmacological information is retrieved for the three proteins affected by each of the point mutations
Results from acute lymphoblastic leukemia gene set enrichment analysis based on GO biological process ontology
| GO terma |
| GO term name |
|---|---|---|
|
| 2.01 × 10−5 | Regulation of apoptotic process |
| GO:0006915b | 6.29 × 10−5 | Apoptotic process |
|
| 1.95 × 10−4 | Positive regulation of cell death |
| GO:0008219b | 3.57 × 10−4 | Cell death |
|
| 5.23 × 10−4 | Positive regulation of apoptotic process |
| GO:0043402 | 2.49 × 10−3 | Glucocorticoid mediated signaling pathway |
| GO:1902532 | 4.78 × 10−3 | Negative regulation of intracellular signal transduction |
|
| 1.42 × 10−2 | Response to organic substance |
| GO:2000271 | 2.61 × 10−2 | Positive regulation of fibroblast apoptotic process |
| GO:0007517 | 2.82 × 10−2 | Muscle organ development |
| GO:0090073b | 2.82 × 10−2 | Positive regulation of protein homodimerization activity |
|
| 3.13 × 10−2 | Negative regulation of signal transduction |
| GO:0007519 | 3.29 × 10−2 | Skeletal muscle tissue development |
| GO:0014902 | 3.33 × 10−2 | Myotube differentiation |
| GO:0009966 | 4.24 × 10−2 | Regulation of signal transduction |
| GO:0043523 | 4.24 × 10−2 | Regulation of neuron apoptotic process |
| GO:0045663 | 4.24 × 10−2 | Positive regulation of myoblast differentiation |
| GO:0048011 | 4.24 × 10−2 | Neurotrophin TRK receptor signaling pathway |
| GO:0048741 | 4.24 × 10−2 | Skeletal muscle fiber development |
| GO:0002260 | 4.50 × 10−2 | Lymphocyte homeostasis |
| GO:0046426 | 4.50 × 10−2 | Negative regulation of JAK-STAT cascade |
| GO:1901216 | 4.50 × 10−2 | Positive regulation of neuron death |
| GO:0021542 | 4.83 × 10−2 | Dentate gyrus development |
| GO:0014070 | 4.88 × 10−2 | Response to organic cyclic compound |
aGO terms emphasized in bold letters refer to terms that have been listed in Table 2 of [46]
bEnriched terms found by our GSE tool when carrying out the analysis with GO data from January 2012 (date of publication)
cListed p-values are Benjamini-Hochberg adjusted and restricted to values lower than 0.05
Performance of Dintor gene prioritization tool compared to results reported in [40]
| Tool namea | Response rate | TPR in top 5 % | TPR in top 10 % | TPR in top 30 % | Median |
|---|---|---|---|---|---|
| Candid [ | 100 % | 21.4 % | 33.3 % | 64.3 % | 18.11 |
| Dintor | 100 % | 31.0 % | 42.9 % | 59.5 % | 23.62 |
| Endeavour-CS [ | 100 % | 26.2 % | 42.9 % | 90.5 % | 11.16 |
| Endeavour-GW [ | 100 % | 28.6 % | 38.1 % | 71.4 % | 15.49 |
| GeneDistiller [ | 97.6 % | 26.2 % | 47.6 % | 78.6 % | 11.11 |
| GeneWanderer-DK [ | 88.1 % | 11.9 % | 21.4 % | 52.4 % | 22.97 |
| GeneWanderer-RW [ | 95.2 % | 16.7 % | 26.2 % | 61.9 % | 22.11 |
| Pinta-CS [ | 100 % | 28.6 % | 31.0 % | 71.4 % | 18.87 |
| Pinta-GW [ | 100 % | 26.2 % | 31.0 % | 71.4 % | 19.03 |
| ToppGene [ | 97.6 % | 35.7 % | 42.9 % | 52.4 % | 16.80 |
Tools that were reported to have a response rate lower than 80 % were not included. The following abbreviations are used: CS candidate set, GW genome-wide, TPR true positive rate
aTool names were taken from [40], references associated with tools are provided in square brackets next to their names. The table is sorted alphabetically by tool name