| Literature DB >> 26066708 |
Heonjong Han1, Hongseok Shim1, Donghyun Shin1, Jung Eun Shim1, Yunhee Ko1, Junha Shin1, Hanhae Kim1, Ara Cho1, Eiru Kim1, Tak Lee1, Hyojin Kim1, Kyungsoo Kim1, Sunmo Yang1, Dasom Bae1, Ayoung Yun1, Sunphil Kim1, Chan Yeong Kim1, Hyeon Jin Cho1, Byunghee Kang1, Susie Shin1, Insuk Lee1.
Abstract
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.Entities:
Mesh:
Year: 2015 PMID: 26066708 PMCID: PMC4464350 DOI: 10.1038/srep11432
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) The overall process of constructing the TRRUST database via the manual curation of Medline abstracts using a sentence-based text-mining approach is outlined. GS stands for gold-standard. (b) A Venn diagram illustrates the overlap of TF-target regulatory interactions from four literature-curated databases: TRRUST, TRED-LC (literature-curated interactions of TRED), HTRIdb-LC (literature-curated interactions of HTRIdb), and TFactS.
A summary of TRRUST and four other databases for literature-curated TF-target regulatory interactions in human.
| TRRUST | 748 | 1,975 | 8,015 |
| TFactS | 277 | 1,932 | 4,311 |
| TRED-LC | 119 | 1,582 | 3,332 |
| HTRIdb-LC | 282 | 1,358 | 2,284 |
| ORegAnno | 67 | 122 | 202 |
*Only literature-curated (LC) interactions in the database were considered for this study.
Figure 2(a) A network of TF (red nodes) and non-TF genes (green nodes) based on the regulatory interactions from TRRUST is shown. (b) Bar graphs show the number of TFs for two classes based on the different modularity of their targets. Only TFs with more than five target genes were considered for this analysis, resulting in 213 TFs with modular targets and 62 TFs with non-modular targets. (c) Bar graphs show the number of target genes for two classes based on the different cooperativity of their TFs. Only target genes regulated by more than five TFs were considered for this analysis, resulting in 344 target genes regulated by cooperative TFs and 53 target genes regulated by disjoint TFs.
Figure 3Selective screenshots from TRRUST search results for an example query gene, BRCA1, are shown.
(a) A functional network of BRCA1 target genes based on HumanNet links is shown. (b) The physical interaction network of TFs that regulate BRCA1 based on literature-curated protein-protein interactions derived from major databases is shown. (c) A network of TFs that are predicted to cooperate with BRCA1 based on literature-curated protein-protein interactions derived from major databases is shown. (d) Disease Ontology terms prioritized for BRCA1 are listed. The top three associated diseases, breast carcinoma, prostate carcinoma, and malignant neoplasm of pancreas, are all validated by the literature.
Figure 4Scatter plots representing the relationship between scores from algorithms (x-axis) and the enrichment fold for TRRUST (a,b), TFactS (c,d), TRED-LC (e,f) and HTRIdb-LC (g,h) gene pairs (y-axis) for inferred human TRNs are shown. TF-target interactions inferred from ChIP-chip/seq data of hmChIP database were scored by the ChIPXpress algorithm (a) and those from a series of microarray samples from the Gene Expression Omnibus database (GSE14764) were scored by the GENIE3 algorithm (b). The enrichment fold was measured for each of successive bins of 1,000 links, which were sorted by algorithm scores. We found best regressions between algorithm scores and the enrichment of benchmarking TF-target interactions using a sigmoidal curve fit for all tested databases. TRRUST exhibits substantially better correlation for the hmChIP-ChIPXpress (Fig. 4a, r = 0.74) and GSE14764-GENIE3 (Fig. 4b, r = 0.48) TRNs than the other databases (Fig. 4c–h). We used the most significant 100,000 TF-target interactions for all benchmarking analyses, and computed the logarithm of the original ChIPEXpress score due to the highly biased score distribution for the low score range.