| Literature DB >> 36090812 |
Tzu-Hsien Yang1, Chung-Yu Wang2, Hsiu-Chun Tsai2, Ya-Chiao Yang2, Cheng-Tse Liu2.
Abstract
Cells adapt to environmental stresses mainly via transcription reprogramming. Correct transcription control is mediated by the interactions between transcription factors (TF) and their target genes. These TF-gene associations can be probed by chromatin immunoprecipitation techniques and knockout experiments, revealing TF binding (TFB) and regulatory (TFR) evidence, respectively. Nevertheless, most evidence is still fragmentary in the literature and requires tremendous human resources to curate. We developed the first pipeline called YTLR (Yeast Transcription-regulation Literature Reader) to automate TF-gene relation extraction from the literature. YTLR first identifies articles with TFB and TFR information. Then TF-gene binding pairs are extracted from the TFB articles, and TF-gene regulatory associations are recognized from the TFR papers. On gathered test sets, YTLR achieves an AUC value of 98.8% in identifying articles with TFB evidence and AUC = 83.4% in extracting the detailed TF-gene binding pairs. And similarly, YTLR also obtains an AUC value of 98.2% in identifying TFR articles and AUC = 80.4% in extracting the detailed TF-gene regulatory associations. Furthermore, YTLR outperforms previous methods in both tasks. To facilitate researchers in extracting TF-gene transcriptional relations from large-scale queried articles, an automated and easy-to-use software tool based on the YTLR pipeline is constructed. In summary, YTLR aims to provide easier literature pre-screening for curators and help researchers gather yeast TF-gene transcriptional relation conclusions from articles in a high-throughput fashion. The YTLR pipeline software tool can be downloaded at https://github.com/cobisLab/YTLR/.Entities:
Keywords: BioBERT; Natural language processing; Transcriptional regulation
Year: 2022 PMID: 36090812 PMCID: PMC9449546 DOI: 10.1016/j.csbj.2022.08.041
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1The overview of the YTLR pipeline in identifying TF-gene binding pairs and TF-gene regulatory associations. Similar architectures were built and performed separately in YTLR for these two types of transcriptional TF-gene relations (TF-gene binding relations from the TFB literature and TF-gene regulatory associations from the TFR literature). These two independent pipelines were denoted by brackets (i.e., pipelines for extracting TF-gene binding[regulatory] pairs from the TFB[TFR] articles).
The performance summary of YTLR Phase I in identifying articles with TFB and TFR evidence on the test sets.
| Phase I: evidence literature identification | AUC | F1 | Precision | Recall | Specificity |
|---|---|---|---|---|---|
| 98.8% | 94.4% | 98.7% | 90.5% | 98.8% | |
| 98.2% | 92.4% | 93.2% | 91.6% | 93.4% |
The performance summary of YTLR Phase II in recognizing the TF-gene binding pairs and regulatory relations on the test sets.
| Phase II: TF-gene association recognition | AUC | F1 | Precision | Recall | Specificity |
|---|---|---|---|---|---|
| 83.4% | 81.9% | 80.5% | 83.3% | 79.7% | |
| 80.4% | 80.7% | 75.4% | 86.8% | 71.8% |
Fig. 2The test ROC curve comparison among tools that can identify articles with (a) TFB or (b) TFR evidence.
Fig. 3The test ROC curve comparison among methods in recognizing (a) TF-gene binding pairs from the articles with TFB evidence and (b) TF-gene regulatory pairs from the articles with TFR evidence.