| Literature DB >> 35180894 |
Boris Jankovic1, Takashi Gojobori2,3.
Abstract
Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields. These were impacted in a major way by three dominant developments, namely an enormous increase in availability and quality of data, a significant increase in computational power available to machine learning applications, and finally, new machine learning paradigms, of which deep learning is the most well-known example. It is not easy in general to distinguish factors leading to improvements in results of applications of machine learning. This is even more so in the field of genomics, where the advent of next-generation sequencing and the increased ability to perform functional analysis of raw data have had a major effect on the applicability of machine learning in OMICS fields. In this paper, we survey the results from a subset of published work in application of machine learning in the recognition of genomic signals and regions in human genome and summarize some lessons learnt from this endeavor. There is no doubt that a significant progress has been made both in terms of accuracy and reliability of models. Questions remain however whether the progress has been sufficient and what these developments bring to the field of genomics in general and human genomics in particular. Improving usability, interpretability and accuracy of models remains an important open challenge for current and future research in application of machine learning and more generally of artificial intelligence methods in genomics.Entities:
Keywords: Artificial intelligence; Deep learning; Genomic signals; Genomics; Machine learning; Sequence analysis
Mesh:
Year: 2022 PMID: 35180894 PMCID: PMC8855580 DOI: 10.1186/s40246-022-00376-1
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1Relationship between true signals and signal motifs
Performance of TIS prediction models (Se sensitivity, Sp specificity, Acc accuracy)
| Tool | Reference | Year | Results | ||
|---|---|---|---|---|---|
| Se | Sp | Acc | |||
| Pedersen and Nielsen | [ | 1997 | 65 | 82 | |
| Salzberg | [ | 1997 | 74 | 68 | |
| Zien et al | [ | 2000 | 76 | 78 | |
| Zeng et al | [ | 2002 | 76 | 94 | 85 |
| Pertea and Salzberg | [ | 2002 | 84 | ||
| Sayes et al | [ | 2007 | 80 | 81 | |
| Tikole | [ | 2008 | 83 | 73 | 74 |
| iTIS-PseTNC | [ | 2014 | 78 | ||
| TITER | [ | 2017 | 81 | 90 | 85 |
| DeepGSR | [ | 2018 | 94 | ||
| Goel et al | [ | 2020 | 77 | 98 | 97 |
Entries with no value are explained in “Methods” section
Performance comparison for acceptor and donor site locations prediction; Se sensitivity, Sp specificity, Acc accuracy
| Tool | Reference | Year | Signal type | Se | Sp | Acc |
|---|---|---|---|---|---|---|
| GeneSplicer | [ | 2001 | Acceptor | 69 | 97 | 83 |
| Donor | 60 | 98 | 79 | |||
| SplicePredictor | [ | 2004 | Acceptor | 84 | 92 | 88 |
| Donor | 79 | 97 | 88 | |||
| Zhang | [ | 2010 | Acceptor | 90 | 90 | |
| Donor | 93 | 93 | ||||
| Bari | [ | 2012 | Acceptor | 77 | 89 | 89 |
| Donor | 89 | 97 | 95 | |||
| Goel | [ | 2015 | Acceptor | 94 | 94 | |
| Donor | 91 | 94 | ||||
| Wen | [ | 2017 | Acceptor | 93 | ||
| Donor | 92 | |||||
| DeepSS | [ | 2018 | Acceptor | 95 | ||
| Donor | 95 | |||||
| SpliceRover | [ | 2018 | Acceptor | 91 | 97 | 95 |
| Donor | 90 | 96 | 96 | |||
| Splice2Deep | [ | 2020 | Acceptor | 98 | 95 | 97 |
| Donor | 99 | 96 | 97 |
Entries with no value are explained in “Methods” section
Performance evolution of poly(A) tail prediction models (Se sensitivity, Sp specificity, Acc accuracy)
| Tool | Reference | Year | Adjusted values | ||
|---|---|---|---|---|---|
| Se | Sp | Acc | |||
| Polyadq | [ | 1999 | 46 | 86 | 65 |
| PolyA Signal Miner | [ | 2003 | 72 | 80 | |
| ERPIN | [ | 2003 | 66 | 88 | 75 |
| PolyA_SVM | [ | 2006 | 56 | 78 | 68 |
| PolyFd/PolyFud | [ | 2009 | 72 | 80 | 78 |
| Polyapred | [ | 2009 | 57 | 86 | |
| Polyar | [ | 2010 | 57 | 50 | 53 |
| Chang et al | [ | 2011 | 56 | 90 | 75 |
| DPS-ANN | [ | 2012 | 78 | ||
| HMM-SVM | [ | 2013 | 80 | 87 | 81 |
| DSET | [ | 2015 | 86 | 86 | 86 |
| Omni_PolyA | [ | 2018 | 80 | ||
| DeepGSR | [ | 2019 | 84 | ||
| DeeReCT-PolyA | [ | 2019 | 84 | ||
Entries with no value are explained in “Methods” section
Fig. 2Performance of TIS prediction models
Fig. 3Performance comparison for acceptor site locations prediction
Fig. 4Performance comparison for donor site locations prediction
Fig. 5Performance evolution of poly(A) tail prediction models