| Literature DB >> 35575915 |
Mohsen Hesami1, Milad Alizadeh2, Andrew Maxwell Phineas Jones1, Davoud Torkamaneh3,4.
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.Entities:
Keywords: Big data; Data integration; Epigenomics; Multi-omics; Plant molecular biology; Prediction; Protein function; Transcription factor
Mesh:
Year: 2022 PMID: 35575915 DOI: 10.1007/s00253-022-11963-6
Source DB: PubMed Journal: Appl Microbiol Biotechnol ISSN: 0175-7598 Impact factor: 4.813