| Literature DB >> 32206210 |
Edwin Baldwin1, Jiali Han2, Wenting Luo1, Jin Zhou3, Lingling An1,3, Jian Liu2, Hao Helen Zhang4, Haiquan Li1.
Abstract
Recent years have witnessed the tendency of measuring a biological sample on multiple omics scales for a comprehensive understanding of how biological activities on varying levels are perturbed by genetic variants, environments, and their interactions. This new trend raises substantial challenges to data integration and fusion, of which the latter is a specific type of integration that applies a uniform method in a scalable manner, to solve biological problems which the multi-omics measurements target. Fusion-based analysis has advanced rapidly in the past decade, thanks to application drivers and theoretical breakthroughs in mathematics, statistics, and computer science. We will briefly address these methods from methodological and mathematical perspectives and categorize them into three types of approaches: data fusion (a narrowed definition as compared to the general data fusion concept), model fusion, and mixed fusion. We will demonstrate at least one typical example in each specific category to exemplify the characteristics, principles, and applications of the methods in general, as well as discuss the gaps and potential issues for future studies.Entities:
Keywords: Data fusion; Data integration; Model fusion; Multi-omics
Year: 2020 PMID: 32206210 PMCID: PMC7078495 DOI: 10.1016/j.csbj.2020.02.011
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Relationship between data integration methods and principles of three types of fusion methods. Panel (a) shows the differences between fusion and non-fusion methods in dealing with their samples. Fusion methods are ideal for matched individual samples although some of them (e.g., network-based model-fusion methods) may also work for different individuals (solid arrow across). While most non-fusion integrations were designed for applications with different individuals, many do work on matched individuals, overlooking the additional information of matched samples (dashed arrow across). Panels (b–d) are our categorization of fusion integration methods, showing the differences in data access and modeling.
Fig. 2Categorization of fusion methods for multi-omics data integration. Methods are categorized by multiple levels and applied problems. Methods spanning two categories are shown across the corresponding boundaries (e.g., PARADIGM and stSVM), while methods usable for multiple problems are shown repeatedly in the same row (e.g., MNMF/SNMNMF and MFA).