| Literature DB >> 32726348 |
Cezary Turek1, Sonia Wróbel2, Monika Piwowar1.
Abstract
A huge amount of atomized biological data collected in various databases and the need for a description of their relation by theoretical methods causes the development of data integration methods. The omics data analysis by integration of biological knowledge with mathematical procedures implemented in the OmicsON R library is presented in the paper. OmicsON is a tool for the integration of two sets of data: transcriptomics and metabolomics. In the workflow of the library, the functional grouping and statistical analysis are applied. Subgroups among the transcriptomic and metabolomics sets are created based on the biological knowledge stored in Reactome and String databases. It gives the possibility to analyze such sets of data by multivariate statistical procedures like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). The integration of metabolomic and transcriptomic data based on the methodology contained in OmicsON helps to easily obtain information on the connection of data from two different sets. This information can significantly help in assessing the relationship between gene expression and metabolite concentrations, which in turn facilitates the biological interpretation of the analyzed process.Entities:
Mesh:
Year: 2020 PMID: 32726348 PMCID: PMC7390260 DOI: 10.1371/journal.pone.0235398
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The workflow of lipidomics (metabolomics) and transcriptomics data integration.
Light gray means steps performed outside and the dark gray steps performed using the OmicsON library.
List of genes and fatty acids after functional grouping with REACTOME and STRING using.
| lipids ID (CHEBI) | Genes symbols (HGNC) |
|---|---|
| 28875,28875,73705, 17268,15756,28842, 35465,28716,16196, 36023,32425,36036, 17351,28661,72850, 15843,61205,61204, 27432,28364,28125 | ACAT1,ACAT2,APOB,APOC3,APOE,CBS,CIDEA,CPT2,CYP27A1,CYP27B1,CYP8B1,FAS,GK,LDLR, LPIN2,LPIN3,LPL,MTHFR,PDK4,PEX11A,PLTP,PPARA,PPARD,PPARG,RARA,RXRA,UCP2,UCP3, VDR,VLDLR,DBI,ACACA,ACACB,SSX2IP,ADSSL1,ALDH3A1,ACOX1,PSMB10,ABCB11,BCL3, PRG4,CAR1,COX1,COX2,CYP24A1,CYP26A1,CYP2B10,CYP2B13,CYP2C29,CYP3A11,CYP4A10, CYP4A14,CYP7A1,FAT1,NR1H4,H6PD,G6PC,GLUL,GSTA,GSTM1,GSTP2,HMGCR,Il2,FABP1,ELOVL6,COL2A1,NR1H3,NR1H2,LPIN1,CPT1A,ACADM,ABCB1B,ABCB4,ABCC6,MTR,NR4A1,NR4A2,SLC10A1,SLC22A5,NRF1,ECI2,PON1,NR1I2,RARB,RXRG,THRSP,PTPN6,ST3GAL4,SERPINA1A,SCARB1,LY6D,TRA2B,AATF,TMPO,HADHB,CDKN1A,TFAP2A,APOA1,FOS,ABCC2,EIF2S3X,ABCA1,FABP6,BAAT,FABP2,NOS2,ABCB8 |
List of genes and fatty acids after turning off correlated data within individual sets.
R correlation coefficient cut-offs were assumed arbitrarily. It was for genes: r = 0.6 and for fatty acids: r = 0.7.
| Genes symbols (HGNC) | lipids ID (CHEBI) |
|---|---|
| APOB, VLDLR, PSMB10, PRG4, CYP26A1, ECI2, NR1I2, RXRG, CDKN1A, APOA1, NOS2, ABCB8 | 73705, 17268, 15756, 28842, 36023, 32425, 36036, 28661, 61204, 27432, 28364, 28125 |
Fig 2The canonical correlation results for one subset of transcriptomic and lipidomic data.
Positive correlation represented by bars directed to the outside of the circle, and a negative correlation towards the inside. The height of the bars indicates the strength of the association.
Fig 3Cross-validated RMSEP curves for the fatty acid variables.
The smaller the prediction error, the better the given variable is suitable for the predictive model.