| Literature DB >> 29671394 |
Shiquan Sun1,2, Xifang Sun3, Yan Zheng4.
Abstract
BACKGROUND: Extensive studies have shown that gene expression levels are strongly affected by chromatin mark combinations via at least two mechanisms, i.e., activation or repression. But their combinatorial patterns are still unclear. To further understand the relationship between histone modifications and gene expression levels, here in this paper, we introduce a purely geometric higher-order representation, tensor (also called multidimensional array), which might borrow more unknown interactions in chromatin states to predicting gene expression levels.Entities:
Keywords: Chromatin states; Gene expression levels; Higher-order partial least squares; Histone modification; Tensor decomposition
Mesh:
Substances:
Year: 2018 PMID: 29671394 PMCID: PMC5907142 DOI: 10.1186/s12859-018-2100-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The process of constructing the tensor data from the peak reads enrichments near TSSs
Fig. 2Schematic diagram of the process of data analysis in our paper
Fig. 3Comparison of four algorithms for predicting gene expression level. The left figure demonstrates the averaged correlation coefficients with varies parameter β over 10 random splitting replicates while the right figure shows the averaged root mean square error with varies parameters β over 10 replicates. lr: linear regression; rf: random forest; svr: support vector regression; npls: N-way partial least squares
Fig. 4The relationship between the numbers of factor and the performance on three species
The performance of different models on three species data sets
| Linear model | Random forest | Support vector machine | NPLS(41bins) | NPLS(21bins) | |
|---|---|---|---|---|---|
| Hum | 0.769(2.43) | 0.775(2.46) | 0.774(2.46) | 0.784(2.37) | 0.787(2.35) |
| Chi | 0.756(2.52) | 0.767(2.47) | 0.765(2.53) | 0.780(2.41) | 0.784(2.39) |
| Rhe | 0.760(2.52) | 0.761(2.51) | 0.765(2.54) | 0.774(2.46) | 0.778(2.43) |
Note: The number in bracket following the average R represents averaged RMSE over 10-flod cross validation (with 10 random splitting replicates). Hum: Human data set, Chi: Chimpanzee data set, and Rhe: Rhesus Macaque data set