| Literature DB >> 24358148 |
Guibo Ye1, Mengfan Tang2, Jian-Feng Cai3, Qing Nie4, Xiaohui Xie5.
Abstract
Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.Entities:
Mesh:
Year: 2013 PMID: 24358148 PMCID: PMC3866120 DOI: 10.1371/journal.pone.0082146
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Root-mean-squared error (RMSE) comparison among different models on the yeast gene expression data.
| Model | Training error | Testing error |
| SiMoNe | — | 1.0074±0.0650 |
| Lasso | 0.3897±0.0045 | 0.6124±0.0583 |
| Linear low-rank | 0.3488±0.0014 | 0.5750±0.0053 |
| Nonlinear low-rank | 0.3249±0.0063 | 0.5752±0.0054 |
“Lasso” represents model (2), “Linear low-rank” represents model (3), and “Nonlinear low-rank” represents model (6) with ANOVA kernel. SiMoNe is the model described by Chiquet et al. [16]. Both training and testing errors are measured in terms of RMSE averaged over all target genes. Shown here are mean ± standard deviation values of RMSEs in ten different runs.
Figure 1Comparison of the testing performance of the linear low-rank regularization model vs. Lasso on the yeast gene expression dataset.
Each * indicates one target gene. X-axis represents the test RMSE of the Lasso model, whereas Y-axis represents the test RMSE of the linear low-rank model. The figure shows that the low-rank model yields lower testing error than Lasso for most target genes.
Root-mean-squared error (RMSE) comparison among different models on the human hematopoietic gene expression data.
| Model | Training error | Testing error |
| SiMoNe | — | 0.9987±0.0400 |
| Lasso | 0.2345±0.0024 | 0.3881±0.0265 |
| Linear low-rank | 0.1877±0.0030 | 0.3758±0.0265 |
| Nonlinear low-rank | 0.1783±0.0005 | 0.3767±0.0265 |
“Lasso” represents model (2), “Linear low-rank” represents model (3), and “Nonlinear low-rank” represents model (6) with ANOVA kernel. SiMoNe is the model described by Chiquet et al. [16]. Both training and testing errors are measured in terms of RMSE averaged over all target genes. Shown here are mean ± standard deviation values of RMSEs in ten different runs.
Figure 2Comparison of the testing performance of the linear low-rank regularization model vs. Lasso on the human hematopoietic gene expression dataset.
Each * indicates one target gene. X-axis represents the test RMSE of the Lasso model, whereas Y-axis represents the test RMSE of the linear low-rank model. The figure shows that the low-rank model yields lower testing error than Lasso for most target genes.
Root-mean-squared error (RMSE) comparison among different models on the connectivity map gene expression data.
| Model | Training error | Testing error |
| Lasso | 0.4943±0.0008 | 0.7077±0.0134 |
| Linear low-rank | 0.5157±0.0004 | 0.7000±0.0123 |
| Nonlinear low-rank | 0.4025±0.0005 | 0.6772±0.0125 |
“Lasso” represents model (2), “Linear low-rank” represents model (3), and “Nonlinear low-rank” represents model (6) with ANOVA kernel. Both training and testing errors are measured in terms of RMSE averaged over all target genes. Shown here are mean ± standard deviation values of RMSEs in ten different runs.
CPU time of running the linear and nonlinear low-rank models.
| Dataset | Linear low-rank (min) | Nonlinear low-rank (min) |
| Yeast gene expression | 1.6 | 0.8 |
| Human hematopoietic gene expression | 1.0 | 0.4 |
| Connectivity map gene expression | 1067 | 869 |