| Literature DB >> 35722206 |
Lei Ding1, Gabriel E Zentner2, Daniel J McDonald3.
Abstract
Motivation: Methods for the global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives or ignore any unknown grouping structures for the features.Entities:
Year: 2022 PMID: 35722206 PMCID: PMC9194947 DOI: 10.1093/bioadv/vbac033
Source DB: PubMed Journal: Bioinform Adv ISSN: 2635-0041
Illustration of the failure of Equation (1) on the AML data
| % sparsity of | 100 | 99.9 | 99.6 | 98.9 | 97.5 | 95.3 |
| % non-zero | 1.8 | 3.3 | 8.4 | 23.5 | 50.2 | 77.9 |
| False negative rate | 0.000 | 0.431 | 0.778 | 0.921 | 0.963 | 0.976 |
Fig. 1.Graphical depiction of Algorithm 1. Solid colors represent nonzero matrix entries
Fig. 2.This figure compares the performance of SuffPCR against alternatives when the features come from a row-sparse factor model under favorable conditions for SuffPCR. Boxplots and ROC curve (far right figure) are over 50 replications. We have omitted the other methods from the ROC curve for legibility, but their behavior is similar to lasso. TPR and FPR stand for true/false positive rate, respectively. Note that (as one would expect from the simulation conditions) SPC has the worst performance in terms of the ROC curve while both SuffPCR and Elastic net have AUC of almost 1
Fig. 3.This figure compares the performance of SuffPCR against alternatives when the features come from a row-sparse factor model extracted from the NSCLC data. Boxplots and ROC curve (far right figure) are over 50 replications. In terms of the ROC curve, SPC and AIMER have the best performance, though SuffPCR is not far behind. But note that SPC has much worse precision and recall
Prediction MSE and number of selected features for regression of survival time on gene expression measurements
| Breast Cancer1 | Breast Cancer2 | DLBCL | AML | NSCLC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | MSE | Feature # | MSE | Feature # | MSE | Feature # | MSE | Feature # | MSE | Feature # |
| SuffPCR |
| 80 |
| 121 | 0.7073 | 48 |
| 75 |
| 27 |
| Lasso | 0.7141 | 7 | 0.4622 | 39 | 0.6992 | 31 | 2.0998 | 3 | 0.2263 | 4 |
| ElasticNet | 0.6845 | 41 | 0.4517 | 104 |
| 87 | 2.0820 | 5 | 0.2332 | 20 |
| SPC | 0.6188 | 59 | 0.4179 | 823 | 0.7677 | 67 | 2.3237 | 62 | 0.2795 | 62 |
| ISPCA | 0.8647 | NA | 0.5882 | NA | 0.9441 | NA | 2.3109 | NA | 0.2408 | NA |
| AIMER | 0.6629 | 76 | 0.4192 | 795 | 0.7003 | 76 | 1.9737 | 36 | 0.2120 | 50 |
| SPCA | 17.0965 | 212 | 4.7239 | 38 | 2.5980 | 652 | 31.11 | 1043 | 0.9757 | 387 |
| DSPCA | 0.6132 | 4374 | 0.4557 | 7880 | 0.7249 | 1342 | 1.9781 | 2742 | 0.2041 | 305 |
Bolded text emphasizes the method with the lowest MSE.