| Literature DB >> 31089389 |
Xinghao Yu1, Lishun Xiao1, Ping Zeng1,2, Shuiping Huang1,2.
Abstract
MOTIVATION: In the past few years many prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures that naturally exists in genetic data.Entities:
Mesh:
Year: 2019 PMID: 31089389 PMCID: PMC6476151 DOI: 10.1155/2019/2807470
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Sample sizes and the number of genes for each cancer in the TCGA dataset used in our analysis.
| Phenotypes | Initial gene expression data | Initial clinical data ( | Final data after quality control | ||
|---|---|---|---|---|---|
|
|
|
|
| ||
| BRCA | 1,218 | 20,531 | 1,247 | 1,083 | 17,675 |
| COAD | 329 | 20,531 | 551 | 275 | 17,493 |
| CRC | 434 | 20,531 | 736 | 367 | 17,510 |
| PAAD | 183 | 20,531 | 196 | 178 | 17,675 |
Note. N is the sample size and G denotes the number of genes. The average number of genes incorporated in each pathway for the seven phenotypes was 65 (ranging from 1 to 1,139), and about 21% genes belonged to multiple pathways. BRCA: breast cancer; CRC: colon and rectal cancer; COAD: colon cancer; PAAD: pancreatic cancer.
Figure 1Comparison of predictive performance of four models with JMAP with PVE = 0.3. Performance is measured by R difference with respect to JMAP; therefore, a negative value (i.e., values below the horizontal line) indicates worse performance than JMAP. In each setting, five groups with nonzero effect sizes were selected; I represents the settings where all the genes in the five groups had nonzero effect sizes; II represents the settings where only the genes in the first two groups had nonzero effect sizes, and half of the genes in the last three groups had nonzero effect sizes; III represents the settings where the effect sizes of the first two groups were nonzero, and the proportion of nonzero effect sizes in the last three groups was 80%, 50%, or 20%; IV represents the settings where the proportion of nonzero effect sizes in the five groups was 90%, 70%, 50%, 30%, or 10%. The predictive performance was assessed across 100 replicates in each scenario.
Figure 2Comparison of predictive performance of four models with JMAP for the four phenotypes from the TCGA datasets. Performance is measured by R difference with respect to JMAP; therefore, a negative value (i.e., values below the horizontal line) indicates worse performance than JMAP. The predictive performance was assessed across 100 MCCV replicates. BRCA: breast cancer; CRC: colon and rectal cancer; COAD: colon cancer; PAAD: pancreatic cancer.