| Literature DB >> 28403836 |
Guifang Fu1, Gang Wang2, Xiaotian Dai2.
Abstract
BACKGROUND: Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches.Entities:
Keywords: Backward elimination; FRIGIDA expression; Feature screening; Genomic selection
Mesh:
Substances:
Year: 2017 PMID: 28403836 PMCID: PMC5389084 DOI: 10.1186/s12859-017-1617-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Strict and individual statistical power for methods using fixed or adaptive thresholds for Example 1
| Methods | Average |
|
|
|
|
| Average MSPE |
|---|---|---|---|---|---|---|---|
| DC-SIS ( | 74 | 0% | 100% | 100% | 100% | 0% | 6.91 |
| IDC-SIS ( | 74 | 100% | 100% | 100% | 100% | 100% | 1.82 |
| BE-IDC | 5.54 | 100% | 100% | 100% | 100% | 100% | 1.04 |
Strict power and average threshold for BE-IDC approach under different drop rates
| Drop Rate | Average |
|
|---|---|---|
| 50% | 5.54 | 100% |
| 40% | 5.45 | 100% |
| 30% | 5.34 | 100% |
| 20% | 5.39 | 100% |
| 10% | 5.33 | 100% |
Genetic effects of 5 assumed SNPs in Example 2
| Position | Additive ( | Dominant ( |
|---|---|---|
| 100 | 1.2 | 0.8 |
| 200 | 1.2 | 0.4 |
| 300 | 1.2 | 0.8 |
| 400 | 0.8 | 1.2 |
| 500 | 1.0 | 1.2 |
Strict and individual statistical power for methods using fixed or adaptive thresholds for Example 2
| Methods | Average |
|
|
|
|
|
| Average MSPE |
|---|---|---|---|---|---|---|---|---|
| DC-SIS ( | 74 | 97% | 100% | 99% | 100% | 99% | 99% | 3.67 |
| IDC-SIS ( | 74 | 100% | 100% | 100% | 100% | 100% | 100% | 2.70 |
| BE-IDC | 15.04 | 100% | 100% | 100% | 100% | 100% | 100% | 1.92 |
Strict and individual statistical power for methods using fixed or adaptive thresholds for Example 3
| Methods | Average |
|
|
|
|
|
|
| DC-SIS ( | 37 | 12% | 100% | 83% | 93% | 100% | 100% |
| DC-SIS ( | 74 | 44% | 100% | 99% | 100% | 100% | 100% |
| IDC-SIS ( | 37 | 92% | 100% | 100% | 98% | 100% | 100% |
| IDC-SIS ( | 74 | 100% | 100% | 100% | 100% | 100% | 100% |
| BE-IDC | 21.35 | 100% | 100% | 100% | 100% | 100% | 100% |
| Methods |
|
|
|
|
| Average MSPE | |
| DC-SIS ( | 100% | 81% | 99% | 100% | 22% | 4.23 | |
| DC-SIS ( | 100% | 98% | 100% | 100% | 46% | 3.68 | |
| IDC-SIS ( | 100% | 95% | 99% | 100% | 100% | 1.44 | |
| IDC-SIS ( | 100% | 100% | 100% | 100% | 100% | 1.84 | |
| BE-IDC | 100% | 100% | 100% | 100% | 100% | 1.17 |
Influential SNPs selected by BE-IDC based on AGI physical map (TAIR.org)
| Rank | Chr | SNP pos (bp) | Gene | Distance to gene (bp) |
|---|---|---|---|---|
| 1 | 4 | 268809 |
| -217 |
| 2 | 4 | 276143 |
| 0 |
| 3 | 4 | 275349 |
| 0 |
| 4 | 4 | 269260 |
| 0 |
Fig. 1The MSPE plot. a The MSPE versus the number of SNPs on the interval [1, 216,130]; b Magnification of the MSPE over the small interval [1, 11], surrounding the minimum MSPE region. The red point is the final threshold determination spot (with size ) achieving the minimum MSPE 0.34. The black solid curve is the traditional MSPE plot, and the blue dash curve is the MSPE +/- 1 standard error plot. When the model size is 103, the MSPE has the maximum value 3.95
Fig. 2The Manhattan plot. The Manhattan plot of the FRI expression along the whole genome, based on the Dcorr measures of 216,130 SNPs against each SNP’s chromosomal position. Chromosomes are shown in alternate colors. The top four SNPs represented by the yellow triangles are finally selected by the BE-IDC procedure