| Literature DB >> 25928167 |
Timothy M Beissinger1, Guilherme J M Rosa2,3, Shawn M Kaeppler4,5, Daniel Gianola6,7,8, Natalia de Leon9,10.
Abstract
BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.Entities:
Mesh:
Year: 2015 PMID: 25928167 PMCID: PMC4404117 DOI: 10.1186/s12711-015-0105-9
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Figure 1Depiction of the method. The spline-window method is presented step by step using a simulated set of 200 markers across a chromosome region. (A) Raw data (F ) computed from individual markers. (B) A cubic smoothing spline indicated by the red line, is fitted to the data. (C) Inflection points of the spline are indicated by dashed vertical lines. (D) Inflection points of the spline are used to define window boundaries, and a statistic such as W is computed.
Method comparison using simulated data
|
|
|
|
|
|---|---|---|---|
|
| 6.33 | 6.39 | 0.990610329 |
|
| 13.29 | 4.16 | 3.194711538 |
|
| 17.6 | 75.1 | 0.234354194 |
|
| 18.38 | 232.58 | 0.079026572 |
|
| 18.23 | 488.19 | 0.037342018 |
|
| 18.41 | 2082.54 | 0.008840166 |
|
| 19.51 | 9065.05 | 0.002152222 |
|
| 7.42 | 0.12 | 61.83333333 |
|
| 12.19 | 0.99 | 12.31313131 |
|
| 16.75 | 7.67 | 2.183833116 |
|
| 18.01 | 11.27 | 1.598047915 |
|
| 17.73 | 9.52 | 1.862394958 |
|
| 19.49 | 31.53 | 0.618141453 |
|
| 18.34 | 28.47 | 0.644186863 |
|
| 15.98 | 3.4 | 4.7 |
Results from applying an assortment of window-methods applied to 100 simulated selection experiments involving 30 QTL, 30 generations of selection, and pooled sequencing at 1 000 000 markers to estimate allele frequencies. The mean number of QTL (out of 30) detected over the 100 simulations, mean number of false positives, and ratio of detections to false positives across simulations is provided for each of the methods evaluated. Sliding- and Distinct- refer to sliding and distinct window methods with windows of the specified size, and Spline Windows refers to the method described here and employed in GenWin, where window size is not restricted a priori.
Application of sliding window and spline methods to empirical data
|
|
| |||
|---|---|---|---|---|
|
|
|
|
|
|
| 1 | 11 588 371 | 11 892 655 | 11686850 | 11872650 |
| 1 | - | - | 54485850 | 54564950 |
| 1 | 122 802 601 | 122 831 005 | 122 790 650 | 124 093 750 |
| 1 | 164 947 151 | 165 229 053 | - | - |
| 2 | 35 519 192 | 35 682 346 | 35 520 750 | 35 648 950 |
| 2 | 41 731 365 | 41 755 299 | 41 728 850 | 41 770 550 |
| 2 | 71 306 928 | 71 378 431 | 71 314 050 | 71 377 150 |
| 2 | 101 062 088 | 101 069 759 | 101 037 150 | 102 026 750 |
| 2 | 160 786 800 | 160 802 631 | - | - |
| 3 | 177 548 249 | 177 681 538 | 177 671 050 | 177 749 050 |
| 3 | - | - | 207 464 650 | 211 847 850 |
| 3 | 215 594 013 | 215 778 968 | - | - |
| 4 | 66 924 240 | 66 935 990 | - | - |
| 4 | 82 825 221 | 82 858 997 | 82 818 050 | 82 860 750 |
| 4 | 113 455 144 | 122 680 452 | 113 401 750 | 114 347 650 |
| 120 298 350 | 122 682 750 | |||
| 4 | - | - | 140 791 850 | 140 834 650 |
| 4 | 191 396 139 | 191 400 390 | - | - |
| 5 | - | - | 24 460 850 | 24 539 450 |
| 5 | 30 083 952 | 30 139 317 | 30 034 650 | 30 120 950 |
| 6 | 41 490 195 | 45 914 266 | 41 517 550 | 45 921 450 |
| 6 | 75 749 792 | 76 382 768 | 76 072 450 | 76 176 350 |
| 6 | - | - | 86 671 650 | 86 727 750 |
| 6 | 119 682 711 | 119 692 810 | 119 683 750 | 119 707 650 |
| 7 | 146 671 419 | 146 771 150 | - | - |
| 7 | 167 742 364 | 167 809 449 | - | - |
| 8 | 92 876 772 | 94 647 137 | 94 633 950 | 94 680 950 |
| 8 | 118 681 864 | 118 767 444 | - | - |
| 9 | 26 149 935 | 26 181 104 | 25 947 850 | 26 183 950 |
| 9 | 101 071 793 | 101 097 690 | - | - |
| 10 | 7 635 223 | 8 719 903 | 8 703 450 | 8 718 950 |
| 10 | 18 846 988 | 19 024 881 | - | - |
| 10 | 25 251 913 | 25 264 660 | - | - |
| 10 | 97 503 134 | 97 542 318 | - | - |
| 10 | - | - | 136 171 150 | 136 259 150 |
A comparison of regions exceeding a 99.9% threshold using 25-SNP sliding windows and spline windows, based on empirical data. The data analyzed are from [5], a study on a 30-generation artificial selection experiment for maize ear number. Previously published outlying regions identified as putatively controlling number of ears by plant based on 25-SNP sliding windows are compared with those identified applying the spline-window method to the same dataset.
Figure 2Window sizes. Histogram of the variability in window sizes, shown according to number of markers included per window, obtained by applying the spline window method to previously published maize data [5].
Figure 3Comparison to previously published data. A comparison of regions exceeding a 99.9% threshold using 25-SNP sliding windows and spline windows, based on empirical data. The data analyzed are from a previous study on a 30-generation artificial selection experiment for maize ear number [5]. (A): Adapted from [5], F values and their outlier threshold (red line) found on maize chromosome 2 using 25-SNP sliding windows. (B): W statistics and their outlier threshold (red line) found using the spline-window method for the same dataset.