| Literature DB >> 28878789 |
Amidou N'Diaye1, Jemanesh K Haile1, D Brian Fowler1, Karim Ammar2, Curtis J Pozniak1.
Abstract
Advances in sequencing and genotyping methods have enable cost-effective production of high throughput single nucleotide polymorphism (SNP) markers, making them the choice for linkage mapping. As a result, many laboratories have developed high-throughput SNP assays and built high-density genetic maps. However, the number of markers may, by orders of magnitude, exceed the resolution of recombination for a given population size so that only a minority of markers can accurately be ordered. Another issue attached to the so-called 'large p, small n' problem is that high-density genetic maps inevitably result in many markers clustering at the same position (co-segregating markers). While there are a number of related papers, none have addressed the impact of co-segregating markers on genetic maps. In the present study, we investigated the effects of co-segregating markers on high-density genetic map length and marker order using empirical data from two populations of wheat, Mohawk × Cocorit (durum wheat) and Norstar × Cappelle Desprez (bread wheat). The maps of both populations consisted of 85% co-segregating markers. Our study clearly showed that excess of co-segregating markers can lead to map expansion, but has little effect on markers order. To estimate the inflation factor (IF), we generated a total of 24,473 linkage maps (8,203 maps for Mohawk × Cocorit and 16,270 maps for Norstar × Cappelle Desprez). Using seven machine learning algorithms, we were able to predict with an accuracy of 0.7 the map expansion due to the proportion of co-segregating markers. For example in Mohawk × Cocorit, with 10 and 80% co-segregating markers the length of the map inflated by 4.5 and 16.6%, respectively. Similarly, the map of Norstar × Cappelle Desprez expanded by 3.8 and 11.7% with 10 and 80% co-segregating markers. With the increasing number of markers on SNP-chips, the proportion of co-segregating markers in high-density maps will continue to increase making map expansion unavoidable. Therefore, we suggest developers improve linkage mapping algorithms for efficient analysis of high-throughput data. This study outlines a practical strategy to estimate the IF due to the proportion of co-segregating markers and outlines a method to scale the length of the map accordingly.Entities:
Keywords: genetic map; high-density; inflation factor; machine learning; map expansion; prediction; single nucleotide polymorphism; wheat
Year: 2017 PMID: 28878789 PMCID: PMC5572363 DOI: 10.3389/fpls.2017.01434
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Features of the Mohawk × Cocorit linkage map.
| Full map | Skeleton map | Co-segregating markers | ||||
|---|---|---|---|---|---|---|
| Chromosomes | Markers | Map size (cM) | Markers | Map size (cM) | Number | Proportion (%) |
| 1A | 348 | 154.4 | 58 | 137.3 | 290 | 83 |
| 1B | 277 | 205.9 | 46 | 197.2 | 231 | 83 |
| 2A | 90 | 183.6 | 28 | 148.7 | 62 | 69 |
| 2B | 334 | 161.1 | 51 | 145.0 | 283 | 85 |
| 3A | 269 | 90.3 | 27 | 82.2 | 242 | 90 |
| 3B | 323 | 231.8 | 55 | 180.3 | 268 | 83 |
| 4A | 76 | 141.7 | 24 | 128.9 | 52 | 68 |
| 4B | 340 | 156.9 | 40 | 119.9 | 300 | 88 |
| 5A | 91 | 71.7 | 36 | 63.7 | 55 | 60 |
| 5B | 323 | 207.8 | 36 | 178.4 | 287 | 89 |
| 6A | 300 | 200.5 | 63 | 165.7 | 237 | 79 |
| 6B | 529 | 215.7 | 53 | 188.9 | 486 | 92 |
| 7A | 330 | 247.2 | 64 | 192.6 | 276 | 84 |
| 7B | 369 | 152.5 | 49 | 123.2 | 320 | 87 |
| Genome A | 1504 | 1089.4 | 300 | 919.1 | 1214 | 81 |
| Genome B | 2495 | 1331.7 | 330 | 1132.9 | 2175 | 87 |
| Total | 3999 | 2421.1 | 630 | 2052.0 | 3389 | 85 |
Features of the Norstar × Cappelle Desprez linkage map.
| Full map | Skeleton map | Co-segregating markers | ||||
|---|---|---|---|---|---|---|
| Chr | Markers | Map size (cM) | Markers | Map size (cM) | Number | Proportion (%) |
| 1A | 909 | 107.1 | 85 | 90.6 | 824 | 91 |
| 1B | 673 | 235.9 | 122 | 217.9 | 551 | 82 |
| 1D | 102 | 110.8 | 31 | 103.0 | 71 | 70 |
| 2A | 483 | 228.5 | 95 | 212.9 | 388 | 80 |
| 2B | 864 | 230.7 | 96 | 216.2 | 768 | 89 |
| 2D | 498 | 198.8 | 42 | 185.9 | 456 | 92 |
| 3A | 593 | 246.2 | 94 | 219.3 | 499 | 84 |
| 3B | 681 | 253 | 129 | 226.1 | 552 | 81 |
| 3D | 76 | 18.7 | 6 | 17.8 | 70 | 92 |
| 4A | 398 | 188.6 | 70 | 179.0 | 328 | 82 |
| 4B | 424 | 130.1 | 68 | 127.9 | 356 | 84 |
| 4D | 29 | 15.3 | 8 | 14.4 | 21 | 72 |
| 5A | 636 | 281.4 | 117 | 278.9 | 519 | 82 |
| 5B | 1049 | 225.9 | 126 | 211.6 | 923 | 88 |
| 5D | 107 | 27.9 | 13 | 19.8 | 94 | 88 |
| 6A | 437 | 170.6 | 66 | 156.7 | 371 | 85 |
| 6B | 937 | 185.0 | 101 | 170.3 | 836 | 89 |
| 6D | 103 | 47.5 | 15 | 15.8 | 91 | 88 |
| 7A | 641 | 216.9 | 114 | 200.7 | 527 | 82 |
| 7B | 471 | 144.6 | 70 | 133.6 | 401 | 85 |
| 7D | 43 | 72.1 | 20 | 71.6 | 23 | 53 |
| Genome A | 4097 | 1439.3 | 641 | 1338.1 | 3456 | 84 |
| Genome B | 5099 | 1405.2 | 712 | 1303.6 | 4387 | 86 |
| Genome D | 958 | 491.1 | 135 | 428.3 | 826 | 86 |
| Total | 10154 | 3335.6 | 1488 | 3070.0 | 8669 | 85 |
Spearman correlation coefficient of markers order between sequential maps and skeleton map in Mohawk × Cocorit and Norstar × Cappelle Desprez.
| Chromosomes | Mohawk × Cocorit | Norstar × Cappelle Desprez |
|---|---|---|
| 1A | 0.99 | 0.99 |
| 1B | 0.97 | 0.99 |
| 1D | 0.99 | |
| 2A | 0.94 | 0.99 |
| 2B | 0.99 | 0.99 |
| 2D | 0.98 | |
| 3A | 0.99 | 0.99 |
| 3B | 0.95 | 0.99 |
| 3D | 0.97 | |
| 4A | 0.97 | 0.99 |
| 4B | 0.96 | 0.99 |
| 4D | 0.98 | |
| 5A | 0.96 | 0.99 |
| 5B | 0.99 | 0.99 |
| 5D | 0.99 | |
| 6A | 0.97 | 0.99 |
| 6B | 0.98 | 0.99 |
| 6D | 0.97 | |
| 7A | 0.99 | 0.99 |
| 7B | 0.99 | 0.99 |
| 7D | 0.99 |
Prediction accuracy of different models in the Mohawk × Cocorit and Norstar × Cappelle Desprez populations.
| Populations | Models1 | RMSE2 | Accuracy |
|---|---|---|---|
| Mohawk × Cocorit | LR | 4.631 | 0.654 |
| GLM | 4.631 | 0.654 | |
| KNN | 4.577 | 0.664 | |
| POLY2 | 4.584 | 0.664 | |
| POLY3 | 4.578 | 0.664 | |
| SVM | 4.632 | 0.661 | |
| CART | 4.694 | 0.638 | |
| RF | 4.577 | 0.664 | |
| Norstar × Cappelle Desprez | LR | 2.234 | 0.737 |
| GLM | 2.234 | 0.737 | |
| KNN | 2.225 | 0.742 | |
| POLY2 | 2.234 | 0.737 | |
| POLY3 | 2.229 | 0.739 | |
| SVM | 2.227 | 0.743 | |
| CART | 2.389 | 0.667 | |
| RF | 2.225 | 0.742 |
Map inflation factor (mean ± standard deviation) relative to the proportion of co-segregating markers in the Mohawk × Cocorit and Norstar × Cappelle Desprez populations.
| Mohawk × Cocorit | Norstar × Cappelle Desprez | |||
|---|---|---|---|---|
| Co-segregating markers (%) | Number of maps | Inflation factor (%) | Number of maps | Inflation factor (%) |
| 10 | 700 | 4.48 (±3.63) | 990 | 3.77 (±1.99) |
| 20 | 700 | 6.85 (±3.94) | 990 | 5.43 (±2.02) |
| 30 | 700 | 9.34 (±3.79) | 990 | 6.71 (±2.02) |
| 40 | 700 | 11.11 (±4.08) | 990 | 7.39 (±1.95) |
| 50 | 700 | 13.78 (±4.95) | 990 | 8.24 (±2.34) |
| 60 | 700 | 14.86 (±5.06) | 970 | 9.35 (±2.46) |
| 70 | 650 | 16.59 (±5.52) | 970 | 10.85 (±2.65) |
| 80 | 650 | 16.62 (±5.58) | 950 | 11.70 (±2.28) |