| Literature DB >> 24905985 |
Dongyuan Liu1, Chouxian Ma1, Weiguo Hong1, Long Huang1, Min Liu1, Hui Liu1, Huaping Zeng1, Dejing Deng1, Huaigen Xin1, Jun Song1, Chunhua Xu1, Xiaowen Sun2, Xilin Hou3, Xiaowu Wang4, Hongkun Zheng1.
Abstract
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24905985 PMCID: PMC4048240 DOI: 10.1371/journal.pone.0098855
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Modules of HighMap algorithm.
A: The single-linkage clustering algorithm was used to partition the marker loci into linkage groups based on a pairwise modified independence LOD score for the recombination frequency. B and B': The ordering module combines Gibbs sampling, spatial sampling, and simulated annealing algorithm to order markers and estimate map distances. C: The error correction module identified singletons according to parental contribution of genotypes and eliminated them from the data using k-nearest neighbor algorithm. To order markers correctly, the processes of ordering and error correction were carried out iteratively. D: Heat maps and haplotype maps were constructed to evaluate map quality.
Figure 2NGS data utilization enhancement by HighMap.
The X-axis represents marker numbers. The Y-axis represents Spearman rank correlation coefficient between estimated map marker order and true marker location for A, B and C, singleton rates for D, E and F, estimated genetic map distances for G, H and I, respectively.
Capability of missing imputation and error correction of HighMap.
| # of marker | genotyping error | genotyping missing | ||||||
| initial rate (%) | % of detected | accurate rate (%) | remain rate (%) | initial rate (%) | % of detected | accurate rate (%) | remain rate (%) | |
| 100 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 200 | 2.37 | 75.74 | 97.63 | 0.88 | 2.50 | 85.10 | 99.53 | 0.37 |
| 300 | 4.61 | 82.88 | 97.65 | 1.53 | 5.00 | 96.33 | 98.34 | 0.18 |
| 400 | 6.69 | 79.15 | 96.93 | 2.39 | 7.50 | 96.90 | 97.39 | 0.23 |
| 500 | 6.32 | 81.80 | 97.52 | 1.99 | 7.00 | 97.47 | 97.60 | 0.18 |
| 600 | 8.40 | 77.98 | 97.14 | 3.16 | 10.00 | 97.42 | 94.82 | 0.26 |
| 700 | 10.25 | 69.42 | 94.88 | 5.40 | 12.86 | 94.96 | 89.13 | 0.65 |
| 800 | 11.90 | 65.19 | 93.40 | 6.91 | 15.63 | 93.46 | 85.75 | 1.02 |
| 900 | 13.20 | 60.10 | 91.07 | 9.16 | 18.33 | 93.27 | 79.73 | 1.23 |
| 1000 | 14.35 | 56.66 | 89.90 | 10.42 | 21.00 | 91.88 | 76.44 | 1.70 |
Figure 3Changes in linkage map quality as genotyping error increased.
The X-axis represents genotyping error. The Y-axis represents Spearman rank correlation coefficient between estimated map marker order and true marker location for A, B and C, singleton rates for D, E and F, estimated genetic map distances for G, H and I, respectively.“Integrated”, “Female”, and “Male” indicates integrated, female, or male linkage maps, respectively. JoinMap4.0 failed to construct linkage map due to its inefficiency in estimating linkage phases when the error rate exceeded about 14%.
Genotyping error and missing rates of different segregation patterns in NGS.
| sequencing depths | ab×cd | ef×eg | hk×hk/nn×np/lm×ll | |||
| error rates (%) | missing rates (%) | error rates (%) | missing rates (%) | error rates (%) | missing rates (%) | |
| 1 | 34.1 | 43.2 | 24.7 | 58.5 | 17.4 | 44.8 |
| 2 | 31.7 | 31.2 | 23.7 | 47.6 | 15.6 | 31.2 |
| 3 | 25.2 | 17.8 | 21.2 | 36.9 | 13.6 | 17.8 |
| 4 | 21.3 | 11.0 | 17.6 | 33.6 | 10.3 | 9.2 |
| 5 | 17.5 | 6.8 | 12.6 | 29.4 | 8.1 | 6.6 |
| 6 | 14.0 | 3.4 | 8.9 | 28.1 | 6.4 | 3.7 |
| 7 | 9.9 | 2.5 | 8.4 | 26.9 | 5.2 | 2.6 |
| 8 | 7.6 | 1.4 | 5.7 | 25.9 | 4.2 | 2.3 |
| 9 | 5.1 | 1.0 | 3.3 | 25.8 | 2.6 | 1.3 |
| 10 | 4.3 | 0.9 | 3.4 | 26.0 | 2.0 | 0.6 |