| Literature DB >> 25867943 |
David Mester1, Yefim Ronin1, Patrick Schnable2, Srinivas Aluru3, Abraham Korol1.
Abstract
Our aim was to develop a fast and accurate algorithm for constructing consensus genetic maps for chip-based SNP genotyping data with a high proportion of shared markers between mapping populations. Chip-based genotyping of SNP markers allows producing high-density genetic maps with a relatively standardized set of marker loci for different mapping populations. The availability of a standard high-throughput mapping platform simplifies consensus analysis by ignoring unique markers at the stage of consensus mapping thereby reducing mathematical complicity of the problem and in turn analyzing bigger size mapping data using global optimization criteria instead of local ones. Our three-phase analytical scheme includes automatic selection of ~100-300 of the most informative (resolvable by recombination) markers per linkage group, building a stable skeletal marker order for each data set and its verification using jackknife re-sampling, and consensus mapping analysis based on global optimization criterion. A novel Evolution Strategy optimization algorithm with a global optimization criterion presented in this paper is able to generate high quality, ultra-dense consensus maps, with many thousands of markers per genome. This algorithm utilizes "potentially good orders" in the initial solution and in the new mutation procedures that generate trial solutions, enabling to obtain a consensus order in reasonable time. The developed algorithm, tested on a wide range of simulated data and real world data (Arabidopsis), outperformed two tested state-of-the-art algorithms by mapping accuracy and computation time.Entities:
Mesh:
Year: 2015 PMID: 25867943 PMCID: PMC4395089 DOI: 10.1371/journal.pone.0122485
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Representing a set of two individual maps as a directed acyclic graph (DAG).
Two single maps (map1 and map2) are joined as a DAG. The joint map contains two conflicted regions.
Fig 2Main features and field of application of the two optimization methods (Globalheuristic and Local exact) for solving multilocus consensus mapping problems.
Fig 3The idea of new mutation procedures based on using the list of neighbors L.
(A) Sequential constructing mutation procedure (SCMP): SCMP generates a new random sequence of markers (i-k-m-j) from a randomly selected marker (i) of the solution vector g. (B) Reference-based constructing mutation procedure (RCMP): RCMP reinserts two randomly defined markers (i) and (m) of g to other positions.
Testing the proposed algorithm on datasets of Group 1.
| Name of dataset | NS | FRS | LCM, cM |
| CPU, sec | Number of errors |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| Ex1-1 | 40.72 | 161.78 | 41.30 | 0.98 | 4 | 1 |
| Ex2-1 | 39.74 | 157.35 | 40.20 | 0.97 | 4 | 2 |
| Ex3-1 | 40.51 | 160.58 | 41.10 | 0.98 | 4 | 1 |
| Ex4-1 | 40.19 | 161.67 | 40.64 | 0.98 | 8 | 1 |
| Ex5-1 | 41.16 | 162.25 | 41.85 | 0.98 | 5 | 1 |
| Ex6-1 | 39.97 | 159.26 | 40.58 | 1.00 | 3 | 0 |
| Ex7-1 | 41.34 | 157.43 | 41.78 | 0.97 | 3 | 2 |
| Ex8-1 | 39.60 | 154.08 | 40.13 | 1.00 | 2 | 0 |
| Ex9-1 | 42.30 | 160.10 | 42.86 | 0.97 | 3 | 2 |
| Ex10-1 | 39.67 | 158.27 | 40.28 | 1.00 | 4 | 0 |
Each dataset of Group 1 contains five subsets of shared markers scored without errors,with different distribution of recombination rates and interference values along the chromosome.
In the table, NS is the sum of lengths of the non-synchronized maps, FRS is the sum of lengths of the initial (random) consensus solution, LCM is the sum of lengths of the optimal consensus maps, and K is the coefficient of recovery of the simulated marker order.
Testing the algorithm on datasets of Group 2.
| Name of dataset | NS | FRS | LCM, cM |
| CPU, sec | Number of errors |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| Ex1-2 | 32.5 | 118.6 | 32.5 | 0.79 | 6 | 13 |
| Ex2-2 | 31.7 | 116.9 | 31.8 | 0.87 | 7 | 7 |
| Ex3-2 | 32.9 | 117.4 | 32.9 | 0.84 | 18 | 10 |
| Ex4-2 | 32.7 | 118.9 | 32.8 | 0.87 | 7 | 7 |
| Ex5-2 | 33.2 | 120.7 | 33.3 | 0.86 | 50 | 8 |
| Ex6-2 | 32.3 | 118.7 | 32.4 | 0.90 | 5 | 6 |
| Ex7-2 | 33.2 | 116.4 | 33.3 | 0.86 | 3 | 8 |
| Ex8-2 | 31.8 | 114.8 | 31.9 | 0.87 | 19 | 7 |
| Ex9-2 | 34.5 | 118.5 | 34.7 | 0.84 | 7 | 10 |
| Ex10-2 | 31.4 | 118.0 | 31.4 | 0.84 | 5 | 10 |
Each dataset of Group 2 contains markers of five mapping population, with a total of 100 shared markers scored with errors and missing data point, with different distribution of recombination rates and interference values along the chromosome.
In the table, NS is the sum of lengths of the non-synchronized maps, FRS is the sum of lengths of the first random consensus solution, LCM is the sum lengths of optimal consensus maps, and K is the coefficient of recovery of the simulated marker order.
The results of consensus mapping on the simulatedproblems of five sets by 200 and 500 shared markers.
| Name of dataset | Number of markers | Simulated map length | Global optimization with RMP + SCMP + RCMP | |
|---|---|---|---|---|
| LCM |
| |||
| 1 | 2 | 3 | 4 | 5 |
| Ex3 | 200 | 62.9 | 63.4 | 0.84 |
| Ex4 | 500 | 153.5 | 154.0 | 0.89 |
Datasets Ex3 and Ex4 contain markers scored with errors and missing data, with different distributions of recombination rates and interference values along the chromosome.
In the table, LCM is the sum of lengths of the optimal consensus maps, K is the coefficient of recovery of the simulated marker order.
Comparative effectiveness of the initial solutions and thelocal search procedures on the simulated problems.
| Name of dataset | Size of the datasets | CPU time (sec.) to reach the best solution | |||
|---|---|---|---|---|---|
| 3M | Int | Int + 3M + LS | |||
| 1 | 2 | 3 | 4 | 5 | |
| Ex1-1 | 5×100 | 6.00 | 2.30 | 0.39 | |
| Ex2-1 | 5×100 | 7.00 | 0.11 | 0.40 | |
| Ex3-1 | 5×100 | 18.00 | 1.12 | 0.37 | |
| Ex4-1 | 5×100 | 7.00 | 0.55 | 0.53 | |
| Ex5-1 | 5×100 | 50.00 | 12.04 | 0.42 | |
| Ex6-1 | 5×100 | 5.00 | 1.59 | 0.77 | |
| Ex7-1 | 5×100 | 3.00 | 5.11 | 0.44 | |
| Ex8-1 | 5×100 | 19.00 | 13.87 | 0.39 | |
| Ex9-1 | 5×100 | 7.00 | 4.95 | 0.43 | |
| Ex10-1 | 5×100 | 5.00 | 12.08 | 0.43 | |
| Ex1-2 | 5×100 | 4.00 | 1.26 | 0.42 | |
| Ex2-2 | 5×100 | 4.00 | 0.68 | 0.46 | |
| Ex3-2 | 5×100 | 4.00 | 1.31 | 0.40 | |
| Ex4-2 | 5×100 | 8.00 | 0.62 | 0.44 | |
| Ex5-2 | 5×100 | 5.00 | 1,58 | 0.48 | |
| Ex6-2 | 5×100 | 3.00 | 1.93 | 0.42 | |
| Ex7-2 | 5×100 | 3.00 | 2.35 | 0.40 | |
| Ex8-2 | 5×100 | 2.00 | 1.64 | 0.45 | |
| Ex9-2 | 5×100 | 3.00 | 1.80 | 0.51 | |
| Ex10-2 | 5×100 | 4.00 | 1.38 | 0.44 | |
| Average | - | 8.35 | 3.51 | 0.42 | |
| Ex3 | 5×200 | 151.00 | 28.00 | 1.98 | |
| Ex4 | 5×500 | 8080.00 | 680.00 | 11.70 | |
The utilization of the initial solution step (column 4) and the local search (column 5) considerably reduces the computation time on the test problem.
1 The three mutation procedures are working.
2 The Initial solution used.
3 The local search used.
Comparing the efficiency of three consensus mapping algorithms.
| Number of Problem | Synchronized-TSP | ILPMap | MergeMap | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Errors |
| CPU, sec | Errors |
| CPU, sec | Errors |
| CPU, sec | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Ex1-1 | 1 | 0.98 | 0.39 | 1 | 0.98 | 1.0 | 4 | 0.92 | 357 |
| Ex2-1 | 2 | 0.97 | 0.40 |
|
| 1.0 | 7 | 0.87 | 38 |
| Ex3-1 | 1 | 0.98 | 0.37 | 1 | 0.98 | 1.0 | 6 | 0.88 | 102 |
| Ex4-1 | 1 | 0.98 | 0.53 | 1 | 0.98 | 1.0 | 6 | 0.88 | 132 |
| Ex5-1 |
|
| 0.42 | 7 | 0.87 | 1.0 | 12 | 0.80 | 240 |
| Ex6-1 |
|
| 0.77 | 4 | 0.92 | 1.0 | 5 | 0.90 | 90 |
| Ex7-1 | 2 | 0.97 | 0.44 |
|
| 1.0 | 2 | 0.96 | 87 |
| Ex8-1 | 0 | 1.00 | 0.39 | 0 | 1.00 | 1.0 | 5 | 0.90 | 94 |
| Ex9-1 |
|
| 0.43 | 4 | 0.92 | 1.0 | 9 | 0.85 | 70 |
| Ex10-1 |
|
| 0.43 | na | - | - | 9 | 0.85 | 60 |
| Ex1-2 |
|
| 0.42 | 15 | 0.70 | 1.0 | 14 | 0.73 | 6 |
| Ex2-2 | 7 | 0.87 | 0.46 | 7 | 0.87 | 1.0 | 7 | 0.87 | 280 |
| Ex3-2 | 10 | 0.84 | 0.40 | 10 | 0.84 | 1.0 | 14 | 0.73 | 2 |
| Ex4-2 | 7 | 0.87 | 0.44 | 7 | 0.87 | 1.0 | 8 | 0.86 | 7 |
| Ex5-2 | 8 | 0.86 | 0.48 | 8 | 0.86 | 1.0 | 14 | 0.73 | 29 |
| Ex6-2 | 6 |
| 0.42 | 6 | 0.88 | 1.0 | 8 | 0.86 | 21 |
| Ex7-2 |
|
| 0.40 | na | - | - | 11 | 0.82 | 7 |
| Ex8-2 |
|
| 0.45 | 10 | 0.84 | 1.0 | 9 | 0.85 | 3 |
| Ex9-2 | 10 | 0.84 | 0.51 | 10 | 0.84 | 1.0 | 10 | 0.84 | 1 |
| Ex10-2 | 10 | 0.84 | 0.44 | 11 | 0.82 | 1.0 | 10 | 0.84 | 5 |
| Ex3 |
|
| 1.98 | 55 | 0.51 | 3.0 | 20 | 0.78 | 40 |
| Ex4 |
|
| 11.70 | 48 | 0.75 | 10.0 | 38 | 0.80 | 270 |
| Real data |
|
|
| 1 | 0.96 | 5.0 |
|
| 200 |
|
|
|
|
| 9.8 | 0.873 | 1.65 | 9.9 | 0.85 | 93.1 |
|
|
|
|
| 0.008 | 0.015 | 0.0005 | 0.0002 | 0.0002 | 0.00003 |
1 Two adjacent markers in the erroneous order.
2 Sequence of 3–6 markers in the erroneous order.
3 Not available: no solution was returned by ILPMap.
4 By comparing to Synchronized-TSP using Wilcoxon [26] matched pairs test.