| Literature DB >> 18710478 |
Olivier Demeure1, Frédéric Lecerf.
Abstract
BACKGROUND: The recent sequencing of full genomes has led to the availability of many SNP markers which are very useful for the mapping of complex traits. In livestock production, there are still no commercial arrays and many studies use home-made sets of SNPs. Thus, the current methodologies for SNP genotyping are still expensive and it is a crucial step to select the SNPs to use. Indeed, the main factors affecting the power of the linkage analyses are the density of the genetic map and the heterozygosity of markers in tested animal parents.Entities:
Year: 2008 PMID: 18710478 PMCID: PMC2525642 DOI: 10.1186/1756-0500-1-9
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Principles of MarkerSet and main parameters. a) MarkerSet selects markers in two windows separated by the Average Marker Interval (AMI), which is the whole genome size divided by the number of markers to select. The window size is a percentage of the AMI (20% by default). Shifting iteratively the windows by the AMI gives a full genome coverage. Different sets are created by using all the possible starting points (x and y). b) Several parameters and options are available in order to improve the sets quality. The space_plus and space_resampling parameters are used to enlarge the window size in case of low (or no) informativity: space_plus is set by default as 50% of the window size on each side. This is automatically performed if the informativity of markers available is lower than the defined informativity threshold. Space_resampling is used to iteratively enlarge window size (by default +1 cM on each side at each step) until markers with informativity higher than the defined resampling threshold are found (resampling option mode).
Figure 2Working principle of MarkerSet.
Figure 3Computation of informativity weight. Empirically, this sigmoid scale is obtained by computing values between -5 and +5 with the arctangent function (corresponding to -1.37 to +1.37 transformed informativity scores). For each experimental design, we re-assign the different informativity values to a -5 to +5 scale (see Figure 3). Let X = {X0, X1, ..., Xn} denoting the informativity status value, with n denoting the number of tested animals for one experimental design. Each informativity value is determined as Xi = Xi-1 + 10/n, with X0 = -5 to fit a scale from -5 to +5. The informativity score values are then expressed from -1.37 to +1.37 (corresponding to -5 and +5 arctangent values respectively). The scores obtained are finally adjusted to a 0 to 2.74 range in order to get only positive score values. vertical axis represents the informativity weight, horizontal axis the informativity values.
Figure 4Experimental designs marker informativity distribution. Each bar represents the number of markers for every informativity values for each experimental design.
Testing results for simulated data files, requesting 384 and 1536 markers.
| Score | Ratio | -r gain | Score | Ratio | -r gain | ||
| Basic | HI | 1002.86 | 1051.61 | ||||
| VI | 927.44 | 1047.9 | |||||
| LI | 317.92 | 500 | |||||
| R | HI | 1008.3 | 0.54% | 1051.61 | 0% | ||
| VI | 932.38 | 0.53% | 1047.9 | 0% | |||
| LI | 320.59 | 0.84% | 500 | 0% | |||
| MD | MD | 1879.48 | 3513.19 | ||||
| HI | 2774.49 | 0.99 | 3959.82 | 0.96 | |||
| VI | 1940.27 | 0.96 | 3760.68 | 0.94 | |||
| LI | 428.14 | 0.80 | 769.18 | 0.49 | |||
| R + MD | MD | 2428.19 | 29.19% | 3513.19 | 0% | ||
| HI | 3487.45 | 0.98 | 25.70% | 3959.82 | 0.96 | 0% | |
| VI | 2519.01 | 0.82 | 29.83% | 3760.68 | 0.94 | 0% | |
| LI | 539.56 | 0.53 | 26.02% | 769.18 | 0.49 | 0% | |
For this purpose, six files with different marker informativity status have been generated with two variables. The first one is the marker density (5K markers – LD for low density or 40K markers – HD for High Density spanned homogeneously on the genome). The second one is marker informativity distribution. Considering a total of 100 reference animals, the following conditions have been explored: markers with heterozygozity values ranging from 50 to 100 (High Informativity, HI), 0 to 100 (Various Informativity, VI) or 0 to 50 (Low Informativity, LI). For each markers panel and condition, the maximal available informativity score (max info), the selected set score, the multidesign/monodesign ratio and the score gain obtained by using the resampling options (-r gain) are detailed. R and MD refer at resampling option activation and multidesign option activation respectively. Scores results are depending on marker density and informativity distribution (better with HI and lower with LI files). Nevertheless, there's only a slight score difference between HI and VI, showing the efficiency of MarkerSet to select the most informative markers. Resampling option is more useful with LD files but can have an impact on the loss of informativity (Ratio) in multidesign mode with LI file.
Testing results for the real data set, requesting 384 and 1536 markers.
| Max info | Score | Ratio | -r gain | Dmax | Dmin | AveD | StD | Markers | 0 markers | ||
| Basic | Exp1 | 3509.86 | 528.67 | 28.6 | 0.4 | 9.2 | 2.2 | 380 | 72 | ||
| Exp2 | 5958.31 | 808.76 | 28.7 | 0.4 | 9.2 | 2.2 | 380 | 32 | |||
| Exp3 | 5685.11 | 680.55 | 20.6 | 0.6 | 9.2 | 2 | 380 | 10 | |||
| Exp4 | 6503.60 | 785.52 | 17.8 | 0.2 | 9.1 | 2.1 | 382 | 7 | |||
| Exp5 | 5293.64 | 673.66 | 17.5 | 0.2 | 9.1 | 2 | 382 | 26 | |||
| R | Exp1 | 605.03 | 14.44% | 90.3 | 0.4 | 9.5 | 5.8 | 366 | 0 | ||
| Exp2 | 887.82 | 9.78% | 20.6 | 0.3 | 9.2 | 2.5 | 383 | 0 | |||
| Exp3 | 701.85 | 3.13% | 16.8 | 0.5 | 9.1 | 2.1 | 384 | 0 | |||
| Exp4 | 803.9 | 2.34% | 16.6 | 0 | 9.1 | 2.1 | 384 | 0 | |||
| Exp5 | 713.43 | 5.90% | 26.9 | 0.5 | 9.2 | 2.5 | 380 | 0 | |||
| MD | MD | 3581.64 | 979.58 | 11.7 | 0.1 | 2.5 | 0.9 | 1461 | 137 | ||
| Exp1 | 801.79 | 0.81 | |||||||||
| Exp2 | 1512.85 | 0.92 | |||||||||
| Exp3 | 1282.7 | 0.89 | |||||||||
| Exp4 | 1494.43 | 0.9 | |||||||||
| Exp5 | 1229.34 | 0.89 | |||||||||
| R + MD | MD | 1114.33 | 13.76% | 11.3 | 0.1 | 2.4 | 0.9 | 1483 | 0 | ||
| Exp1 | 898.48 | 0.47 | 12.06% | ||||||||
| Exp2 | 1720.82 | 0.6 | 13.75% | ||||||||
| Exp3 | 1446.47 | 0.72 | 12.77% | ||||||||
| Exp4 | 1695.42 | 0.73 | 13.45% | ||||||||
| Exp5 | 1400.07 | 0.63 | 13.89% | ||||||||
The data file includes the genotype of 9216 SNPs covering the whole genome for The 26 F1 sires of five real chicken F2 designs (4 in Exp1, 5 in Exp3 and Exp5 and 6 in Exp2 and Exp4). For each markers panel and condition, the maximal available informativity score (max info), the selected set score, the multidesign/monodesign ratio, the score gain obtained by using the resampling (-r gain), the maximal (Dmax), minimal (Dmin), average (AveD) and standard deviation (StD) distances between two markers, the number of selected markers and the number of no informative markers in this set are detailed. R and MD refer at resampling option activation and multidesign option activation, respectively. With the resampling option, the gain is inversely proportional to the maximum informativity, except for Exp2, because of an overrepresentation of markers heterozygous for 0 and 6 animals in this experimental design. The results for multidesign mode (1536 markers) are similar to those obtained with the 5K markers file: the ratio is about 0.90, and the resampling option permits the increase of the number of selected markers (and thus the final score) without significant modifications of the average distance and the standard deviation.
Figure 5Impacts of window sizes upon informativity score and standard deviation The horizontal axis represents the percentage of AMI used to define the window size (15 to 40%). The left vertical axis represents the best marker set score (full squares), and the right vertical axis the standard deviation (white diamonds). The simulation mode was performed on experimental design 1 for 384 markers requested without the resampling options.