| Literature DB >> 20184767 |
Ozren Polasek1, Caroline Hayward, Celine Bellenguez, Veronique Vitart, Ivana Kolcić, Ruth McQuillan, Vanja Saftić, Ulf Gyllensten, James F Wilson, Igor Rudan, Alan F Wright, Harry Campbell, Anne-Louise Leutenegger.
Abstract
BACKGROUND: Genome-wide homozygosity estimation from genomic data is becoming an increasingly interesting research topic. The aim of this study was to compare different methods for estimating individual homozygosity-by-descent based on the information from human genome-wide scans rather than genealogies. We considered the four most commonly used methods and investigated their applicability to single-nucleotide polymorphism (SNP) data in both a simulation study and by using the human genotyped data. A total of 986 inhabitants from the isolated Island of Vis, Croatia (where inbreeding is present, but no pedigree-based inbreeding was observed at the level of F > 0.0625) were included in this study. All individuals were genotyped with the Illumina HumanHap300 array with 317,503 SNP markers.Entities:
Mesh:
Year: 2010 PMID: 20184767 PMCID: PMC2848240 DOI: 10.1186/1471-2164-11-139
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Grandparental birthplace cluster of examinees in seven groups with progressively reduced expected individual genome-wide homozygosity (after removal of 63 individuals due to missing genotypes of over 5%)
| Group | Description | N | % |
|---|---|---|---|
| I | All four grandparents from Okljucna | 17 | 1.8 |
| II | All four grandparents from Komiza | 244 | 26.4 |
| III | All four grandparents from the central villages | 68 | 7.4 |
| IV | All four grandparents from Vis | 115 | 12.5 |
| V | Mixed origin (at least one grandparent from the island) | 229 | 24.8 |
| VI | All four grandparents from the rest of Croatia | 200 | 21.7 |
| VII | All four grandparents from the other countries | 50 | 5.4 |
| Total | 923 | 100.0 | |
Figure 1Geographical position and settlements on the Vis Island, Croatia.
Simulation results for offspring of first cousins (1C), second cousins (2C) and third cousins (3C).
| Correlation coefficients | |||||||
|---|---|---|---|---|---|---|---|
| Mean [95% CI] | True HBD | FPLINK | FADC | FEstimSPT | FEstim | ||
| 1C | True HBD | 0.062 [0.019-0.121] | 1.00 | 0.82 | 0.82 | 0.86 | 0.91 |
| FPLINK | 0.063 [0.005-0.129] | 0.82 | 1.00 | 1.00 | 0.95 | 0.78 | |
| FADC | 0.063 [0.006-0.129] | 0.82 | 1.00 | 1.00 | 0.95 | 0.78 | |
| FEstimSPT | 0.063 [0.004-0.128] | 0.86 | 0.95 | 0.95 | 1.00 | 0.82 | |
| FEstim | 0.063 [0.020-0.115] | 0.91 | 0.78 | 0.78 | 0.82 | 1.00 | |
| 2C | True HBD | 0.015 [0.000-0.045] | 1.00 | 0.51 | 0.51 | 0.56 | 0.87 |
| FPLINK | 0.018 [0.000-0.059] | 0.51 | 1.00 | 0.99 | 0.82 | 0.52 | |
| FADC | 0.018 [0.000-0.059] | 0.51 | 0.99 | 1.00 | 0.83 | 0.52 | |
| FEstimSPT | 0.016 [0.000-0.055] | 0.56 | 0.82 | 0.83 | 1.00 | 0.58 | |
| FEstim | 0.016 [0.000-0.045] | 0.87 | 0.52 | 0.52 | 0.58 | 1.00 | |
| 3C | True HBD | 0.004 [0.000-0.020] | 1.00 | 0.22 | 0.22 | 0.26 | 0.77 |
| FPLINK | 0.009 [0.000-0.041] | 0.22 | 1.00 | 0.97 | 0.70 | 0.25 | |
| FADC | 0.009 [0.000-0.040] | 0.22 | 0.97 | 1.00 | 0.71 | 0.26 | |
| FEstimSPT | 0.007 [0.000-0.035] | 0.26 | 0.70 | 0.71 | 1.00 | 0.30 | |
| FEstim | 0.004 [0.000-0.023] | 0.77 | 0.25 | 0.26 | 0.30 | 1.00 | |
All FPLINK and FADC values that were negative were set to zero in these correlations, in order to allow comparisons across different methods since FEstim does not provide negative values.
Correlation coefficient is for comparing each method to the true proportion of markers that are homozygous by descent ("True HBD" line).
Descriptive statistics of various homozygosity estimates and marker sets in the Vis Island dataset
| Method | Mean | St. deviation | Range | Minimum | Maximum |
|---|---|---|---|---|---|
| MLH | 0.354 | 0.005 | 0.040 | 0.325 | 0.365 |
| MLH, M0.1 | 0.390 | 0.006 | 0.044 | 0.360 | 0.405 |
| MLH, M0.05 | 0.360 | 0.005 | 0.042 | 0.329 | 0.371 |
| FPLINK | 0.009 | 0.014 | 0.116 | -0.021 | 0.094 |
| FPLINK, M0.1 | 0.009 | 0.014 | 0.112 | -0.021 | 0.090 |
| FPLINK, M0.05 | 0.009 | 0.016 | 0.110 | -0.026 | 0.084 |
| FADC | 0.009 | 0.015 | 0.112 | -0.021 | 0.091 |
| FADC, M0.1 | 0.009 | 0.014 | 0.113 | -0.023 | 0.083 |
| FADC, M0.05 | 0.008 | 0.016 | 0.109 | -0.026 | 0.083 |
| FEstimSPT | 0.007 | 0.011 | 0.083 | 0.000a | 0.083 |
| FEstimSPT, M0.1 | 0.008 | 0.012 | 0.079 | 0.000a | 0.079 |
| FEstimSPT, M0.05 | 0.008 | 0.011 | 0.084 | 0.000a | 0.084 |
| FEstim, M0.1 | 0.017 | 0.010 | 0.086 | 0.000a | 0.086 |
| FEstim, M0.05 | 0.009 | 0.011 | 0.080 | 0.000a | 0.080 |
a By construction in the methods, all estimates will be between zero and one.
Correlation coefficients between the five methods in three marker selection sets in the Vis Island dataset
| MLH | FPLINK | FADC | FEstimSPT | FEstim | ||
|---|---|---|---|---|---|---|
| Full marker set | MLH | 1.00 | -1.00 | -1.00 | -0.67 | * |
| FPLINK | -1.00 | 1.00 | 0.99 | 0.67 | * | |
| FADC | -1.00 | 0.99 | 1.00 | 0.66 | * | |
| FEstimSPT | -0.67 | 0.67 | 0.66 | 1.00 | * | |
| FEstim | * | * | * | * | * | |
| M0.1 | MLH | 1.00 | -1.00 | -0.99 | -0.71 | -0.54 |
| FPLINK | -1.00 | 1.00 | 0.99 | 0.74 | 0.60 | |
| FADC | -0.99 | 0.99 | 1.00 | 0.78 | 0.63 | |
| FEstimSPT | -0.71 | 0.74 | 0.78 | 1.00 | 0.55 | |
| FEstim | -0.54 | 0.60 | 0.63 | 0.55 | 1.00 | |
| M0.05 | MLH | 1.00 | -1.00 | -0.99 | -0.73 | -0.64 |
| FPLINK | -1.00 | 1.00 | 0.99 | 0.73 | 0.64 | |
| FADC | -0.99 | 0.99 | 1.00 | 0.75 | 0.64 | |
| FEstimSPT | -0.73 | 0.73 | 0.75 | 1.00 | 0.51 | |
| FEstim | -0.64 | 0.64 | 0.64 | 0.51 | 1.00 |
All correlations were significant at the level of P < 0.001
All FPLINK and FADC values that were negative were set to zero in these correlations, in order to allow comparisons across different methods since FEstim does not provide negative values
*FEstim was not calculated for the full marker set due to LD
Figure 2Scatterplot of FEstim using two marker selections, M0.1 and M0.05. Dashed line is a reference line (y = x).
Figure 3Grandparental birthplace clusters and their homozygosity estimates using MLH (full marker count). Numbers on the figure are P values of pair-wise comparisons between neighbouring group homozygosity estimates using Mann-Whitney test.
Figure 4Grandparental birthplace clusters and their homozygosity estimates using F. Numbers on the figure are P values of pair-wise comparisons between neighbouring group homozygosity estimates using Mann-Whitney test. Plot for FPLINK is not shown here due to very high correlation coefficient with FADC.
Figure 5Grandparental birthplace clusters and their homozygosity estimates using FEstim (M0.05 selection). Numbers on the figure are P values of pair-wise comparisons between neighbouring group homozygosity estimates using Mann-Whitney test.