| Literature DB >> 22984448 |
Daniel Kling1, Thore Egeland, Petter Mostad.
Abstract
In a number of applications there is a need to determine the most likely pedigree for a group of persons based on genetic markers. Adequate models are needed to reach this goal. The markers used to perform the statistical calculations can be linked and there may also be linkage disequilibrium (LD) in the population. The purpose of this paper is to present a graphical Bayesian Network framework to deal with such data. Potential LD is normally ignored and it is important to verify that the resulting calculations are not biased. Even if linkage does not influence results for regular paternity cases, it may have substantial impact on likelihood ratios involving other, more extended pedigrees. Models for LD influence likelihoods for all pedigrees to some degree and an initial estimate of the impact of ignoring LD and/or linkage is desirable, going beyond mere rules of thumb based on marker distance. Furthermore, we show how one can readily include a mutation model in the Bayesian Network; extending other programs or formulas to include such models may require considerable amounts of work and will in many case not be practical. As an example, we consider the two STR markers vWa and D12S391. We estimate probabilities for population haplotypes to account for LD using a method based on data from trios, while an estimate for the degree of linkage is taken from the literature. The results show that accounting for haplotype frequencies is unnecessary in most cases for this specific pair of markers. When doing calculations on regular paternity cases, the markers can be considered statistically independent. In more complex cases of disputed relatedness, for instance cases involving siblings or so-called deficient cases, or when small differences in the LR matter, independence should not be assumed. (The networks are freely available at http://arken.umb.no/~dakl/BayesianNetworks.).Entities:
Mesh:
Substances:
Year: 2012 PMID: 22984448 PMCID: PMC3439468 DOI: 10.1371/journal.pone.0043873
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sample allele frequencies for STR loci vWa and D12S391, based on 444 unrelated Norwegian individuals.
| vWa | D12S391 | |
|
| 0.08896 | |
|
| 0.0732 | 0.04392 |
|
| 0.21621 | 0.02252 |
|
| 0.30968 | 0.12387 |
|
| 0.01351 | |
|
| 0.1982 | 0.19369 |
|
| 0.01351 | |
|
| 0.10248 | 0.10698 |
|
| 0.01126 | |
|
| 0.10135 | 0.10811 |
|
| 0.00114 | 0.10248 |
|
| 0.01149 | |
|
| 0.09234 | |
|
| 0.03829 | |
|
| 0.00901 | |
|
| 0.00338 | |
|
| 0.00225 |
Conditional allele probabilities for the alleles at D12S391 given the allele at vWa.
| 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | |
|
| 0.038049 | 0.030969 | 0.036497 | 0.043637 | 0.056745 | 0.054825 | 0.004392 | 0.021959 |
|
| 0.000282 | 0.030644 | 0.020842 | 0.029067 | 0.028376 | 0.011114 | 0.002252 | 0.011261 |
|
| 0.176548 | 0.062483 | 0.156082 | 0.112768 | 0.091095 | 0.131781 | 0.312387 | 0.061937 |
|
| 0.000169 | 0.000205 | 0.015614 | 0.018165 | 0.017026 | 0.011017 | 0.001351 | 0.006757 |
|
| 0.177421 | 0.199904 | 0.177169 | 0.207224 | 0.221433 | 0.154279 | 0.119369 | 0.096847 |
|
| 0.012669 | 0.015356 | 0.005251 | 0.014542 | 0.028325 | 0.000147 | 0.001351 | 0.006757 |
|
| 0.138837 | 0.062227 | 0.114544 | 0.09459 | 0.113599 | 0.131598 | 0.010698 | 0.053491 |
|
| 0.000141 | 0.015322 | 0.015602 | 0.018157 | 0.005713 | 0.000122 | 0.001126 | 0.005631 |
|
| 0.076351 | 0.168305 | 0.083462 | 0.090971 | 0.136204 | 0.142479 | 0.110811 | 0.054054 |
|
| 0.126281 | 0.153068 | 0.109339 | 0.098197 | 0.074025 | 0.09894 | 0.110248 | 0.051239 |
|
| 0.076437 | 0.122953 | 0.104222 | 0.14172 | 0.096694 | 0.109945 | 0.211487 | 0.057433 |
|
| 0.126154 | 0.062005 | 0.098924 | 0.083668 | 0.079618 | 0.109699 | 0.109234 | 0.546171 |
|
| 0.037979 | 0.061186 | 0.046831 | 0.032747 | 0.039764 | 0.022155 | 0.003829 | 0.019144 |
|
| 0.012613 | 0.015288 | 0.005228 | 0.010902 | 0.005701 | 0.010968 | 0.000901 | 0.004505 |
|
| 4.22E-05 | 5.12E-05 | 0.01038 | 1.22E-05 | 0.005669 | 3.67E-05 | 0.000338 | 0.001689 |
|
| 2.82E-05 | 3.41E-05 | 1.17E-05 | 0.003631 | 1.27E-05 | 0.010894 | 0.000225 | 0.001126 |
To account for unseen haplotypes, probabilities were estimated using a Dirichlet distribution. Each row indicates the allele at vWa, while each column indicates the allele at D12S391. The table should be interpreted as follows, for a given allele at vWa (top row), the corresponding conditional allele probabilities for D12S391 are given (column).
Figure 1Bayesian network describing the basic layout for a paternity case.
Figure 2Bayesian network describing a paternity case.
The Recombination node contains the probability for a recombination to occur, i.e., the recombination rate. The nodes P/M tell whether the vWa paternal or maternal allele is inherited. The LD node is connected to the paternal and maternal allele nodes and decides whether or not to use conditional allele probabilities. Furthermore, the node Is Father? contains the different hypotheses.
Figure 3Bayesian network describing a sibling case.
The nodes P/M tell whether the vWa paternal or maternal allele is inherited. The P/M nodes connected to the D12S391 allele also contains the recombination frequency. The LD node is connected to the paternal and maternal allele nodes and decides whether or not to use conditional allele probabilities. Furthermore, the node Are Siblings? contains the different hypotheses.
Comparison of calculated likelihood ratios (LR) based on the genotype data from STR loci vWa and D12S391, on a selection of real cases.
| Case id | M1 | M2 | M3 | Comparison | LRM1/LRComparison | LRM1/LRM2 | LRM2/LRM3 |
|
| |||||||
| 1 | 3.608 | 3.608 | 1.909 | 3.78 | 0.954 | - | 1.89 |
| 2 | 3.038 | 3.038 | 2.769 | 3.099 | 0.98 | - | 1.097 |
| 3 | 25.455 | 25.455 | 35.036 | 24.243 | 1.05 | - | 0.727 |
| 4 | 8.723 | 8.723 | 9.638 | 9.447 | 0.923 | - | 0.905 |
| 5 | 8.93 | 8.93 | 10.792 | 9.036 | 0.988 | - | 0.827 |
| 6 | 39.487 | 39.487 | 51.46 | 41.563 | 0.95 | - | 0.767 |
| 7 | 11.761 | 11.761 | 11.859 | 10.631 | 1.106 | - | 0.992 |
| 8 | 2.943 | 2.943 | 3.721 | 2.66 | 1.106 | - | 0.791 |
| 9 | 5.956 | 5.956 | 6.457 | 6.463 | 0.922 | - | 0.922 |
| 10 | 6.81 | 6.81 | 8.912 | 6.815 | 0.999 | - | 0.764 |
| WCS | 750.879 | 750.879 | 308.597 | 404 | 1.859 | - | 2.433 |
|
| |||||||
| 11 | 5.567 | 5.567 | 5.055 | 5.239 | 1.063 | - | 1.101 |
| 12 | 96.809 | 96.809 | 107.696 | 89.208 | 1.085 | - | 0.899 |
| 13 | 11.626 | 11.626 | 7.026 | 10.834 | 1.073 | - | 1.655 |
| 14 | 87.652 | 87.652 | 52.191 | 54.74 | 1.601 | - | 1.679 |
| 15 | 8.32 | 8.32 | 7.772 | 9.498 | 0.876 | - | 1.071 |
| 16 | 29.479 | 29.479 | 21.491 | 28.919 | 1.019 | - | 1.372 |
| 17 | 6.214 | 6.214 | 7.347 | 6.624 | 0.938 | - | 0.846 |
| 18 | 11.234 | 11.234 | 9.624 | 11.628 | 0.966 | - | 1.167 |
| 19 | 24.483 | 24.483 | 33.811 | 24.8 | 0.987 | - | 0.724 |
| 20 | 11.635 | 11.635 | 12.358 | 10.827 | 1.075 | - | 0.941 |
| WCS | 2917.855 | 2917.855 | 736.46 | 2130 | 1.37 | - | 3.962 |
|
| |||||||
| 21 | 9.917 | 7.097 | 9.732 | 9.766 | 1.015 | 1.397 | 0.729 |
| 22 | 0.264 | 0.287 | 0.296 | 0.405 | 0.652 | 0.92 | 0.97 |
| 23 | 38.841 | 62.98 | 71.993 | 38.331 | 1.013 | 0.617 | 0.875 |
| 24 | 0.351 | 0.339 | 0.314 | 0.34 | 1.032 | 1.035 | 1.08 |
| 25 | 1.331 | 1.584 | 1.439 | 1.331 | 1 | 0.84 | 1.101 |
| 26 | 0.46 | 0.621 | 0.633 | 0.455 | 1.011 | 0.741 | 0.981 |
| 27 | 0.378 | 0.354 | 0.363 | 0.38 | 0.995 | 1.068 | 0.975 |
| 28 | 0.83 | 0.622 | 0.612 | 0.815 | 1.018 | 1.334 | 1.016 |
| 29 | 8.61 | 10.962 | 11.92 | 9.1278 | 0.943 | 0.785 | 0.92 |
| 30 | 13.772 | 19.825 | 19.367 | 13.763 | 1.001 | 0.695 | 1.024 |
| WCS | 200.938 | 298.868 | 134.619 | 115.694 | 1.737 | 0.672 | 2.22 |
Three different methods have been used, denoted M1, M2 and M3. M1: 50% recombination rate, LD not considered; M2: 10% recombination, LD not considered; M3: 10% recombination, LD taken into consideration. The column Comparison is the LR obtained using the software Familias with the standard Norwegian population database. WCS. abbreviates Worst Case Scenario and attempts to simulate a case where the likelihood ratios should differ the most due to linkage disequilibrium. The columns to the right display three relevant quotients for each case; Note that the LR calculated using M2 and the quotient LRM1/LRM2 is only relevant in the non-paternity cases, since recombination alone will not effect the likelihoods for these cases.
Comparison of calculated likelihood ratios (LR) based on the genotype data from STR loci D5S818 and CSF1PO, on a selection of cases.
| Case id | M1 | M2 | M3 | Comparison | LRM1/LRComparison | LRM1/LRM2 | LRM2/LRM3 |
|
| |||||||
| 1 | 1.4632 | 1.4632 | 1.5058 | 1.4215 | 1.029 | - | 0.972 |
| 2 | 1.062 | 1.062 | 1.034 | 1.037 | 1.024 | - | 1.027 |
| 3 | 4.84 | 4.84 | 7.998 | 5.176 | 0.935 | - | 0.605 |
| 4 | 395.668 | 395.668 | 362.636 | 485.808 | 0.814 | - | 1.091 |
| 5 | 9.598 | 9.598 | 9.016 | 10.246 | 0.937 | - | 1.065 |
| 6 | 74.489 | 74.489 | 80.653 | 100.604 | 0.74 | - | 0.924 |
| 7 | 8.072 | 8.072 | 8.013 | 7.734 | 1.044 | - | 1.007 |
| 8 | 19.193 | 19.193 | 20.172 | 20.491 | 0.937 | - | 0.951 |
| 9 | 49.869 | 49.869 | 42.537 | 55.005 | 0.907 | - | 1.172 |
| 10 | 77.215 | 77.215 | 121.659 | 114.202 | 0.676 | - | 0.635 |
| W.C.S. | 1520.143 | 1520.143 | 11036.52 | 3656 | 0.416 | - | 0.138 |
|
| |||||||
| 11 | 40.007 | 40.007 | 64.944 | 48.709 | 0.821 | - | 0.616 |
| 12 | 11.369 | 11.369 | 11.272 | 9.947 | 1.143 | - | 1.009 |
| 13 | 5.746 | 5.746 | 5.577 | 8.65 | 0.664 | - | 1.03 |
| 14 | 101.284 | 101.284 | 85.736 | 63.917 | 1.585 | - | 1.181 |
| 15 | 604.62 | 604.62 | 383.645 | 777.506 | 0.778 | - | 1.576 |
| 16 | 23.505 | 23.505 | 22.616 | 25.496 | 0.922 | - | 1.039 |
| 17 | 76.821 | 76.821 | 52.45 | 87.727 | 0.876 | - | 1.465 |
| 18 | 1838.249 | 1838.249 | 1964.408 | 2138.332 | 0.86 | - | 0.936 |
| 19 | 394.116 | 394.116 | 216.855 | 346.241 | 1.138 | - | 1.817 |
| 20 | 53.305 | 53.305 | 69.457 | 66.978 | 0.796 | - | 0.767 |
| W.C.S. | 139.278 | 139.278 | 709.883 | 138.139 | 1.008 | - | 0.196 |
|
| |||||||
| 21 | 6.218 | 5.808 | 5.02 | 6.742 | 0.922 | 1.071 | 1.157 |
| 22 | 0.906 | 0.906 | 0.93 | 0.696 | 1.301 | 1 | 0.974 |
| 23 | 4.202 | 3.99 | 3.92 | 3.75 | 1.121 | 1.053 | 1.018 |
| 24 | 3.632 | 3.343 | 3.499 | 2.856 | 1.272 | 1.086 | 0.955 |
| 25 | 0.247 | 0.265 | 0.139 | 0.255 | 0.968 | 0.935 | 1.903 |
| 26 | 6.407 | 6.407 | 6.165 | 4.441 | 1.443 | 1 | 1.039 |
| 27 | 0.158 | 0.177 | 0.171 | 0.154 | 1.022 | 0.892 | 1.037 |
| 28 | 0.256 | 0.256 | 0.157 | 0.16 | 1.596 | 1 | 1.636 |
| 29 | 0.5 | 0.5 | 0.548 | 0.25 | 2.001 | 1 | 0.912 |
| 30 | 0.758 | 0.758 | 0.727 | 0.563 | 1.347 | 1 | 1.043 |
| W.C.S. | 23254.65 | 24999 | 40649.41 | 93209.73 | 0.249 | 0.93 | 0.615 |
Three different methods have been used, denoted M1, M2 and M3. M1: 50% recombination rate and LD not considered. M2: 30% recombination and LD not considered, M3: 30% recombination and LD taken into consideration. The column Comparison is the LR obtained using the software Familias with the standard Norwegian population database. WCS abbreviates Worst Case Scenario and attempts to simulate a case where the likelihood ratios should differ the most due to linkage disequilibrium. The columns to the right display three relevant quotients for each case; Note that the LR calculated using M2 and the quotient LRM1/LRM2 is only relevant in the non-paternity cases, since recombination alone will not effect the likelihoods for these cases.