| Literature DB >> 32102657 |
Abstract
BACKGROUND: Measures of linkage disequilibrium (LD) play a key role in a wide range of applications from disease association to demographic history estimation. The true population LD cannot be measured directly and instead can only be inferred from genetic samples, which are unavoidably subject to measurement error. Previous studies of r2 (a measure of LD), such as the bias due to finite sample size and its variance, were based on the special case that the true population-wise LD is zero. These results generally do not hold for non-zero [Formula: see text] values, which are more common in real genetic data.Entities:
Keywords: Linkage disequilibrium; Maximum likelihood estimation; Sampling error
Mesh:
Year: 2020 PMID: 32102657 PMCID: PMC7045472 DOI: 10.1186/s12863-020-0818-9
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Expected genotypic frequencies under HWE
The expected frequency of genotypes given the haplotype frequencies under HWE [2]. All the expected frequencies add up to one
Fig. 1Plots of against under different sample sizes: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right). A linear regression (red line) is fitted to each plot and the estimates are reported in Table 2
Slope and intercept estimates from phased data
| 1/(2 | Intercept estimate | 1 − 1/(2 | Slope estimate | |
|---|---|---|---|---|
| 20 | 0.025 | 0.02357 [0.02096, 0.02619] | 0.975 | 0.96700 [0.95638, 0.97761] |
| 40 | 0.0125 | 0.01174 [0.00985, 0.01363] | 0.9875 | 0.99060 [0.98293, 0.99827] |
| 60 | 0.0083 | 0.00885 [0.00731, 0.01040] | 0.9917 | 0.99136 [0.98506, 0.99766] |
| 80 | 0.0063 | 0.00580 [0.00447, 0.00712] | 0.9937 | 0.99201 [0.98665, 0.99738] |
Slope and intercept estimates for the plots in Fig. 1. 95% C.I.s are reported in brackets
Fig. 2Plots of against under different sample sizes: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right). A linear regression (red line) is fitted to each plot and the estimates are reported in Table 3. Simulation setting is described in text
Slope and intercept estimates from unphased data
| 1/ | Intercept estimate | 1 − 1/ | Slope estimate | |
|---|---|---|---|---|
| 20 | 0.05 | 0.04740 [0.04451, 0.05029] | 0.95 | 0.93576 [0.92390, 0.94761] |
| 40 | 0.025 | 0.02243 [0.02038, 0.02447] | 0.975 | 0.96722 [0.95907, 0.97537] |
| 60 | 0.0167 | 0.01574 [0.01398, 0.01750] | 0.9833 | 0.97750 [0.97029, 0.98472] |
| 80 | 0.0125 | 0.01157 [0.01009, 0.01306] | 0.9875 | 0.98340 [0.97741, 0.98941] |
Slope and intercept estimates for the plots in Fig. 2. 95% C.I.s are reported in brackets
Fig. 3Plots of variance of against under different sample sizes: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right). The red lines shows the functional form of
Fig. 4Plots of variance of against under different sample sizes: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right). The red lines shows the functional form of
Fig. 5Plots of relative log-likelihood against relative tolerance for the two maximisation routines using unphased data: the EM algorithm (black circles), and Constrained ML (red crosses). Four different sample sizes were examined: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right). The global maximum of the log-likelihood has the relative value of 1
Fig. 6Plots of I index against relative tolerance for the two maximisation routines using unphased data: the EM algorithm (black circles), and Constrained ML (red crosses). Four different sample sizes were examined: 20 (top left), 40 (top right), 60 (bottom left), and 80 (bottom right)
Selected results from the analysis of APOE dataset
| Loci pair | Real count | MIDAS | CubeX 1st solution | CubeX 2nd solution | CML | Possible alternative solution? | CML alternative solution | LRT | CML Decision | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1–2 | 14 | 116 | 14 | 116 | 14 | 116 | NA | 14 | 116 | No | NA | NA | NA | ||
| 0 | 30 | 0 | 30 | 0 | 30 | 0 | 30 | ||||||||
| 1–3 | 69 | 61 | 64 | 66 | 64 | 66 | NA | 64 | 66 | No | NA | NA | NA | ||
| 17 | 13 | 22 | 8 | 22 | 8 | 22 | 8 | ||||||||
| 1–5 | 120 | 10 | 121 | 9 | 121 | 9 | 120 | 10 | 121 | 9 | Yes | 120 | 10 | 0.15 | Accept alternative |
| 30 | 0 | 29 | 1 | 29 | 1 | 30 | 0 | 29 | 1 | 30 | 0 | ||||
| 1–9 | 9 | 121 | 9 | 121 | 9 | 121 | 12 | 118 | 9 | 121 | Yes | 12 | 118 | 1.84 | Accept alternative |
| 3 | 27 | 3 | 27 | 3 | 27 | 0 | 30 | 3 | 27 | 0 | 30 | ||||
| 4–8 | 98 | 3 | 99 | 2 | 99 | 2 | 98 | 3 | 99 | 2 | Yes | 98 | 3 | 0.06 | Accept alternative |
| 59 | 0 | 58 | 1 | 58 | 1 | 59 | 0 | 58 | 1 | 59 | 0 | ||||
| 5–7 | 141 | 9 | 141 | 9 | 141 | 9 | 131 | 19 | 141 | 9 | Yes | 131 | 19 | 45.91 | Reject alternative |
| 0 | 10 | 0 | 10 | 0 | 10 | 10 | 0 | 0 | 10 | 10 | 0 | ||||
| 5–9 | 12 | 138 | 11 | 139 | 11 | 139 | 12 | 138 | 11 | 139 | Yes | 12 | 138 | 1.37 | Accept alternative |
| 0 | 10 | 1 | 9 | 1 | 9 | 0 | 10 | 1 | 9 | 0 | 10 | ||||
Selected results from the analysis of APOE dataset. The second column shows the real haplotype counts which had been experimentally identified. MIDAS estimates are shown in the next column. CubeX 1st solution refers to the α or β solution set. CubeX 2nd solution refers to the γ solution set should it exist. Constrained ML’s estimates are presented in the sixth column. Log-likelihood was maximised within the entire feasible region. The next step is to decide whether a simpler solution is possible (e.g. there are only 3 haplotypes instead of 4). If we cannot rule of the possibility of having a simpler solution, the log-likelihood is then maximised within the restricted range, with 2 free parameters. LRT statistics are reported, which equal 2 times the differences between the log-likelihoods of the two solutions. If the LRT statistic is greater than , we reject the alternative (simpler) solution at 5% confidence level. Complete results are shown in Additional file 1