| Literature DB >> 22662176 |
Abstract
Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This paper develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction with constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50,000 or greater in contrast to 10,000, and the estimates of the recent homogenization events are agree with the "Out of Africa" hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. The results show that significant discrepancies can exist between the estimates.Entities:
Mesh:
Year: 2012 PMID: 22662176 PMCID: PMC3360760 DOI: 10.1371/journal.pone.0037588
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration of numerical instability of the expression (3).
The moment generating function is plotted for the same range of values of in red and blue dots by using the expressions (11) and (3), respectively. The numerical instability of the expression (3) is obvious because the values of must be between 0 and 1 for any positive .
Summary of the DNA sequence data sets from loci 22q11.2, 17q23, Xq13.3, and YAP.
| Locus | seq. length |
|
|
|
| 22q11.2 | 10 | 54(40) | 44(88) | 75(128) |
| 17q23 | 20 | 57(10) | 37(12) | 63(22) |
| Xq13.3 | 10 | 24(23) | 17(46) | 33(69) |
| YAP | 2.6 | 3(8) | 1(7) | 3(15) |
Sequence length in kilobases.
The number of polymorphic sites in a sample of sequences.
Estimates for the elapsed times since a recent homogenization event for each of the four loci.
| Loci |
|
|
|
|
|
|
| 22q11.2 | 1400 | 220(120, 380) | 117 | 640 | 72 (44, 116) | 43 |
| 17q23 | 800 | 320 (220, 500) | 247 | 360 | 160 (105, 260) | 134 |
| Xq13.3 | 525 | 127 (71, 240) | 90 | 120 | 41 (22, 75) | 32 |
| YAP | 350 | 195(40, | 125 | 94 | 55(0, 450) | 30 |
| Combined | 1300 | 241(183, 316) | 145 | 360 | 92 (67, 125) | 55 |
The estimates of the elapsed times are in 1000 years Before Present. The estimate of are based on formula (13) when is equal to 10000.
The estimate of based on formula (13) when is equal to 50000.
Under the “simple” model an estimator for is denoted as . It is equal to based on the data at a single locus; for data sets from independent loci, it is equal to .
The values of minus twice of the log of likelihood-ratio statistics for the data sets from each of the four loci.
| Locus |
|
|
| 22q11.2 | 19.8 (8.5e−6) | 48.6 (0.3e−11) |
| 17q23 | 11.2 (0.8e−3) | 19.8 (0.8e−5) |
| Xq13.3 | 20.3 (7e−6) | 46.6 (0.8e−11) |
| YAP | 2(0.16) | 4.6(0.03) |
The values of minus twice of log of likelihood-ratio statistics for Test III.
| Compared loci |
|
|
| (22q11.2, Xq13.3) | 1.7 (0.19) | 2 (0.15) |
| (17q23, 22q11.2) | 1.3 (0.25) | 5.6 (0.02) |
| (Xq13.3, 17q23) | 5.8 (0.02) | 12.5 (0.0004) |
| (YAP, 17q23) | 0.3 (0.6) | 0.9 (0.3) |
| (YAP, Xq13.3) | 0.2 (0.65) | 0.08 (0.8) |
| (YAP, 22q11.2) | 0.01 (0.9) | 0.06 (0.8) |
| (Xq13.3, 17q23, 22q11.2) | 6 (0.05) | 14 (0.001) |
| (YAP, Xq13.3, 17q23, 22q11.2) | 53 (1.8e−11) | 14 (0.003) |
Sets of compared loci.
Summary of Shankarappa et al's [46] data.
| Patient number | seroconverion time (in years) | sample size | number of polymorphic sites | viral load |
| 1 | 0.28 | 7 | 7 | 6637 |
| 2 | 0.42 | 21 | 33 | 68706 |
| 3 | 0.35 | 10 | 9 | 598 |
| 5 | 0.25 | 22 | 41 | 7798 |
| 6 | 0.21 | 19 | 21 | 4709 |
| 7 | 0.2 | 19 | 32 | 6251 |
| 8 | 0.29 | 7 | 5 | 4045 |
| 9 | 0.25 | 10 | 31 | 145545 |
| 11 | 0.21 | 8 | 13 | 478 |
I used the same notation for the patients as in [46].
In [46] seroconversion times in the patients are estimated as the mid-time point between the last HIV negative screening test and the first HIV positive screening test.
For each patient, the samples of DNA sequences are drawn from HIV populations in HIV patients at the time of the first HIV positive screening test.
The number of polymorphic sites in the samples.
For each patient viral load per milliliter is measured at the time of the first HIV positive screening test.
The estimates of the seroconversion times ( in coalescence units) in the nine patients.
| Patient |
|
|
|
| 1 | 0.008 (0.003, 0.016) | 36.6 | 1.2e−9 |
| 2 | 0.0012 (0.0008, 0.0017) | 162.1 | 0 |
| 3 | 0.093 (0.04, 0.21) | 13.4 | 2.5e−4 |
| 5 | 0.013 (0.009, 0.018) | 75.3 | 0 |
| 6 | 0.013 (0.008, 0.02) | 65.6 | 5.5e−16 |
| 7 | 0.015 (0.01, 0.02) | 64.4 | 9.9e−16 |
| 8 | 0.009 (0.003, 0.02) | 33.5 | 7.1e−9 |
| 9 | 0.001 (0.00075, 0.0015) | 88.9 | 0 |
| 11 | 0.23 (0.08, 0.58) | 6.2 | 0.012 |
The maximum likelihood estimates of in coalescent units.
Figure 2Comparison of two estimates of the seroconversion time for each of the nine patients.
The effective generation time in (A) and (B) are considered to be day and days, respectively. Maximum likelihood and 95% confidence interval estimates of the time of HIV seroconversion in years since the first HIV positive screening test are shown in full dots and error bars, respectively. Empty circles represent the mid-point estimates of the seroconversion times [46].
Figure 3The likelihood function of
for two values of . For a sample of 15 DNA sequences with 25 polymorphic sites at a locus, the likelihood function of the elapsed time is plotted for the values of and in red and blue, respectively.