| Literature DB >> 27557806 |
Gie Ken-Dror1, Ian M Hastings2.
Abstract
BACKGROUND: Haplotypes are important in anti-malarial drug resistance because genes encoding drug resistance may accumulate mutations at several codons in the same gene, each mutation increasing the level of drug resistance and, possibly, reducing the metabolic costs of previous mutation. Patients often have two or more haplotypes in their blood sample which may make it impossible to identify exactly which haplotypes they carry, and hence to measure the type and frequency of resistant haplotypes in the malaria population.Entities:
Keywords: Expectation–maximization algorithm; Haplotype reconstruction; Markov chain Monte Carlo; Multiplicity of infection; Single nucleotide polymorphisms
Mesh:
Year: 2016 PMID: 27557806 PMCID: PMC4997664 DOI: 10.1186/s12936-016-1473-5
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
How malaria datasets are simulated
| Patient # | MOI | BIOMASS | f.BIOMASS |
|
|
| Haplotype | Observed MOI | Observed genotype |
|---|---|---|---|---|---|---|---|---|---|
|
| 1 | 5.29E+10 | 1.000 | 10 | 34 | 3 | 112 |
|
|
|
| 3 | 8.06E+09 | 0.100 | 24 | 23 | 5 | 112 |
|
|
| 6.48E+10 | 0.803 | 20 | 6 | 5 | 111 | ||||
| 7.86E+09 | 0.097 | 16 | 27 | 5 | 112 | ||||
|
| 2 | 5.06E+10 | 0.474 | 24 | 35 | 3 | 111 |
|
|
| 5.62E+10 | 0.526 | 1 | 34 | 4 | 111 | ||||
|
| 2 | 5.52E+10 | 0.487 | 21 | 34 | 4 | 122 |
|
|
| 5.81E+10 | 0.513 | 18 | 33 | 4 | 111 | ||||
|
| 3 | 3.16E+10 | 0.432 | 23 | 32 | 9 | 111 |
|
|
| 1.35E+09 | 0.018 | 21 | 28 | 7 | 112 | ||||
| 4.03E+10 | 0.550 | 23 | 27 | 9 | 122 |
The ‘population’ frequencies of different MOI classes, polymorphic markers (msp1, msp2, ta109) and resistance haplotypes in the local malaria population are first defined. A number of patients are then simulated, five in this case but more usually 100. For each patient a MOI is first sampled according to the local “population” frequencies (which will depend on local transmission intensity). This MOI then determines the number of malaria clones in the patient. These clones are then simulated. The first step is to assign a biomass to the clone. The clone polymorphic markers are assigned at random according to the local true frequencies. Finally a resistance haplotype is assigned to the clone, again sampled from the local true frequencies. This process is repeated for each clone in each patient and gives rise to the data given in black font below. The genetic signal observed in each patient (last two columns) is then calculated as described in the main text. In this example, genetic signals are not detected if they constitute ≤10 % of the biomass (f.BIOMASS gives relative biomass for each clone in a patient). What is actually observed, and available for analysis, is the information given in italics; genotyping limits produce errors and those erroneous data are indicated by a asterisk: they are the data available to the researcher but do not truly reflect the genetic data of the parasites in that patient
Haplotype is the resistance haplotype for each clone. It is defined at three SNPs, for each clone: 1 = wildtype, 2 = mutat. Observed genotype is observed genotype for each patient. It is defined at three SNPs; for each SNP: 1 = wildtype alone, 2 = mutant alone, 3 = both wildtype and mutant genetic signals observed in the blood sample
Fig. 1The correlation (R ) between population/sample and estimated haplotype frequency across statistical methods among LoDSNP/MOI = 0.00/0.00
Fig. 2The change coefficient (C) between population/sample and estimated haplotype frequency across statistical methods among LoDSNP/MOI = 0.00/0.00
Fig. 3The similarity index of the estimates haplotype frequency compared population/sample haplotype frequency across statistical methods
Fig. 4The MSE of the estimated haplotype frequencies compared to population/sample haplotype frequency across statistical methods
Fig. 5The validity of the methods, calculated “population/sampled” frequencies fall out of the 95 % CI
Fig. 6Computational time for five methods
The haplotype frequency estimations for real data set (Swiss TPH), n = 82 individual
| SNP 1 | SNP 2 | MHF | R-EM | Bayesian | EM | MCMC |
|---|---|---|---|---|---|---|
| 1 | 1 | 0.928 (0.892–0.955) | 0.914 (0.858–0.970) | 0.936 (0.907–0.960) | 0.908 (0.821–0.961) | 0.914 (0.828–0.965) |
| 1 | 2 | 0.011 (0.002–0.030) | 0.016 (0.000–0.041) | 0.017 (0.006–0.037) | 0.013 (0.000–0.070) | 0.021 (0.002–0.083) |
| 2 | 1 | 0.004 (0.000–0.017) | 0.005 (0.000–0.018) | 0.007 (0.001–0.017) | 0.009 (0.000–0.064) | 0.008 (0.000–0.061) |
| 2 | 2 | 0.057 (0.033–0.089) | 0.065 (0.008–0.121) | 0.038 (0.017–0.064) | 0.069 (0.024–0.150) | 0.058 (0.017–0.135) |
SNP 1 = wildtype alone, 2 = mutant alone, MHF MalHaploFreq, R-EM malaria.em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo