| Literature DB >> 34246273 |
Gie Ken-Dror1, Pankaj Sharma2.
Abstract
BACKGROUND: Malaria patients can have two or more haplotypes in their blood sample making it challenging to identify which haplotypes they carry. In addition, there are challenges in measuring the type and frequency of resistant haplotypes in populations. This study presents a novel statistical method Gibbs sampler algorithm to investigate this issue.Entities:
Keywords: Gibbs sampler algorithm; Haplotype reconstruction; Malaria; Markov chain Monte Carlo; Multiplicity of infection; Single nucleotide polymorphisms
Year: 2021 PMID: 34246273 PMCID: PMC8272262 DOI: 10.1186/s12936-021-03841-9
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
The correlation (R) of the estimated haplotype frequencies with simulated population and sample haplotype frequencies across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| Population haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.949 | 0.913 | 0.955 | 0.977 | 0.973 | 0.961 |
| 0.10/0.05 | 0.953 | 0.932 | 0.960 | 0.978 | 0.980 | 0.970 |
| 0.20/0.10 | 0.962 | 0.949 | 0.960 | 0.977 | 0.981 | 0.975 |
| 0.30/0.15 | 0.957 | 0.962 | 0.960 | 0.974 | 0.976 | 0.977 |
| Sample haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.960 | 0.926 | 0.962 | 0.983 | 0.978 | 0.966 |
| 0.10/0.05 | 0.963 | 0.944 | 0.968 | 0.985 | 0.986 | 0.976 |
| 0.20/0.10 | 0.971 | 0.960 | 0.968 | 0.984 | 0.987 | 0.982 |
| 0.30/0.15 | 0.967 | 0.972 | 0.967 | 0.982 | 0.983 | 0.984 |
Higher value represents higher accuracy
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection
The similarity index (I) of the estimated haplotype frequencies with simulated population and sample haplotype frequencies across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| Population haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.906 | 0.879 | 0.910 | 0.938 | 0.915 | 0.919 |
| 0.10/0.05 | 0.911 | 0.894 | 0.913 | 0.943 | 0.938 | 0.934 |
| 0.20/0.10 | 0.905 | 0.909 | 0.909 | 0.940 | 0.945 | 0.940 |
| 0.30/0.15 | 0.889 | 0.918 | 0.898 | 0.930 | 0.930 | 0.932 |
| Sample haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.917 | 0.886 | 0.917 | 0.942 | 0.917 | 0.921 |
| 0.10/0.05 | 0.923 | 0.904 | 0.922 | 0.950 | 0.942 | 0.939 |
| 0.20/0.10 | 0.915 | 0.920 | 0.917 | 0.948 | 0.953 | 0.948 |
| 0.30/0.15 | 0.897 | 0.930 | 0.906 | 0.938 | 0.937 | 0.942 |
Higher value represents higher accuracy
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection
The mean squared error (MSE) of the estimated haplotype frequencies with simulated population and sample haplotype frequencies across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| Population haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.103 | 0.178 | 0.091 | 0.050 | 0.085 | 0.097 |
| 0.10/0.05 | 0.108 | 0.138 | 0.088 | 0.043 | 0.047 | 0.065 |
| 0.20/0.10 | 0.120 | 0.103 | 0.108 | 0.051 | 0.040 | 0.050 |
| 0.30/0.15 | 0.177 | 0.086 | 0.141 | 0.072 | 0.076 | 0.062 |
| Sample haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 0.082 | 0.158 | 0.077 | 0.041 | 0.080 | 0.091 |
| 0.10/0.05 | 0.086 | 0.117 | 0.072 | 0.032 | 0.038 | 0.055 |
| 0.20/0.10 | 0.095 | 0.081 | 0.089 | 0.036 | 0.027 | 0.037 |
| 0.30/0.15 | 0.150 | 0.063 | 0.122 | 0.053 | 0.058 | 0.044 |
Lower value represents higher accuracy
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection
The average change coefficient (C) of the estimated haplotype frequencies with simulated population and sample haplotype frequencies for haplotype frequency > 5% across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| Population haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 20.8 | 25.2 | 18.7 | 13.2 | 17.7 | 16.1 |
| 0.10/0.05 | 19.3 | 22.3 | 18.0 | 12.4 | 13.4 | 13.7 |
| 0.20/0.10 | 19.8 | 19.7 | 18.6 | 13.0 | 11.9 | 12.9 |
| 0.30/0.15 | 22.0 | 17.8 | 20.1 | 14.6 | 14.4 | 14.4 |
| Sample haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 18.8 | 23.7 | 17.4 | 12.0 | 17.2 | 15.4 |
| 0.10/0.05 | 17.0 | 20.5 | 16.3 | 10.9 | 12.4 | 12.5 |
| 0.20/0.10 | 17.7 | 17.5 | 17.1 | 11.4 | 10.4 | 11.3 |
| 0.30/0.15 | 20.6 | 15.4 | 18.8 | 13.0 | 13.1 | 12.8 |
Lower value represents higher accuracy
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection
Percentages of simulated sample haplotype frequencies and population haplotype frequencies that fall outside the confidence intervals of the estimated haplotype frequencies across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| Population haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 4.90 | 5.90 | 14.60 | 0.70 | 4.00 | 3.00 |
| 0.10/0.05 | 6.20 | 5.70 | 15.10 | 1.00 | 1.30 | 1.90 |
| 0.20/0.10 | 11.40 | 5.90 | 18.10 | 2.00 | 1.80 | 2.10 |
| 0.30/0.15 | 20.00 | 8.10 | 23.30 | 4.30 | 4.40 | 3.80 |
| Sample haplotype (LoDSNP/LoDMOI) | ||||||
| 0/0 | 6.90 | 8.30 | 14.80 | 4.50 | 8.00 | 6.40 |
| 0.10/0.05 | 7.50 | 7.50 | 14.50 | 4.40 | 5.10 | 4.80 |
| 0.20/0.10 | 11.50 | 6.80 | 17.10 | 4.40 | 4.40 | 4.60 |
| 0.30/0.15 | 20.30 | 8.20 | 21.60 | 5.40 | 5.40 | 5.20 |
Lower value represents higher accuracy
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection
The computational time (seconds) of the estimated haplotype frequencies across statistical methods and four conditions of LoDSNP (30%, 20%, 10%, 0%) and LoDMOI (15%, 10%, 5%, 0%)
| MHF | R-EM | Bayesian | EM | MCMC | Gibbs | |
|---|---|---|---|---|---|---|
| (LoDSNP/LoDMOI) | ||||||
| 0/0 | 29.8 | 517.5 | 3.5 | 19.9 | 5.4 | 25.5 |
| 0.10/0.05 | 31.1 | 353.6 | 3.4 | 1.9 | 5.2 | 19.2 |
| 0.20/0.10 | 25.6 | 227.5 | 3.3 | 1.3 | 4.8 | 13.1 |
| 0.30/0.15 | 23.4 | 127.7 | 3.1 | 0.9 | 4.4 | 8.7 |
Lower value represents faster calculation
MHF MalHaploFreq, R-EM malaria em, Bayesian Bayesian statistic, EM EM algorithm, MCMC Markov chain Monte Carlo, Gibbs Gibbs sampler, LoD limit of detection, SNP single nucleotide polymorphisms, MOI multiplicity of infection