| Literature DB >> 27634595 |
John D O'Brien1, Lucas Amenga-Etego2,3, Ruiqi Li4.
Abstract
BACKGROUND: The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease.Entities:
Keywords: Balding–Nichols model; COI; F-statistics; Inbreeding coefficient; MOI
Mesh:
Year: 2016 PMID: 27634595 PMCID: PMC5025560 DOI: 10.1186/s12936-016-1531-z
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
Notation for parameters used throughout the manuscript
| Parameter | Description |
|---|---|
|
| Index over number of SNPs, |
|
| Index over number of samples, |
|
| Reference/non-reference read count data in sample |
|
| Read count data in sample |
|
| Population-level non-reference allele frequency for SNP |
|
| Within-sample non-reference allele frequency for SNP j in sample i (estimate) |
|
| Inbreeding coefficient for sample |
|
| Observed heterozygosity for sample |
|
| Expected heterozygosity for bin |
|
| Estimator of |
|
| Vector of |
Parameter values for simulated data sets
| Parameter | Description | Simulation values |
|---|---|---|
| M | Number of SNPs | 10, 50, 150, 500, 1500 |
| C | Total read counts per SNP | 10, 100, 1000, 10000 |
| f | Inbreeding coefficient | 0.01, 0.1, 0.5, 0.9, 0.99 |
|
| Controls skew in allele frequency | 1, 10, 100, 1000 |
For each parameter set, 100 replicate data sets were generated
Fig. 1Inferred value over simulated values for each estimator across a range of parameter values: , , , and Vertical axis shows inferred/simulated value, with dashed line at one. Specific simulated values can be found in Table 2. Each Tukey boxplot represents 100 replicate data sets with the same parameters
Fig. 2Boostrap standard deviation for each estimator for the same parameter values as Fig. 1. Specific simulated values can be found in Table 2. Each boxplot represents 50 bootstrap samples, each with 100 replicate data sets
Fig. 3Correlation in inferred value for the four estimators across the set of 344 Ghanaian samples, with each sample represented as point. Each panel shows the correlation between the two estimators on the corresponding diagonal position. For the Bayesian case, the MAP value is reported
Fig. 4Boxplot of direct estimator for each of 344 Ghanian samples, grouped by number of inferred strains using the complex mixture model of [8]
Fig. 5Boxplot of for each sample grouped by country of origin for 12 countries from the PF3K, arranged from west to east. The more intuitive is used to emphasize where low and high levels of mixture are prevelant