| Literature DB >> 28961721 |
Sha Joe Zhu1,2, Jacob Almagro-Garcia1,2,3,4, Gil McVean5,2.
Abstract
Motivation: The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analyzing and interpreting such data is challenging because of the high rate of multiple infections present.Entities:
Mesh:
Year: 2018 PMID: 28961721 PMCID: PMC5870807 DOI: 10.1093/bioinformatics/btx530
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Notation used in this article
| Symbol | Notation |
|---|---|
| Marker index | |
| Sample index | |
| Read count for reference allele | |
| Read count for alternative allele | |
| Population level allele frequency (PLAF) | |
| Number of strains within sample | |
| Sequence length | |
| Proportions of strains | |
| Log titer of strains | |
| Allelic states of | |
| Allelic state of parasite strain | |
| Observed within sample allele frequency (WSAF) | |
| Unadjusted expected WSAF | |
| π | Adjusted expected WSAF |
| Ξ | Reference panel |
| Allelic state of reference panel strain | |
| Scaling factor used for genetic map | |
| Probability of read error |
Fig. 1.Experimental validation and effective number of strains inferred by DEploid. We use Reference Panel V to deconvolve the same lab-mixed samples, assuming 3 and 5 strains within a sample. Each experiment is repeated without a reference panel. Black crosses indicate the true effective number of strains. Red crosses indicate median values obtained from 30 replicates when using a panel and assuming that the maximum number of strains is 5. The coloured dots show the inferred effective number of strains across replicates with intensity proportional to fraction
Fig. 2.Comparison of true and inferred haplotypes for Chromosome 14 (2369 SNPs) in sample PG0396-C without linkage disequilibrium (top) and using Reference Panels I to IV (from the second to the bottom). Reference Panel V gives results equivalent Panel IV and Panel VI gives results similar to Panel I. Red bars mark wrongly inferred positions. The yellow, cyan and white background label the haplotype segments from strains 7G8, HB3 and Dd2 respectively. The switch errors are obtained by counting the changes of a strain segment mapped to reference strains; the genotype errors are the discordance between the strain and the mapped reference segments. These results demonstrate the value of including reference strains similar to those present in the sample being analyzed
Fig. 3.Comparison of DEploid and existing tools (COIL, pfmix, BEAGLE and SHAPEIT). (a) Estimates for the number of strains present in each mixed infection (artificially mixed in the lab) as given by COIL and DEploid. (b) Comparison of the inferred effective number of strains of each mixture as given by pfmix and DEploid. (c) Relationship between strain proportions and haplotype inference accuracy in the experimental validation for DEploid and BEAGLE/SHAPEIT (only mixtures of two strains). We used Reference Panel V to deconvolute all 27 samples with default settings. Each point represents a deconvolved haplotype with 17 530 sites. Point shape refers to strain and colour indicates the method applied. We use LOESS smoothing to show the trend of error versus strain proportion. Top panel shows switch error rate, the middle panel indicates genotyping error rate and the bottom panel indicates genotyping error rate through strain dropout. Note that zero switch error is represented as points below one. In summary, we find that DEploid results for the number of strains and relative proportions in a mixture are comparable to those achieved by existing methods, while inferred haplotypes are considerably better than from other methods
Fig. 4.Histograms of switch error and genotype error across 78 simulated Pf3k samples. We excluded four cases out of the 100 experiments where simulated haplotypes were over 99% identical and 18 cases where average coverage was below 20