Literature DB >> 24443379

estMOI: estimating multiplicity of infection using parasite deep sequencing data.

Samuel A Assefa1, Mark D Preston, Susana Campino, Harold Ocholla, Colin J Sutherland, Taane G Clark.   

Abstract

Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission.

Entities:  

Mesh:

Year:  2014        PMID: 24443379      PMCID: PMC3998131          DOI: 10.1093/bioinformatics/btu005

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 BACKGROUND

Multiplicity of infection (MOI) refers to the number of different parasite genotypes co-infecting a single host. It is an epidemiological measure that can improve the understanding of many areas of parasitology, including the dynamics of infections, pathogenesis, effect of transmission intensity, drug efficacy and parasite genetics (Ross ). In malaria endemic areas, MOI can be a useful indicator of the transmission level, where the latter is positively correlated with the average number of malaria parasite strains in an individual (Babiker ). The merozoite surface proteins (MSPs) are involved in erythrocyte invasion and affect parasite density and eventually severe pathology. Genotyping of the MSP2 gene is a standard method for assessing MOI in Plasmodium falciparum studies, as it is highly polymorphic in length and sequence (Ntoumi ). High-resolution genotyping of the MSP2 gene can distinguish between infections by detecting the presence of different alleles at a polymorphic marker. However, the number of infections may not be accurately counted, as parasites from multiple infections may carry the same allele. Several methods have been developed using observed allele frequencies (Ross et al., 2011), but they do not account for detectability. Polymerase chain reaction-based approaches struggle to detect parasites at low levels in samples, leading to model estimates referring only to infections that would have been counted if they had been distinguished in the genotyping. Approaches that consider the whole genome rather than candidate regions, and exploit the potentially high depth of coverage from sequencing technologies allowing detection of parasite at low levels, are likely to be more informative (Bowman ). Massively parallelizable sequencing technologies are providing whole-genome data on various parasites including P.falciparum (haploid genome, 14 chromosomes, size 23 Mb, 19% GC content) isolates to high coverage depth (Robinson ). In this setting, the presence of heterozygous genotypes not only provides evidence of MOI but also complicates population genetic and diversity analysis. For example, the calculation of heterozygosity and detection of signatures of recent positive selection assume clonal samples, and de novo assembly of genomes derived from multiclone infections can lead to potentially cryptic gene characterizations. A common approach to overcome the problem of multiplicity is to culture the parasite in the laboratory to (near-) clonality, but this is time-consuming and not feasible for large numbers of field isolates. In addition, parasites sampled after a long-term culture will not fully represent the genotypes that are circulating in the population at the time of sampling due to clone loss and chromosomal deletions. Here, we adopt a sequence-based genome-wide approach to estimating MOI, which considers all possible local estimates based on combinations of alleles from read pairs. Being genome-wide, it is possible to identify genic regions of high multiplicity, thereby informing the development of new assays for inference. Our approach has been implemented in the estMOI software package.

2 ALGORITHM AND APPLICATION

The estMOI is a suite of Perl scripts that estimates the MOI locally in the genome and then overall to obtain a global estimate. The inputs are alignment (BAM files), variant regions in the Variant Call Format/VCF and an optional file of regions to exclude from analysis. Minimal multiplicity is inferred by considering the maximum number of distinct haplotypes formed by combinations of a user-specified number of single nucleotide polymorphisms (SNPs) on single or paired reads. The default setting is to consider SNPs on read pairs, as haplotypes formed using too many SNPs on only single reads, may lead to MOI artefacts due to sequencing errors. The local minimum haplotype frequency (default value 3) and total count (coverage, default 10) can be set to reduce the number of spurious estimates due to sequencing or mapping errors. In addition, a distributional cut-off can be set to exclude extreme values when estimating overall MOI (default 90th percentile). To evaluate estMOI, it was implemented on several collections of P.falciparum with Illumina deep sequencing data. All the corresponding raw sequence data [SRA Study ERP000190, Manske ; Robinson ] consists of paired end reads of minimum length 54 bp and was mapped to the 3D7 P.falciparum reference genome (v3.0) using smalt (www.sanger.ac.uk/resources/software/smalt/). Accuracy of MOI estimates using whole-genome sequence data may be affected by low read coverage, and all isolates had at least 30-fold coverage. The alignments were processed as described previously (Robinson ) to construct VCF (v4.1) files consisting of SNPs and indels (with quality scores of 30 or more). An exclusion file consisting of sub-telomeric and highly variable gene families was also constructed. The software was executed using default values, and the average run time per sample on standard desktop was ∼10 min, a process that is highly parallelizable. The algorithm was tested on a resequencing data of the 3D7 reference genome, where as expected few SNPs were found, and an MOI of 1 was confirmed. To assess its performance on other clonal samples, we used sequence data from four isolates that have been under long-term culture [DD2, GB4, HB3 and 7G8, (Sepulveda )]. The four clonal strains all had MOI of one. To evaluate estMOI in the case of mixed infections, we combined reads from two clonal isolates and confirmed a MOI of 2. As estMOI accuracy in a clinical sample setting may be affected by SNP density, we considered samples with at least 25K differences from the reference genome. We applied estMOI to sequence data for 54 clinical isolates from west Africa, where multiplicity information using MSP1 and MSP2 genotyping was available (Amambua-Ngwa ). The estimates for the presence of multiplicity from our method were in 65% concordant with MSP results. The 35% discordance arises when our method estimated the presence of multiple genotypes, whereas the MSP typing reported single infections. This difference may be explained by the high error rate of MSP-based genotyping and its potentially low detectability (Ross et al., 2011). To infer any relationship between the estimated MOI and transmission, we considered sequence data from Burkina Faso (n = 25, medium transmission), Cambodia (n = 25, low and cultured), Malawi (n = 25, medium to high), Mali (n = 25, medium), Thailand (n = 25, low and cultured) and Papua New Guinea (n = 11) (Preston ; M.Preston, submitted for publication). The MOI estimate for the clinical isolates varied according to transmission intensity and geographical origin of samples. Isolates from southeast Asia, where transmission is lower had the least proportion of multiple infections (Cambodia 4%; Thailand 7%, Papua New Guinea 16%). Conversely, isolates from Africa had a higher proportion of multiple infections (Mali 44%; Malawi 48%, Burkina Faso 54%). A previous study in Malawi estimated MOI from a single locus circumsporozoite protein gene using deep sequencing of T-cell epitope haplotypes (mean MOI 2.31) and genotyping of the NANP repeat region (mean MOI 1.29). Our Malawi result (mean MOI 3.47) is higher than the former, supporting the potential utility of our approach and whole-genome estimates.

3 DISCUSSION

The number of whole-genome sequenced parasite samples taken directly from malaria patients is growing rapidly, primarily due to improvements in sequencing technology, throughput and multiplexing. Establishing the MOI of P.falciparum samples using sequence data will assist with understanding the dynamics of infections, pathogenesis, effect of transmission intensity, drug efficacy and parasite genetics. The estMOI software can rapidly determine an estimate of MOI, and we have demonstrated that these results correlate highly with both MSP2 genotyping and transmission intensity. A sufficient density of SNPs is required to provide local estimates, but too great concentration of SNPs may be evidence of sequencing or mapping errors. Data filtering based on mapping qualities and coverage can assist with minimizing over-inflated MOI estimates. The estMOI may be used to identify potentially informative regions with high MOI across multiple samples as suitable candidates for future genotyping. We identified 26 potentially MOI informative genes (see Supplementary Materials). Further use may come from applying estMOI to other organisms, especially those that are highly polymorphic. The application to 26 publicly available Plasmodium vivax isolates (Auburn ) revealed multiple infections in 27% of the samples. In the future, it is expected that developments in single molecule sequencing are likely to increase read length and improve MOI estimates. However, in mixtures of highly related genomes, it may not be possible to accurately estimate the MOI, even when using whole-chromosome sequences (because they segregate independently) (Nkhoma ). Nonetheless, technological developments will increase the read length and accuracy of polymorphic calls, making our approach more robust and sensitive.
  11 in total

Review 1.  Genetic structure and dynamics of Plasmodium falciparum infections in the Kilombero region of Tanzania.

Authors:  H A Babiker; L C Ranford-Cartwright; D Walliker
Journal:  Trans R Soc Trop Med Hyg       Date:  1999-02       Impact factor: 2.184

2.  Age-dependent carriage of multiple Plasmodium falciparum merozoite surface antigen-2 alleles in asymptomatic malaria infections.

Authors:  F Ntoumi; H Contamin; C Rogier; S Bonnefoy; J F Trape; O Mercereau-Puijalon
Journal:  Am J Trop Med Hyg       Date:  1995-01       Impact factor: 2.345

3.  VarB: a variation browsing and analysis tool for variants derived from next-generation sequencing data.

Authors:  Mark D Preston; Magnus Manske; Neil Horner; Samuel Assefa; Susana Campino; Sarah Auburn; Issaka Zongo; Jean-Bosco Ouedraogo; Francois Nosten; Tim Anderson; Taane G Clark
Journal:  Bioinformatics       Date:  2012-09-13       Impact factor: 6.937

4.  Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

Authors:  Timothy Robinson; Susana G Campino; Sarah Auburn; Samuel A Assefa; Spencer D Polley; Magnus Manske; Bronwyn MacInnis; Kirk A Rockett; Gareth L Maslen; Mandy Sanders; Michael A Quail; Peter L Chiodini; Dominic P Kwiatkowski; Taane G Clark; Colin J Sutherland
Journal:  PLoS One       Date:  2011-08-11       Impact factor: 3.240

5.  Close kinship within multiple-genotype malaria parasite infections.

Authors:  Standwell C Nkhoma; Shalini Nair; Ian H Cheeseman; Cherise Rohr-Allegrini; Sittaporn Singlam; François Nosten; Tim J C Anderson
Journal:  Proc Biol Sci       Date:  2012-03-07       Impact factor: 5.349

6.  Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

Authors:  Magnus Manske; Olivo Miotto; Susana Campino; Sarah Auburn; Jacob Almagro-Garcia; Gareth Maslen; Jack O'Brien; Abdoulaye Djimde; Ogobara Doumbo; Issaka Zongo; Jean-Bosco Ouedraogo; Pascal Michon; Ivo Mueller; Peter Siba; Alexis Nzila; Steffen Borrmann; Steven M Kiara; Kevin Marsh; Hongying Jiang; Xin-Zhuan Su; Chanaki Amaratunga; Rick Fairhurst; Duong Socheat; Francois Nosten; Mallika Imwong; Nicholas J White; Mandy Sanders; Elisa Anastasi; Dan Alcock; Eleanor Drury; Samuel Oyola; Michael A Quail; Daniel J Turner; Valentin Ruano-Rubio; Dushyanth Jyothi; Lucas Amenga-Etego; Christina Hubbart; Anna Jeffreys; Kate Rowlands; Colin Sutherland; Cally Roper; Valentina Mangano; David Modiano; John C Tan; Michael T Ferdig; Alfred Amambua-Ngwa; David J Conway; Shannon Takala-Harrison; Christopher V Plowe; Julian C Rayner; Kirk A Rockett; Taane G Clark; Chris I Newbold; Matthew Berriman; Bronwyn MacInnis; Dominic P Kwiatkowski
Journal:  Nature       Date:  2012-07-19       Impact factor: 49.962

7.  Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites.

Authors:  Alfred Amambua-Ngwa; Kevin K A Tetteh; Magnus Manske; Natalia Gomez-Escobar; Lindsay B Stewart; M Elizabeth Deerhake; Ian H Cheeseman; Christopher I Newbold; Anthony A Holder; Ellen Knuepfer; Omar Janha; Muminatou Jallow; Susana Campino; Bronwyn Macinnis; Dominic P Kwiatkowski; David J Conway
Journal:  PLoS Genet       Date:  2012-11-01       Impact factor: 5.917

8.  Estimating the numbers of malaria infections in blood samples using high-resolution genotyping data.

Authors:  Amanda Ross; Cristian Koepfli; Xiaohong Li; Sonja Schoepflin; Peter Siba; Ivo Mueller; Ingrid Felger; Thomas Smith
Journal:  PLoS One       Date:  2012-08-29       Impact factor: 3.240

9.  Comparative population structure of Plasmodium falciparum circumsporozoite protein NANP repeat lengths in Lilongwe, Malawi.

Authors:  Natalie M Bowman; Seth Congdon; Tisungane Mvalo; Jaymin C Patel; Veronica Escamilla; Michael Emch; Francis Martinson; Irving Hoffman; Steven R Meshnick; Jonathan J Juliano
Journal:  Sci Rep       Date:  2013       Impact factor: 4.379

10.  A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

Authors:  Nuno Sepúlveda; Susana G Campino; Samuel A Assefa; Colin J Sutherland; Arnab Pain; Taane G Clark
Journal:  BMC Genomics       Date:  2013-02-26       Impact factor: 3.969

View more
  39 in total

Review 1.  Population Genetics and Molecular Epidemiology of Eukaryotes.

Authors:  Ronald E Blanton
Journal:  Microbiol Spectr       Date:  2018-11

2.  COIL: a methodology for evaluating malarial complexity of infection using likelihood from single nucleotide polymorphism data.

Authors:  Kevin Galinsky; Clarissa Valim; Arielle Salmier; Benoit de Thoisy; Lise Musset; Eric Legrand; Aubrey Faust; Mary Lynn Baniecki; Daouda Ndiaye; Rachel F Daniels; Daniel L Hartl; Pardis C Sabeti; Dyann F Wirth; Sarah K Volkman; Daniel E Neafsey
Journal:  Malar J       Date:  2015-01-19       Impact factor: 2.979

3.  Malaria ecology along the Thailand-Myanmar border.

Authors:  Daniel M Parker; Verena I Carrara; Sasithon Pukrittayakamee; Rose McGready; François H Nosten
Journal:  Malar J       Date:  2015-10-05       Impact factor: 2.979

4.  Deep sequencing of the Trypanosoma cruzi GP63 surface proteases reveals diversity and diversifying selection among chronic and congenital Chagas disease patients.

Authors:  Martin S Llewellyn; Louisa A Messenger; Alejandro O Luquetti; Lineth Garcia; Faustino Torrico; Suelene B N Tavares; Bachar Cheaib; Nicolas Derome; Marc Delepine; Céline Baulard; Jean-Francois Deleuze; Sascha Sauer; Michael A Miles
Journal:  PLoS Negl Trop Dis       Date:  2015-04-07

5.  Identification of rifampin-regulated functional modules and related microRNAs in human hepatocytes based on the protein interaction network.

Authors:  Jin Li; Ying Wang; Lei Wang; Xuefeng Dai; Wang Cong; Weixing Feng; Chengzhen Xu; Yulin Deng; Yue Wang; Todd C Skaar; Hong Liang; Yunlong Liu
Journal:  BMC Genomics       Date:  2016-08-22       Impact factor: 3.969

6.  Distinctive genetic structure and selection patterns in Plasmodium vivax from South Asia and East Africa.

Authors:  Ernest Diez Benavente; Emilia Manko; Jody Phelan; Monica Campos; Debbie Nolder; Diana Fernandez; Gabriel Velez-Tobon; Alberto Tobón Castaño; Jamille G Dombrowski; Claudio R F Marinho; Anna Caroline C Aguiar; Dhelio Batista Pereira; Kanlaya Sriprawat; Francois Nosten; Robert Moon; Colin J Sutherland; Susana Campino; Taane G Clark
Journal:  Nat Commun       Date:  2021-05-26       Impact factor: 14.919

7.  Population Genetics, Evolutionary Genomics, and Genome-Wide Studies of Malaria: A View Across the International Centers of Excellence for Malaria Research.

Authors:  Jane M Carlton; Sarah K Volkman; Swapna Uplekar; Daniel N Hupalo; João Marcelo Pereira Alves; Liwang Cui; Martin Donnelly; David S Roos; Omar S Harb; Monica Acosta; Andrew Read; Paulo E M Ribolla; Om P Singh; Neena Valecha; Samuel C Wassmer; Marcelo Ferreira; Ananias A Escalante
Journal:  Am J Trop Med Hyg       Date:  2015-08-10       Impact factor: 2.345

8.  Whole-genome scans provide evidence of adaptive evolution in Malawian Plasmodium falciparum isolates.

Authors:  Harold Ocholla; Mark D Preston; Mwapatsa Mipando; Anja T R Jensen; Susana Campino; Bronwyn MacInnis; Daniel Alcock; Anja Terlouw; Issaka Zongo; Jean-Bosco Oudraogo; Abdoulaye A Djimde; Samuel Assefa; Ogobara K Doumbo; Steffen Borrmann; Alexis Nzila; Kevin Marsh; Rick M Fairhurst; Francois Nosten; Tim J C Anderson; Dominic P Kwiatkowski; Alister Craig; Taane G Clark; Jacqui Montgomery
Journal:  J Infect Dis       Date:  2014-06-19       Impact factor: 5.226

9.  Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria.

Authors:  Sesh A Sundararaman; Lindsey J Plenderleith; Weimin Liu; Dorothy E Loy; Gerald H Learn; Yingying Li; Katharina S Shaw; Ahidjo Ayouba; Martine Peeters; Sheri Speede; George M Shaw; Frederic D Bushman; Dustin Brisson; Julian C Rayner; Paul M Sharp; Beatrice H Hahn
Journal:  Nat Commun       Date:  2016-03-22       Impact factor: 14.919

10.  Assessment of submicroscopic infections and gametocyte carriage of Plasmodium falciparum during peak malaria transmission season in a community-based cross-sectional survey in western Kenya, 2012.

Authors:  Zhiyong Zhou; Rebecca M Mitchell; Simon Kariuki; Christopher Odero; Peter Otieno; Kephas Otieno; Philip Onyona; Vincent Were; Ryan E Wiegand; John E Gimnig; Edward D Walker; Meghna Desai; Ya Ping Shi
Journal:  Malar J       Date:  2016-08-19       Impact factor: 2.979

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.