| Literature DB >> 27531718 |
Alistair Miles1, Zamin Iqbal2, Paul Vauterin3, Richard Pearson1, Susana Campino4, Michel Theron4, Kelda Gould4, Daniel Mead4, Eleanor Drury4, John O'Brien5, Valentin Ruano Rubio6, Bronwyn MacInnis6, Jonathan Mwangi7, Upeka Samarakoon8, Lisa Ranford-Cartwright9, Michael Ferdig8, Karen Hayton10, Xin-Zhuan Su10, Thomas Wellems10, Julian Rayner4, Gil McVean11, Dominic Kwiatkowski1.
Abstract
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.Entities:
Mesh:
Year: 2016 PMID: 27531718 PMCID: PMC5052046 DOI: 10.1101/gr.203711.115
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Summary of sequence and variation data generated in this study for the three crosses 3D7 × HB3, HB3 × Dd2, and 7G8 × GB4
Figure 1.Properties of indels. (A) Indel size distribution (size > 0 are insertions, size < 0 are deletions). Solid black bars represent the frequency of indels that are expansions or contractions of short tandem repeats (STR); solid white bars represent the frequency of non-STR indels. Most coding indels are size multiples of 3, preserving the reading frame. Most noncoding indels are size multiples of 2, reflecting the abundance of poly(AT) repeats in noncoding regions. (B) Amino acids inserted and deleted (relative to the 3D7 reference genome). (C) Indel diversity in intergenic regions relative to the position of core promoters predicted by Brick et al. (2008). Each point represents the mean indel diversity in a 50-bp window at a given distance from the center of a core promoter. Vertical bars represent the 95% confidence interval from 1000 bootstraps. The dashed line is at the mean intergenic diversity for the given indel class (STR/non-STR).
Figure 2.Variation in nucleotide diversity over the core genome. Nucleotide diversity is shown for each cross in 500-bp half-overlapping windows across the core genome (which excludes hypervariable regions containing var, rif, or stevor genes) using SNPs combined from both variant calling methods and passing all quality filters. The peak of nucleotide diversity on Chromosome 10 is expanded to show four distinct peaks due to genes encoding merozoite surface antigens MSP3, MSP6, DBLMSP, and DBLMSP2. All labeled loci (with the exception of AMA1) are sites of complex variation where assembly of sequence reads is required to determine the nonreference alleles.
Figure 3.Crossover (CO) and non-crossover (NCO) recombination parameters. (A) Genetic map length by cross. For each cross, the red line shows the median map length averaged over progeny; boxes extend from lower to upper quartiles. (B) Map length by chromosome. Each point shows the mean map length for a single chromosome averaged over progeny, with an error bar showing the 95% confidence interval from 1000 bootstraps. The line shows a fitted linear regression model with shading showing the 95% bootstrap confidence interval. (C) CO recombination rate relative to centromere position as given by the genome annotation. Error bars show the 95% confidence interval from 1000 bootstraps. (D) NCO tract length distribution. The dashed line shows the distribution of minimal tract lengths that would be observed with the available markers if NCO tract lengths follow a geometric distribution with parameter φ= 0.9993. (E) Quantile-quantile plot of actual NCO minimal tract lengths versus the expected distribution of minimal tract lengths that would be observed with the given markers if NCO tract length is modeled as a geometric distribution with parameter φ = 0.9993. The data fit the model well except for an excess of tracts with minimal length greater than ∼3 kb. (F) NCO frequency by chromosome, adjusted for incomplete discovery of NCO events. Error bars and linear regression as in B.
Figure 4.Copy number variation and recombination spanning the anti-folate resistance gene gch1 on Chromosome 12. (A) CNVs in the 3D7 and HB3(1) parental clones; α labels the segment amplified in HB3, β labels the segment amplified in 3D7. (B) CNV and recombination in clone C06, progeny of 3D7 × HB3. AB = fraction of aligned reads containing the first parent's allele. (C) CNV and recombination in clone C05, progeny of 3D7 × HB3. AB = fraction of aligned reads containing the first parent's allele. (D) CNVs in the HB3(2) and Dd2 parental clones; γ labels the segment amplified in Dd2. Note that the HB3(2) clone sequenced here appears to be a mixture, with a minor proportion of parasites carrying the amplification visible in HB3(1). (E) CNV and recombination in clone CH3_61, progeny of HB3 × Dd2. AB = fraction of aligned reads containing the first parent's allele. (F) CNVs in the 7G8 and GB4 parental clones. CN = copy number; markers show normalized read counts within 300-bp nonoverlapping windows, excluding windows where GC content was below 20%; solid black line is the copy number predicted by fitting a Gaussian hidden Markov model to the coverage data (Supplemental Information). DP = depth of coverage (number of aligned reads), FA = reads aligned facing away from each other (expected at boundaries of a tandem array), SS = reads aligned in the same orientation (expected at boundaries of a tandem inversion).