Literature DB >> 25371432

Estimation of the spontaneous mutation rate in Heliconius melpomene.

Peter D Keightley1, Ana Pinharanda2, Rob W Ness3, Fraser Simpson4, Kanchon K Dasmahapatra5, James Mallet6, John W Davey2, Chris D Jiggins2.   

Abstract

We estimated the spontaneous mutation rate in Heliconius melpomene by genome sequencing of a pair of parents and 30 of their offspring, based on the ratio of number of de novo heterozygotes to the number of callable site-individuals. We detected nine new mutations, each one affecting a single site in a single offspring. This yields an estimated mutation rate of 2.9 × 10(-9) (95% confidence interval, 1.3 × 10(-9)-5.5 × 10(-9)), which is similar to recent estimates in Drosophila melanogaster, the only other insect species in which the mutation rate has been directly estimated. We infer that recent effective population size of H. melpomene is about 2 million, a substantially lower value than its census size, suggesting a role for natural selection reducing diversity. We estimate that H. melpomene diverged from its Müllerian comimic H. erato about 6 Ma, a somewhat later date than estimates based on a local molecular clock.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Heliconius; genome sequencing; mutation

Mesh:

Year:  2014        PMID: 25371432      PMCID: PMC4271535          DOI: 10.1093/molbev/msu302

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Understanding the process of spontaneous mutation is central for many of the most important questions in evolutionary genetics. The neutral nucleotide diversity expected within a species (θ) is proportional to the product of the spontaneous mutation rate per nucleotide site (μ) and the effective population size (Ne). Variation in the mutation rate is therefore expected to contribute to the large range of variation in neutral nucleotide diversity that has been observed in natural populations (Leffler et al. 2012). Conversely, if nucleotide diversity for putatively neutral sites of a population has been estimated, and the mutation rate is known, it is possible to estimate Ne. Effective population size is an important factor determining the effectiveness of natural selection, and selection effects on diversity at linked sites could limit diversity in the genome (Charlesworth B and Charlesworth D 2010). Species split times can be estimated using between-species neutral nucleotide divergence, which is also expected to be proportional to the mutation rate. This can be useful if fossil evidence-based dates of species divergence are not available. Estimates of the mutation rate for a range of species across the tree of life are therefore needed in order to better understand patterns of diversity in relation to Ne and the influence of natural selection on variation. However, at present, only a handful of direct mutation rate estimates are available, for a small number of model species. Mutation rate estimation has until recently depended on assaying rates of mutation at specific loci or on the between-species nucleotide divergence at putatively neutral sites, such as synonymous sites (Drake et al. 1998). There are, however, drawbacks to these approaches, including uncertainty about species divergence dates and nonneutral synonymous site evolution (Chamary et al. 2006). This has led to efforts to directly estimate the mutation rate by sequencing mutation accumulation (MA) lines or outbred parents and their offspring. The MA line approach suffers from potential difficulties, however. For example, recessive mutator alleles might become fixed by inbreeding in the MA line progenitor. Furthermore, its practical applicability is limited, because inbred lines cannot be produced for most species. Sequencing parents and their offspring and searching for de novo mutations in the offspring are more generally applicable, but to date experiments have only been carried out in humans (Roach et al. 2010; Conrad et al. 2011; Kong et al. 2012; Michaelson et al. 2012) and Drosophila melanogaster (Keightley et al. 2014). Both humans and D. melanogaster have “finished” genome sequences, but the genomes of most sequenced species are incomplete or “draft” and it is unknown whether parent–offspring sequencing can be applied in such cases. Widely used sequencing technologies produce short reads and spurious base calls arise due to the mismapping of paralogs. Here, we apply parent–offspring genome sequencing to a tropical butterfly species, Heliconius melpomene, whose genome sequence is currently draft (Heliconius Genome Consortium 2012). Heliconius melpomene has become a focal organism for genome-based studies of speciation and hybridization (Martin et al. 2013), and an accurate estimate of the mutation rate will have several immediate applications. We use deep sequencing of parents and 30 offspring to produce an estimate of the mutation rate that is close to one recently obtained by similar means in D. melanogaster (Keightley et al. 2014).

Results

Among the 30 offspring, we sequenced 13 focal offspring at a high depth (mean = 26.3, SD = 7.1; supplementary table S1, Supplementary Material online) and 17 “bait offspring” at a lower depth (mean = 12.7, SD = 3.9). Bait offspring were used to remove regions prone to alignment errors that generate false positives by excluding sites at which any of these individuals had an alternate base call. Mutations were not called in the bait offspring nor did they contribute to the number of site-individuals. In 4,309 scaffolds of the draft genome assembly, there are 2.70 × 108 sites, of which 1.23 × 108 (46%) were estimated to be callable, yielding 1.60 × 109 site-individuals. We used the Genome Analysis Toolkit (GATK; Depristo et al. 2011) to call mutations. We assume that reads having the alternate allele at a site present in multiple individuals are due to mismapping. This arises when a paralogous locus present in the sample, but not in the reference genome, contributes reads that map to the wrong place (Li 2011). We assume that such mismapping is equally likely to occur at mutated and unmutated sites. We applied the mutation calling rules described in Materials and Methods to the GATK genotype calls, yielding 15 candidate mutations appearing as de novo heterozygotes in up to two focal offspring (supplementary table S2, Supplementary Material online). We first examined each candidate using the Integrative Genomics Viewer (Thorvaldsdóttir et al. 2012) to determine whether there are single nucleotide polymorphisms (SNPs) in complete association with the alternate base calls at the candidate sites, a characteristic of mismapped paralogs (Li 2011; Keightley et al. 2014). Two closely linked candidates 17 bp apart on contig HE668478 (individual 110; supplementary fig. S1, Supplementary Material online) and candidates HE671028 and HE669561 (individual 110; supplementary figs. S2 and S3, Supplementary Material online) met this criterion. Furthermore, in each case reads containing the alternate allele are truncated and have unmapped mate pairs. We judged these four candidates as likely false positives caused by mismapping. We then attempted to verify the 11 remaining candidates by Sanger sequencing. Ten gave clearly interpretable chromatograms, confirming that eight are genuine mutations, and that candidates HE671439 and HE672001 are false positives (supplementary table S1, Supplementary Material online). Further attempts at sequencing the remaining candidate (HE671010) were unsuccessful, but mutant-bearing reads are well aligned and have aligned mate pairs (supplementary fig. S4, Supplementary Material online), suggesting that it is genuine. Thus, there are nine apparently genuine de novo mutations, eight confirmed by Sanger sequencing (table 1), each one affecting a single site and present in a single focal individual.
Table 1.

Mutations Called and Depth of Sequencing Coverage Statistics for the Wild Type (WT) and Mutant (Mut) Bases in the Mutant Focal Offspring Along with Average Read Depth in the Parents and Focal Offspring.

ContigPositionIndividualBase Call
Depth
Mean Depth
WTMutWTMutParentsOffspring
HE67127080778118AT51533.526.8
HE6703347159033TC13113328.7
HE67011816036103GA11141725.0
HE6708551085837AG251130.532.7
HE6688341893301GA26235835.6
HE6713841878684TA291740.530.8
HE672075836004118GA4121715.6
HE679870446333GC24135334.2
HE67101095914TA212124.522.0
Mutations Called and Depth of Sequencing Coverage Statistics for the Wild Type (WT) and Mutant (Mut) Bases in the Mutant Focal Offspring Along with Average Read Depth in the Parents and Focal Offspring. Among the nine de novo mutations, the number of transitions exceeds the number of transversions, as is usually observed in eukaryotes. The number of mutations divided by twice the number of callable site-individuals yields an estimated mutation rate (uncorrected for false negatives) of 2.8 × 10−9 (95% confidence interval = 1.3 × 10−9–5.3 × 10−9, assuming that the number of mutations is Poisson distributed). To estimate the frequency of false negatives, we simulated synthetic mutations by modifying sequencing reads for randomly selected sites in the focal offspring. We realigned and analyzed the modified data using the same procedures as for the real data. Among 1,000 synthetic mutations, 456 occurred at callable sites where all other focal offspring, both parents and all bait offspring were pure. Of the callable mutations, 436 (96%) were called. The small proportion of uncalled mutations presumably reflects mutant-bearing reads mapping less frequently than reference reads (fig. 1). A corrected estimate for the mutation rate is therefore 2.9 × 10−9 (95% confidence interval = 1.3 × 10−9–5.5 × 10−9).
F

Examples of observed frequency distributions (red or gray) and binomial distributions with parameter 0.5 (blue or black) of alternate (i.e., nonreference) base number at heterozygous sites in the focal offspring having (A) 20 reads and (B) 40 reads.

Examples of observed frequency distributions (red or gray) and binomial distributions with parameter 0.5 (blue or black) of alternate (i.e., nonreference) base number at heterozygous sites in the focal offspring having (A) 20 reads and (B) 40 reads.

Discussion

We estimated the mutation rate per base pair by genome sequencing of parents and offspring in H. melpomene. The incomplete state of the genome causes difficulties in identification of de novo mutations, because paralogous reads map more frequently to the wrong location, often yielding false heterozygote calls. Disregarding impure sites affecting any bait offspring and more than two focal offspring effectively addressed this problem. It has been estimated that approximately 20% of spontaneous recessive sex-linked lethal mutations in D. melanogaster males occur as premeiotic clusters (Woodruff and Thompson 1992). In the present experiment, we detected no mutation clusters affecting two focal offspring, but in view of the small number of mutation events detected, the sequences of many more individuals will be needed to accurately estimate the rate of premeiotic cluster mutations in H. melpomene. The draft state of the genome precluded the detection of large-scale de novo variants, such as rearrangements and duplications, which are particularly sensitive to mismapping. Autosomal nucleotide diversity (π) at 4-fold degenerate sites in H. melpomene is approximately 2.4% (Martin SH, unpublished data). Assuming neutral synonymous sites evolution, and equating π to 4Neμ, Ne for the species is therefore approximately 2 million. This will be an underestimate if selection reduces diversity at 4-fold sites. However, estimates of Ne for D. melanogaster based on this approach are of similar magnitude (Keightley et al. 2014), but they are orders of magnitude smaller than both species’ census population sizes. Similar diversities and effective population sizes are consistent with the small range of genetic diversity across eukaryotes (Leffler et al. 2012), suggesting a role for processes such as genetic draft limiting diversity (Maynard-Smith and Haigh 1974; Gillespie 2001; Leffler et al. 2012). Estimates of μ can also be used to date species divergences, assuming that neutral nucleotide divergence d = 2μt, where t is the divergence time in generations. For example, synonymous divergence between H. melpomene and its Müllerian comimic H. erato corrected for diversity within H. melpomene is 14% (Martin SH, unpublished data), yielding t = 23 million generations, which will be an underestimate if selection reduces synonymous site divergence. Assuming four generations per year, the divergence date is approximately 6 Ma, which is somewhat more recent than that estimated from a fossil-calibrated phylogeny of 10–13 Ma (Kozak et al. 2014). Although our data suggest that current estimates of the age of the Heliconius radiation are approximately correct, further work will be required to reconcile these estimates. It remains to be seen whether the hypothesis that the early radiation of Heliconius coincided with a time of rapid uplift in the Andes about 10 Ma is supported. This is only the second direct estimate of the mutation rate per base pair in insects, and the first in Lepidoptera. There have been several direct estimates in D. melanogaster, by Denaturing High Performance Liquid Chromatography (Haag-Liautard et al. 2007), by whole-genome sequencing of MA lines (Keightley et al. 2009; Schrider et al. 2013), and most recently by parent–offspring sequencing (Keightley et al. 2014). There is significant variation among these estimates, but most are close to 3 × 10−9, which is remarkably close to our estimate of 2.9 × 10−9 for Heliconius. We have demonstrated that it is possible to estimate the mutation rate by offspring–parent genome sequencing for the case of a draft genome sequence. It should soon be possible to address the question of whether this lack of variation in the mutation rate extends to other arthropod groups whose draft genome sequences are now becoming available.

Materials and Methods

Cross Sequencing

The cross was previously used to produce chromosomal scaffolds of the H. melpomene genome (Heliconius Genome Consortium 2012, supplementary material section S4). After four generations of inbreeding, a male H. melpomene melpomene from the same lineage as for the H. melpomene reference genome was crossed with a female H. melpomene rosina from a laboratory strain established from Gamboa, Panama. DNA from two F1 parents and 30 of their F2 offspring was extracted using the DNeasy Blood and Tissue Kit (Qiagen). Illumina TruSeq libraries (300 bp insert size) were sequenced using 100-bp paired-end reads on an Illumina HiSeq2500.

Alignment to Reference Genome

Reads for parents and offspring were aligned to version 1.1 of the H. melpomene genome (available on Ensembl and from http://butterflygenome.org, last accessed November 5, 2014) using SMALT version 0.7.0.1 with default options. Output sequence alignment/map (SAM) files were converted to binary format (BAM) files, sorted and annotated with read groups using Picard version 1.84 (http://picard.sourceforge.net/, last accessed November 5, 2014).

Genotype Assignment

Each individual’s BAM file was processed to remove duplicates using Picard tools, then to realign indels using GATK. SNPs were called using the GATK UnifiedGenotyper across all individuals simultaneously to produce a variant call format (VCF) file, assuming a heterozygosity parameter of 0.01. For high read depth, genotype calls are insensitive to this parameter (Depristo et al. 2011; Ness et al. 2012).

Mutation Calling

We processed the VCF by a similar algorithm as described previously (Keightley et al. 2014) filtering sites as follows: Not marked as low quality (GATK LowQual). Read depth of both parents ≥10, both homozygous references, containing no alternate allele reads. None of the 17 bait offspring contains alternate allele reads. The genotypes of all 13 focal offspring are defined (i.e., are called by GATK). At most two focal offspring are called as heterozygous by GATK. No other focal offspring contains alternate allele reads. Our method excludes sites containing alternate alleles in either parent, which precludes the identification of mutations at polymorphic sites. Assuming that mutations are not more frequent at polymorphic sites, this should reduce the number of mutations and callable sites proportionally. There was no filtering carried out on read depth of the focal or bait offspring. Heterozygotes called among the focal offspring were marked as candidate mutations.

Synthetic Mutations

We estimated the proportion of false negatives (genuine mutations we failed to call) by simulating mutations in the Heliconius data, running our pipeline, and calculating the fraction of simulated mutations called. Synthetic mutations were simulated by modifying the reads overlapping a random site in a focal offspring. We sampled the number of reads to be altered from empirical distributions of numbers of nonreference base calls at heterozygous sites (see below). For each synthetic mutation, we randomly sampled a genomic position b and a focal offspring. We sampled a random integer y from the frequency distribution of nonreference base number for the individual’s read depth (e.g., see fig. 1). We then changed y reference bases to a different randomly selected base by modifying reads overlapping position b in the individual’s BAM file. We generated 1,000 synthetic mutations in the BAM files of focal individuals, extracted all reads from the modified BAM files, and aligned the modified reads to the reference genome by the procedure used for the original data. We then applied the mutation-calling algorithm, exactly as described above, with the exception that filters were not applied to the focal offspring carrying the synthetic mutation. The fraction of callable simulated mutated sites estimates the fraction of callable sites in the genome. Uncallable sites will include, for example, sites of low mapping quality and sites where genotypes are undefined in one or more focal offspring.

Frequency Distributions of Nonreference Read Number in Heterozygotes

To produce the distributions used to generate synthetic mutations, we identified a set of sites heterozygous for natural polymorphisms, regardless of the genotypes called from the sequencing data, taking advantage of the lack of recombination in Heliconius females. F2 offspring receive whole chromosomes from the F1 mother, so SNPs from the same chromosome have identical segregation patterns in the offspring, and segregation patterns for each of the 21 H. melpomene chromosomes for this cross are known (Heliconius Genome Consortium 2012). We identified SNPs from each chromosome by compiling sites called as heterozygous in the F1 mother, homozygous in the F1 father, and matching one of the chromosome segregation patterns for the bait offspring. Heterozygous focal offspring could then be identified based on segregation pattern, without reference to their sequenced genotype. We used these heterozygous focal offspring to tabulate frequency distributions of numbers of nonreference base calls for read depths 1, … 100 (see fig. 1).

Sanger Sequencing

With the exception of four candidates ruled out by inspection (see Results), we checked all candidates by Sanger sequencing. We sequenced the focal individual and one control individual on both strands. If the initial sequencing failed, an alternative primer pair was tried.

Data Accessibility

Whole-genome sequence data for the parents and the 30 offspring from the mapping cross used for this study are available from the European Nucleotide Archive, study accession PRJEB7581.

Supplementary Material

Supplementary figures S1–S4 and tables S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
  20 in total

1.  Is the population size of a species relevant to its evolution?

Authors:  J H Gillespie
Journal:  Evolution       Date:  2001-11-11       Impact factor: 3.694

2.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

3.  Whole-genome sequencing in autism identifies hot spots for de novo germline mutation.

Authors:  Jacob J Michaelson; Yujian Shi; Madhusudan Gujral; Hancheng Zheng; Dheeraj Malhotra; Xin Jin; Minghan Jian; Guangming Liu; Douglas Greer; Abhishek Bhandari; Wenting Wu; Roser Corominas; Aine Peoples; Amnon Koren; Athurva Gore; Shuli Kang; Guan Ning Lin; Jasper Estabillo; Therese Gadomski; Balvindar Singh; Kun Zhang; Natacha Akshoomoff; Christina Corsello; Steven McCarroll; Lilia M Iakoucheva; Yingrui Li; Jun Wang; Jonathan Sebat
Journal:  Cell       Date:  2012-12-21       Impact factor: 41.582

4.  Variation in genome-wide mutation rates within and between human families.

Authors:  Donald F Conrad; Jonathan E M Keebler; Mark A DePristo; Sarah J Lindsay; Yujun Zhang; Ferran Casals; Youssef Idaghdour; Chris L Hartl; Carlos Torroja; Kiran V Garimella; Martine Zilversmit; Reed Cartwright; Guy A Rouleau; Mark Daly; Eric A Stone; Matthew E Hurles; Philip Awadalla
Journal:  Nat Genet       Date:  2011-06-12       Impact factor: 38.330

5.  Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

Authors:  Jared C Roach; Gustavo Glusman; Arian F A Smit; Chad D Huff; Robert Hubley; Paul T Shannon; Lee Rowen; Krishna P Pant; Nathan Goodman; Michael Bamshad; Jay Shendure; Radoje Drmanac; Lynn B Jorde; Leroy Hood; David J Galas
Journal:  Science       Date:  2010-03-10       Impact factor: 47.728

6.  Estimate of the spontaneous mutation rate in Chlamydomonas reinhardtii.

Authors:  Rob W Ness; Andrew D Morgan; Nick Colegrave; Peter D Keightley
Journal:  Genetics       Date:  2012-10-10       Impact factor: 4.562

7.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

8.  Multilocus species trees show the recent adaptive radiation of the mimetic heliconius butterflies.

Authors:  Krzysztof M Kozak; Niklas Wahlberg; Andrew F E Neild; Kanchon K Dasmahapatra; James Mallet; Chris D Jiggins
Journal:  Syst Biol       Date:  2015-01-28       Impact factor: 15.683

9.  Revisiting an old riddle: what determines genetic diversity levels within species?

Authors:  Ellen M Leffler; Kevin Bullaughey; Daniel R Matute; Wynn K Meyer; Laure Ségurel; Aarti Venkat; Peter Andolfatto; Molly Przeworski
Journal:  PLoS Biol       Date:  2012-09-11       Impact factor: 8.029

10.  Rate of de novo mutations and the importance of father's age to disease risk.

Authors:  Augustine Kong; Michael L Frigge; Gisli Masson; Soren Besenbacher; Patrick Sulem; Gisli Magnusson; Sigurjon A Gudjonsson; Asgeir Sigurdsson; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Wendy S W Wong; Gunnar Sigurdsson; G Bragi Walters; Stacy Steinberg; Hannes Helgason; Gudmar Thorleifsson; Daniel F Gudbjartsson; Agnar Helgason; Olafur Th Magnusson; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nature       Date:  2012-08-23       Impact factor: 49.962

View more
  68 in total

1.  Simultaneous Estimation of Additive and Mutational Genetic Variance in an Outbred Population of Drosophila serrata.

Authors:  Katrina McGuigan; J David Aguirre; Mark W Blows
Journal:  Genetics       Date:  2015-09-16       Impact factor: 4.562

Review 2.  The repeatability of genome-wide mutation rate and spectrum estimates.

Authors:  Megan G Behringer; David W Hall
Journal:  Curr Genet       Date:  2016-02-26       Impact factor: 3.886

Review 3.  Genetic drift, selection and the evolution of the mutation rate.

Authors:  Michael Lynch; Matthew S Ackerman; Jean-Francois Gout; Hongan Long; Way Sung; W Kelley Thomas; Patricia L Foster
Journal:  Nat Rev Genet       Date:  2016-10-14       Impact factor: 53.242

4.  Phylogenetic incongruence and the evolutionary origins of cardenolide-resistant forms of Na(+) ,K(+) -ATPase in Danaus butterflies.

Authors:  Matthew L Aardema; Peter Andolfatto
Journal:  Evolution       Date:  2016-07-27       Impact factor: 3.694

5.  Evolution of Resistance Against CRISPR/Cas9 Gene Drive.

Authors:  Robert L Unckless; Andrew G Clark; Philipp W Messer
Journal:  Genetics       Date:  2016-12-10       Impact factor: 4.562

6.  Whole-chromosome hitchhiking driven by a male-killing endosymbiont.

Authors:  Simon H Martin; Kumar Saurabh Singh; Ian J Gordon; Kennedy Saitoti Omufwoko; Steve Collins; Ian A Warren; Hannah Munby; Oskar Brattström; Walther Traut; Dino J Martins; David A S Smith; Chris D Jiggins; Chris Bass; Richard H Ffrench-Constant
Journal:  PLoS Biol       Date:  2020-02-27       Impact factor: 8.029

7.  Chromosome arm-specific patterns of polymorphism associated with chromosomal inversions in the major African malaria vector, Anopheles funestus.

Authors:  Colince Kamdem; Caroline Fouet; Bradley J White
Journal:  Mol Ecol       Date:  2017-09-15       Impact factor: 6.185

8.  Variable Spontaneous Mutation and Loss of Heterozygosity among Heterozygous Genomes in Yeast.

Authors:  Duong T Nguyen; Baojun Wu; Hongan Long; Nan Zhang; Caitlyn Patterson; Stephen Simpson; Krystalynne Morris; W Kelley Thomas; Michael Lynch; Weilong Hao
Journal:  Mol Biol Evol       Date:  2020-11-01       Impact factor: 16.240

9.  The Spontaneous Mutation Rate in the Fission Yeast Schizosaccharomyces pombe.

Authors:  Ashley Farlow; Hongan Long; Stéphanie Arnoux; Way Sung; Thomas G Doak; Magnus Nordborg; Michael Lynch
Journal:  Genetics       Date:  2015-08-10       Impact factor: 4.562

10.  Genomic evidence for gene flow between monarchs with divergent migratory phenotypes and flight performance.

Authors:  Venkat Talla; Amanda A Pierce; Kandis L Adams; Tom J B de Man; Sumitha Nallu; Francis X Villablanca; Marcus R Kronforst; Jacobus C de Roode
Journal:  Mol Ecol       Date:  2020-07-11       Impact factor: 6.185

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.