Literature DB >> 19192948

Mutation patterns in the human genome: more variable than expected.

Abstract

Entities: Chemical Disease Gene Species

Year: 2009 PMID： 19192948 PMCID： PMC2634789 DOI： 10.1371/journal.pbio.1000028

Source DB: PubMed Journal: PLoS Biol ISSN： 1544-9173 Impact factor: 8.029

× No keyword cloud information.

The development, survival, and reproduction of an organism depend on the genetic information that is carried in its genome, yet the transmission of genetic information is not perfectly accurate: new mutations occur at each generation. These mutations are the primary cause of the genetic diversity on which natural selection can operate, and hence are the sine qua non of evolution. A better knowledge of mutation processes is crucial for investigating the causes of genetic diseases or cancer and for understanding evolutionary processes. This knowledge is also important for different practical reasons. First, comparative sequence analysis is widely used to find functional elements within genomes. The basic principle of this approach is that functional elements are affected by natural selection, and hence can be recognized because they evolve either slower or faster than expected given the local mutation rate. Hence, to be able to annotate genomic sequences, it is necessary to have a good knowledge of the underlying pattern of mutation. Moreover, this knowledge is also essential for ensuring the accuracy of the methods that analyze sequence divergence to determine the phylogeny of species or the demographical history of populations. Finally, the study of mutational processes also provides valuable information about genome function in processes such as replication, repair, transcription, and recombination. During the last few years, several important factors affecting mutation rates have been uncovered. However, a paper in this issue of PLoS Biology [1] reveals an unexpected additional layer of complexity in the determinants of mutation rates. A priori, nucleotide mutation rates are expected to depend upon three factors [2]: (i) the intrinsic stability of nucleotides and their sensitivity to mutagenic agents; (ii) the fidelity of DNA replication; and (iii) the efficiency of the DNA repair machinery. The analysis of variations in mutation rate across genomes can shed light on the relative contribution of these different factors and on the genomic features that affect mutation rates. In mammals, current knowledge of mutation processes derives essentially from the analysis of a limited number of germ-line mutations responsible for human genetic diseases [3] and from phylogenetic studies. This latter approach consists of comparing homologous sequences (between species or within populations) to estimate the number and kinds of changes that occurred since their divergence. At neutral sites—i.e., sites where the impact of natural selection is presumed to be null or very limited (pseudogenes, defective transposable elements, noncoding sequences, synonymous codon positions)—substitution rate is expected to be equal to the mutation rate [4]. This approach suffers from several limitations (see below), but thanks to the accumulation of genome sequences and polymorphism data, it has provided indirect estimates of genome-wide mutation patterns.

Large-Scale Variations in the Rate and Patterns of Neutral Sequence Evolution

Phylogenetic analyses show that in mammals, neutral rates of sequence evolution (measured in number of base changes per site and per year) vary at different scales. First, substitution rates vary between species. Notably, species with short generation time generally evolve faster, presumably because they experience more rounds of germ-cell divisions (and hence more DNA replication errors) during a given unit of time [5]. If most mutations are due to DNA replication errors, then mutation rates are expected to be higher in males than in females, owing to the greater number of cell divisions per generation in the male germ-line. In agreement with that prediction, in apes, substitution rate is two times higher on the Y chromosome than on the X, whereas autosomes show intermediate values (the three classes of chromosomes spend on average—over generations—respectively 100%, 33%, and 50% of their time in the male germ-line) [6,7]. The strength of this male mutation bias in different mammalian species appears to be correlated to the number of male germ-cell divisions [8]. Within autosomes, there are substantial variations in neutral substitution rates at the megabase scale [7,9-11], but it is not clear to what extent these variations reflect mutational processes or are only the consequence of biased gene conversion (i.e., the biased repair of mismatches occurring in heteroduplex DNA during meiotic recombination), a neutral process that affects the probability of fixation of mutations [12]. Patterns of neutral substitution vary also at the gene scale. Notably, mammalian genomes show an excess of A→G transitions over T→C transitions, specifically in transcribed regions [13]. This may be a consequence of transcription-coupled repair [13]. Finally, it is well known that some short sequence motifs (such as minisatellite or microsatellite repeats, typically less than 100 bp long) are prone to DNA replication errors [14].

Fine-Scale Variations in Substitution Rates: Neighbor-Dependant Mutational Processes

In mammals, the most dramatic variation in mutation rate is observed at the dinucleotide scale: a cytosine followed by a guanine is about 10 times more mutable than a cytosine in any other dinucleotide context [15,16]. Mutations of cytosines in CG dinucleotide (conventionally noted CpG, “p” standing for the phosphate between the two bases) are responsible for one third of disease-causing germ-line mutations in humans [17]. CpGs are the target of cytosine methylases in mammals, and their hypermutability is the consequence of the spontaneous deamination of methylated cytosines into thymines [18]. Compared to other substitutions, CpG substitution rates show weaker male mutation bias, which is consistent with the fact that the majority of mutations at CpG sites are not due to DNA replication errors [7]. The rate of substitution at CpG sites is strongly negatively correlated to the regional GC-content [12,19,20]. The influence of GC-content on CpG substitution appears to be very local (less than 2 kb) [20]. This observation is linked to the fact that cytosine deamination occurs essentially in single-stranded DNA [21]. Hence, the rate of CpG mutation is expected to depend on the rate of DNA melting, which is affected by the local base composition—GC-rich DNA fragment being more stable [21]. Although less dramatic, 2- to 3-fold variations in substitution rates are observed in other dinucleotide contexts [16,22]. These variations are poorly understood, but are probably the consequence of context-dependant DNA-replication errors [22]. Finally, substitution rates vary also at the base pair scale: in primates, substitution rates at G:C base pairs (excluding CpG sites) are 25% to 85% higher than at A:T base pairs [11,12], possibly because cytosine is intrinsically more mutable than other bases [23].

Mutagenic Effects of Heterozygosity?

A recent study suggested that the probability of mutation at a given site might be affected by the presence of polymorphic sites in its vicinity [24]. By comparing pairs of closely related species in different eukaryotic taxa, the authors showed that the occurrence of an insertion or deletion (indel) in a given species is associated with an excess of single-nucleotide changes in the flanking region (less than 150 bp) in the same species [24]. Selection is unlikely to explain this clustering of changes in a given lineage because selection is a priori expected to affect both species equally [24]. The authors proposed that the heterozygosity for an indel might promote mutations in surrounding sequences, possibly because the repair of indel mismatches in heteroduplex DNA during meiotic recombination might be error-prone [24]. Along the same lines, it has been proposed that the repair of hypermutable CpG sites might be the cause of the high substitution rate observed in flanking non-CpG sites [25]. The hypothesis that sequence polymorphism is mutagenic remains to be demonstrated, but if confirmed, it raises the intriguing possibility that the rate of mutation in sexual species might also be affected by population parameters, such as effective population size and migration.

Cryptic Mutational Hotspots

Now, research by Hodgkinson and colleagues published in this issue of PLoS Biology [1] reveals a new level of variation in mutation rate, which is not associated with any obvious sequence feature. The authors investigated the pattern of single nucleotide polymorphism (SNP) in human populations, at sites that are known to be polymorphic in chimpanzee. If some sites are more prone to mutations than others, then one expects to find an excess of sites that are polymorphic in both species (coincident SNPs). And indeed, the observed number of coincident SNPs is three times higher than the number expected under the null hypothesis that SNPs are randomly distributed in the two genomes. Even after accounting for the effects of the hypermutability of CpGs and other neighbor-dependant mutational processes, the authors still find a 1.76-fold excess of coincident SNPs. Interestingly, this excess is essentially due to the same SNP (i.e., the same pair of alleles) being present in both species. Such SNPs could potentially correspond to ancestral polymorphism, present in the last common ancestor of human and chimpanzee, and preserved in both lineages. However, comparison with macaque revealed the same excess of coincident SNPs, whereas very few polymorphisms are expected to be shared between human and macaque given their divergence time. Moreover, the hypothesis of ancestral polymorphism predicts that all categories of SNPs should show the same frequency of coincident SNPs, whereas they observed a particularly striking excess specifically for A/T coincident SNPs. Might the excess of coincident SNPs be the consequence of selection? Positive selection leads to the rapid fixation of advantageous mutations, and hence is not expected to lead to an excess of coincident SNPs. Negative selection reduces polymorphism at functional sites and hence might lead to a clustering of SNPs in nonfunctional regions. However, the data show no tendency for SNPs to cluster (which is not surprising given that 98% of the analyzed data set consists of intergenic or intronic regions—where only a very small fraction of sites are expected to be under selective pressure). If the excess of coincident SNPs is due neither to selection nor to ancestral polymorphism, then it must reflect an excess of convergent mutations, occurring independently at the same sites in both species. This phenomenon is quantitatively important: indeed, the level of variation in mutation rates that is necessary to account for the observed number of coincident SNPs is similar, or even higher, than the level of variation that is due to CpG hypermutability [1]. To investigate the sequence features that might be responsible for these mutational hotspots, the authors analyzed the sequence composition in regions flanking coincident SNPs. Although they noticed that the frequency of particular triplet oligonucleotides was significantly different in the 100 bp surrounding coincident SNPs compared to other SNPs, they were unable to identify any clear sequence motif that could explain a substantial fraction of these hotspots [1].

Direct Evaluation of Mutation Patterns in Mammals

The discovery of cryptic mutational hotspots in the human genome [1] illustrates how limited our knowledge of the determinants of mutation rates remains. Thanks to large-scale sequencing projects, genome-wide substitution patterns can be measured in different mammalian taxa [5]. However, there is now clear evidence that in mammals, biased gene conversion has a strong impact on genome-wide substitution patterns [12]. Hence, even at neutral sites, substitution rates do not provide a perfect estimator of mutation rates. As mentioned previously, a precise knowledge of genome-wide mutation patterns is crucial for many issues in genetics or evolutionary biology. Notably, to be able to detect functional elements within genomes, it is essential to tease apart the relative contribution of the three determinants of sequence evolution: mutation, biased gene conversion, and selection. This will require a direct quantification of mutation patterns. Thanks to the new technologies, it is now feasible to directly measure mutation rates by sequencing. This approach has first been used in species with relatively small genomes compared to mammals (yeast, drosophila, nematode) [26-28]. Recently, direct whole-genome sequencing was used to identify somatic mutations in a human tumor [29]. Hopefully, it will soon be possible to directly measure germ-line mutation rates in humans by sequencing the genomes of a mother, a father, and their child.

29 in total

1. Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes.

Authors: Dacheng Tian; Qiang Wang; Pengfei Zhang; Hitoshi Araki; Sihai Yang; Martin Kreitman; Thomas Nagylaki; Richard Hudson; Joy Bergelson; Jian-Qun Chen
Journal: Nature Date: 2008-07-20 Impact factor: 49.962

2. CpG dinucleotides and the mutation rate of non-CpG DNA.

Authors: Jean-Claude Walser; Loïc Ponger; Anthony V Furano
Journal: Genome Res Date: 2008-06-11 Impact factor: 9.043

3. Initial sequence of the chimpanzee genome and comparison with the human genome.

Authors:
Journal: Nature Date: 2005-09-01 Impact factor: 49.962

4. Mammalian male mutation bias: impacts of generation time and regional variation in substitution rates.

Authors: M Paula Goetting-Minesky; Kateryna D Makova
Journal: J Mol Evol Date: 2006-09-04 Impact factor: 2.395

5. A genome-wide view of the spectrum of spontaneous mutations in yeast.

Authors: Michael Lynch; Way Sung; Krystalynne Morris; Nicole Coffey; Christian R Landry; Erik B Dopman; W Joseph Dickinson; Kazufusa Okamoto; Shilpa Kulkarni; Daniel L Hartl; W Kelley Thomas
Journal: Proc Natl Acad Sci U S A Date: 2008-06-26 Impact factor: 11.205

6. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements.

Authors: Sergey I Nikolaev; Juan I Montoya-Burgos; Konstantin Popadin; Leila Parand; Elliott H Margulies; Stylianos E Antonarakis
Journal: Proc Natl Acad Sci U S A Date: 2007-12-11 Impact factor: 11.205

7. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.

Authors: Timothy J Ley; Elaine R Mardis; Li Ding; Bob Fulton; Michael D McLellan; Ken Chen; David Dooling; Brian H Dunford-Shore; Sean McGrath; Matthew Hickenbotham; Lisa Cook; Rachel Abbott; David E Larson; Dan C Koboldt; Craig Pohl; Scott Smith; Amy Hawkins; Scott Abbott; Devin Locke; Ladeana W Hillier; Tracie Miner; Lucinda Fulton; Vincent Magrini; Todd Wylie; Jarret Glasscock; Joshua Conyers; Nathan Sander; Xiaoqi Shi; John R Osborne; Patrick Minx; David Gordon; Asif Chinwalla; Yu Zhao; Rhonda E Ries; Jacqueline E Payton; Peter Westervelt; Michael H Tomasson; Mark Watson; Jack Baty; Jennifer Ivanovich; Sharon Heath; William D Shannon; Rakesh Nagarajan; Matthew J Walter; Daniel C Link; Timothy A Graubert; John F DiPersio; Richard K Wilson
Journal: Nature Date: 2008-11-06 Impact factor: 49.962

8. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation.

Authors: Navin Elango; Seong-Ho Kim; Eric Vigoda; Soojin V Yi
Journal: PLoS Comput Biol Date: 2008-02-29 Impact factor: 4.475

9. Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster.

Authors: Cathy Haag-Liautard; Nicole Coffey; David Houle; Michael Lynch; Brian Charlesworth; Peter D Keightley
Journal: PLoS Biol Date: 2008-08-19 Impact factor: 8.029

10. The impact of recombination on nucleotide substitutions in the human genome.

Authors: Laurent Duret; Peter F Arndt
Journal: PLoS Genet Date: 2008-05-09 Impact factor: 5.917

19 in total

1. Recombination-associated sequence homogenization of neighboring Alu elements: signature of nonallelic gene conversion.

Authors: Alexey Aleshin; Degui Zhi
Journal: Mol Biol Evol Date: 2010-05-07 Impact factor: 16.240

2. The mutational spectrum of non-CpG DNA varies with CpG content.

Authors: Jean-Claude Walser; Anthony V Furano
Journal: Genome Res Date: 2010-05-24 Impact factor: 9.043

Review 3. The epigenomic interface between genome and environment in common complex diseases.

Authors: Christopher G Bell; Stephan Beck
Journal: Brief Funct Genomics Date: 2010-11-08 Impact factor: 4.241

4. Structural and functional divergence of a 1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region from Phaseolus vulgaris.

Authors: Jer-Young Lin; Robert M Stupar; Christian Hans; David L Hyten; Scott A Jackson
Journal: Plant Cell Date: 2010-08-20 Impact factor: 11.277

5. Measuring the rates of spontaneous mutation from deep and large-scale polymorphism data.

Authors: Philipp W Messer
Journal: Genetics Date: 2009-06-15 Impact factor: 4.562

6. Inference of identity by descent in population isolates and optimal sequencing studies.

Authors: Dominik Glodzik; Pau Navarro; Veronique Vitart; Caroline Hayward; Ruth McQuillan; Sarah H Wild; Malcolm G Dunlop; Igor Rudan; Harry Campbell; Chris Haley; Alan F Wright; James F Wilson; Paul McKeigue
Journal: Eur J Hum Genet Date: 2013-01-30 Impact factor: 4.246

7. Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation-Accumulation Experiments.

Authors: Way Sung; Matthew S Ackerman; Jean-François Gout; Samuel F Miller; Emily Williams; Patricia L Foster; Michael Lynch
Journal: Mol Biol Evol Date: 2015-03-06 Impact factor: 16.240

Review 8. Characterization of DNA methylation-based markers for human body fluid identification in forensics: a critical review.

Authors: Farzeen Kader; Meenu Ghai; Ademola O Olaniran
Journal: Int J Legal Med Date: 2019-11-12 Impact factor: 2.686

9. A high frequency and geographical distribution of MMACHC R132* mutation in children with cobalamin C defect.

Authors: Rajdeep Kaur; Savita Verma Attri; Arushi Gahlot Saini; Naveen Sankhyan
Journal: Amino Acids Date: 2021-01-30 Impact factor: 3.520

10. Pericentromeric effects shape the patterns of divergence, retention, and expression of duplicated genes in the paleopolyploid soybean.

Authors: Jianchang Du; Zhixi Tian; Yi Sui; Meixia Zhao; Qijian Song; Steven B Cannon; Perry Cregan; Jianxin Ma
Journal: Plant Cell Date: 2012-01-06 Impact factor: 11.277