Literature DB >> 17572359

A brief review of short tandem repeat mutation.

Hao Fan1, Jia-You Chu.   

Abstract

Short tandem repeats (STRs) are short tandemly repeated DNA sequences that involve a repetitive unit of 1-6 bp. Because of their polymorphisms and high mutation rates, STRs are widely used in biological research. Strand-slippage replication is the predominant mutation mechanism of STRs, and the stepwise mutation model is regarded as the main mutation model. STR mutation rates can be influenced by many factors. Moreover, some trinucleotide repeats are associated with human neurodegenerative diseases. In order to deepen our knowledge of these diseases and broaden STR application, it is essential to understand the STR mutation process in detail. In this review, we focus on the current known information about STR mutation.

Entities:  

Mesh:

Year:  2007        PMID: 17572359      PMCID: PMC5054066          DOI: 10.1016/S1672-0229(07)60009-6

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

Short tandem repeats (STRs), also known as microsatellites or simple sequence repeats, are shorl tandemly repeated DNA sequences that involve a repetitive unit of 1-6 bp (, forming series with lengths of up to 100 nucleotides (nt). STRs are widely found in prokaryotes and eukaryotes, including humans. They appear scattered more or less evenly throughout the human genome, accounting for about 3% of the entire genome. However, their distribution within chromosomes is not quite uniform—they appear less frequently in subtelomeric regions (. Most STRs are found in the noncoding regions, while only about 8% locate in the coding regions (. Moreover, their densities vary slightly among chromosomes. In humans, chromosome 19 has the highest density of STRs (. On average, one STR occurs per 2,000 bp in the human genome (. The most common STRs in humans are A-rich units: A, AC, AAAN, AAN, and AG 5., 6.. The STR locus is named as, for example, D3S1266, where D represents DNA, 3 means chromosome 3 on which the STR locus locates, S stands for STR, and 1266 is the unique identifier. On the basis of different repeat units, STRs can be classified into different types. On the one hand, according to the length of the major repeat unit, STRs are classified into mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats. The total number of each type decreases as the size of the repeat unit increases. The most common STRs in the human genome are dinucleotide repeats (. On the other hand, according to the repeat structure, STRs are classified into perfect repeats (simple repeats), containing only one repetitive unit, and imperfect repeats (compound repeats), consisting of different composition repeats (. Since the last decade of the 20th century, scientists have been interested in the direct functions of STRs in some of their host organisms. Although STRs widely exist in organisms, most of them are thought to have no biological uses at all and are regarded as “junk DNA”. However, several interesting hypotheses suggest that STRs actually play an important role in many organisms. In many disease-causing bacteria, some “contingency genes” reside in STR sequences. The STRs in such genes could cause frameshift mutations, which will change the expression of some proteins. These proteins are not necessary for the viability of the bacteria, but they can help bacteria to evade the human immune system. Some STRs may take part in regulating the transcription. From yeasts to humans, many proteins involved in transcriptional regulation contain glutamine-rich domains and trinucleotide repeats encoding series of polyglutamine (. Moreover, researches have shown that some STRs can regulate the transcription of the epidermal growth factor gene (, the tyrosine hydroxylase gene (, and the PIG3 gene (. In addition, some STRs may influence the regulation of gene expression. For example, in mammalian genomes, (CA)n and (CT)n near particular genes can affect the expression of these genes 12., 13.. STRs may also affect recombination (, generation of nucleosome positioning signals (, and maintenance of chromatin spatial organization (. Although currently more and more biological functions of STRs are being discovered, most of them remain unknown. Therefore, further study of the mutation and variability of STRs is required to understand their biological functions.

High Mutation Rates of STRs

Unique DNA sequences in a genome exhibit a very low mutation rate (approximately 10−9 nt per generation), whereas the mutation rates in STR sequences are several orders of magnitude higher (, ranging from 10−6 to 10−2 nt per generation. STR mutation rates are specific for organisms in vivo. For instance, the STR mutation rate in yeast ( and human 17., 18. is 10−5 nt and 10−3−10−5 nt per cell division, respectively. There is apparently great variation in mutation rates among loci. Chakraborty et al. ( showed that in human nonpathogenic STR loci, dinucleotide repeats display the highest mutation rate, while those of tetranucleotide STRs are 50% lower. However, the mutation rates of disease-associated trinucleotide repeats exceed the normal value by four to seven times. Several approaches have been developed for elucidating the mutation rates of STRs, such as the familial approach (, biological model approach, population approach (, and germline cell approach. The familial approach is the most direct one, where both the mutation rate and mutation type can be directly examined during STR transmission from the parents to the offspring (. In the biological model approach, an STR is cloned into a vector and propagated in its host, then the spontaneous rates of STR mutations can be evaluated and the effects of various factors on STR mutation can be estimated. Using the population approach, the common evolutionary origin of STRs can be detected and the mutation events can be traced back many generations (. STR mutation rates can also be directly analyzed in germline cells by the germline cell approach, especially in sperms (.

Mechanisms of STR Mutation

STRs were identified in eukaryotic DNA at the beginning of the 1970s. However, for the past decades, the mechanisms of STR mutation remain poorly understood. Up to date, three possible mechanisms have been proposed: (1) unequal crossing over in meiosis; (2) retrotransposition mechanism; (3) strand-slippage replication. Among these mechanisms, strand-slippage replication appears to be widely regarded as the main pattern of STR mutation.

Unequal crossing over in meiosis

This is a well-known mechanism generating large blocks of satellite DNA. It is associated with the exchange of repeat units between homologous chromosomes. However, this process involves different chromosomes, thus it plays a restricted role in STR mutation. Nevertheless, this mechanism may be responsible for STR multistep mutations (, which will be mentioned later.

Retrotransposition mechanism

This mechanism speculates that A-rich STRs are generated by a 3’ extension of retrotranscripts, similar to the polyadenylation of mRNA. Evidence has shown the association between the most common human STRs with A-rich content and transposable elements (. However, a high density of transposable elements does not always coincide with a high density of STRs. Further studies are needed to elucidate whether it is really a mechanism for STR mutation.

Strand-slippage replication

This model was first proposed by Kornberg et al. in 1964 (, which has also been called DNA slippage, polymerase slippage, or slipped strand mispairing. At present, it appears to be widely accepted as the main explanation of the STR mutation process. The slippage occurs during DNA replication, with a consequence of mispairing (by one or more repeat units) between the nascent and template strands. Next, the repeated DNA fragment is forced to “loop out” at the mismatch site. If DNA synthesis continues on this molecule, then the repeat number of the STR is altered (Figure 1) (.
Fig. 1

Schematic illustration of the strand-slippage replication at STR (.

However, the slippage rate is not the same as the apparent mutation rate of STR. Experiments in vitro have demonstrated that DNA slippage occurs at very high rates (. But in vivo, most of the DNA loops are recognized and eliminated by the mismatch repair system. It has been shown that a functional mismatch repair system reduces the STR mutation rate between 100 and 1,000 folds (. Thus, the observed STR mutation rate depends on the rate of slippage and the efficiency of the repair system correcting the mismatches. Several factors can affect the rate of slippage events, among which the repeat unit is the most important factor. A negative correlation was suggested between the length of the repeat unit and the rate of slippage (. Kruglyak et al. ( showed that the rate of slippage was the highest in dinucleotide STRs and the lowest in tetranucleotide STRs. This is consistent with the observation that the longer the repeat unit, the less the total amount of STRs. Probably the longer repeat units would require the strand to slip further before the bases could pair correctly again, and then they become less common in the genomes. Besides the repeat unit, other factors such as the number, location, and sequence of repeats are also likely to affect the rate and direction of slippage (. For instance, in humans, the rate of slippage events exponentially increases with the increasing repeat number (. The genesis of STRs assumes that the generation of STRs requires short “proto-STR”, which is subsequently extended by DNA slippage (. Once proto-STR arises, the repeat sequence acquires the ability of mutation. The minimum number of repeats needed for further expansion is four to five repeats for a dinucleotide STR and two repeats for a tetranucleotide STR (.

Models of STR Mutation

Infinite alleles model (IAM)

Kimura and Crow proposed this model in 1964 ( based on an assumption that each new mutation produces a new allele and all mutations are equiprobable. Therefore, it can involve any number of tandem repeats and always results in a new allele state not previously existing in the population. However, many studies on STR mutation showed that this model was incompatible with real mutation processes.

Stepwise mutation model (SMM)

This model was developed by Ota and Kimura in 1973 (. It was originally used to describe the changes of charged proteins inferred from electrophoretic mobility, which was later shown to be ineffective. However, it proved to be suitable for describing STR mutations. SMM is based on the following assumptions: (1) small changes in repeat number; (2) equal probabilities of increasing and decreasing repeat number; (3) unlimited allele size; (4) independence of the rate and size of mutations from the repeat number. SMM is in agreement with the strand-slippage replication mechanism, which is currently accepted as the main mechanism for STR mutation. In this model, alleles can mutate up or down by one or a small number of repeat units (. The model that only changes one repeat unit each time is called strict (single-step) SMM. When the change is more than one unit each time, it is called two-phase mutation model (TPM), which is also termed as generalized or multistep mutation model (MMM). In general, SMM refers to the strict SMM. Some reports suggested that SMM was consistent with the distribution of alleles at STR loci (. But many other studies demonstrated that strict SMM might not be sufficient to account for allele frequency distributions at STR alleles. These studies imply that the more complex the repeat structure is, the lower the likelihood that strict stepwise mutational events will happen (. TPM was developed by Di Rienzo et al. in 1994 (, which predicts the expected variance in repeat number under different mutational processes and demographic histories. It incorporates the mutational process of SMM, but allows for mutations of a larger magnitude to occur. As the variance in repeat number increases, the frequency of multistep mutations increases. If the distribution of STR alleles corresponds to MMM, such STR mutations could be caused by unequal crossing over. TPM is found in various organisms including Primates (. Huang et al. reported that the proportion of multistep mutations in human dinucleotide repeats is 62.9% (. Other researchers reported a much lower value of the average frequency of multistep human STR mutations, with an average of 23.8% 18., 37..

Factors Influencing STR Mutation

Repeat number

One of the key effective factors influencing STR mutation is repeat number. Studies using different methods, such as familial approach ( and population approach (, have strongly suggested that STR mutation rate increases with repeat number. Some studies displayed a positive association between mutation rate and repeat number in many vertebrate species ( including humans (. The direction of mutation may be different for alleles of different sizes within a locus (. An expansion occurs more frequently in short STRs, while a reduction of repeat number exists in longer ones (.

Repeat unit

The mutation rate of dinucleotide repeats is higher than that of trinucleotide repeats 19., 42.. These results agree with the slippage studies mentioned above.

Repeat structure

In autosomes, Y chromosome, and tumor cells, it was found that the mutation frequency was appreciably higher in heterozygotes with large allele span (, indicating that the repeat structure could have some contributions to the STR mutation process.

Base composition of repeat unit

Sequences of the unit with a high AT content mutate faster than those with a high GC content (, suggesting that the template stability could influence the mutation rate. Perhaps the sequences with high GC content could reduce the frequency of strand-slippage events.

Flanking sequence

Glenn et al. observed a significant negative correlation between allelic diversity and GC content of flanking sequences (. However, others did not agree with it (. Further studies are required to find out the true role of GC content of STR flanking sequences in the STR mutation process.

Recombination

The published research results were controversial on whether recombination is associated with STR mutation. Some detected a correlation, while others found no correlation. In humans, genome-wide analyses provided no evidence for a strong positive correlation between recombination rate and STR mutation 17., 46.. The STR loci from a non-recombining region of the human Y chromosome display the same mutation rate to that of autosomal loci, suggesting that recombination is not the predominant mechanism generating STR variability.

Sex

The mutation rates at most loci in germ cells are higher in males than in females (. It is widely accepted that sperms undergo more DNA replication cycles than eggs. The more replication cycles a cell experiences, the higher frequency of mutation is.

Age

The mean age of the men who carry STR mutations is significantly greater than that of the men lacking of these mutations 18., 38.. This is probably because sperms experience more mitoses and thus have higher mutation possibilities. A sperm of men undergoes about 380 and 540 mitoses by the age of 28 and 35, respectively. Therefore, in a way, the mutation rate of STRs depends on the age of men.

Interruptions in STR

A special mutation in STR is slippage, which induces insertion or deletion of one or several repeats. Besides, STRs also display “nonspecific” mutations, such as transitions/transversions, single-nucleotide insertions/deletions, and other events. Their frequency is low as compared to that of specific mutations, but they disrupt the STR nucleotide sequences and change their mutability. STRs can be stabilized by insertions of a different nucleotide composition.

STR Mutation and Diseases

The interest in STR mutation comes from the discovery that some trinucleotide repeats are involved in human neurodegenerative diseases. Trinucleotide repeat associated diseases are known to include many rare, dominant, and mainly neurological disorders, such as fragile X syndrome, Huntington’s disease, myotonic dystrophy, and certain types of spinocerebellar ataxia. To date, trinucleotide repeat associated diseases have only been identified in humans. This has led to the hypothesis that the presence of trinucleotide repeats within certain brain related genes may contribute to the evolution of brain function. Trinucleotide repeat associated diseases are characterized by the trinucleotide repeats that expand far outside of their “normal” polymorphic ranges. Such trinucleotide repeats are usually inside genes, most of which encode clusters of glutamine residues; others, which reside outside the genes, are currently close enough to disrupt the genes’ functions. In general, disease severity often appears to correlate with the extent of abnormal expansion. For instance, the CGG repeat, which encodes runs of arginine, resides on the 5’ end in the fragile X mental retardation-I syndrome. Usually the number of repeats ranges from 6 to 46, with an average of 29. When the repeats overrun 52 times, the STR region will be unstable during meiosis, with a consequence of rapid expanding. The CGG repeat number in the carrier without any symptoms ranges from 60 to 200. However, the patient with obvious symptoms carries more than 230 repeats of CGG (. Therefore, in order to better understand these human neurodegenerative diseases, it is important to understand the particular mutation process of STRs.

Application of STRs in Population Genetics

STRs have such properties as abundant, codominant, highly polymorphic, and nearly selectively neutral. Besides, STRs contain DNA fragments that are small enough to be amplified by polymerase chain reaction and separated in high-resolution media like polyacrylamide. With the availability of high-throughout capillary sequencers or mass spectrography, the sizing of alleles is no longer a bottleneck in STR analysis. Thus STRs are widely used in scientific and applied research. STRs are extremely useful in applications such as the construction of genetic maps (, gene location, genetic linkage analysis, identification of individuals, paternity testing, as well as disease diagnosis 50., 51.. STR analysis has also been employed in population genetics. Nevertheless, the application of STRs to population genetics requires a more detailed understanding of the STR mutation process. We can apply STRs to reconstruct the history of migration and evolution of the species, as well as to assess biological diversity at various levels of biological organization (. A method of absolute genetic dating uses mutation rates as molecular clocks. Such a molecular clock based on STR, whose mutation rate is very high, can be applied to human evolution. Therefore, STRs are likely to reflect relatively recent divergence (. The difference in size between two different STR alleles might be informative: the larger the difference, the more the number of mutation events. Thus there is a “memory” of past mutation events (. That is, when a mutation occurs, the new mutant is related to the allele from which it was derived. In this case, the difference in length between alleles contains phylogenetic information (. However, the prevalence of different mutational events may vary dramatically among groups. Ignoring the possibility that the same allelic type found in different individuals or populations may be derived from different evolutionary processes, it might lead to biased estimates of genetic structure. Consequently, it is very important to know the mutation process of STRs in detail before they are applied to population genetics studies. Mutation models for the evolutionary process of STRs are needed in order to estimate phylogenetic relationships, population differentiation measures, and genetic distances from STR data. Different kinds of estimators based on IAM have been developed, such as DAS (shared allele distance), DCH (Cavalli-Sforza and Edwards chord distance), and DS (Nei’s standard genetic distance). On SMM/TPM, estimators include (δμ)2, DSW (stepwise weighted genetic distance), and RST. Different estimators can be effective in different situations. Goldstein et al. concluded that for a relatively short period of time, DAS or DS is a better measure, but as time increases, the estimator based on SMM such as (δμ)2 becomes superior (. In 1995, Goldstein et al. predicted that STR loci would ultimately allow a high-resolution description of the human evolutionary history (. Many researchers have studied the history of human evolution and migration by using STR loci 57., 58., 59., 60.. Mountain et al. developed a new combination polymorphism, namely SNPSTRs (, in which each such segment includes one or more single nucleotide polymorphisms (SNPs) and exactly one STR locus, providing insights into population history. At present, STR loci are employed to reveal the relationship of populations in different regions, as well as the route of migration of ancient peoples.

Perspective

With the development of the third generation genetic markers, SNP will replace STR for some applications like genome mapping. However, a comprehensive understanding of STR mutation and its high informative characteristics will increase the application of STR analysis in many more fields of science.
  58 in total

1.  THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION.

Authors:  M KIMURA; J F CROW
Journal:  Genetics       Date:  1964-04       Impact factor: 4.562

2.  Slippage synthesis of simple sequence DNA.

Authors:  C Schlötterer; D Tautz
Journal:  Nucleic Acids Res       Date:  1992-01-25       Impact factor: 16.971

3.  Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups.

Authors:  A Edwards; H A Hammond; L Jin; C T Caskey; R Chakraborty
Journal:  Genomics       Date:  1992-02       Impact factor: 5.736

4.  Microsatellites show mutational bias and heterozygote instability.

Authors:  W Amos; S J Sawcer; R W Feakes; D C Rubinsztein
Journal:  Nat Genet       Date:  1996-08       Impact factor: 38.330

5.  Conservation of glutamine-rich transactivation function between yeast and humans.

Authors:  D Escher; M Bodmer-Glavas; A Barberis; W Schaffner
Journal:  Mol Cell Biol       Date:  2000-04       Impact factor: 4.272

6.  A comprehensive genetic map of the human genome based on 5,264 microsatellites.

Authors:  C Dib; S Fauré; C Fizames; D Samson; N Drouot; A Vignal; P Millasseau; S Marc; J Hazan; E Seboun; M Lathrop; G Gyapay; J Morissette; J Weissenbach
Journal:  Nature       Date:  1996-03-14       Impact factor: 49.962

7.  Microsatellite variation and recombination rate in the human genome.

Authors:  B A Payseur; M W Nachman
Journal:  Genetics       Date:  2000-11       Impact factor: 4.562

8.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations.

Authors:  S Kruglyak; R T Durrett; M D Schug; C F Aquadro
Journal:  Proc Natl Acad Sci U S A       Date:  1998-09-01       Impact factor: 11.205

9.  High resolution of human evolutionary trees with polymorphic microsatellites.

Authors:  A M Bowcock; A Ruiz-Linares; J Tomfohrde; E Minch; J R Kidd; L L Cavalli-Sforza
Journal:  Nature       Date:  1994-03-31       Impact factor: 49.962

10.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions.

Authors:  Subbaya Subramanian; Rakesh K Mishra; Lalji Singh
Journal:  Genome Biol       Date:  2003-01-23       Impact factor: 13.583

View more
  56 in total

1.  Variations in short tandem repeats deduced on the basis of the number of repeats and the relationship of these variations with longevity.

Authors:  Liu Hui; Yu Weijian; Deng Xuelian; Liu Qigui
Journal:  Age (Dordr)       Date:  2010-06-29

2.  Mind the dbGAP: the application of data mining to identify biological mechanisms.

Authors:  Eric C Wooten; Gordon S Huggins
Journal:  Mol Interv       Date:  2011-04

3.  Utility of ForenSeq™ DNA Signature Prep Kit in the research of pairwise 2nd-degree kinship identification.

Authors:  Miao Xu; Qingqing Du; Guanju Ma; Zifan Chen; Qingxia Liu; Lihong Fu; Bin Cong; Shujin Li
Journal:  Int J Legal Med       Date:  2019-01-28       Impact factor: 2.686

4.  Comparison of southern Chinese Han and Brazilian Caucasian mutation rates at autosomal short tandem repeat loci used in human forensic genetics.

Authors:  Hongyu Sun; Sujuan Liu; Yinming Zhang; Martin R Whittle
Journal:  Int J Legal Med       Date:  2013-04-03       Impact factor: 2.686

Review 5.  Immunotherapy efficacy on mismatch repair-deficient colorectal cancer: From bench to bedside.

Authors:  Darleny Y Lizardo; Chaoyuan Kuang; Suisui Hao; Jian Yu; Yi Huang; Lin Zhang
Journal:  Biochim Biophys Acta Rev Cancer       Date:  2020-10-06       Impact factor: 10.680

6.  A Short-Tandem-Repeat Assay (MmySTR) for Studying Genetic Variation in Madurella mycetomatis.

Authors:  Bertrand Nyuykonge; Kimberly Eadie; Willemien H A Zandijk; Sarah A Ahmed; Marie Desnos-Ollivier; Ahmed H Fahal; Sybren de Hoog; Annelies Verbon; Wendy W J van de Sande; Corné H W Klaassen
Journal:  J Clin Microbiol       Date:  2021-02-18       Impact factor: 5.948

7.  Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

Authors:  Taedong Yun; Helen Li; Pi-Chuan Chang; Michael F Lin; Andrew Carroll; Cory Y McLean
Journal:  Bioinformatics       Date:  2021-01-05       Impact factor: 6.937

8.  Impact of genetic background on allele selection in a highly mutable Candida albicans gene, PNG2.

Authors:  Ningxin Zhang; Richard D Cannon; Barbara R Holland; Mark L Patchett; Jan Schmid
Journal:  PLoS One       Date:  2010-03-09       Impact factor: 3.240

9.  High rate of large deletions in Caenorhabditis briggsae mitochondrial genome mutation processes.

Authors:  Dana K Howe; Charles F Baer; Dee R Denver
Journal:  Genome Biol Evol       Date:  2009-12-23       Impact factor: 3.416

10.  DNA capture-probe based separation of double-stranded polymerase chain reaction amplification products in poly(dimethylsiloxane) microfluidic channels.

Authors:  Dmitriy Khodakov; Leigh Thredgold; Claire E Lenehan; Gunther G Andersson; Hilton Kobus; Amanda V Ellis
Journal:  Biomicrofluidics       Date:  2012-06-12       Impact factor: 2.800

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.