Literature DB >> 28957459

Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes.

Andrew T M Bagshaw1.   

Abstract

Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  eQTL; repeat; review; short; tandem; transcription

Mesh:

Substances:

Year:  2017        PMID: 28957459      PMCID: PMC5622345          DOI: 10.1093/gbe/evx164

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Microsatellites, or short tandem repeats (STRs), also often called short sequence repeats (SSRs), consist of tandem duplications of 1–6 bp motifs. They are highly abundant in the noncoding DNA of all eukaryotic genomes studied, covering 1–3% of the human genome, depending on how they are defined (Lander etal. 2001; Subramanian etal. 2003b; fig. 1). Their repetitive structure allows strand misalignment, which can result in frequent change of length mutations, at rates as high as 10−4–10−3 per generation (reviewed in Ellegren 2004). Repeats shorter than a threshold length of around five copies, in the case of dinucleotide motifs, or four where the repeated motif is longer, are less mutable and polymorphic than longer STRs (Ananda etal. 2013), and are traditionally not referred to as microsatellites. However, no such threshold length has been found by comparative studies of mutation (Leclercq etal. 2010), and no firm definition is established. The common length polymorphism of microsatellites has been utilized very widely for many years as a marker of genetic difference in diverse fields including gene mapping, population genetics and forensics (reviewed in Hodel etal. 2016), and is probably the attribute for which they are best known among geneticists.
1.

—(A) Distribution among human intergenic (IGR), near-genic (within 2 kb of transcription start or end sites), exonic, intronic and untranslated regions of ∼1.4 million microsatellites identified by Willems etal. (2014). Minimum length thresholds were 12 repeat copies for mononucleotide runs, 6 for dinucleotides and 4 for 3–6 bp repeat periods. (B) Distribution of microsatellites (periodicity 2–6 bp) with length variants showing significant effects on transcription (eSTRs) in lymphoblastoid cell lines (Gymrek etal. 2016).

—(A) Distribution among human intergenic (IGR), near-genic (within 2 kb of transcription start or end sites), exonic, intronic and untranslated regions of ∼1.4 million microsatellites identified by Willems etal. (2014). Minimum length thresholds were 12 repeat copies for mononucleotide runs, 6 for dinucleotides and 4 for 3–6 bp repeat periods. (B) Distribution of microsatellites (periodicity 2–6 bp) with length variants showing significant effects on transcription (eSTRs) in lymphoblastoid cell lines (Gymrek etal. 2016). Microsatellites are also well-known for their causative roles in as many as 40 neurological diseases including Huntington’s Disease, Friedreich's Ataxia (FRDA), several of the Spinocelebellar Ataxias (SCA), Fragile X syndrome (FRAXA), and Myotonic Dystrophy types 1 and 2 (DM1 and 2) (reviewed in Pearson etal. 2005; Gatchel and Zoghbi 2005; Groh etal. 2014). In many of these diseases, radical expansions of trinucleotide microsatellites are pathogenic. These mutations begin at threshold levels of around 35–40 repeats and can reach hundreds of copies in affected cells (Pearson etal. 2005). Toxicity often results from hyper-expanded polyglutamine tracts translated from exonic microsatellites, but repeats do not have to encode protein to exert pathogenic effects (Gatchel and Zoghbi 2005). In FRAXA, transcription of the FMR1 gene is silenced in alleles with 200+ copies of its 5′ untranslated region (UTR) CGG repeat due to DNA–RNA hybridization between the repeat in mRNA and the gene itself (Colak etal. 2014). Interestingly, individuals with “preexpansion” 55–200 copy alleles show increased transcription of the gene (Tassone etal. 2007). Reduction of gene expression occurs by a different mechanism in FRDA, in which progression of transcription is inhibited due to a secondary structure formed by the microsatellite, in conjunction with epigenetic modifications (Punga and Buhler 2010; Sakamoto etal. 1999). Another major pathogenic mechanism in microsatellite disease is disruption of splicing (Groh etal. 2014). Several diseases including DM1 and at least two SCAs involve global splicing misregulation due to sequestration of RNA binding proteins by expanded repeats (Echeverria and Cooper 2012; Galka-Marciniak etal. 2012). These effects have also been seen locally, for example, the toxic truncated N-terminal fragment of mutant HTT protein in Huntington’s Disease is generated by CAG repeat length-dependent missplicing (Sathasivam etal. 2013), and the expanded GAA microsatellite associated with FRDA has been shown to affect the splicing efficiency of its gene in model systems (Baralle etal. 2008; Shishkin etal. 2009). A substantial body of evidence now indicates that many of the transcriptional and RNA-level effects of disease-causing microsatellites are not unique to disease, but instead represent aberrant manifestations of normal microsatellite function. At present, the best known aspect of this is the potential of microsatellites in upstream promoter regions to modulate gene expression levels (reviewed in Sawaya etal. 2012; Press etal. 2014). Scattered examples have been known for many years, and several are now well-replicated. One of the most studied is an (AC)17–39 repeat in the promoter region of the HO-1 gene, polymorphisms of which are associated with cardio-vascular disease, cancer, preeclampsia and Parkinson’s disease, reflecting the antioxidant, anti-inflammatory activities of the HO-1 enzyme (Daenen etal. 2016; Chen etal. 2002; Zhang etal. 2014; Ayuso etal. 2014; Kaartokallio etal. 2014). Others include a (CCTTT)8–17 polymorphism in the promoter of the NOS2 gene, which modifies risk of hypertension and several other conditions including psoriasis (Baloira Villar etal. 2014; Chang etal. 2015; Ryk etal. 2014), and a series of repeats in the AVPR1A gene’s promoter region, which have been associated with social behavior in voles, mice and humans (Donaldson and Young 2013; Hammock and Young 2005; Wang etal. 2016; Walum etal. 2008). One of the most notable examples from a medical standpoint is an A(TA)6–7 TAA polymorphism in the promoter (TATAA box) of the bilirubin UDP-glucuronosyltransferase 1 gene. Individuals with Gilbert’s syndrome are homozygous for the longer allele, which is associated with reduced gene expression (Bosma etal. 1995). It also has major effects on metabolism of the anticancer drug irinotecan (Hoskins etal. 2007). Other well-studied examples of promoter-associated microsatellites are reviewed elsewhere (Sawaya etal. 2012). While promoter loci have been given the most attention to date, single-gene studies have also identified expression-altering microsatellite variants in introns (Zhang etal. 2009; Zakieh etal. 2013; Li etal. 2013; Agarwal etal. 2000; Gebhardt etal. 1999), and UTRs (Chen etal. 2007; Gau etal. 2011; Nagalingam etal. 2014; Galindo etal. 2011; Balasubramaniam etal. 2013; Kumar and Bhatia 2016). Demonstrated examples of gene expression modulation by microsatellite polymorphism remain isolated at present, but evidence has recently emerged that the phenomenon is widespread in the human genome. Studies of expression quantitative trait loci (eQTL) have shown that a substantial proportion of the heritability of human gene expression levels attributable to common variants in cis is due to STR polymorphism (Gymrek etal. 2016; Quilez etal. 2016). This contribution has likely gone largely unaccounted for in genome-wide association studies (GWAS) because the frequency and diversity of microsatellite polymorphism are much higher than those of single nucleotide polymorphism (SNP) (Willems etal. 2014; Quilez etal. 2016; Gymrek 2017). Normal microsatellite polymorphism has also been linked to alternative splicing. In 2003, Hui and colleagues showed that splicing efficiency of transcripts from the eNOS gene in minigene constructs depended on the length and sequence of an intronic (CA)19–38 repeat (Hui etal. 2003). The same authors later reported that CA microsatellites in the introns of several other genes could act as splicing enhancers or suppressors (Hui etal. 2005). Other examples include modification of pathogenic splicing in cystic fibrosis by a (CA)9–13 repeat in the CFTR gene (Cuppens etal. 1998). Adjustment of transcriptional frequency and mRNA splicing are by no means the only aspects of genomic regulation for which the unique properties of microsatellites have been harnessed. Evidence has indicated roles in modulating mRNA stability (Chen etal. 2007), selection of transcription start and termination sites (Kramer etal. 2013; Tseng etal. 2013), enhancer function (Kumar etal. 2013; Gebhardt etal. 1999; Gymrek etal. 2016), nucleosome positioning and modification (Iyer and Struhl 1995; Liu etal. 2006; Zhao etal. 2015; Gymrek etal. 2016; Quilez etal. 2016), higher order chromatin structure (Pathak etal. 2013; McNeil etal. 2006; Subramanian etal. 2003a), noncoding RNAs (ncRNAs) (Amiteye etal. 2013; Zheng etal. 2010), and meiotic recombination hot spots (Gendrel etal. 2000; Kirkpatrick etal. 1999; Choi etal. 2013). Also notable is the surprising importance of exonic microsatellites. These are often highly conserved and are more common than expected in view of their potential to disrupt gene function (Schaper etal. 2014; Loire etal. 2013; Gymrek etal. 2017). They mostly consist of trinucleotide repeats, which have functional roles encoding runs of particular amino acids. Variation in these repeats has been associated with diverse phenotypic changes including skeletal morphology in dogs and receptor protein levels in humans (Fondon and Garner 2004; Brockschmidt etal. 2007). Interestingly, some exonic dinucleotide microsatellites are also maintained, which is mysterious given that their length-changes are expected to cause frameshift mutations in downstream coding sequence (Haasl and Payseur 2014). Indeed, it seems reasonable to speculate that this potential for frameshifts may underlie the propensity of human DNA polymerases to avoid causing mutations that remove interruptions to microsatellites, at least in the case of poly-A repeats (Ananda etal. 2014), although regulatory frameshifting has been described (reviewed in Ketteler 2012; Moxon etal. 2006). As these observations suggest, current understanding of microsatellite biology in general remains limited. However, while the number of microsatellites with demonstrated function remains very small relative to their overall abundance, some functional mechanisms have been described in detail, and several themes are evident (table 1).
Table 1

Biological Processes Influenced by Microsatellites in Healthy Cells

ProcessGene (Organism)Repeat MotifRef.
Binding of transcription factors to microsatellite DNASLC11A1 (human)GT (imperfect)Bayele etal. 2007,
ECE-1c (human)CA (imperfect)Taka etal. 2013
TH (human)TACTLi etal. 2012
PIG3 (human)TGYCCAlbanese etal. 2001
nadA (N. meningitidis)TAAAContente etal. 2002
Martin etal. 2005
Spacing between promoter elementsGP91-PHOX (human)CAUhlemann etal. 2004
IGF1 (human)CAChen etal. 2016
Long-range interactionsIntergenic (Drosophila & human)GATAKumar etal. 2013
Transcription start site selectionHO-1 (human)ACKramer etal. 2013
ECE-1c (human)CA (imperfect)Li etal. 2012
Transcription end site selectionASS1 (human)GTTseng etal. 2013
RNA half-lifeFGF9 (human)TG (imperfect)Chen etal. 2007
Alternative splicingAPOA2 (human)GTCuppens etal. 1998
CFTR (human)TGHefferon etal. 2004
eNOS (human)CAHui etal. 2003
Various (human)CAHui etal. 2005
Nucleosome packagingHIS3 (S. cerevisiae)AIyer and Struhl 1995
CSF1 (human)TGLiu etal. 2001, Liu etal. 2006
CYC1 (S. cerevsiae)CGWong etal. 2007
Genomic (human)BAAZhao etal. 2015
Histone modificationGenomic (human)VariousGymrek etal. 2016
MethylationGenomic (human & chimpanzee)CGFukuda etal. 2013
Genomic (human)CGQuilez etal. 2016
Noncoding RNA functionGenomic (Drosophila)AAGAGPathak etal. 2013
Genomic (mammals)GAAZheng etal. 2010
Meiotic recombinationARG4 (S. cerevisiae)HIS4 (S. cerevisiae)TGCCGNNGendrel etal. 2000, Kirkpatrick etal. 1999
Genomic (A. thaliana)CCT & CCNChoi etal. 2013, Shilo etal. 2015

Note.—Studies Only Considering Low-Copy STRs Are Not Included

Biological Processes Influenced by Microsatellites in Healthy Cells Note.—Studies Only Considering Low-Copy STRs Are Not Included

Transcription Factor Binding

Modulation of transcription factor binding by microsatellite length changes may seem the most obvious explanation for STR eQTL, but demonstrated examples of this are quite rare. The GAGA factor, which binds to short GA repeats and modulates chromatin structure, is well-known for its involvement in a significant class of promoters (Adkins etal. 2006; Valipour etal. 2013; Fuda etal. 2015). However, while a small number of long GA microsatellites capable of modulating gene expression in reporter plasmids can be found in promoters (Valipour etal. 2013), functional repeats bound by the GAGA factor are mostly shorter than five copies (Omelina etal. 2011; van Steensel etal. 2003). Few experimental studies have demonstrated direct transcription factor binding to microsatellites as normally defined. Some of the earliest evidence were reported in 2001, for a (TACT)5–10 repeat in the first intron of the human TH gene (Albanese etal. 2001). This microsatellite binds the transcription factor HBP1 and the zinc finger protein ZNF191, and exerts a copy number-dependent silencing effect on the gene. In contrast, another early study showed a stimulatory effect on transcription by a (TGYCC)10–17 repeat located between the positions +451 and +517 of the gene PIG3 (Contente etal. 2002). This study used a variety of methods to show that binding of the microsatellite by the tumor suppressor protein p53 was necessary and sufficient for transcription, the frequency of which correlated with repeat copy number. Evidence for the evolutionary conservation of this functional mechanism has been seen in Neisseria meningitidis. Phase-variable expression of the nadA virulence gene of this pathogenic bacterium is regulated at the transcriptional level by a (TAAA)4–12 promoter microsatellite, which is bound by the transcription factor IHF in a copy number-dependent manner (Martin etal. 2005). The link with transcription factor binding is more complex in the case of a (GT)5AC(GT)5AC(GT)9–10 microsatellite in the proximal promoter of the gene SLC11A1 (also known as NRAMP1), where gain or loss of a single AC repeat copy can cause a several-fold change in transcription level in reporter plasmid assays (Searle and Blackwell 1999). The functional role of the microsatellite is partly due to two dinucleotide insertions interrupting the repeated motif, which create binding sites for the hypoxia-inducible protein HIF-1 (Bayele etal. 2007). Changes in repeat copy number are associated with the promoter’s response to binding of the transcription factor ATF-3 at an adjacent site (Taka etal. 2013). As outlined below, a possible mechanism for this is microsatellite-mediated disruption of local nucleosomes (fig. 2). A [CA]6[CpG]14–24[CA]30–50 compound-repeat in the ECE-1c gene’s promoter, which is associated with Alzheimer’s disease, also has an unusual relationship with transcription factors. Similarly to the SLC11A1 locus, luciferase assays showed that one particular allele of this microsatellite causes substantially higher levels of ECE-1c expression than any of the other alleles (Li etal. 2012). This effect was linked to binding of the transcription factors SFPQ and PARP-1.
. 2.

—Several types of microsatellite have been shown to resist packaging into nucleosomes, facilitating the binding of transcription factors (TF) and other proteins to nearby DNA. This sometimes involves induced nonB-DNA structure formation by the microsatellite. TFs also bind directly to some microsatellites. Additionally, STR eQTLs have been associated with epigenetic chemical modifications of nucleosomes, including regulatory methylation and acetylation of histone proteins.

—Several types of microsatellite have been shown to resist packaging into nucleosomes, facilitating the binding of transcription factors (TF) and other proteins to nearby DNA. This sometimes involves induced nonB-DNA structure formation by the microsatellite. TFs also bind directly to some microsatellites. Additionally, STR eQTLs have been associated with epigenetic chemical modifications of nucleosomes, including regulatory methylation and acetylation of histone proteins. In addition to experimentally verified examples, it has been observed that promoter-associated STR eQTL significantly overlap with known transcription factor binding sites (Quilez etal. 2016). Also consistent with the hypothesis of a common role for transcription factors in mediating microsatellite function are observations of tissue- and cell-type-specific effects of polymorphic microsatellites on target gene expression (Chen etal. 2007; Chiba-Falek and Nussbaum 2001; Albanese etal. 2001; Borrmann etal. 2003). However, it is notable that known examples of transcription factor binding to microsatellites are mostly limited to repeats of longer periodicity and lower uniformity, and because these are less mutable than perfect repeats of short motifs (reviewed in Ellegren 2004), the potential magnitude of their contribution to phenotypic variation is correspondingly lower.

Spacing between Regulatory Elements

The above-mentioned studies show that the direction of correlation between microsatellite length and transcriptional frequency is context-dependent, and maximal activity is sometimes seen for alleles of intermediate length (Morris etal. 2010; Li etal. 2012; Contente etal. 2002). In view of the importance of maintaining particular distances between promoter elements in many contexts (Vardhanabhuti etal. 2007), these observations suggest that the potential of microsatellites to modulate these distances may be underappreciated. At least two studies are found suggestive evidence consistent with the concept. Copy number of a (CA)17–21 microsatellite in the promoter of the IGF1 gene correlates inversely with transcription, but this effect is only seen in the presence of a flanking SNP haplotype (Chen etal. 2016). The haplotype provides a binding site for CCAAT/enhancer-binding-protein δ (C/EBPD), which is essential for the eSTR activity of the microsatellite, and the transcription factor FOXA3 may also be involved (Chen etal. 2016). Other work suggests the possible involvement of DNA looping in at least some distal interactions interposed by microsatellites (fig. 3). The GP91-PHOX gene’s promoter contains a (TA)11–26 repeat, the copy number of which correlates with NADPH-oxidase activity (Uhlemann etal. 2004). The correlation shows regular periodicity, with around five repeat copies between each of three observed maxima. This distance coincides with the approximate length of one helical turn, and similar periodic correlations have been seen at loci where looping is known to occur between two promoter elements either side of a sequence of variable length (Lewis and Adhya 2002; Perez etal. 2000). Also consistent with a function in modulating spacing between functional elements, comparative work has revealed that microsatellites show clear length-specific as well as motif-specific enrichment, with several trends conserved among species (Ramamoorthy etal. 2014).
. 3.

—Microsatellite change of length mutations may act by altering distances between flanking protein binding sites. Observations of high DNA flexibility at microsatellite sequences, and of periodicity approximating helical turn length in a correlation between microsatellite copy number and transcription, suggest the potential for looping to mediate this mechanism where sufficient distance exists between interacting elements, although closer interactions could occur without looping. Two possible scenarios are binding between an enhancer and a transcription factor flanking a microsatellite (TF; A), and between two TFs (B). Adoption of a nonB-DNA structure such as Z-DNA by the microsatellite could also play a role, because such structures absorb supercoiling energy and should thereby reduce the bending potential of surrounding DNA (B).

—Microsatellite change of length mutations may act by altering distances between flanking protein binding sites. Observations of high DNA flexibility at microsatellite sequences, and of periodicity approximating helical turn length in a correlation between microsatellite copy number and transcription, suggest the potential for looping to mediate this mechanism where sufficient distance exists between interacting elements, although closer interactions could occur without looping. Two possible scenarios are binding between an enhancer and a transcription factor flanking a microsatellite (TF; A), and between two TFs (B). Adoption of a nonB-DNA structure such as Z-DNA by the microsatellite could also play a role, because such structures absorb supercoiling energy and should thereby reduce the bending potential of surrounding DNA (B).

Links to Enhancer Function

Despite intronic microsatellites being very common among known STR eQTLs (Gymrek etal. 2016; fig. 1), mechanisms underlying their effects on gene expression (Agarwal etal. 2000; Zakieh etal. 2013) have been studied relatively little in comparison to the promoter-associated loci discussed above. A notable exception is the breast cancer-associated CA14–21 repeat in intron 1 of the EGFR gene, which inhibits transcription by as much as 5-fold at higher copy numbers both invitro and invivo, although other regulatory mechanisms can suppress the effect (Gebhardt etal. 1999, 2000; Buerger etal. 2004). This microsatellite is located between two enhancer elements, one upstream of the promoter and one downstream in intron 1, the activity of which depends on presence of the upstream element (Maekawa etal. 1989). Analysis of the curvature of the repeat DNA and its flanking sequences, based on trinucleotide bending propensity parameters deduced from DNase I digestion data, suggested that the region was highly bendable, and more so at higher repeat copy numbers (Gabrielian etal. 1996). This led to the proposal that the microsatellite could influence interaction between flanking regulatory elements (Gebhardt etal. 1999). The propensity of some microsatellites to form Z-DNA or other structural variants could contribute to such interactions, since while DNA looping may normally require binding of architectural proteins (reviewed in Olson etal. 2013), nonB-DNA structures are expected to modify the process by relieving the torsional tension of nearby DNA, increasing the energy required for it to bend (Benham etal. 2010; Mogil etal. 2016; fig. 3). Additional evidence consistent with a distal role for some microsatellites in modulating enhancer activity was reported by a study showing that GATA repeats can block interaction between enhancers and promoters invivo (Kumar etal. 2013). A more general role in mediating distal interactions was suggested by a study of long range contacts revealed by 5C experiments, which showed enrichment of low-copy STRs in interacting sequences (Nikumbh and Pfeifer 2017). Low-copy STRs have also been shown to act as functional components of enhancers. A computational analysis in Drosophila identified 2–4 copy repeats of CA, GA, CG, and GATA among the most enriched and discriminative enhancer motifs, and went on to demonstrate that insertion of these elements into nonfunctional sequence could generate enhancer activity (Yanez-Cuna etal. 2014). Supporting a similar role for longer microsatellites, a genome-wide study found enrichment of STR eQTL variants near the enhancer histone mark H3K27ac (Gymrek etal. 2016).

Effects on Alternative Splicing

The best-described examples of functional intronic microsatellites exert their effects at the level of alternative splicing. Perhaps the first reported example of this outside the context of trinucleotide expansion disease was a (GT)16 repeat in the 3′ splice site of the human APOA2 gene’s second intron. This replaces the poly-pyrimidine tract known to be common at 3′ splice sites, and efficient splicing was found to depend on the number of GT repeats present (Shelley and Baralle 1987). Some years later, work on the CFTR gene involved in cystic fibrosis revealed a similar effect for a TG microsatellite located near the exon 9 splice acceptor site. A pathogenic T5 variant at this splice site is associated with exon skipping and disease (Chu etal. 1993), and its effect is modified by the length of an adjacent (TG)9–13 repeat (Cuppens etal. 1998). Several mechanisms have been proposed to explain this (Cuppens etal. 1998; Hefferon etal. 2004; Groman etal. 2004; Zuccato etal. 2004; Buratti etal. 2001). Involvement of the protein TDP-43 was indicated by a study showing that it binds to the microsatellite, and that presence of misspliced mRNA without exon 9 correlates with the expression level of the protein (Buratti etal. 2001). Other evidence has implicated TIA-1 protein in the process (Zuccato etal. 2004). However, based on a study of the effects of replacing the microsatellite with various sequences of similar length, RNA secondary structure may play a role (Hefferon etal. 2004). This study showed greater splicing efficiency in the proximity of sequences with the potential to form RNA hairpins, and two other observations pointed to the importance of this property. Firstly, differences between substituted dinucleotide repeats of various motifs were far greater than differences between (TG)8 and (TG)12, suggesting that the link between microsatellite copy number and splicing efficiency at the locus doesn’t primarily relate to relative positioning of adjacent elements. Second, a similar copy-number dependent suppressive effect to that of poly-TG was shown for a poly-TA substitute, also arguing against binding of sequence-specific splicing effector proteins as the main functional mechanism. Interestingly, splicing was most efficient for sequences predicted to form low-stability hairpins, suggesting transient structure formation (Hefferon etal. 2004). In this context, it is notable that intronic G-qudruplex structures have also been shown to modulate alternative splicing (Ribeiro etal. 2015; Didiot etal. 2008). RNA structures in general may act by causing substantial changes to the distances between elements of the splicing process, or by impeding the progression of RNA polymerase, changing the time-window for splicing regulatory sequences to be recognized (Nieto Moreno etal. 2015). In contrast, support for protein binding as a primary mediator of microsatellites’ effects on splicing has been seen in the eNOS gene. Copy number of a (CA)19–38 repeat near the 5′ splice site of the gene’s 13th intron correlates with the efficiency with which this intron is excised, and this splicing enhancer activity depends on the RNA-binding protein hnRNP L (Hui etal. 2003). Poly-CA is not structurally equivalent to poly-TG in RNA, and doesn’t form hairpins (Hefferon etal. 2004). Poly-CA microsatellites have also been shown to enhance splicing when inserted at various alternative intronic positions, and generation of cryptic splice sites has been demonstrated in some cases (Hui etal. 2005). Investigating the prevalence of this phenomenon, one study identified several hundred AC microsatellites located close to alternatively spliced exons in the human genome, and performed experimental validation for four of these, demonstrating splice-enhancer effects for two, and suppressive effects by the other two (Hui etal. 2005). Position relative to the splice site was suggested as a potential determinant of positive or negative regulation.

Distinct Functions Observed for UTR Microsatellites

Several UTR microsatellite polymorphisms have been shown to modulate gene expression (Chen etal. 2007; Kumar and Bhatia 2016; Joshi-Saha and Reddy 2015). Like intronic microsatellites the mechanisms underlying their activity haven’t been investigated to the same degree as some promoter-associated loci, but some interesting distinct mechanistic details have emerged. Perhaps the most notable example to date is the complex (TG)3TA(TG)13–16TA(TG)3 microsatellite in the 3′ UTR of the FGF9 gene. One study showed that effects of polymorphism in this repeat on transcription depend on its orientation as well as its position, and are cell-type specific (Chen etal. 2007). The authors of this study noted the microsatellite’s capacity to form hairpin structures and tested its effects on mRNA stability, showing half-life differences of >50% between alleles differing by only one repeat copy. A later study showed binding of the same microsatellite in mRNA by the protein FUBP3, which was associated regulation at the level of translation, though the mechanism for this was not explored (Gau etal. 2011). In contrast, a (TC)8–21 polymorphism in the 5′ UTR of the Tdc gene in Catharanthus roseus was shown to have no effect on translation, but to modulate rate of transcription—an advance on the usually measured parameter transcript abundance, which doesn’t distinguish between effects on transcription and effects on mRNA half-life (Kumar and Bhatia 2016). It seems likely that intronic and exonic microsatellites could also affect mRNA half-life, but to date demonstrated examples of this are lacking. Microsatellites in UTRs can also influence the location of transcription initiation and termination sites. The human HO-1 gene utilizes several alternative transcription start sites (TSS) downstream of the canonical start codon of its first exon, and the relative abundance of these isoforms correlates with the length of a well-studied poly-AC promoter microsatellite (Kramer etal. 2013). In the ECE-1c gene’s promoter an alternative TSS has been observed within a [CA]6[CpG]14–24[CA]30–50 compound repeat, and this is mediated by binding of the PARP-1 protein to the microsatellite (Kraus and Lis 2003; Li etal. 2012). Observations that STRs are strikingly more common, and also more conserved, close to TSS’s suggest that this may be a common phenomenon (Sawaya etal. 2013). Some evidence also indicates a role for microsatellites in determining the 3′ ends of transcripts. A (GT)14–25 repeat in the 3′ UTR of the human ASS1 gene serves as the poly(A)-downstream GU-rich element to modulate mRNA 3′-end formation, with repeat copy number correlating with the relative abundance of two alternative termination sites (Tseng etal. 2013).

NonB-DNA Structure Formation

In some cases the functional roles of microsatellites have been linked to their potential to adopt nonB-DNA structures. These include several well-described conformations which are energetically less favorable than normal B-form DNA, but inducible by torsional stress (reviewed by Mirkin 2008; fig. 4). The earliest of these to be discovered was Z-DNA. Named for its characteristic zig-zag sugar-phosphate backbone, the left-handed Z-DNA helix is most readily taken up by sequences in which purine and pyrimidine nucleotides alternate, including poly-AC and poly-CG of moderate length (Wang etal. 1979; Wong etal. 2007; Liu etal. 2006). Given that AC is the most commonly repeated motif among mammalian microsatellites, other than mononucleotide arrays (Ellegren 2004), Z-DNA may be the conformational variant most relevant to microsatellite function in mammals. It binds several different proteins (Rich and Zhang 2003; Wang and Vasquez 2007), and can affect gene expression when present in promoter regions (Rothenburg etal. 2001; Wong etal. 2007; Zhang etal. 2006; Liu etal. 2006; Oh etal. 2002).
. 4.

—Unusual structural conformations adopted by microsatellites. A/T-rich sequences can form SIDD, Z-DNA can be formed by poly-AC and poly-CG repeats, sequences with four closely spaced stretches of multiple guanine residues can fold into G-quadruplexes, and H-DNA can be adopted by poly-purine sequences with mirror symmetry including poly-AG.

—Unusual structural conformations adopted by microsatellites. A/T-rich sequences can form SIDD, Z-DNA can be formed by poly-AC and poly-CG repeats, sequences with four closely spaced stretches of multiple guanine residues can fold into G-quadruplexes, and H-DNA can be adopted by poly-purine sequences with mirror symmetry including poly-AG. Although they also consist of alternating purines and pyrimidines, Poly-AT microsatellites have much lower Z-DNA forming potential than poly-AC or poly-CG (Ho etal. 1986). However, their low base-pairing stability facilitates formation of cruciform or stress-induced duplex destabilized DNA (SIDD) structures, depending on conditions (Aranda etal. 1997). While cruciforms may be absent from chromosomal DNA invivo, SIDD is quite prevalent (Kouzine etal. 2017), and evidence from Saccharomyces cerevisiae suggests that it may function to relieve the positive supercoiling generated ahead of processing RNA polymerase complexes, potentially also helping to terminate transcription (Benham 1996; Zaret and Sherman 1982). Duplex melting is also required for formation of G-quadruplexes and intramolecular triplexes/H-DNA. G-quadruplexes are fold-back structures wherein a guanine-rich strand self-associates into square-planar guanine tetrads held together by Hoogsteen hydrogen bonds (Sen and Gilbert 1988). They can be adopted by microsatellites with four or more guanine runs, such as (GGGGTT)4, (GGGT)4, and (GGA)4 (Palumbo etal. 2008; Sundquist and Klug 1989; Ogloblina etal. 2015). In H-DNA, one strand joins adjacent duplex DNA in a triple helix via Hoogsteen bonding (Kohwi and Kohwi-Shigematsu 1988; Dayn etal. 1992). It is most favorable for poly-purine/poly-pyrimidine sequences with mirror symmetry and can be formed by microsatellites including poly-GA and poly-GAA (Potaman etal. 2004; Lu etal. 2003). H-DNA and G-quadruplexes have been linked to regulation of transcription, and also RNA biology (reviewed in Weldon etal. 2016; Murat and Balasubramanian 2014; Jain etal. 2008). However, neither of these two structures is prominent among known STR eQTLs (Gymrek etal. 2016; Quilez etal. 2016). Notably, the requirement of nonB-DNA structures for supercoiling energy may often be provided invivo by processing polymerases, and it has been suggested that these structures may function to regulate supercoiling (reviewed in Kouzine and Levens 2007; van Holde and Zlatanova 1994). Consistent with this, a recent investigation using permanganate footprinting revealed that 9% of computationally predicted nonB-DNA structures genome-wide were associated with single-stranded DNA in activated mouse B cells, but only in the presence of transcription (Kouzine etal. 2017). H-DNA, G4, Z-DNA, and SIDD were all evident, at genome frequencies of 15,000–23,000 each. The prevalence of G-quadruplex conformations genome-wide has also been demonstrated recently using specific antibodies (Hansel-Hertsch etal. 2016).

Modulation of Chromatin Structure

The unusual structural properties of microsatellites can have functionally relevant influence on chromatin structure. This has been known for many years in trinucleotide repeat disease (Volle and Delaney 2012; Wang 2007; Evans-Galea etal. 2013), though recently it is less commonly addressed by studies of functional microsatellites, which have often been limited to plasmid-based validation work. Some of the most informative experiments connecting normal microsatellites to chromatin structure were done on poly-A runs more than twenty years ago. These extremely common repeats have the potential to resist nucleosome formation due to structural stiffness (Nelson etal. 1987), but are often excluded from definitions of microsatellites, and ignored by functional studies, despite evidence that they can stimulate transcription when present in yeast promoter regions (Iyer and Struhl 1995; Struhl 1985; Schlapp and Rodel 1990). A study of an A15–17 array in the promoter of the HIS3 gene in S. cerevisiae showed that its effect on transcription was not caused by direct protein binding, but was instead due binding of the transcription factor Gcn4 at a distance of 10 bp (Iyer and Struhl 1995). Although poly-G has different structural properties (Panyutin etal. 1989), substituting poly-G tracts of similar length produced similar results (Iyer and Struhl 1995). This study revealed that poly-A tracts perturbed chromatin structure over ∼200 bp, making the Gcn4 binding site more accessible, an effect also associated with increased cytosine methylation at the promoter. Overall similarity of micrococcal nuclease cleavage patterns in the presence or absence of poly-A suggested that altered nucleosome phasing or nucleosome-free DNA was not involved, and it was proposed that nucleosomes covering poly-A may be destabilized and less effective in competition with transcription factors (Iyer and Struhl 1995). However, it is notable that mononucleotide repeats don’t always have the effect of reducing nucleosome stability. A 30 bp poly-A tract in the human HGF promoter region is normally associated with a tightly packaged promoter, inaccessible to DNase I digestion, but truncations of the repeat cause loosening of the chromatin structure, modified protein binding and stimulation of the promoter in breast cancer tissue (Ma etal. 2009). Several studies have shown that packaging into nucleosomes is disfavored for the Z-DNA-forming sequences poly-CG and poly-AC (Garner and Felsenfeld 1987; Wong etal. 2007; Liu etal. 2001). In the case of the poly-AC microsatellite in the promoter of the CSF1 gene, multiple lines of evidence indicate that Z-DNA formation is stimulated by the BRG1 protein and participates in nucleosome disruption, resulting in transcriptional activation (Liu etal. 2006). The authors of this study proposed that the Z-form may function to relieve negative supercoils induced by nucleosome release, and also to resist replacement of the nucleosome, allowing room to assemble transcriptional machinery. Another example of BRG1 operating in conjunction with a functional poly-AC microsatellite to activate transcription occurs in the HO-1 gene’s promoter, where substituting an alternative Z-DNA-forming sequence produces the same results (Jianyong Zhang etal. 2006). Poly-CG, which has higher Z-DNA forming potential than poly-AC (Ho etal. 1986), can also disrupt nucleosomes in promoter regions and cause position-dependent stimulation of transcription (Wong etal. 2007). The effects of poly-CG on nucleosomes can also be influenced by CpG methylation (Davey etal. 2004), and the potential evolutionary importance of mutations in these repeats has been shown by a study associating them with divergence in methylation and gene expression between humans and chimpanzees (Fukuda etal. 2013). The link between microsatellites and methylation was further explored by a study of human promoter-associated STRs, which showed that 463 out of 4849 repeat polymorphisms tested correlated significantly with CpG methylation levels within 1 kb, though only 8% of these showed significant effects in the same direction in two populations (Quilez etal. 2016). Interestingly, this study found that 96% of promoter microsatellites significantly associated with gene expression also influenced local cytosine methylation status. Many of these microsatellites overlapped with DNase I hypersensitive sites, indicating open chromatin. Unsurprisingly, in view of the degree to which they disturb normal B-DNA structure, the conformational variants G-quadruplex, H-DNA, and SIDD are also associated with reduced nucleosome occupancy (Ruan and Wang 2008; Hansel-Hertsch etal. 2016; Kouzine etal. 2017). The significance of these structures to nucleosomes genome-wide was recently demonstrated in activated mouse B cells (Kouzine etal. 2017). This study showed mild to severe nucleosome depletion at sequences shown to form Z-DNA, G4, H-DNA, and SIDD in the presence of transcription. It is notable, however, that potential to form a nonB-DNA structure is not always relevant, even in promoter regions. For example, a poly-AG microsatellite in the Hsp26 gene’s promoter in Drosophila can form H-DNA, but this property cannot substitute for binding of the GAGA transcription factor in creating an open chromatin configuration (Lu etal. 2003). Some evidence suggests that effects of microsatellites on chromatin are commonly mediated by regulatory chemical modifications of histone proteins. A genome-wide study of STR eQTL showed enrichment in peaks of the histone marks H3K4me1, H3K4me2, H3K4me3, H3K27ac, H3K36me3, and H3K9ac, which are associated with regulatory and transcribed regions, and depleted near the H3K27me3 mark, which is associated with repression of gene expression (Gymrek etal. 2016). This study also showed significant correlations between variation in regulatory chromatin modifications and variation in STR eQTL genotypes. Suggesting one possible mechanism underlying this epigenetic complexity, blockage of DNA replication, which is known in trinucleotide repeat disease, can result in epigenetic disruption (Svikovic and Sale 2016; Khurana and Oberdoerffer 2015; Gadgil etal. 2016). Long CTG microsatellites have been shown to cause replication stalling by folding into hairpin structures (Liu etal. 2013), and microsatellites of moderate length known to form G-quadruplexes or H-DNA, for example TC20 and TTCC9, can block DNA polymerases invitro (Hile and Eckert 2004). Interestingly, the mononucleotide repeat T11 is also able to stall invitro DNA polymerization, perhaps due to DNA bending, which is known to occur adjacent to sequences consisting entirely of adenine or thymine bases without any TA dinucleotides (Hile and Eckert 2008; Hud and Plavec 2003). Z-DNA has been shown to inhibit RNA polymerase (Ditlevson etal. 2008). Replication stalling at abundant secondary structures can cause epigenetic instability invivo in cells lacking certain factors, though the degree of error-proneness inherent in the systems employed to resolve impediments to replication in normal cells is unclear (Guo etal. 2015; Wu and Spies 2016; Sarkies etal. 2010; Schiavone etal. 2016). However, this and other potential explanations of STR eQTL involving nonB-DNA structures lack currently available evidence that minor changes to microsatellite length are commonly expected to affect these structures substantially. This poses a problem because nearly all microsatellite mutations involve only one or two repeat copies (Sun etal. 2012). Departure from B-form DNA is not always a necessary component of microsatellites’ effects on chromatin. A human genome-wide survey of GAA microsatellites, known for disrupting nucleosomes when present at extreme lengths corresponding those seen in FRDA (Ruan and Wang 2008), showed that abundant (GAA)6–8 repeats were also associated with substantial nearby nucleosome depletion (Zhao etal. 2015). Milder effects were seen at distances of up to 400 bp from the microsatellites. Intriguingly, poly-A tracts were very frequently found near the 5′ ends of these repeats, where they were associated with further reductions in nucleosome occupancy. Suggesting that the low flexibility of the AA dinucleotide base-step (Fujii etal. 2007) rather than H-DNA formation was responsible for these effects, CAA, TAA, and GAA microsatellites all showed similar patterns of nucleosome depletion. Another study found TGGA repeats to be the most significant feature in random sequence with outstandingly low nucleosome formation invitro, and showed that this was not explained by G-quadruplex formation (Cao etal. 1998). Evidence also suggests that some microsatellites may have a role in modulating higher order chromatin structure. Repeats of the motif GATA are enriched in the sex chromosomes of some organisms, and their frequency distributions in the human and mouse genomes show a striking peak in frequency at 10–12 copies—a pattern not seen for other tetranucleotide repeats of similar composition (Subramanian etal. 2003a). The distribution of these repeats on sex chromosomes suggests a link to chromatin domain boundaries. They are >10-fold enriched throughout the 10-Mb segment of human Xp22 that escapes inactivation (McNeil etal. 2006), and their flanking sequences contain patterns characteristic of nuclear matrix attachment (Subramanian etal. 2003a). Moreover, work in several organisms has shown that they are bound by Bkm-binding protein, which is predominantly expressed in the germ cells of the heterogametic sex, where sex-determining chromosomes are decondensed and transcriptionally active (Singh etal. 1994).

Roles in Regulatory RNA

Given that 50% or more of mammalian DNA is transcribed, and that many noncoding transcripts are functional (reviewed in Mattick and Makunin 2006; Holoch and Moazed 2015), it is unsurprising that microsatellites have acquired functions in ncRNA as well as in mRNA (fig. 5). An example is the RNA component of the nuclear matrix. One study found that 70% of RNA clones isolated from Drosophila nuclear matrix RNA contained AAGAG microsatellites, transcribed from both strands and likely deriving from pericentromeric regions in which AAGAG repeats are predominantly located (Pathak etal. 2013). Knockdown of poly-AAGAG-containing transcripts by RNA interference resulted in late larval/early pupal lethality. Long ncRNAs primarily consisting of GAA repeats have also been shown to associate with the nuclear matrix (Zheng etal. 2010).
. 5.

—Effects of microsatellites at the level of RNA. Long noncoding RNAs (lncRNAs) predominantly consisting of microsatellites have been observed to function in the nuclear matrix and to aggregate into nuclear foci with indications of functional significance. They have also been shown to associate with DNA microsatellites in UTRs. Microsatellite-dominated microRNAs have been observed, but their function is not yet clear. Intronic microsatellites can modulate splicing efficiency, including exon skipping and splice site selection, and UTR repeats can influence the locations of the sites of transcription initiation and termination. Transcribed microsatellites can also affect mRNA half-life, which may be due to formation of secondary structures such as hairpins.

—Effects of microsatellites at the level of RNA. Long noncoding RNAs (lncRNAs) predominantly consisting of microsatellites have been observed to function in the nuclear matrix and to aggregate into nuclear foci with indications of functional significance. They have also been shown to associate with DNA microsatellites in UTRs. Microsatellite-dominated microRNAs have been observed, but their function is not yet clear. Intronic microsatellites can modulate splicing efficiency, including exon skipping and splice site selection, and UTR repeats can influence the locations of the sites of transcription initiation and termination. Transcribed microsatellites can also affect mRNA half-life, which may be due to formation of secondary structures such as hairpins. At present, evidence for other roles in functional ncRNA is sketchy, but some suggestive observations have been made. One study showed that long ncRNAs containing poly-GAA constitute a distinct class of nuclear-retained RNA which forms foci (Zheng etal. 2010). Microsatellite-based RNA foci can be pathogenic in trinucelotide expansion disease (Echeverria and Cooper 2012; Galka-Marciniak etal. 2012), but they may also have regulatory functions. In mouse cell lines GAA-rich lncRNA foci are found in functionally important areas such as the cytokinetic midbody in late-telophase cells, and are redistributed in response to changes in proliferation status. They also associate with genomic GAA repeats, which are enriched near the 5′ and 3′ ends of genes (Zheng etal. 2010). Another species of ncRNA in which microsatellites may exert some effect is microRNA. The ability of these short 20–24 bp RNAs to regulate transcription has been studied extensively in plants (reviewed in Sunkar and Zhu 2007), and in Boechera it has been observed that microsatellites are prominent features of many of them. Out of 994 microRNAs identified by one study, 673 (67%) predominantly consisted of 2–7 repeats of the trinucleotide motifs GAA, GCA, GGA, GGU, UGA and their compliments, some of which were conserved in Arabidopsis (Amiteye etal. 2013).

Regulation of Meiotic Recombination Hot Spots

In the human and mouse genomes, a proportion of the hot spots in which meiotic recombination events are most frequent are governed by sequence-specific DNA binding proteins such as PRDM9, in concert with epigenetic processes (reviewed in Paigen and Petkov 2010). However, in organisms which lack a functional PRDM9 system, the basis in sequence of hot spot determination is less clear. Hot spots in these species are often found in gene promoters, and in S. cereivisiae it has been demonstrated that they can be affected by the presence of microsatellites including telomeric sequence (G1–3T1) (White etal. 1993), and poly-AC (Gendrel etal. 2000). The latter study showed inhibition of strand exchange and stimulation of double crossover at the microsatellite. Like transcription, recombination hot spots require an opening of the chromatin structure, and a (CCGNN)12–48 repeat, shown to resist nucleosomes invitro, can also modulate hot spot activity in S. cerevisiae (Kirkpatrick etal. 1999). Hot spots often contain GC-rich and repetitive sequence naturally, an observation which has led to the additional suggestion of replication pausing mediated by epigenetic marks as a determining mechanism, in view of experiments in yeast demonstrating coupling of replication and meiotic double-strand break formation (Borde etal. 2000; Petes 2001; Bagshaw etal. 2008). More recently, microsatellites have been implicated in plant recombination hot spots (reviewed in Choi and Henderson 2015). One study showed that A-rich and (CTT)2–7 repeat sequences are the most common hot spot-associated motifs in Arabidopsis (Choi etal. 2013). A-rich elements are predominantly found just upstream of hot spot promoter TSSs, overlapping with regions of nucleosome depletion, and CTT repeats are located just downstream of these TSSs, coinciding with crossover peaks. Underlying mechanisms were not explored by this study. As mentioned above, these two repeat types are often found in close proximity in the human genome, and both have been linked with nucleosome depletion (Iyer and Struhl 1995; Zhao etal. 2015). However, hot spot-associated CTT repeats in Arabidopsis are associated with peaks in H2A.Z nucleosome occupancy, and with the H3K4me3 histone modification, which has been linked to recombination and transcription (Choi etal. 2013; Shilo etal. 2015). Arabidopsis hot spots are also enriched for repeats of the motif CCN. These show similar patterns of distribution to CTT repeats with respect to nucleosomes (Shilo etal. 2015).

Prevalence of Functional Microsatellites

Several additional lines of evidence suggest that functional microsatellites are more prevalent than has traditionally been appreciated. Most prominent is recent work associating STR genotypes with transcript abundance genome-wide. One study in lymphoblastoid cell lines identified 2060 significant eQTL STRs, contributing 10–15% to the heritability of human gene expression levels attributable to common variants in cis (Gymrek etal. 2016). As this study was limited to linear correlations between repeat copy number and expression level, it presumably underestimated nonlinear effects, which are likely to be common in view of evidence detailed above that microsatellite alleles of intermediate length often show the most positive associations with transcription. Surprisingly, 69% of the eQTL STRs were in introns, and only 17.7% were in upstream promoters, with 20.8% located >5 kb from any known gene (fig. 1). A similar study of 4,849 promoter-associated microsatellites found 183 significantly associated with nearby gene expression, but only 5% of these showed significant effects in the same direction in two populations (Quilez etal. 2016). The motif-group most frequently seen was AC, though the most overrepresented motifs in both studies included A-rich tri and tetranucleotide repeats such as AAC and AAAC, which are not known to form nonB-DNA structures. Promoter microsatellites may be more influential in yeast, where correlations between interstrain divergence in gene expression levels and microsatellite variation showed significant effects at around 25% of promoters (Vinces etal. 2009). Several other comparative studies have indicated widespread microsatellite function. It has been known for many years that some microsatellites, including noncoding repeats, have remained conserved between species across hundreds of millions of years (FitzSimmons etal. 1995; Zhang etal. 2006), and recent studies taking advantage of the wealth of genome sequences made available because the advent of next generation sequencing technology have revealed evolutionary conservation at large numbers of loci, although conservation decays exponentially with phylogenetic distance at many (Buschiazzo and Gemmell 2010; Sawaya etal. 2012). Conservation has been found to be highest for microsatellites in UTR and coding regions, and emergence of new microsatellites happens most often in these areas (Sawaya etal. 2012). Some repeat motifs are more conserved than others, notably AC. The most conserved locations tend to be near TSS, even when 5′ UTR loci are not considered (Sawaya etal. 2012). Supporting the functional significance of conserved microsatellites, a recent comparison of primate genomes revealed that genes with orthologous microsatellites in upstream or transcribed regions consistently show elevated interspecies divergence in gene expression levels across various tissue types (Bilgin Sonay, Carvalho, etal. 2015). It is notable in this context that microsatellite function is not necessarily limited to highly conserved loci. Evidence suggesting the potential importance of primate-specific repeats includes exceptional expansion or contraction in the primate lineage of core promoter microsatellites in several genes related to neuronal and craniofacial development (Namdar-Aligoodarzi etal. 2015; Ohadi etal. 2015). The potential significance of variation in microsatellites to brain development is also indicated by their enrichment and conservation in genes connected with neurological and other developmental systems, and by the large number of microsatellite-phenotype associations reported for such genes (Bolton etal. 2013; Fondon etal. 2008; Nithianantharajah and Hannan 2007; Sawaya etal. 2012). Additional support for prevalent microsatellite function has been gathered through interspecies comparisons of their distribution relative to other genomic elements. In maize, for example, microsatellite densities were found to be highest in 5′ UTR, followed by 3′ UTR, promoter, intronic, intergenic, and protein coding regions (Qu and Liu 2013). A study of 29 land plant species also found the highest densities in 5′ UTRs, followed by promoters, while in two algal species, densities were highest in introns and coding regions respectively, with intronic microsatellites concentrated near intron–exon boundaries (Zhao etal. 2014).

Future Perspectives

Genome-wide studies are likely to continue identifying functional microsatellites. Until recently, significant obstacles to the incorporation of large numbers of repeat loci into GWAS in humans included practical difficulties with large scale genotyping, problems with mapping short sequence reads, and statistical hypothesis testing issues generated by multiple alleles per locus—but these are now being alleviated through theoretical and technological developments, including less expensive, longer read sequencing (Press etal. 2014; Li etal. 2017; Shin etal. 2017; Gymrek 2017). Large numbers of microsatellites have already been directly incorporated into GWAS in Drosophila, for example a study involving three traits, 2.5 million SNPs and 78,000 microsatellites found that the representation of microsatellites among significantly phenotype-associated loci at the level of P < 10−6 was 5.6%, even though they only comprised 3% of markers used (Mackay etal. 2012). Relatively cheaper approaches to identifying functional microsatellites include deep sequencing around existing GWAS hits, and investigation of highly conserved loci associated with genes of interest, both of which have found some success (Grunewald etal. 2015; Bagshaw etal. 2017). Somatic mutation is another aspect of microsatellite biology made visible by high throughput sequencing. Questions of growing tractability include the degree to which this causes ageing and age-related disease (Bavarva etal. 2014; Kurz etal. 2015), and the possibility that it functions in normal brain development (Nithianantharajah and Hannan 2007). One particularly interesting hypothesis is that microsatellite variation provides a mechanism of rapid adaptation for individual developing neurons, mirroring its potential role at the level of whole organisms (Nithianantharajah and Hannan 2007). Also under investigation are the functional effects of global microsatellite instability associated with some colorectal, gastric and other cancers (Kim and Park 2014; Bilgin Sonay, Koletou, etal. 2015; Hogan etal. 2015). In conjunction with available genome-wide functional data, high throughput technologies will find additional applications in elucidating microsatellites’ functional mechanisms, for example in the identification of molecular networks affected by expression-altering variants. In view of the population specificity of the effects of promoter-associated STR eQTL, exploration of the genetic backgrounds modifying their effects seems particularly relevant (Quilez etal. 2016). Intronic loci are likely to be another immediate focus, given their unexpected enrichment among STR eQTL (Gymrek etal. 2016), and UTR microsatellites are also of increased current interest in view of the above-mentioned study of gene expression divergence between primates, which revealed that 3′ UTR loci have more influence than those in promoter, exonic, or intronic regions (Bilgin Sonay, Carvalho, etal. 2015). As these examples illustrate, it seems likely that microsatellites, once thought of as generally neutral, retain considerable capacity to surprise genomic investigators with their diverse, pervasive functional significance.

Acknowledgments

Funding for this work was provided by the University of Otago, New Zealand. I thank Kateryna Makova and anonymous reviewers for helpful comments on the manuscript.
  203 in total

1.  A functional assay in Escherichia coli to detect non-assisted interaction between galactose repressor dimers.

Authors:  N Perez; M Rehault; M Amouyal
Journal:  Nucleic Acids Res       Date:  2000-09-15       Impact factor: 16.971

2.  The (CCTTT) n pentanucleotide repeat polymorphism in the inducible nitric oxide synthase gene promoter and the risk of psoriasis in Taiwanese.

Authors:  Ya-Ching Chang; Wei-Ming Wu; Yu-Huei Huang; Wen-Hung Chung; Hsin-Yi Tsai; Lung-An Hsu
Journal:  Arch Dermatol Res       Date:  2015-02-08       Impact factor: 3.017

Review 3.  Meiotic recombination hotspots - a comparative view.

Authors:  Kyuha Choi; Ian R Henderson
Journal:  Plant J       Date:  2015-05-20       Impact factor: 6.417

4.  A distinct triplex DNA unwinding activity of ChlR1 helicase.

Authors:  Manhong Guo; Kristian Hundseth; Hao Ding; Venkatasubramanian Vidhyasagar; Akira Inoue; Chi-Hung Nguyen; Rula Zain; Jeremy S Lee; Yuliang Wu
Journal:  J Biol Chem       Date:  2015-01-05       Impact factor: 5.157

5.  Somatic mutation and functional polymorphism of a novel regulatory element in the HGF gene promoter causes its aberrant expression in human breast cancer.

Authors:  Jihong Ma; Marie C DeFrances; Chunbin Zou; Carla Johnson; Robert Ferrell; Reza Zarnegar
Journal:  J Clin Invest       Date:  2009-02-02       Impact factor: 14.808

6.  Sex-specific mediation effect of the right fusiform face area volume on the association between variants in repeat length of AVPR1A RS3 and altruistic behavior in healthy adults.

Authors:  Junping Wang; Wen Qin; Feng Liu; Bing Liu; Yuan Zhou; Tianzi Jiang; Chunshui Yu
Journal:  Hum Brain Mapp       Date:  2016-03-29       Impact factor: 5.038

7.  DNA phasing by TA dinucleotide microsatellite length determines in vitro and in vivo expression of the gp91phox subunit of NADPH oxidase and mediates protection against severe malaria.

Authors:  Anne-Catrin Uhlemann; Nicole A Szlezák; Reinhard Vonthein; Jürgen Tomiuk; Stefanie A Emmer; Bertrand Lell; Peter G Kremsner; Jürgen F J Kun
Journal:  J Infect Dis       Date:  2004-05-25       Impact factor: 5.226

8.  A survey of tandem repeat instabilities and associated gene expression changes in 35 colorectal cancers.

Authors:  Tugce Bilgin Sonay; Malamati Koletou; Andreas Wagner
Journal:  BMC Genomics       Date:  2015-09-16       Impact factor: 3.969

9.  CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis.

Authors:  GiWon Shin; Susan M Grimes; HoJoon Lee; Billy T Lau; Li C Xia; Hanlee P Ji
Journal:  Nat Commun       Date:  2017-02-07       Impact factor: 14.919

10.  STaRRRT: a table of short tandem repeats in regulatory regions of the human genome.

Authors:  Katherine A Bolton; Jason P Ross; Desma M Grice; Nikola A Bowden; Elizabeth G Holliday; Kelly A Avery-Kiejda; Rodney J Scott
Journal:  BMC Genomics       Date:  2013-11-15       Impact factor: 3.969

View more
  39 in total

Review 1.  New pathologic mechanisms in nucleotide repeat expansion disorders.

Authors:  C M Rodriguez; P K Todd
Journal:  Neurobiol Dis       Date:  2019-06-21       Impact factor: 5.996

2.  Genome-wide development and application of miRNA-SSR markers in Melilotus genus.

Authors:  Gisele Kanzana; Jean Musaza; Fan Wu; Zifeng Ouyang; Yimeng Wang; Tiantian Ma; Bakhit Ishag Rahama Akoy; Jiyu Zhang
Journal:  Physiol Mol Biol Plants       Date:  2021-10-09

3.  Morphometric study of encephalic lesions in aborted bovine fetuses naturally infected by two subpopulations of Neospora caninum.

Authors:  Matias A Dorsch; Dadín P Moore; Javier Regidor-Cerrillo; María V Scioli; Eleonora L Morrell; Germán J Cantón; Luis M Ortega-Mora; Yanina P Hecker
Journal:  Parasitol Res       Date:  2021-07-22       Impact factor: 2.289

4.  Associations of BCL2 CA-Repeat Polymorphism and Breast Cancer Susceptibility in Isfahan Province of Iran.

Authors:  Fatemeh Ghorbani; Farzane Amirmahani; Zahra Fatehi; Seyed-Morteza Javadirad; Manoochehr Tavassoli
Journal:  Biochem Genet       Date:  2020-11-05       Impact factor: 1.890

5.  Skewing of the genetic architecture at the ZMYM3 human-specific 5' UTR short tandem repeat in schizophrenia.

Authors:  F Alizadeh; A Bozorgmehr; J Tavakkoly-Bazzaz; M Ohadi
Journal:  Mol Genet Genomics       Date:  2018-01-13       Impact factor: 3.291

Review 6.  miRNA dysregulation is an emerging modulator of genomic instability.

Authors:  Ana P Ferragut Cardoso; Mayukh Banerjee; Alexandra N Nail; Angeliki Lykoudi; J Christopher States
Journal:  Semin Cancer Biol       Date:  2021-05-09       Impact factor: 15.707

7.  Re-analysis of genetic polymorphism data supports a relationship between schizophrenia and microsatellite variability in PLA2G4A.

Authors:  Craig J Hudson; Justin X G Zhu; Alexandra M Durocher
Journal:  Psychiatr Genet       Date:  2021-06-01       Impact factor: 2.458

Review 8.  STRs: Ancient Architectures of the Genome beyond the Sequence.

Authors:  Jalal Gharesouran; Hassan Hosseinzadeh; Soudeh Ghafouri-Fard; Mohammad Taheri; Maryam Rezazadeh
Journal:  J Mol Neurosci       Date:  2021-05-30       Impact factor: 3.444

9.  Genome-scale portrait and evolutionary significance of human-specific core promoter tri- and tetranucleotide short tandem repeats.

Authors:  N Nazaripanah; F Adelirad; A Delbari; R Sahaf; T Abbasi-Asl; M Ohadi
Journal:  Hum Genomics       Date:  2018-04-05       Impact factor: 4.639

10.  Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.

Authors:  Mathys Grapotte; Manu Saraswat; Chloé Bessière; Christophe Menichelli; Jordan A Ramilowski; Jessica Severin; Yoshihide Hayashizaki; Masayoshi Itoh; Michihira Tagami; Mitsuyoshi Murata; Miki Kojima-Ishiyama; Shohei Noma; Shuhei Noguchi; Takeya Kasukawa; Akira Hasegawa; Harukazu Suzuki; Hiromi Nishiyori-Sueki; Martin C Frith; Clément Chatelain; Piero Carninci; Michiel J L de Hoon; Wyeth W Wasserman; Laurent Bréhélin; Charles-Henri Lecellier
Journal:  Nat Commun       Date:  2021-06-02       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.