| Literature DB >> 19204802 |
Angelika Merkel1, Neil J Gemmell.
Abstract
Microsatellites are currently one of the most commonly used genetic markers. The application of bioinformatic tools has become common practice in the study of these short tandem repeats (STR). However, in silico studies can suffer from study bias. Using a meta-analysis on microsatellite distribution in yeast we show that estimates of numbers of repeats reported by different studies can differ in the order of several magnitudes, even within a single genome. These differences arise because varying definitions of microsatellites, spanning repeat size, array length and array composition, are used in different search paradigms, with minimum array length being the main influencing factor. Structural differences in the implemented search algorithm additionally contribute to variation in the number of repeats detected. We suggest that for future studies a consistent approach to STR searches is adopted in order to improve the power of intra- and interspecific comparisons.Entities:
Keywords: array length; definition; genome; microsatellites; short tandem repeats; study bias
Year: 2008 PMID: 19204802 PMCID: PMC2614199 DOI: 10.4137/ebo.s420
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Studies utilized in the meta-analysis. All studies report comparisons of microsatellite distribution pattern in yeast. Table shows (from left to right) study, algorithm or software employed, the type of repeat that was investigated (with respect to perfection/imperfection) and parameter that were implemented in the bioinformatics search, such as repeat size (mono-octanucleotide) and array length (minimum/maximum threshold).
| Study | Algorithm | Type of repeat | Repeat parameters |
|---|---|---|---|
| PERL script–regular expression | Perfect repeats | All mononucleotides: 1–42bp
| |
| C-script | Perfect repeats | Repeat size: 1, 2, 3, 4, 5, 6, 7, 8bp
| |
| C-script, –base-by-base search using adjacent sliding windows for alignments | Imperfect repeats (mismatch every 10th nt) | Repeat size: 1, 2, 3, 4bp
| |
| C-script,–motif search for consecutive sequence stretches | Perfect repeats (incl. partial copies) | Repeat size: 1, 2, 3, 4bp
| |
| TRF software ( | Imperfect repeats (match: (+1) mismatch: (−2, −3, 4) indels: (−6, −9, −12)) | Pattern size: 2, 3, 4bp
| |
| PYTHON script | Perfect repeats | Pattern size: 1, 2, 3, 4, 5, 6bp
| |
| C++ script,–base-by-base search using adjacent sliding windows for alignment | Perfect repeats | Pattern size: 1, 2, 3, 4, 5, 6bp
|
Personal communication, algorithm is now implemented as MsatFinder software (http://www.bioinf.ceh.ac.uk/msatfinder/).
The URL address given for the server was not valid anymore at the time of our study, no further information could be found.
Figure 1Microsatellite distribution in S. cerevisiae. Histogram shows the number of repeat loci per size class reported by each study. For details on parameter settings see Supplementary Table 1). *no data available.
Variation in TRF results* between genome builts
| 1/01/1998 | 1/10/2003 | 30/11/2006 | |
| 12069303 | 12070521 | 12070899 | |
| 406 | 407 | 406 |
TRF default parameters: 2 7 7 80 10 50 6 (minimum length: 25nt)