| Literature DB >> 26428628 |
Concita Cantarella1, Nunzio D'Agostino2.
Abstract
BACKGROUND: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26428628 PMCID: PMC4591729 DOI: 10.1186/s13104-015-1474-4
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Workflow design of the polymorphic SSR retrieval tool that includes two modules. PSR_read_retrieval aims at the identification of all the reads that cover the full-length of perfect microsatellites. N indicates the number of iteration that must correspond to the number of genotypes under investigation. PSR_poly_finder detects length polymorphism in microsatellites
The average read length obtained from Illumina instruments is typically less than 150 nucleotides
| Repeat unit (nts) | Max number of repeat unit | SSR length (nts) |
|---|---|---|
| Read length = 100 nucleotides (nts) | ||
| 2 | 48 | 96 |
| 3 | 31 | 94 |
| 4 | 23 | 92 |
| 5 | 18 | 90 |
| 6 | 14 | 88 |
This length allows to detect di-nucleotide repetition up to 96 nucleotides and 14 repetitions of esa-nucleotide pattern
Fig. 2The microsatellite region (36 nucleotides in length) in the reference sequence is shown in bold. PSR selects and counts only reads aligning the entire SSR region, in addition to one or more nucleotides to the left and right side different from bases in the repetitive units. Reads in grey are discarded since they partially cover the microsatellite. Reads in red are filtered out due to the uncertainty at the sequence borders. For each sample the program returns the number of repetitive units presenting coverage greater than the selected threshold (−p) in a tab delimited file
Evaluation through capillary gel separation of SSR length polymorphism from ten randomly selected transcripts
| Seq ID | SSR | Start | Stop | SSR length | SSR type | Capillary electrophoresis | PSR | ||
|---|---|---|---|---|---|---|---|---|---|
| Genotype 1 | Genotype 2 | Genotype 1 | Genotype 2 | ||||||
| TR11073 | (GAT)6 | 389 | 406 | 18 | p3 | 222 | 222 + 219 | 6 | 6; 5 |
| TR11727 | (TCA)5 | 1026 | 1040 | 15 | p3 | 182 | 179 | 6 | 5 |
| TR12365 | (GAT)6 | 269 | 286 | 18 | p3 | 243 + 246 | 243 + 246 | 8; 6 | 8; 6 |
| TR19012 | (CAC)7 | 208 | 228 | 21 | p3 | 239 | 233 + 239 | 7 | 7; 5 |
| TR6251 | (CAA)9 | 204 | 230 | 27 | p3 | 294 | 294 + 297 | 9 | 10; 9 |
| TR1824 | (CAT)5 | 90 | 104 | 15 | p3 | 277 | 274 | 6 | 5 |
| TR12469 | (TC)7 | 1300 | 1313 | 14 | p2 | 234 | 232 | 7 | 6 |
| TR7469 | (ATC)7 | 20 | 40 | 21 | p3 | 237 | 237 + 240 | 7 | 7; 8 |
| TR2455 | (TTG)6 | 57 | 74 | 18 | p3 | 239 | 239 + 242 | 6 | 6 |
| TR142 | (TCT)9 | 1874 | 1900 | 27 | p3 | 159 | 156 + 159 | 9 | 9 |
Columns 7 and 8 report amplicon size detected into two genotypes. Columns 9 and 10 list the number of repeated units as identified by PSR
Evaluation through capillary electrophoresis of eight monomeric SSR loci across nine cpDNA genotypes
| Sample | SSR1 (T)n | SSR2 (T)n | SSR3 (T)n | SSR4 (A)n | SSR5 (T)n | SSR6 (A)n | SSR7 (A)n | SSR8 (A)n |
|---|---|---|---|---|---|---|---|---|
| G1 |
|
|
|
|
|
|
|
|
| G2 |
| 13 | 16 | 12 |
|
| 13 |
|
| G3 | 12 | 13 | 16 | 13 |
|
| 16 | 15 |
| G4 |
| 13 | 16 |
| 17 |
| 16 |
|
| G5 | 12 | 13 | 16 | 13 | 17 |
| 16 | 15 |
| G6 | 12 | 13 | 16 | 13 | 17 |
| 16 | 15 |
| G7 | 12 | 13 | 16 | 13 | 17 |
| 16 | 15 |
| G8 | 12 | 13 | 16 | 13 | 17 |
| 16 | 15 |
| G9 | 12 | 13 | 16 | 13 | 17 |
| 15 | 15 |
Numbers represent the length of SSR stretches. Italics cells indicate microsatellites that have been confirmed also by Sanger sequencing. Bold italics cells represent SSRs with different lengths compared to those determined by PSR