| Literature DB >> 26463180 |
Lua Lopez1,2, Rodolfo Barreiro3, Markus Fischer4, Marcus A Koch5.
Abstract
BACKGROUND: Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest.Entities:
Mesh:
Year: 2015 PMID: 26463180 PMCID: PMC4603344 DOI: 10.1186/s12864-015-2031-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flowchart of bioinformatics analysis used for developing the EST-SSR. The results of the SSR mining in the control genera Oryza and Arabidopsis are indicated in green on the left side of the figure while in blue, on the right side, are shown the results of the SSR mining in the IUCN plant genera (note that Oryza was used as a control genus and also included in the IUCN analyses). The steps followed for the analysis are highlighted with bold letters in the center
Number and distribution of the EST-SSRs found for the EST sequences of Oryza and Arabidopsis
| Genomic | Intron | UTR | Exon | Total | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| Dinucleotides | 17 | 2 | 26 | 5 | 29 | 16 | 1 | 1 | 73 | 24 |
| Trinucleotides | 18 | 2 | 16 | 2 | 70 | 26 | 142 | 67 | 246 | 97 |
| Tetranucleotides | 3 | 0 | 3 | 0 | 9 | 3 | 2 | 0 | 17 | 3 |
| Pentanucleotides | 4 | 0 | 1 | 0 | 10 | 3 | 0 | 0 | 15 | 3 |
| Hexanucleotides | 6 | 1 | 3 | 0 | 13 | 1 | 24 | 10 | 46 | 12 |
| Total | 48 | 5 | 49 | 7 | 131 | 49 | 169 | 78 | 397 | 139 |
SSR search was only carried out in those EST sequences downloaded from the dbEST database (NCBI) that had a match in their respective reference genomes using BLASTn
Fig. 2Di- and trinucleotides distribution obtained using QDD1 software from Oryza and Arabidopsis EST sequences that had a positive hit in the Oryza sativa (japonica cultivar-group) and Arabidopsis thaliana reference genomes database with BLASTn (NCBI). Oryza is represented in black while Arabidopsis is displayed in grey. The different types of motif are detailed in axis X while the number of SSRs for each class are showed in axis Y
Number of ETS-SSRs found in the IUCN plant genera containing EST sequences in the dbEST
| Taxonomic groups | Ng | Ng SSR | NEST | Dinucleotides | Trinucleotides | Tetranucleotides | Pentanucleotides | Hexanucleotides | Total | Commonest motifs |
|---|---|---|---|---|---|---|---|---|---|---|
| Florideophyceae | 2 | 2 | 16645 | 2 | 10 | 2 | 1 | 10 | 25 | ACG/GGC |
| Charophyceae | 1 | 1 | 88280 | 16 | 77 | 39 | 38 | 30 | 200 | AG/TGA |
| Acrogymnospermae | 18 | 15 | 1191184 | 144 | 145 | 30 | 58 | 193 | 570 | AG/AT/CAG |
| Lycopodiophyta | 3 | 3 | 101292 | 20 | 122 | 15 | 7 | 26 | 190 | AG/CAG/TGA |
| Monilophyta | 5 | 3 | 35665 | 129 | 18 | 3 | 2 | 6 | 158 | AG/TGA |
| Magnoliidae | 5 | 5 | 68569 | 193 | 89 | 11 | 9 | 30 | 332 | AG/AT/CAG |
| Monocotyledoneae | 58 | 37 | 3197142 | 598 | 1395 | 296 | 323 | 496 | 3108 | AG/AT/AAG/CGG |
| Eudicotyledoneae | 165 | 127 | 9742277 | 4160 | 4820 | 760 | 769 | 2010 | 12519 | AG/AT/AAG/TGA |
| Total | 257 | 193 | 14498726 | 5262 | 6676 | 1156 | 1207 | 2801 | 17102 |
Ng number of genera, Ng , N number of EST sequences downloaded, number of genera with SSRs
Fig. 3Distribution of EST-SSRs in 193 plant genera including threatened species by the IUCN. Bars in the X axis represent each taxonomic group investigated and the whole dataset. The axis Y represents the percentage of EST-SSRs found within each group. Colors in each bar indicate the type of repeat: dinucleotide repeats in light green, trinucleotide repeats in light blue, tetranucleotide repeats in yellow, pentanucleotide repeats in dark green and hexanucleotide repeats in dark blue
EST-SSRs tested empirically in two Eudicotyledoneae genera, Trifolium and Centaurea
| Locus | GenBank accession No. | Species | Primer sequences | Repeat motif | Expected size (bp) |
| Size range (bp) | |
|---|---|---|---|---|---|---|---|---|
| Forward | Reverse | |||||||
| T6- | gi86106666 |
| CAACCAGTGGTGTGAGTAGGAG | ACGTTGGTGGAGAGGTTGAG | (AG)11 | 110–128 | 2 | 114–116 |
| gi86105378 | ||||||||
| T7- | gi428283538 |
| ATCACGCTTCACTCCTCCAC | CAACTCCAAGCTTAAGATCGTGTA | (AG)13 | 110–122 | no PCR product | |
| T1- | gi428292074 |
| AGATTCCCACCAATCTCCCT | CAATACGCGGGTCTTGATCT | (AG)11 | 210–228 | --- | 257–261 |
| T2- | gi86106666 |
| TTCCGGTTAGGTTAGGGTTT | TTTTCACATCTTCCGAAGCC | (AAT)7 | 110–113 | no PCR product | |
| gi86105378 | ||||||||
| T3- | gi428285635 |
| CACCACATATGCAACCACAA | GTCGACGACGGTTGTTACCT | (AGT)8 | 110–126 | no PCR product | |
| T8- | gi428291122 |
| GCAAAACTCAAGAGAACGGC | GGATGTCTTCGGAGGTGAGA | (ACC)7 | 110–122 | no PCR product | |
| T9- | gi428292435 |
| ACAACCCATTTGCCTCAAAG | TTTTCACTTCCACCACCTCC | (ACC)7 | 110–133 | 2 | 124–127 |
| T10- | gi86119186 |
| TCCACTAGTTCTAGAGCGGC | TCCTGTAAACTGGAGGAGCC | (ACC)9 | 110–153 | no PCR product | |
| T11- | gi86124411 |
| TGGCGGTGGTGACTTATACA | TGTTTGGCAGTGGTGATGTT | (AGG)8 | 110–153 | no PCR product | |
| T4- | gi86125686 |
| GCTGCCACAGCACTACCAG | AATATTACCGTGAATGAAGCTCAG | (ACC)8 | 110–113 | 1 | 110 |
| T5- | gi86097190 |
| TGAGTTCCGAGTTAAGGCTCA | TTCGGTAACTCCGAGGATTG | (ACCT)5 | 210–217 | 2 | 227–230 |
| T12- | gi428282514 |
| GATTATTCAACCAAACGCCG | TAGAAAGCCACGCCAAGACT | (AATCC)20 | 290 | no PCR product | |
| C6- | gi124618051 |
| TGGGATGCAGTCCAGTCATA | TTGCAACTTGCCTGTACCAC | (AC)11 | 160–162 | 1 | 256 |
| C1- | gi148298213 |
| GGGAACCACACCTTTCATCT | GATCTGGCTTGACCCAAGAA | (AC)10 | 90–119 | 2 | 99–101 |
| C7- | gi124669731 |
| TCGTTTTCCGATCACAAACTC | CAATTTGGCGACATCTCCTT | (AC)12 | 110–160 | 4 | 114–152 |
| C2- | gi124680442 |
| CGCATTATGGAATAAACCCG | GCTTTCGACTTCATAAGCGG | (AAG)7 | 140–152 | 1 | 147 |
| C8- | gi148296795 |
| CGATGTATACAGGTGGTGCG | GGAGAAGGGGAGACGTAAGG | (ACC)7 | 110–150 | 2 | 141–144 |
| C9- | gi124675484 |
| AACGGTAGGAACCAGCATTG | GATCCTCTGGCAGGGTCATA | (ACC)9 | 260–302 | 4 | 290–299 |
| C10- | gi124661102 |
| AGTTGCCAGAAAGGAGCAAG | TCGAGAACAATGGCCTATCC | (AGC)7 | 210–229 | no PCR product | |
| C11- | gi148292432 |
| TCCATGGATACAACCACCAA | GCGATATTCGGATGCAAAGT | (AGG)7 | 160–175 | 4 | 160–172 |
| C3- | gi124632630 |
| GCCATCCCCTTCTCTACTCC | GTTACAGGTGACGATGGGG | (AGT)7 | 160–181 | no PCR product | |
| C4- | gi124691992 |
| CTGCACCTACCCAGAGAAGC | CGGGAGAGGGTAAATTGTGA | (AGGT)5 | 110–115 | 3 | 103–109 |
| C12- | gi124632477 |
| ATGCATTGAGAAGGCCAATC | AACTCGCAAGCCTTTTCAAG | (AATCGG)4 | 210–223 | no PCR product | |
| C5 | gi124673348 |
| TTAAGCATTCTTCGAGGCGT | TCTATGCCTACGCCGATCTC | (AAGCAG)5 | 110 | no PCR product | |
| gi124676118 | ||||||||
| gi124669484 | ||||||||
GenBank accession No., identification number of the EST sequences (when more than one ID refers to consensus sequence); species, indicates the species of the EST sequences; primer sequences; type of repeated motif; expected size of the PCR product; N A, number of alleles for the examined individuals and size range of the PCR product (−− indicates stutter peak)
Cross-species transferability of EST-SSRs in two plant genera, Trifolium and Centaurea
| Locus |
| Size Range (bp) |
| Size Range (bp) |
|---|---|---|---|---|
|
|
| |||
| C6- | 1 | 256 | 1 | 256 |
| C1- | 2 | 99–101 | 2 | 90–101 |
| C7- | 2 | 141–143 | 4 | 114–152 |
| C2- | 1 | 305 | 1 | 305 |
| C8- | 1 | 144 | 1 | 141 |
| C9- | 1 | 290 | 3 | 293–299 |
| C11- | 2 | 160–166 | 3 | 160–169 |
| C4- | 1 | 103 | 2 | 105–109 |
|
|
| |||
| T6- | 1 | 114 | 1 | 116 |
| T1- | -- | 257–261 | -- | 257–261 |
| T9- | 1 | 124 | 2 | 124–127 |
| T4- | 1 | 110 | 1 | 110 |
| T5- | 2 | 227–230 | 2 | 227–230 |
n number of individuals tested, N number of alleles for the examined individuals and size range of the PCR product (−− indicates stutter peak)