| Literature DB >> 27347455 |
Richard G J Hodel1, Matthew A Gitzendanner2, Charlotte C Germain-Aubrey3, Xiaoxian Liu1, Andrew A Crowl1, Miao Sun3, Jacob B Landis1, M Claudia Segovia-Salcedo4, Norman A Douglas2, Shichao Chen5, Douglas E Soltis6, Pamela S Soltis7.
Abstract
PREMISE OF THE STUDY: The One Thousand Plant Transcriptomes Project (1KP, 1000+ assembled plant transcriptomes) provides an enormous resource for developing microsatellite loci across the plant tree of life. We developed loci from these transcriptomes and tested their utility. METHODS ANDEntities:
Keywords: 1KP; microsatellite development; neutral markers; next-generation sequencing (NGS); non-neutral markers; simple sequence repeat (SSR); transcriptomes
Year: 2016 PMID: 27347455 PMCID: PMC4915922 DOI: 10.3732/apps.1600024
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Fig. 1.The workflows for developing SSR loci from 1KP transcriptomes, assessing cross-amplification of SSR loci from 1KP transcriptomes, and comparing transcriptomic and genomic SSR loci. Turquoise boxes indicate software packages or scripts; custom scripts (available at https://github.com/soltislab/transcriptome_microsats) designed by the authors are designated by asterisks.
Average number of SSR loci by repeat type and their location relative to identified open reading frames (ORFs). Note that percentages are based on unrounded values.
| SSR type | Avg. no. SSR loci (% of total) | Avg. no. with no overlap of ORF (% of type) | Avg. no. with any overlap of ORF (% of type) | Avg. no. with substantial (≥15 bp) overlap of ORF (% of type) |
| Compound | 176 (4) | 70 (40) | 106 (60) | 104 (59) |
| Complex compound | 7 (0) | 4 (49) | 4 (51) | 4 (50) |
| Mononucleotide | 1681 (39) | 833 (50) | 848 (50) | 46 (3) |
| Dinucleotide | 947 (22) | 436 (46) | 511 (54) | 99 (10) |
| Trinucleotide | 1427 (33) | 250 (18) | 1176 (82) | 472 (33) |
| Tetranucleotide | 55 (1) | 30 (54) | 25 (46) | 23 (41) |
| Pentanucleotide | 16 (0) | 9 (56) | 7 (44) | 6 (40) |
| Hexanucleotide | 20 (0) | 4 (21) | 16 (79) | 15 (79) |
| Total | 4328 (100) | 1635 (38) | 2693 (62) | 769 (18) |
Summary of the electronic cross-amplification among species of Oenothera. The first row is percentage of all loci that amplify in other samples of the same species. The other rows are percentage of amplifying loci that are polymorphic between samples. Genetic identity (GI) is based on nucleotide identity at 10 plastid loci.
| Same species (100% GI) | Very similar (95% to <100% GI) | Similar (90% to <95% GI) | Less similar (<90% GI) | |
| % Amplification | 53 | 24 | 20 | 18 |
| % Polymorphic repeat | 27 | 76 | 79 | 79 |
| % Flanking sequence length polymorphic | 26 | 76 | 79 | 79 |
| % Flanking sequence polymorphic | 39 | 94 | 96 | 95 |
Fig. 2.The relationship between genetic identity and the proportion of loci that will successfully amplify, the proportion of flanking regions that are the same, the proportion of flanking regions with the same length, the proportion with the same microsatellite, the proportion with the same PCR sequence, and the proportion of PCR sequences that have the same length. The sequence is defined as the whole sequence between the primers, so the flanking regions are the sequence between the primer and the repeat, or either side of the repeat. The orange line is dashed to enable the reader to distinguish it from the blue line.
Comparison of genomic and transcriptomic SSRs developed in Glycine. The number of reads, total number of bases, reads containing an SSR, reads containing a potentially amplifiable locus (PAL), unique PALs, mean number of PALs in a coding region, and percentage of PALs found in a coding region are presented for both data sets. The analyses to determine which loci were in translated regions were run on both forward and reverse loci, and the results were averaged. Thus, some of the counts of motifs in coding regions are not integers.
| Statistic of data set | Genomic | Transcriptomic |
| Reads | 717,309 (raw 454 reads) | 364,755 (Illumina assembly scaffolds) |
| Total bases | 84,930,732 | 133,872,643 |
| Reads/scaffolds with SSR | 7368 | 13,667 |
| Reads/scaffolds with primers (PALs) | 624 | 8456 |
| Total unique PALs | 532 | 7186 |
| Dinucleotides, total | 286 | 2893 |
| Trinucleotides, total | 212 | 4050 |
| Tetranucleotides, total | 24 | 174 |
| Pentanucleotides, total | 9 | 39 |
| Hexanucleotides, total | 1 | 30 |
| Dinucleotides, % of total | 53.8 | 40.3 |
| Trinucleotides, % of total | 39.8 | 56.4 |
| Tetranucleotides, % of total | 4.5 | 2.4 |
| Pentanucleotides, % of total | 1.7 | 0.5 |
| Hexanucleotides, % of total | 0.2 | 0.4 |
| Mean PALs in coding region | 65 | 2552 |
| % of PALs in coding region | 12.2 | 35.5 |
| Dinucleotides, in coding region | 17.5 | 643 |
| Trinucleotides, in coding region | 46 | 1838 |
| Tetranucleotides, in coding region | 1.5 | 48 |
| Pentanucleotides, in coding region | 0 | 7.5 |
| Hexanucleotides, in coding region | 0 | 15 |
| Dinucleotides, % of total in coding region | 26.9 | 25.2 |
| Trinucleotides, % of total in coding region | 70.8 | 72.0 |
| Tetranucleotides, % of total in coding region | 2.3 | 1.9 |
| Pentanucleotides, % of total in coding region | 0.0 | 0.3 |
| Hexanucleotides, % of total in coding region | 0.0 | 0.6 |
| Dinucleotides, % in coding relative to total dinucleotides | 6.1 | 22.2 |
| Trinucleotides, % in coding relative to total trinucleotides | 21.7 | 45.4 |
| Tetranucleotides, % in coding relative to total tetranucleotides | 6.3 | 27.6 |
| Pentanucleotides, % in coding relative to total pentanucleotides | 0.0 | 19.2 |
| Hexanucleotides, % in coding relative to total hexanucleotides | 0.0 | 50.0 |