| Literature DB >> 31717904 |
Ruizheng Tian1, Cunhuan Zhang1, Yixiao Huang1, Xin Guo1, Maohua Chen1.
Abstract
Traditional methods for developing polymorphic microsatellite loci without reference sequences are time-consuming and labor-intensive, and the polymorphisms of simple sequence repeat (SSR) loci developed from expressed sequence tag (EST) databases are generally poor. To address this issue, in this study, we developed a new software (PSSRdt) and established an effective method for directly obtaining polymorphism details of SSR loci by analyzing diverse transcriptome data. The new method includes three steps, raw data processing, PSSRdt application, and loci extraction and verification. To test the practicality of the method, we successfully obtained 1940 potential polymorphic SSRs from the transcript dataset combined with 44 pea aphid transcriptomes. Fifty-two SSR loci obtained by the new method were selected for validating the polymorphic characteristics by genotyping in pea aphid individuals. The results showed that over 92% of SSR loci were polymorphic and 73.1% of loci were highly polymorphic. Our new software and method provide an innovative approach to microsatellite development based on RNA-seq data, and open a new path for the rapid mining of numerous loci with polymorphism to add to the body of research on microsatellites.Entities:
Keywords: SSR; method; polymorphic; transcriptomes
Mesh:
Substances:
Year: 2019 PMID: 31717904 PMCID: PMC6895799 DOI: 10.3390/genes10110917
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary of transcriptome assembly and simple sequence repeat (SSR) analysis.
| Accession ID | Total Sequences a | Total Size (bp) | Sequences with SSRs b | Total SSRs | 1 c | 2 | 3 | 4 | 5 | 6 | Submission Institution |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR063706 | 1584,7 | 2811,5408 | 7193 | 1499,2 | 9979 | 1860 | 3011 | 102 | 31 | 9 | The University of Arizona |
| SRR063707 | 2667,3 | 9125,155 | 3736 | 5370 | 2947 | 1034 | 1351 | 21 | 14 | 3 | The University of Arizona |
| SRR064408 | 8548 | 3035,495 | 1493 | 2138 | 985 | 508 | 630 | 9 | 4 | 2 | Yale University |
| SRR064409 | 2422,2 | 3274,3820 | 7230 | 1175,3 | 6877 | 1456 | 3337 | 63 | 13 | 7 | Yale University |
| SRR071347 | 9558 | 3108,732 | 1196 | 1653 | 663 | 300 | 680 | 6 | 2 | 2 | Baylor College of Medicine |
| SRR073136 | 1014,5 | 3367,825 | 1481 | 2082 | 1122 | 376 | 574 | 7 | 2 | 1 | University of Nebraska-Lincoln |
| SRR073272 | 1888,5 | 2141,4328 | 5184 | 8737 | 5423 | 1090 | 2153 | 52 | 13 | 6 | University of Nebraska-Lincoln |
| SRR073274 | 7634 | 2076,016 | 407 | 514 | 151 | 73 | 283 | 6 | 1 | 0 | University of Nebraska-Lincoln |
| SRR073276 | 3775,9 | 1020,0134 | 2122 | 2766 | 1143 | 375 | 1223 | 16 | 5 | 4 | University of Nebraska-Lincoln |
| SRR353539 | 2789,8 | 4192,9144 | 1014,8 | 2241,8 | 1363,7 | 2943 | 5646 | 143 | 40 | 9 | University of Nebraska-Lincoln |
| SRR073426 | 4643,7 | 1260,2772 | 2662 | 3487 | 1597 | 452 | 1410 | 19 | 5 | 4 | Cornell university |
| SRR073573 | 2073,0 | 1504,9227 | 3212 | 4541 | 2745 | 593 | 1171 | 18 | 10 | 4 | National Institute for Basic Biology |
| SRR073574 | 2164,6 | 8890,341 | 2478 | 3269 | 1405 | 447 | 1386 | 22 | 5 | 4 | National Institute for Basic Biology |
| SRR073575 | 2001,8 | 2192,9991 | 4421 | 7262 | 3453 | 997 | 2758 | 40 | 9 | 5 | National Institute for Basic Biology |
| SRR073576 | 1979,1 | 1799,6584 | 3611 | 5772 | 2199 | 872 | 2646 | 40 | 10 | 5 | National Institute for Basic Biology |
| SRR073588 | 1668,3 | 2225,0609 | 6410 | 1291,7 | 7607 | 1742 | 3472 | 73 | 20 | 3 | National Institute for Basic Biology |
| SRR074231 | 2336,2 | 4099,6212 | 1021,9 | 2242,0 | 1476,3 | 2745 | 4715 | 148 | 41 | 8 | University of Nebraska-Lincoln |
| SRR074233 | 2133,8 | 2352,6029 | 7417 | 1472,3 | 9237 | 1897 | 3463 | 93 | 28 | 5 | University of Nebraska-Lincoln |
| SRR075802 | 2376,3 | 3008,1669 | 8363 | 1792,7 | 1085,3 | 2391 | 4544 | 108 | 25 | 6 | INRA d |
| SRR075803 | 3108,7 | 3924,7189 | 1047,2 | 2068,1 | 1362,0 | 2470 | 4414 | 127 | 43 | 7 | INRA |
| SRR097896 | 3299,3 | 3999,7626 | 9993 | 1783,0 | 1139,9 | 2139 | 4150 | 97 | 36 | 9 | Centro Nacional de Análisis Genómico |
| SRR098330 | 3110,8 | 3628,8769 | 9981 | 1905,8 | 1276,0 | 2254 | 3898 | 104 | 35 | 7 | Centro Nacional de Análisis Genómico |
| SRR1239439 | 3280,9 | 3742,6415 | 1015,9 | 1920,7 | 1267,7 | 2272 | 4090 | 119 | 40 | 9 | Gene Expression Omnibus |
| SRR1239440 | 2037,3 | 3664,0425 | 7828 | 1561,6 | 8987 | 2073 | 4421 | 103 | 24 | 8 | Gene Expression Omnibus |
| SRR1239441 | 1687,1 | 1151,2399 | 2089 | 2859 | 1654 | 365 | 820 | 15 | 2 | 3 | Gene Expression Omnibus |
| SRR1239442 | 1581,1 | 2480,9531 | 6033 | 1218,1 | 7532 | 1470 | 3084 | 74 | 16 | 5 | Gene Expression Omnibus |
| SRR1239443 | 1571,6 | 2352,5854 | 6212 | 1250,6 | 7943 | 1556 | 2918 | 68 | 18 | 3 | Gene Expression Omnibus |
| SRR1239444 | 1232,7 | 1797,2772 | 4418 | 7931 | 5175 | 913 | 1783 | 45 | 11 | 4 | Gene Expression Omnibus |
| SRR1239445 | 1376,8 | 2219,2206 | 5276 | 9848 | 6445 | 1139 | 2186 | 63 | 10 | 5 | Gene Expression Omnibus |
| SRR1239446 | 6832,1 | 2486,0272 | 7185 | 1092,4 | 6648 | 1428 | 2759 | 68 | 15 | 6 | Gene Expression Omnibus |
| SRR1239448 | 6799,5 | 2526,1149 | 7454 | 1140,1 | 7018 | 1473 | 2821 | 66 | 16 | 7 | Gene Expression Omnibus |
| SRR1239449 | 2097,3 | 2681,8239 | 5932 | 9427 | 5914 | 1066 | 2369 | 58 | 13 | 7 | Gene Expression Omnibus |
| SRR1239450 | 6334,7 | 1765,8260 | 3957 | 5229 | 2464 | 770 | 1954 | 28 | 9 | 4 | Gene Expression Omnibus |
| SRR1239451 | 3222,4 | 8346,763 | 1515 | 1913 | 786 | 256 | 856 | 10 | 3 | 2 | Gene Expression Omnibus |
| SRR1239452 | 2073,0 | 15049,227 | 3212 | 4541 | 2745 | 593 | 1171 | 18 | 10 | 4 | Gene Expression Omnibus |
| SRR1239453 | 2080,9 | 8469,795 | 2382 | 3158 | 1270 | 441 | 1416 | 23 | 4 | 4 | Gene Expression Omnibus |
| SRR1793299 | 2384,4 | 4240,2758 | 1037,1 | 2327,6 | 1495,4 | 2898 | 5216 | 155 | 39 | 14 | Cornell university |
| SRR1793300 | 2042,4 | 3121,8422 | 7938 | 1492,9 | 9771 | 1845 | 3179 | 100 | 27 | 7 | Cornell university |
| SRR924106 | 3001,7 | 3667,7810 | 9772 | 1859,3 | 1207,1 | 2275 | 4082 | 115 | 44 | 6 | INRA |
| SRR924118 | 2568,1 | 3458,2798 | 8421 | 1534,3 | 9796 | 1877 | 3552 | 84 | 26 | 8 | INRA |
| SRR924119 | 2478,0 | 3590,0468 | 8810 | 1695,9 | 1087,9 | 2045 | 3888 | 108 | 31 | 8 | INRA |
| SRR924120 | 1589,6 | 2747,1266 | 5755 | 1011,2 | 6294 | 1202 | 2535 | 63 | 11 | 7 | INRA |
| SRR924121 | 1600,2 | 2609,7341 | 6519 | 1377,2 | 8406 | 1733 | 3511 | 93 | 23 | 6 | INRA |
| SRR924122 | 1445,5 | 2071,9694 | 5758 | 1126,0 | 7336 | 1348 | 2497 | 60 | 16 | 3 | INRA |
a The number of transcripts assembled by Trinity; b The number of sequences containing SSRs; c Mononucleotide SSRs; d French National Institute for Agricultural Research.
Figure 1Flow diagram of the method for polymorphic SSR loci mining. Step 1 indicates the procedure of RNA-seq raw data processing. Two examples represent the characteristics of the two output files generated by PSSRdt on the left of Step 2. (a) and (b) correspondingly represent total screened SSR loci and potential polymorphic loci. Two main validation items for the information check of potential polymorphic SSR are listed in Step 3. The flanking sequences in Sample 1 fill the requirements of primer design, while those in Sample 2 do not. The flanking sequences in Sample 4 are consistent with the corresponding flanking sequences in Sample 3, which are identified as the same SSR locus, while Sample 5 is not.
Figure 2The number of SSRs in A. pisum based on motif types. (a) The number of total types of SSR motifs in the transcript dataset. (b) The number of SSR motifs with potential polymorphism analyzed by PSSRdt.
Polymorphism analysis of 52 microsatellite loci in pea aphid individuals.
| Locus | N | NA | FM | PIC |
|---|---|---|---|---|
| 3 | 12 | 5 | 0.4167 | 0.5748 |
| 4 | 18 | 6 | 0.5278 | 0.6194 |
| 5 | 20 | 7 | 0.4250 | 0.7164 |
| 6 | 18 | 6 | 0.3333 | 0.7444 |
| 7 | 14 | 3 | 0.6786 | 0.4090 |
| 8 | 17 | 9 | 0.2059 | 0.8313 |
| 9 | 21 | 6 | 0.4286 | 0.7006 |
| 10 | 12 | 6 | 0.2917 | 0.7517 |
| 13 | 16 | 9 | 0.2813 | 0.8122 |
| 14 | 18 | 8 | 0.4444 | 0.7118 |
| 15 | 16 | 1 | 1.0000 | 0.0000 |
| 16 | 16 | 7 | 0.4375 | 0.7081 |
| 17 | 15 | 3 | 0.8333 | 0.2604 |
| 18 | 21 | 8 | 0.2381 | 0.8207 |
| 19 | 14 | 5 | 0.4643 | 0.6469 |
| 21 | 16 | 2 | 0.6250 | 0.3589 |
| 22 | 11 | 5 | 0.5000 | 0.6257 |
| 23 | 11 | 6 | 0.3182 | 0.7436 |
| 27 | 17 | 4 | 0.6176 | 0.5239 |
| 29 | 13 | 6 | 0.3462 | 0.6874 |
| 31 | 11 | 11 | 0.2273 | 0.8595 |
| 33 | 16 | 2 | 0.9375 | 0.1103 |
| 34 | 12 | 3 | 0.4583 | 0.5697 |
| 35 | 18 | 4 | 0.4722 | 0.5851 |
| 38 | 17 | 4 | 0.6176 | 0.5269 |
| 39 | 17 | 8 | 0.2647 | 0.7888 |
| 40 | 20 | 1 | 1.0000 | 0.0000 |
| 41 | 10 | 7 | 0.3500 | 0.7700 |
| 43 | 12 | 8 | 0.2917 | 0.8013 |
| 46 | 17 | 5 | 0.3235 | 0.7130 |
| 47 | 16 | 1 | 1.0000 | 0.0000 |
| 48 | 13 | 7 | 0.4231 | 0.6867 |
| 49 | 16 | 2 | 0.8750 | 0.1948 |
| 51 | 14 | 5 | 0.3214 | 0.7248 |
| 52 | 20 | 4 | 0.4500 | 0.5249 |
| 53 | 14 | 6 | 0.3929 | 0.7072 |
| 101 | 15 | 3 | 0.7667 | 0.3227 |
| 102 | 15 | 12 | 0.2000 | 0.8685 |
| 108 | 12 | 7 | 0.2917 | 0.7614 |
| 109 | 16 | 3 | 0.8438 | 0.2478 |
| 110 | 15 | 6 | 0.4333 | 0.6675 |
| 112 | 17 | 9 | 0.2353 | 0.8319 |
| 113 | 15 | 3 | 0.8333 | 0.2710 |
| 114 | 15 | 9 | 0.3667 | 0.7762 |
| 116 | 9 | 3 | 0.7222 | 0.3709 |
| 117 | 9 | 7 | 0.2778 | 0.8053 |
| 119 | 12 | 8 | 0.2500 | 0.7957 |
| 121 | 18 | 5 | 0.2778 | 0.7429 |
| 122 | 17 | 2 | 0.8824 | 0.1861 |
| 128 | 11 | 5 | 0.3182 | 0.7319 |
| 131 | 18 | 5 | 0.3056 | 0.7165 |
| 132 | 18 | 1 | 1.0000 | 0.0000 |
| Mean | 15.2115 | 5.3462 | 0.4966 | 0.5751 |
N, Number of aphids successfully genotyped; NA, Number of alleles per locus; FM, Frequency of major allele; PIC, Polymorphism information content.
Figure 3Polymorphism information content (PIC) details of SSR loci tested on A. pisum. The percentages of different PIC values in 9 types of SSR motifs, dinucleotide microsatellites, and trinucleotide microsatellites were visualized by three colors.