| Literature DB >> 33004908 |
Qichao Wu1, Fengqi Zang2, Xiaoman Xie3, Yan Ma4, Yongqi Zheng2, Dekui Zang1.
Abstract
Populus wulianensis is an endangered species endemic to Shandong Province, China. Despite the economic and ornamental value of this species, few genomics and genetic studies have been performed. In this study, we performed a relevant analysis of the full-length transcriptome sequencing data of P. wulianensis and obtained expressed sequence tag (EST)-simple sequence repeat (SSR) markers with polymorphisms that can be used for further genetic research. In total, 8.18 Gb (3,521,665) clean reads with an average GC content of 42.12% were obtained. From the corrected 64,737 high-quality isoforms, 42,323 transcript sequences were obtained after redundancy analysis with CD-HIT. Among these transcript sequences, 41,876 sequences were annotated successfully. A total of 23,539 potential EST-SSRs were identified from 16,057 sequences. Excluding mononucleotides, the most abundant motifs were trinucleotide SSRs (47.80%), followed by di- (46.80%), tetra- (2.98%), hexa- (1.58%) and pentanucleotide SSRs (0.84%). Among the 100 designed EST-SSRs, 18 were polymorphic with high PIC values (0.721 and 0.683) and could be used for analyses of the genetic diversity and population structure of P. wulianensis. These full-length transcriptome sequencing data will facilitate gene discovery and functional genomics research in P. wulianensis, and the novel EST-SSRs developed in our study will promote molecular-assisted breeding, genetic diversity and conservation biology research in this species.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33004908 PMCID: PMC7530656 DOI: 10.1038/s41598-020-73289-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Full-length transcriptome sequencing data.
| cDNA size (kb) | CD | MCDL | ROI | MRLI | MRQI | NFNR | AFNRL | FLP (%) |
|---|---|---|---|---|---|---|---|---|
| 1–2 | 1,629,787 | 1799 | 128,208 | 1956 | 0.93 | 78,124 | 1509 | 61.21 |
| 2–3 | 1,314,986 | 2364 | 99,284 | 2143 | 0.93 | 57,066 | 2106 | 57.63 |
| 3–6 | 576,892 | 3708 | 61,636 | 3623 | 0.91 | 32,875 | 3488 | 53.55 |
| All | 3,521,665 | 2177 | 289,128 | 2664 | 0.92 | 16,8065 | 7103 | 57.46 |
cDNA size size of the inserted fragment used to build the library, CD clean data, MCDL mean length of clean data, ROI read of insert, MRLI mean read length of insert, MRQI mean read quality of insert, NFNR number of full-length nonchimeric reads, AFNRL average full-length nonchimeric read length, FLP full-length percentage.
ICE clustering statistics.
| Size (kb) | NCI | ACIRL | NPHI | PPHI (%) |
|---|---|---|---|---|
| 0–1 | 2176 | 915 | 1839 | 84.51 |
| 1–2 | 44,027 | 1547 | 35,739 | 81.18 |
| 2–3 | 21,372 | 2331 | 15,787 | 73.87 |
| 3–6 | 19,103 | 3629 | 11,354 | 59.44 |
| Above 6 | 326 | 9220 | 18 | 5.52 |
| All | 87,004 | 17,642 | 64,737 | 60.90 |
Size length range of sequence statistics, NCI number of consensus isoforms, ACIRL average consensus isoform length, NPHI number of polished high-quality isoforms, PPHI percent of polished high-quality isoforms.
Figure 1GO annotations of P. wulianensis transcript sequences.
Figure 2KOG functional classification of P. wulianensis transcript sequences.
Figure 3KEGG metabolic categories in the P. wulianensis transcriptome.
SSR analysis statistics.
| Searching item | Numbers | Percentage | SSR density |
|---|---|---|---|
| Total number of sequences examined | 42,311 | – | – |
| Total size of examined sequences (bp) | 89,101,859 | – | – |
| Total number of identified SSRs | 23,539 | – | – |
| Number of SSR-containing sequences | 16,057 | – | – |
| Number of sequences containing more than 1 SSR | 5050 | – | – |
| Number of SSRs present in compound form | 2506 | – | – |
| Mononucleotide | 12,520 | – | – |
| Dinucleotide | 5157 | 46.80 | 44.01 |
| Trinucleotide | 5267 | 47.80 | 47.34 |
| Tetranucleotide | 328 | 2.98 | 2.93 |
| Pentanucleotide | 93 | 0.84 | 0.83 |
| Hexanucleotide | 174 | 1.58 | 1.66 |
Repeat type and proportion of SSRs.
| Searching item | Number of repeat types | Major repeat type | Percentage |
|---|---|---|---|
| Dinucleotide | 4 | AG/CT (3788, 34.38%), AT/AT (774, 7.02%), AC/GT (570, 5.17%), CG/CG (25, 0.23%) | 3.08 |
| Trinucleotide | 10 | AAG/CTT (1138, 10.33%), AGC/CTG (1078, 9.78%), AGG/CCT (735, 6.67%), ACC/GGT (616, 5.59%), AAT/ATT (528, 4.79%) | 7.69 |
| Tetranucleotide | 23 | AAAG/CTTT (72, 0.65%), AGGG/CCCT (66, 0.60%), AAAT/ATTT (58, 0.53%), AAGG/CCTT (25, 0.23%), ACAT/ATGT (16, 0.15%) | 17.69 |
| Pentanucleotide | 19 | AAAAG/CTTTT (22, 0.20%), AGAGG/CCTCT (17, 0.15%), AAGAG/CTCTT (10, 0.09%), AAAAT/ATTTT (7, 0.06%), AGGGG/CCCCT (6, 0.05%) | 14.62 |
| Hexanucleotide | 74 | AACAGC/CTGTTG (12, 0.11%), AAATAC/ATTTGT (8, 0.07%), ACCGCC/CGGTGG (7, 0.06%), AAAAAT/ATTTTT (6, 0.05%), ACCATC/ATGGTG (6, 0.05%) | 56.92 |
Figure 4PIC values of 18 polymorphic EST-SSR markers.