| Literature DB >> 33888729 |
Yanzhi Feng1,2,3,4, Yang Zhao1,2,3,4, Jiajia Zhang1,2,3,4, Baoping Wang1,2,3,4, Chaowei Yang1,2,3,4, Haijiang Zhou1,2,3,4, Jie Qiao5,6,7,8.
Abstract
Paulownia catalpifolia is an important, fast-growing timber species known for its high density, color and texture. However, few transcriptomic and genetic studies have been conducted in P. catalpifolia. In this study, single-molecule real-time sequencing technology was applied to obtain the full-length transcriptome of P. catalpifolia leaves treated with varying degrees of drought stress. The sequencing data were then used to search for microsatellites, or simple sequence repeats (SSRs). A total of 28.83 Gb data were generated, 25,969 high-quality (HQ) transcripts with an average length of 1624 bp were acquired after removing the redundant reads, and 25,602 HQ transcripts (98.59%) were annotated using public databases. Among the HQ transcripts, 16,722 intact coding sequences, 149 long non-coding RNAs and 179 alternative splicing events were predicted, respectively. A total of 7367 SSR loci were distributed throughout 6293 HQ transcripts, of which 763 complex SSRs and 6604 complete SSRs. The SSR appearance frequency was 28.37%, and the average distribution distance was 5.59 kb. Among the 6604 complete SSR loci, 1-3 nucleotide repeats were dominant, occupying 97.85% of the total SSR loci, of which mono-, di- and tri-nucleotide repeats were 44.68%, 33.86% and 19.31%, respectively. We detected 112 repeat motifs, of which A/T (42.64%), AG/CT (12.22%), GA/TC (9.63%), GAA/TTC (1.57%) and CCA/TGG (1.54%) were most common in mono-, di- and tri-nucleotide repeats, respectively. The length of the repeat SSR motifs was 10-88 bp, and 4997 (75.67%) were ≤ 20 bp. This study provides a novel full-length transcriptome reference for P. catalpifolia and will facilitate the identification of germplasm resources and breeding of new drought-resistant P. catalpifolia varieties.Entities:
Year: 2021 PMID: 33888729 PMCID: PMC8062547 DOI: 10.1038/s41598-021-87538-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1SMRT sequencing of P. catalpifolia leaf transcriptomes. (a) Length distribution of CCSs. (b) Length distribution of FLNCs. (c) Length distribution of high-quality transcripts. Figure was made by Microsoft Office Excel 2013 software.
Results of the functional annotation of 25,969 HQ transcripts.
| Database | Number of HQ transcripts | Percentage (%) |
|---|---|---|
| Annotated in NR | 25,591 | 98.54 |
| Annotated in GO | 18,501 | 71.24 |
| Annotated in KOG | 12,350 | 47.56 |
| Annotated in Swiss-Prot | 22,606 | 87.05 |
| Annotated in KEGG | 13,829 | 53.25 |
| Unannotated | 367 | 1.41 |
| Total HQ isoforms | 25,969 | 100 |
Figure 2The Homologous species distribution of P. catalpifolia HQ transcripts. Figure was made by Microsoft Office Excel 2013 software.
Figure 3Gene ontology (GO), eukaryotic orthologous groups (KOG) and Kyoto encyclopedia of genes and genomes (KEGG) functional classifications of high-quality (HQ) transcripts. (a) GO classification of HQ transcripts. (b) KOG classification of HQ transcripts. (c) KEGG classification of HQ transcripts.
Figure 4The identification of lncRNAs and the proteins length distribution of the P. catalpifolia transcriptome. (a) The Venn diagram of the number of lncRNAs predicted by CPC2, CPAT, PLEK and CNCI. (b) The length distribution of the proteins translated using predicted intact CDSs.
Occurrence of microsatellites in the full-length transcriptome of P. catalpifolia.
| Item | Number |
|---|---|
| Total number of HQ transcripts examined | 25,969 |
| Total size of the examined HQ transcripts (bp) | 42,183,906 |
| Total number of HQ transcripts containing SSRs | 6293 |
| Total number of SSRs identified | 7367 |
| Total number of complex SSRs identified | 763 |
| Number of HQ transcripts containing more than one SSR | 747 |
Figure 5The types and numbers of complete SSRs in P. catalpifolia. Figure was made by Microsoft Office Excel 2013 software.
The six types of SSR repeat motifs and their frequency in P. catalpifolia.
| Repeat motif length | Repeat number | Total number | Frequency (%) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | > 12 | |||
| A/T | 704 | 577 | 403 | 1132 | 2816 | 42.64 | |||||
| C/G | 16 | 26 | 18 | 75 | 135 | 2.04 | |||||
| AC/GT | 35 | 28 | 32 | 44 | 27 | 13 | 14 | 41 | 234 | 3.54 | |
| CA/TG | 78 | 33 | 33 | 15 | 7 | 16 | 5 | 28 | 215 | 3.26 | |
| AG/CT | 230 | 121 | 93 | 52 | 58 | 46 | 37 | 170 | 807 | 12.22 | |
| GA/TC | 180 | 96 | 66 | 56 | 48 | 38 | 22 | 130 | 636 | 9.63 | |
| AT/AT | 45 | 22 | 19 | 31 | 13 | 14 | 14 | 13 | 171 | 2.59 | |
| TA/TA | 38 | 34 | 29 | 21 | 12 | 10 | 9 | 12 | 165 | 2.50 | |
| GC/GC | 2 | 2 | 4 | 0.06 | |||||||
| CG/CG | 4 | 4 | 0.06 | ||||||||
| AAC/GTT | 2 | 3 | 1 | 1 | 7 | 0.11 | |||||
| AAG/CTT | 49 | 11 | 5 | 5 | 1 | 1 | 4 | 1 | 1 | 78 | 1.18 |
| AAT/ATT | 8 | 5 | 5 | 1 | 1 | 1 | 21 | 0.32 | |||
| ACA/TGT | 4 | 1 | 5 | 0.08 | |||||||
| ACC/GGT | 26 | 14 | 5 | 2 | 1 | 48 | 0.73 | ||||
| ACG/CGT | 1 | 2 | 3 | 6 | 0.09 | ||||||
| ACT/AGT | 12 | 4 | 1 | 17 | 0.26 | ||||||
| AGA/TCT | 42 | 14 | 5 | 7 | 5 | 2 | 4 | 1 | 2 | 82 | 1.24 |
| AGC/GCT | 34 | 5 | 5 | 5 | 1 | 50 | 0.76 | ||||
| AGG/CCT | 27 | 8 | 5 | 1 | 41 | 0.62 | |||||
| ATA/TAT | 11 | 2 | 1 | 1 | 1 | 16 | 0.24 | ||||
| ATC/GAT | 26 | 5 | 9 | 1 | 1 | 42 | 0.64 | ||||
| ATG/CAT | 34 | 17 | 1 | 5 | 3 | 60 | 0.91 | ||||
| CAA/TTG | 13 | 2 | 2 | 2 | 1 | 20 | 0.30 | ||||
| CAC/GTG | 30 | 17 | 6 | 5 | 2 | 60 | 0.91 | ||||
| CAG/CTG | 40 | 12 | 3 | 19 | 4 | 2 | 1 | 81 | 1.23 | ||
| CCA/TGG | 72 | 17 | 4 | 5 | 3 | 1 | 102 | 1.54 | |||
| CCG/CGG | 34 | 24 | 11 | 5 | 4 | 78 | 1.18 | ||||
| CGA/TCG | 2 | 6 | 8 | 0.12 | |||||||
| CGC/GCG | 26 | 1 | 2 | 5 | 1 | 35 | 0.53 | ||||
| CTA/TAG | 9 | 1 | 10 | 0.15 | |||||||
| CTC/GAG | 38 | 16 | 4 | 1 | 1 | 60 | 0.91 | ||||
| GAA/TTC | 50 | 25 | 13 | 8 | 2 | 3 | 1 | 1 | 1 | 104 | 1.57 |
| GAC/GTC | 7 | 2 | 1 | 10 | 0.15 | ||||||
| GCA/TGC | 24 | 5 | 6 | 5 | 40 | 0.61 | |||||
| GCC/GGC | 35 | 18 | 4 | 1 | 58 | 0.88 | |||||
| GGA/TCC | 26 | 10 | 6 | 3 | 3 | 48 | 0.73 | ||||
| GTA/TAC | 1 | 1 | 2 | 0.03 | |||||||
| TAA/TTA | 6 | 5 | 1 | 5 | 1 | 18 | 0.27 | ||||
| TCA/TGA | 49 | 6 | 6 | 6 | 1 | 68 | 1.03 | ||||
| ATCA/TGAT | 1 | 1 | 0.02 | ||||||||
| TTTG/CAAA | 3 | 3 | 0.05 | ||||||||
| AAAT/ATTT | 2 | 1 | 3 | 0.05 | |||||||
| GGAA/TTCC | 1 | 1 | 0.02 | ||||||||
| CCCT/AGGG | 1 | 7 | 8 | 0.12 | |||||||
| TTTA/TAAA | 2 | 2 | 0.03 | ||||||||
| TGTA/TACA | 1 | 1 | 0.02 | ||||||||
| TTCT/AGAA | 2 | 2 | 0.03 | ||||||||
| ACAG/CTGT | 2 | 2 | 0.03 | ||||||||
| GAAA/TTTC | 2 | 1 | 3 | 0.05 | |||||||
| TGAA/TTCA | 4 | 4 | 0.06 | ||||||||
| TCTT/AAGA | 1 | 1 | 0.02 | ||||||||
| ATGT/ACAT | 1 | 1 | 0.02 | ||||||||
| CGTG/CACG | 1 | 1 | 0.02 | ||||||||
| GATT/AATC | 4 | 4 | 0.06 | ||||||||
| TCTA/TAGA | 1 | 1 | 0.02 | ||||||||
| CTTT/AAAG | 2 | 1 | 3 | 0.05 | |||||||
| ATAC/GTAT | 1 | 1 | 0.02 | ||||||||
| TTGT/ACAA | 1 | 1 | 0.02 | ||||||||
| GCCC/GGGC | 1 | 1 | 0.02 | ||||||||
| GGAG/CTCC | 2 | 1 | 3 | 0.05 | |||||||
| CAAC/GTTG | 1 | 1 | 0.02 | ||||||||
| AATA/TATT | 1 | 1 | 0.02 | ||||||||
| AAAC/GTTT | 1 | 1 | 0.02 | ||||||||
| CCACC/GGTGG | 9 | 9 | 0.14 | ||||||||
| TGATG/CATCA | 1 | 1 | 0.02 | ||||||||
| TCCTC/GAGGA | 2 | 2 | 4 | 0.06 | |||||||
| CCACA/TGTGG | 1 | 1 | 0.02 | ||||||||
| CTTTT/AAAAG | 1 | 1 | 2 | 0.03 | |||||||
| CACTT/AAGTG | 1 | 1 | 0.02 | ||||||||
| TTCTT/AAGAA | 1 | 1 | 0.02 | ||||||||
| TATTT/AAATA | 1 | 1 | 0.02 | ||||||||
| CACCC/GGGTG | 1 | 1 | 0.02 | ||||||||
| CCCAC/GTGGG | 1 | 1 | 0.02 | ||||||||
| CTCTT/AAGAG | 1 | 1 | 0.02 | ||||||||
| AGCTT/AAGCT | 1 | 1 | 0.02 | ||||||||
| AAAAAG/CTTTTT | 2 | 2 | 0.03 | ||||||||
| AAGAGA/TCTCTT | 8 | 8 | 0.12 | ||||||||
| ACAGGG/CCCTGT | 2 | 2 | 0.03 | ||||||||
| ACTCCG/CGGAGT | 3 | 3 | 0.05 | ||||||||
| AGGAAA/TTTCCT | 1 | 1 | 0.02 | ||||||||
| AGGAGA/TCTCCT | 3 | 3 | 0.05 | ||||||||
| AGGCTC/GAGCCT | 2 | 2 | 0.03 | ||||||||
| ATGGGC/GCCCAT | 1 | 1 | 0.02 | ||||||||
| ATTTTC/GAAAAT | 3 | 3 | 0.05 | ||||||||
| CACCAG/CTGGTG | 2 | 2 | 0.03 | ||||||||
| CACCCC/GGGGTG | 1 | 1 | 0.02 | ||||||||
| CACGCA/TGCGTG | 1 | 1 | 0.02 | ||||||||
| CAGCAA/TTGCTG | 1 | 1 | 0.02 | ||||||||
| CATCTT/AAGATG | 1 | 1 | 0.02 | ||||||||
| CCATCT/AGATGG | 2 | 2 | 0.03 | ||||||||
| CCCACT/AGTGGG | 1 | 1 | 0.02 | ||||||||
| CCCTTT/AAAGGG | 1 | 1 | 0.02 | ||||||||
| CCGCCA/TGGCGG | 2 | 1 | 3 | 0.05 | |||||||
| CCGGGA/TCCCGG | 3 | 3 | 0.05 | ||||||||
| CCTCCC/GGGAGG | 3 | 3 | 0.05 | ||||||||
| CCTCTC/GAGAGG | 1 | 1 | 0.02 | ||||||||
| CCTCTT/AAGAGG | 1 | 1 | 0.02 | ||||||||
| CTCAAC/GTTGAG | 1 | 1 | 0.02 | ||||||||
| CTCCAC/GTGGAG | 1 | 1 | 0.02 | ||||||||
| CTCCAT/ATGGAG | 1 | 1 | 2 | 0.03 | |||||||
| GAACCA/TGGTTC | 2 | 2 | 0.03 | ||||||||
| GAGCCG/CGGCTC | 2 | 2 | 0.03 | ||||||||
| GAGGAT/ATCCTC | 1 | 1 | 0.02 | ||||||||
| GGAATG/CATTCC | 1 | 1 | 0.02 | ||||||||
| GGAGCA/TGCTCC | 1 | 1 | 0.02 | ||||||||
| GGTGGA/TCCACC | 1 | 1 | 0.02 | ||||||||
| TCCGCC/GGCGGA | 1 | 1 | 0.02 | ||||||||
| TCCTTT/AAAGGA | 1 | 1 | 0.02 | ||||||||
| TTTCTT/AAGAAA | 6 | 6 | 0.09 | ||||||||
| TTTTCT/AGAAAA | 1 | 1 | 0.02 | ||||||||
| TTTTGC/GCAAAA | 1 | 1 | 0.02 | ||||||||
| Total number | 834 | 901 | 456 | 378 | 247 | 897 | 757 | 527 | 1607 | 6604 | 100.00 |
| Frequency (%) | 12.63 | 13.64 | 6.90 | 5.72 | 3.74 | 13.58 | 11.46 | 7.98 | 24.33 | 100.00 | |
Figure 6The types and numbers of mononucleotide and dinucleotide repeat motifs.