| Literature DB >> 35689714 |
Sharmin Hasan1,2, Lichun Huang3, Qiaoquan Liu3, Virginie Perlo1, Angela O'Keeffe1, Gabriel Rodrigues Alves Margarido4, Agnelo Furtado1, Robert J Henry5,6.
Abstract
BACKGROUND: High-throughput next-generation sequencing technologies offer a powerful approach to characterizing the transcriptomes of plants. Long read sequencing has been shown to support the discovery of novel isoforms of transcripts. This approach enables the generation of full-length sequences revealing splice variants that may be important in regulating gene action. Investigation of the diversity of transcripts in the rice transcriptome including splice variants was conducted using PacBio long-read sequence data to improve the annotation of the rice genome.Entities:
Keywords: Alternative splicing isoforms; Full-length transcripts; Iso-sequencing; Novel isoforms; Rice transcriptome; Splicing junctions
Year: 2022 PMID: 35689714 PMCID: PMC9188635 DOI: 10.1186/s12284-022-00577-1
Source DB: PubMed Journal: Rice (N Y) ISSN: 1939-8425 Impact factor: 5.638
PacBio SMRT sequencing data
| Category | Dataset |
|---|---|
| Polymerase reads | 517,297 |
| Mean length of polymerase | 65,287 bp |
| Polymerase reads N50 | 134,505 |
| Number of subreads | 16,551,194 |
| Mean length of subreads | 1997 bp |
| Number of the circular consensus sequences (CCS) reads | 415,221 |
| Number of full-length (FL) reads | 357,925 |
| Number of full-length non-chimeric (FLNC) reads | 346,190 |
| Number of unpolished FL reads | 33,658 |
| Length of the longest unpolished FL reads | 14,636 bp |
| Number of high quality (HQ) FLNC reads | 33,504 |
| Number of low quality (LQ) FLNC reads | 152 |
| Mean FLNC reads length | 2971.10 bp |
Fig. 1PacBio transcriptome characterization: a quantification of total genes comprising of annotated genes and novel genes; b transcripts distribution across the annotated and novel genes. Annotated genes: annotated to the reference genome; Novel genes: not annotated to the reference genome
Fig. 2Characterization of isoform categories of PacBio transcriptome: a percentage of the PacBio transcriptome in different isoform categories (FSM full splice match, ISM incomplete splice match; NIC Novel In Catalog, NNC Novel Not in Catalog, Genic Genomic, Antisense, Fusion, and Intergenic); b structure of different categories of isoforms c length of PacBio transcripts by different isoform categories
Distribution of isoforms with length (in bp) in 19 starch synthesis related genes
| Starch synthesis related genes | Total number of isoforms | Transcript length (bp) | Isoform categories | |||||
|---|---|---|---|---|---|---|---|---|
| FSM | ISM | NIC | NNC | Fusion | Genic genomic | |||
| 19 | 261–2459 | – | 5 | – | 14 | – | – | |
| 28 | 86–3615 | 12 | 10 | 1 | 5 | – | – | |
| 19 | 2039–2976 | 15 | – | 4 | – | – | – | |
| 9 | 1764–2800 | 8 | 1 | – | – | – | – | |
| 9 | 1480–2934 | – | 2 | – | 7 | – | – | |
| 5 | 2385–2447 | – | 4 | 1 | – | – | ||
| 3 | 2888–3057 | – | – | – | 3 | – | – | |
| 51 | 2175–7791 | 34 | 4 | 3 | 3 | 7 | – | |
| 7 | 1978–5047 | 3 | 1 | 3 | – | – | – | |
| 1 | 3643 | 1 | – | – | – | – | – | |
| 2 | 3316–4797 | 1 | – | 1 | – | – | – | |
| 45 | 127–5352 | 3 | 30 | 2 | 10 | – | – | |
| 5 | 993–1208 | 5 | – | – | – | – | – | |
| 28 | 143–5077 | – | 11 | – | 16 | – | 1 | |
| 8 | 2068–3010 | 6 | 2 | – | – | – | – | |
| 4 | 2619–2888 | 4 | – | – | – | – | – | |
| 15 | 278–4771 | – | 2 | – | 13 | – | – | |
| 3 | 1723–1830 | 3 | – | – | – | – | ||
| 15 | 1203–3439 | 7 | 8 | – | – | – | – | |
AGPL2—ADP-glucose pyrophosphorylase large subunit 2, GBSSI—Granule-bound starch synthase I, GBSSII—Granule-bound starch synthase II, SSI—Soluble starch synthase I, SSIIa—Soluble starch synthase IIa, SSIIb—Soluble starch synthase IIb, SSIIc—Soluble starch synthase IIc, SSIIIa—Soluble starch synthase III-2, SSIIIb—Soluble starch synthase III-1, SSIVa—Soluble starch synthase IV-1, SSIVb—Soluble starch synthase IV-2, BEI—Starch branching enzyme I, BEIIa—Starch branching enzyme IIa, BEIIb—Starch branching enzyme IIb, ISA1—Isoamylase 1, ISA2—Isoamylase 2, PUL—Pullulanase, GPT1—Glucose-6-phosphate translocator, PHOL—Starch phosphorylase
FSM full splice match, ISM incomplete splice match, NIC Novel In Catalog, NNC Novel Not in Catalog
Fig. 3Structure of isoforms for starch synthesis related genes: a variation in the composition of isoforms of 19 starch synthesis related genes; b visualization of the 28 alternative splicing isoforms of the GBSSI (granule-bound starch synthase I) gene shown as an example. Grey panel: read coverage, Brown panel: splice junctions; Pink panel: alignment trach representing 28 spliced isoforms of the GBSSI gene; I: insertion; Blue arrow: indicate the position of insertion; Blue panel: reference transcripts of GBSSI
Fig. 4Distribution of alternative splice junction across all the isoform features
Fig. 5Top ten GO terms represented by the highest number of transcripts for cellular component, molecular function, and biological process categories of gene ontology annotation for all aligned transcript isoforms