| Literature DB >> 31191588 |
Sarah Clark1, Feng Yu2, Lianfeng Gu3, Xiang Jia Min1.
Abstract
Tomato (Solanum lycopersicum) is an important vegetable and fruit crop. Its genome was completely sequenced and there are also a large amount of available expressed sequence tags (ESTs) and short reads generated by RNA sequencing (RNA-seq) technologies. Mapping transcripts including mRNA sequences, ESTs, and RNA-seq reads to the genome allows identifying pre-mRNA alternative splicing (AS), a post-transcriptional process generating two or more RNA isoforms from one pre-mRNA transcript. We comprehensively analyzed the AS landscape in tomato by integrating genome mapping information of all available mRNA and ESTs with mapping information of RNA-seq reads which were collected from 27 published projects. A total of 369,911 AS events were identified from 34,419 genomic loci involving 161,913 transcripts. Within the basic AS events, intron retention is the prevalent type (18.9%), followed by alternative acceptor site (12.9%) and alternative donor site (7.3%), with exon skipping as the least type (6.0%). Complex AS types having two or more basic event accounted for 54.9% of total AS events. Within 35,768 annotated protein-coding gene models, 23,233 gene models were found having pre-mRNAs generating AS isoform transcripts. Thus the estimated AS rate was 65.0% in tomato. The list of identified AS genes with their corresponding transcript isoforms serves as a catalog for further detailed examination of gene functions in tomato biology. The post-transcriptional information is also expected to be useful in improving the predicted gene models in tomato. The sequence and annotation information can be accessed at plant alternative splicing database (http://proteomics.ysu.edu/altsplice).Entities:
Keywords: Solanum lycopersicum; alternative splicing; gene expression; mRNA; plant; tomato; transcriptome
Year: 2019 PMID: 31191588 PMCID: PMC6546887 DOI: 10.3389/fpls.2019.00689
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Basic features of the assembled unique transcripts in tomato plants.
| Total unique transcripts | 533707 |
| Average transcript length (bp) | 1350 |
| Total genomic loci with at least one transcript | 260681 |
| Transcripts matching with gene model cDNAs | 260365 |
| Unique gene model cDNAs matching with transcripts | 34522 |
| Transcripts having a BLASTX match against Swiss-Prot database | 226881 |
| Total predicted ORFs from assembled transcripts | 518307 |
| Average length of predicted ORFs (amino acids) | 215 |
| Predicted full-length ORFs | 182325 |
| Predicted ORFs having a PFAM match | 176234 |
Summary of alternative splicing events in each chromosome of tomato plants.
| AltA (%) | AltD (%) | ExonS (%) | IntronR (%) | Others (%) | Total | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Chr0 | 188 | 11.8 | 108 | 6.8 | 55 | 3.4 | 451 | 28.3 | 794 | 49.7 | 1596 |
| Chr1 | 6410 | 13.5 | 3592 | 7.6 | 2923 | 6.2 | 9269 | 19.5 | 25276 | 53.2 | 47470 |
| Chr2 | 5396 | 10.1 | 3156 | 5.9 | 2379 | 4.5 | 8324 | 15.6 | 34184 | 64.0 | 53439 |
| Chr3 | 5168 | 13.2 | 2856 | 7.3 | 2274 | 5.8 | 7432 | 19.0 | 21476 | 54.8 | 39206 |
| Chr4 | 3988 | 14.0 | 2228 | 7.8 | 1851 | 6.5 | 5567 | 19.5 | 14926 | 52.3 | 28560 |
| Chr5 | 3026 | 13.4 | 1818 | 8.0 | 1434 | 6.3 | 4488 | 19.8 | 11883 | 52.5 | 22649 |
| Chr6 | 4077 | 14.1 | 2173 | 7.5 | 1758 | 6.1 | 5938 | 20.5 | 15071 | 51.9 | 29017 |
| Chr7 | 3379 | 12.3 | 1855 | 6.7 | 1661 | 6.0 | 5106 | 18.6 | 15506 | 56.4 | 27507 |
| Chr8 | 3355 | 13.3 | 1839 | 7.3 | 1537 | 6.1 | 4979 | 19.8 | 13426 | 53.4 | 25136 |
| Chr9 | 3281 | 12.8 | 1995 | 7.8 | 1600 | 6.3 | 4896 | 19.2 | 13786 | 53.9 | 25558 |
| Chr10 | 2883 | 13.6 | 1572 | 7.4 | 1378 | 6.5 | 4509 | 21.3 | 10811 | 51.1 | 21153 |
| Chr11 | 3254 | 13.0 | 1876 | 7.5 | 1630 | 6.5 | 4383 | 17.5 | 13973 | 55.6 | 25116 |
| Chr12 | 3316 | 14.1 | 1905 | 8.1 | 1681 | 7.2 | 4693 | 20.0 | 11909 | 50.7 | 23504 |
Summary of alternative splicing events identified in EST and mRNA assembly dataset and each RNA-seq dataset in tomato plants.
| Dataa | AltA | AltD | ExonS | IntronR | Others |
|---|---|---|---|---|---|
| ESTs and mRNAs | 13738 | 7779 | 9355 | 17255 | 32448 |
| RNA-seq projects | |||||
| Alkan (2014) | 2664 | 1841 | 1367 | 3966 | 3057 |
| Chen (2013) | 1425 | 1068 | 711 | 2238 | 1528 |
| Cruz-Mendivil (2015) | 122 | 130 | 88 | 329 | 448 |
| Dai (2017) | 2075 | 1440 | 992 | 2574 | 1920 |
| Du (2015) | 2557 | 1873 | 1273 | 3851 | 2878 |
| Ezura (2017) | 254 | 251 | 88 | 3698 | 1658 |
| Gupta (2013) | 2619 | 1909 | 1375 | 4723 | 3437 |
| Higashi (2016) | 1773 | 1256 | 967 | 7274 | 2916 |
| Koenig (2013) | 1791 | 1376 | 1005 | 2774 | 1998 |
| Lopez-Casado (2011) | 565 | 521 | 353 | 995 | 889 |
| Shi (2013) | 1773 | 1431 | 972 | 4806 | 2810 |
| Shukla (2017) | 18581 | 9519 | 7032 | 20625 | 29217 |
| Sun (2015) | 5309 | 3255 | 1876 | 5925 | 4530 |
| Sundaresan (2016) | 1857 | 1286 | 920 | 2461 | 1802 |
| Tan (2015) | 1268 | 954 | 656 | 1863 | 1287 |
| Tang (2013) | 1353 | 1057 | 816 | 1813 | 1401 |
| Wang (2013) | 5324 | 3022 | 1945 | 5945 | 4683 |
| Wang (2016) | 3603 | 2264 | 1857 | 5079 | 4818 |
| Worley (2016) | 1921 | 1308 | 1078 | 3769 | 2402 |
| Xue (2017) | 6455 | 3602 | 3246 | 8886 | 9113 |
| Yang (2015) | 2093 | 1356 | 1257 | 3259 | 2041 |
| Ye (2015) | 1389 | 1127 | 762 | 3678 | 2112 |
| Zhang (2016) | 6109 | 3399 | 2727 | 7825 | 6880 |
| Zhang (2017) | 2493 | 1623 | 1385 | 3121 | 2396 |
| Zheng (2017) | 412 | 376 | 267 | 972 | 683 |
| Zouari (2014) | 2679 | 1820 | 1436 | 3878 | 3091 |
| Zouine (2014) | 2853 | 1935 | 1357 | 3278 | 2453 |
Protein families in gene models and alternatively spliced genes in tomato plants∗.
| Pfam ID | Total | AS genes | % | Pfam abbreviation | Pfam description |
|---|---|---|---|---|---|
| pfam00069 | 631 | 503 | 79.7 | Pkinase | Protein kinase domain |
| pfam07714 | 467 | 388 | 83.1 | Pkinase_Tyr | Protein tyrosine kinase |
| pfam00067 | 337 | 216 | 64.1 | p450 | Cytochrome P450 |
| pfam13041 | 316 | 247 | 78.2 | PPR_2 | PPR repeat family |
| pfam13639 | 223 | 140 | 62.8 | zf-RING_2 | Ring finger domain |
| pfam00931 | 217 | 151 | 69.6 | NB-ARC | NB-ARC domain |
| pfam00076 | 179 | 162 | 90.5 | RRM_1 | RNA recognition motif |
| pfam00249 | 178 | 111 | 62.4 | Myb_DNA-binding | Myb-like DNA-binding domain |
| pfam00201 | 169 | 86 | 50.9 | UDPGT | UDP-glucoronosyl and UDP-glucosyl transferase |
| pfam10536 | 167 | 101 | 60.5 | PMD | Plant mobile domain |
| pfam03171 | 147 | 87 | 59.2 | 2OG-FeII_Oxy | 2OG-Fe(II) oxygenase superfamily |
| pfam00847 | 137 | 62 | 45.3 | AP2 | AP2 domain |
| pfam02519 | 130 | 35 | 26.9 | Auxin_inducible | Auxin responsive protein |
| pfam00141 | 123 | 81 | 65.9 | peroxidase | Peroxidase |
| pfam05699 | 118 | 28 | 23.7 | Dimer_Tnp_hAT | hAT family C-terminal dimerisation |
| pfam00319 | 109 | 34 | 31.2 | SRF-TF | SRF-type transcription factor |
| pfam02458 | 99 | 51 | 51.5 | Transferase | Transferase family |
| pfam14432 | 95 | 68 | 71.6 | DYW_deaminase | DYW family of nucleic acid deaminases |
| pfam00481 | 91 | 79 | 86.8 | PP2C | Protein phosphatase 2C |
| pfam00854 | 86 | 75 | 87.2 | PTR2 | POT family |
| pfam01095 | 85 | 42 | 49.4 | Pectinesterase | Pectinesterase |
| pfam00561 | 84 | 70 | 83.3 | Abhydrolase_1 | alpha/beta hydrolase fold |
| pfam00657 | 83 | 65 | 78.3 | Lipase_GDSL | GDSL-like Lipase/Acylhydrolase |
| pfam00010 | 82 | 70 | 85.4 | HLH | Helix-loop-helix DNA-binding domain |
| pfam03106 | 82 | 54 | 65.9 | WRKY | WRKY DNA -binding domain |
| pfam01554 | 77 | 66 | 85.7 | MatE | MatE |
| pfam13839 | 75 | 63 | 84.0 | PC-Esterase | GDSL/SGNH-like Acyl-Esterase family found |
| pfam00004 | 74 | 64 | 86.5 | AAA | ATPase family associated with various cellular |
| pfam02362 | 74 | 49 | 66.2 | B3 | B3 DNA binding domain |
| pfam00071 | 73 | 58 | 79.5 | Ras | Ras family |
| pfam00083 | 73 | 53 | 72.6 | Sugar_tr | Sugar (and other) transporter |
| pfam05695 | 72 | 13 | 18.1 | DUF825 | Plant protein of unknown function (DUF825) |
| pfam00082 | 71 | 41 | 57.7 | Peptidase_S8 | Subtilase family |
| pfam13499 | 71 | 40 | 56.3 | EF-hand_7 | EF-hand domain pair |
| Others | 17227 | 13078 | 75.9 | ||
FIGURE 1Gene ontology (GO) classification of tomato genes with pre-mRNAs not undergoing alternative splicing (non-AS genes) and genes with pre-mRNA undergoing alternative splicing (AS-genes). (A) Biological process; (B) Molecular function; (C) Cellular components.
Summary of internal exon length and intron length of all transcripts, and DNA fragment sizes (bp) involved in alternative splicing events in tomato.
| Sample size | Length range (bp) | Mean (bp) | Standard deviation (bp) | |
|---|---|---|---|---|
| Internal exons | 215952 | 1–88397 | 282 | 547 |
| Introns | 282296 | 5–313176 | 1352 | 7609 |
| Retained introns | 70035 | 6–19337 | 366 | 710 |
| Alternative acceptor sites | 47721 | 1–12110 | 145 | 451 |
| Alternative donor sites | 26973 | 1–11238 | 212 | 493 |
| Skipped exons | 22161 | 2–25108 | 214 | 446 |
FIGURE 2Distribution of internal exon size and intron size in tomato genes. Bin size are right inclusive (e.g., bin 100 comprises sequences of lengths 1–100 bp).
The usage of different splicing sites at both ends of the introns in tomato.
| Types | Total | % | Non-AS gene | % | AS gene | % |
|---|---|---|---|---|---|---|
| 5′-GT..AG-3′ | 255669 | 90.6 | 41053 | 89.0 | 214616 | 90.9 |
| 5′-GC..AG-3′ | 7571 | 2.7 | 1511 | 3.3 | 6060 | 2.6 |
| 5′-GC..AT-3′ | 3060 | 1.1 | 712 | 1.5 | 2348 | 1.0 |
| 5′-AT..AC-3′ | 2751 | 1.0 | 844 | 1.8 | 1907 | 0.8 |
| 5′-CT..AC-3′ | 2067 | 0.7 | 232 | 0.5 | 1835 | 0.8 |
| 5′-GT..AT-3′ | 2036 | 0.7 | 554 | 1.2 | 1482 | 0.6 |
| Others | 9142 | 3.2 | 1208 | 2.6 | 7934 | 3.4 |
| Total | 282296 | 46114 | 236182 |
FIGURE 3Pictograms of nucleotide probabilities at each position of the exon-intron junctions in genes not undergoing alternative splicing (non-AS genes) and genes undergoing alternative splicing (AS genes). The 5′-end intronic nucleotides are from position 11 to 20 in the left panel pictograms and the 3′-end intronic nucleotides are from position 1 to 10 in the right panel pictorgrams.