| Literature DB >> 31829136 |
Paulo Rapazote-Flores1, Micha Bayer1, Linda Milne1, Claus-Dieter Mayer2, John Fuller3, Wenbin Guo4, Pete E Hedley3, Jenny Morris3, Claire Halpin4, Jason Kam4,5, Sarah M McKim4, Monika Zwirek4,6, M Cristina Casao4, Abdellah Barakate3, Miriam Schreiber3, Gordon Stephen1, Runxuan Zhang1, John W S Brown3,4, Robbie Waugh3,4, Craig G Simpson7.
Abstract
BACKGROUND: The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.Entities:
Keywords: Barley; Differential alternative splicing; Differential gene expression; Reference transcript dataset; Transcriptome
Mesh:
Substances:
Year: 2019 PMID: 31829136 PMCID: PMC6907147 DOI: 10.1186/s12864-019-6243-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1BaRTv1.0 assembly and validation pipeline. Steps in construction and validation of BaRTv1.0 and programs used in each step (right hand side)
Fig. 2Benchmarking of 38 different StringTie Morex reference-based assemblies. The four plots show different benchmark tests to assess the parameters used in the StringTie assemblies. The graphs do not start from 0 on the y axis. a Transcript number; b the number of HR RT-PCR products that match transcripts; c correlation of the proportions of transcripts in 86 AS events derived from HR RT-PCR and the RNA-seq data using the different assemblies as reference for transcript quantification by Salmon; and d the number of Haruna nijo fl cDNAs that match RTD transcripts. Each plot point represents the result of a StringTie assembly using different parameters (Additional file 1: Table S2). The broken circled plot points at assembly 4, an assembly using STAR defaults (without splice junction filtering) and StringTie defaults. The solid circled plot point at assembly 34 represents the selected optimised StringTie parameters used to produce BaRTv1.0 (see also Materials and Methods; Additional file 2: Figure S3; Additional file 1: Table S2)
Transcriptome dataset comparisons with HR RT-PCR and Haruna nijo fl cDNAs
| Transcriptome Version | BaRTv1.0 | BaRTv1.0-QUASI | HORVU |
|---|---|---|---|
| # HR RT-PCR products | 220 | 220 | 191 |
| Pearson Correlation | 0.793 | 0.828 | 0.769 |
| Spearman Ranked Correlaton | 0.795 | 0.830 | 0.768 |
| # Complete HN flcDNAs | 17,619 | 17,695 | 17,099 |
| # Genes | 60,444 | 60,444 | 81,683 |
| # Transcripts | 177,240 | 177,240 | 334,126 |
Fig. 3Correlation of alternative splicing from HR RT-PCR and RNA-seq. Percentage spliced in (PSI) values were calculated from relative fluorescence units from HR RT-PCR and transcript abundances (TPM) from RNA-seq data quantified with Salmon using the (a) BaRTv1.0, b HORVU and (c) BaRTv1.0-QUASI transcript datasets as reference. The 86 primer pairs designed to cv. Morex genes covered 220 AS events in BaRTv1.0 (three biological replicates of 5 different barley organs/tissues) giving 1992 data points and 81 primer pairs covered 191 AS events giving 1642 points for HORVU
Characteristics of barley genes and transcripts in BaRTv1.0. Percentages given are of total number of genes or transcripts
| Number of genes | 60,444 |
| Number of predicted transcripts | 177,240 |
| Single exon genes | 25,719 (43%) |
| Multi exon genes | 34,725 (57%) |
| Single transcript genes | 39,534 (65%) |
| Single exon transcripts | 27,754 (16%) |
| Multi-Exon transcripts | 149,486 (84%) |
| Number of Multi-exonic genes with alternative transcript variants | 20910 (60%) |
| Mean number of transcripts per gene | 2.93 |
| Number of distinct exons | 466,247 |
| Mean number of distinct exons per gene | 7.7 |
| Mean transcript locus size (first to last exon) (nt) | 5,633 |
| Mean exon size (nt) | 573 |
Frequencies of different alternative splicing events in BaRTv1.0
| Type of event | # | % |
|---|---|---|
| Alternative 3' | 44,590 | 28.0% |
| Alternative 5' | 37,626 | 23.6% |
| Retained intron | 60,327 | 37.9% |
| Skipped exon | 15,387 | 9.7% |
| Mutually exclusive exons | 1,311 | 0.8% |
| 159,241 | 100.0% |
Fig. 4Differential gene and alternative splicing analysis in five barley organs. a. Numbers of expressed genes, differentially expressed genes (DE) and differential AS (DAS) across all 5 barley organs/tissues. b. Number of up- and down-regulated DE genes between pairs of different organs. Dark blue (up-regulated genes); light blue (down-regulated genes). c Number of DAS genes between pairs of different organs. d. Heatmap and hierarchical clustering of 20,972 DE. e. Heatmap and hierarchical clustering of 2768 DTU transcripts. The z-score scale in D and E represents mean-subtracted normalised log-transformed TPMs
Fig. 5Comparison of alternative splicing in different barley tissues with HR RT-PCR and RNA-seq data. Splicing proportions of four different genes in 5 different barley tissues are presented. a. Hv110; HORVU5Hr1G027080, b. Hv118; HORVU1Hr1G078110, c. Hv173; HORVU7Hr1G062930, d. Hv217; HORVU7Hr1G071060. Schematic transcript/AS models are presented above histograms of PSIs derived from HR-RT-PCR (black) and RNA-seq (white) with standard error bars across three biological repeats. White boxes - exons, lines - introns; chevrons – splicing events; grey boxes region between alternative splice sites; thick intron line represents an intron retention
Fig. 6Alternative splicing in a WW domain containing protein gene (BART1_0-u51812). a. BART1_0-u51812 transcript models represented in the BaRTv1.0 database. b. AS events involving intron 2 validated by HR-RT-PCR. c. AS events between exon 6 and 8 validated by HR-RT-PCR. Electropherogram output from the ABI3730 shows the HR RT-PCR products (x-axis RT-PCR products (bp); y-axis relative fluorescence units). The products expected from RNA-seq are indicated as FS – Fully spliced, AE - Alternative exon, Alt 5’ss - Alternative 5′ splice site, IR-intron retention and Unspl.-Unspliced. * in B. indicates minor alternative transcripts identified in HR RT-PCR and in RNA-seq. + in C. indicates an uncharacterised alternative transcript identified in HR RT-PCR