| Literature DB >> 35720116 |
Chun Wu1, Xiaolong Lu1, Shaohua Lu1,2, Hongwei Wang1, Dehua Li1, Jing Zhao1, Jingjie Jin1, Zhenghua Sun1, Qing-Yu He1, Yang Chen1, Gong Zhang1.
Abstract
Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms. Here, we demonstrated full-length translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) sequencing (RNC-seq) strategy to sequence the entire translating mRNA using next-generation sequencing, including short-read and long-read technologies, to construct a protein database containing all translating AS isoforms. Taking the advantage of read length, short-read RNC-seq identified up to 15,289 genes and 15,906 AS isoforms in a single human cell line, much more than the Ribo-seq. The single-molecule long-read RNC-seq supplemented 4,429 annotated AS isoforms that were not identified by short-read datasets, and 4,525 novel AS isoforms that were not included in the public databases. Using such RNC-seq-guided database, we identified 6,766 annotated protein isoforms and 50 novel protein isoforms in mass spectrometry datasets. These results demonstrated the potential of full-length RNC-seq in investigating the proteome of AS isoforms.Entities:
Keywords: alternative splicing; human proteome project; isoform; mass spectrometry; proteome; translatome sequencing
Year: 2022 PMID: 35720116 PMCID: PMC9201276 DOI: 10.3389/fmolb.2022.895746
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1RNC-seq’s advantages over Ribo-seq in detecting splicing isoforms with higher efficiency. (A) Percentage of rRNA reads in RNC-seq and Ribo-seq datasets of MHCC97H and HeLa cells. (B) Percentage of ribosomal RNA reads in RNC-seq and Ribo-seq datasets in human, mouse, and rat. The p-values were obtained from Kolmogorov-Smirnov test. (C) Percentage of uniquely mapping reads (uni-mapped reads/non-rRNA reads) in RNC-seq and Ribo-seq. (D) Number of identified canonical junctions (left) and novel junctions (right) using same number of non-rRNA reads (25M). (E) Number of identified genes and isoforms supported by uniquely mapped reads under the same throughput of raw reads. (F) Number of protein isoforms identified by using RNC-seq and Ribo-seq guided protein database. (G) Proportion of identified distinct proteins by RNC-seq guided database and neXtProt. Ambiguous: these proteins share all their peptides with other proteins, and thus cannot be unambiguously identified.
FIGURE 2The difference of identified isoforms between RNC-seq and Ribo-seq. (A) Venn diagram of isoforms identified by RNC-seq and Ribo-seq using 8 million uniquely mapped reads, respectively. (B) Distribution of identified ncRNA with and without ORFs. (C) Length distribution of the ncRNA without canonical ORF. (D) Types of small ncRNA identified by Ribo-seq of MHCC97H.
FIGURE 3Single-molecule full-length RNC-seq improved the efficiency of isoforms identification. (A) Three types of identified isoforms by Nanopore RNC-seq. “Annotated” means that the isoform is included in NCBI human Refseq database. “in short-read set” means that the isoform can be identified by short-read RNC-seq. (B) All isoforms of GAPDH. The marked isoform was identified by full-length RNC-seq but not by short-read RNC-seq. (C) All isoforms of PCBP2. Nanopore full-length RNC-seq identified a novel isoform that was absent in Refseq and neXtProt databases. (D) The MS-detected specific junction peptide (PSSSPVLFAGGQLTK) that supported the novel junction of PCBP2.