| Literature DB >> 30983154 |
Anil K Madugundu1,2,3,4,5, Chan Hyun Na4,6, Raja Sekhar Nirujogi4, Santosh Renuse4,5, Kwang Pyo Kim7, Kathleen H Burns4,8,9,10, Christopher Wilks11,12, Ben Langmead11,12, Shannon E Ellis12,13, Leonardo Collado-Torres12,14, Marc K Halushka8, Min-Sik Kim7,15, Akhilesh Pandey1,4,5,6,8,16,17.
Abstract
Understanding the molecular profile of every human cell type is essential for understanding its role in normal physiology and disease. Technological advancements in DNA sequencing, mass spectrometry, and computational methods allow us to carry out multiomics analyses although such approaches are not routine yet. Human umbilical vein endothelial cells (HUVECs) are a widely used model system to study pathological and physiological processes associated with the cardiovascular system. In this study, next-generation sequencing and high-resolution mass spectrometry to profile the transcriptome and proteome of primary HUVECs is employed. Analysis of 145 million paired-end reads from next-generation sequencing confirmed expression of 12 186 protein-coding genes (FPKM ≥0.1), 439 novel long non-coding RNAs, and revealed 6089 novel isoforms that were not annotated in GENCODE. Proteomics analysis identifies 6477 proteins including confirmation of N-termini for 1091 proteins, isoforms for 149 proteins, and 1034 phosphosites. A database search to specifically identify other post-translational modifications provide evidence for a number of modification sites on 117 proteins which include ubiquitylation, lysine acetylation, and mono-, di- and tri-methylation events. Evidence for 11 "missing proteins," which are proteins for which there was insufficient or no protein level evidence, is provided. Peptides supporting missing protein and novel events are validated by comparison of MS/MS fragmentation patterns with synthetic peptides. Finally, 245 variant peptides derived from 207 expressed proteins in addition to alternate translational start sites for seven proteins and evidence for novel proteoforms for five proteins resulting from alternative splicing are identified. Overall, it is believed that the integrated approach employed in this study is widely applicable to study any primary cell type for deeper molecular characterization.Entities:
Keywords: RNA-seq; allelic expression; coding SNP; mass-spectrometry; proteoform; proteogenomics; splice variants; transcriptome
Mesh:
Year: 2019 PMID: 30983154 PMCID: PMC6812510 DOI: 10.1002/pmic.201800315
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1Schematic of data analysis workflow employed in this study. Top left and right panels describe the analysis steps and tools employed in transcriptomic and proteomic analysis, respectively. Bottom panel describes the integrated analysis of proteomics and proteogenomics carried out in this manuscript.
Figure 2Summary of results from transcriptomic analysis. A) Distribution of FPKM (log2) over the protein‐coding genes. B) Histogram shows the relative expression of genes expressed across the number of human tissues in GTEx. C) Heatmap showing the relative expression of cell type enriched protein‐coding genes across the human tissues in GTEx. D) Distribution of identified splice junctions. Novel splice junctions were formed by novel splice donor and acceptor sites whereas partial novel junctions were form by a novel donor or acceptor site. E) Comparison to coverage of splice junctions in over 70 000 human RNA‐seq accessions collected in recount2.
Figure 3Integrated analysis of proteomics data. A) Scatter plot showing the comparison of gene expression at mRNA and protein levels. Schematic representation of B) TUBB2A and C) HIST2H3 proteins with identified PTMs. D) Histogram of number of proteins identified with annotated and alternate signal peptide cleavage sites. Alternate cleavage sites flanking to annotated (0) cleavage site.
Selected cell adhesion molecules identified in HUVECs
| Gene symbol | Protein | PSMs | FPKM |
|---|---|---|---|
|
| Intercellular adhesion molecule 2 | 82 | 278.91 |
|
| Platelet endothelial cell adhesion molecule | 431 | 244.76 |
|
| Endothelial cell selective adhesion molecule | 61 | 131.91 |
|
| Junctional adhesion molecule C | 14 | 35.27 |
|
| Junctional adhesion molecule A | 54 | 30.70 |
|
| Neuronal cell adhesion molecule | 35 | 11.61 |
|
| Basal cell adhesion molecule isoform | 13 | 7.65 |
|
| Intercellular adhesion molecule 1 | 59 | 5.91 |
|
| Carcinoembryonic antigen‐related cell adhesion molecule 19 | – | 3.07 |
|
| Cell adhesion molecule 4 | – | 2.28 |
|
| E‐selectin | 3 | 1.86 |
|
| Intercellular adhesion molecule 3 | – | 1.51 |
|
| Carcinoembryonic antigen‐related cell adhesion molecule 21 | – | 1.06 |
|
| Cell adhesion molecule‐related/downregulated by oncogenes | – | 0.95 |
|
| Cell adhesion molecule 3 | 7 | 0.48 |
|
| Carcinoembryonic antigen‐related cell adhesion molecule 1 | – | 0.45 |
|
| Epithelial cell adhesion molecule | – | 0.33 |
|
| Junctional adhesion molecule B | – | 0.27 |
|
| Vascular cell adhesion molecule 1 | – | 0.21 |
|
| Cell adhesion molecule 1 | – | 0.18 |
Figure 4A) A schematic showing the ICAM2 protein with annotated (bottom) and alternate (top) signal peptide cleavage sites. Schematic of proteins with alternate allele(s) identified from cSNP database search at mRNA and protein levels: Both wild type and alternate alleles were found for B) ARL2 protein while only homozygous alternate allele was found for C) AP3B1 protein.
List of peptides identified upstream of annotated protein N‐termini
| Sequence | Gene symbol | Peptide position (from TIS) | Near cognate TIS position (codon) | Evidence from Ribo‐seq study | |
|---|---|---|---|---|---|
| 1. | VGNMSESELGR |
| −3 | −7 (CTG) | |
| 2. | FCLDRPLTTDMSR |
| −10 | −21 (GUG) |
|
| 3. | AAAADGERPGPGPLLVGCGR |
| −68 | −73 (GTG) |
|
| 4. | EAGAGAEAAAGSARPLGR |
| −34 | −43 (CUG) | |
| 5. | EGSEAFAGPLLLPGPGPLMAAIR |
| −18 | −23 (GUG) | |
| 6. | AGGAADMTDNIPLQPVR |
| −6 | −47 (CUG) | |
| 7. | ALEMENSQLCK |
| −3 | −17 (ACG) |
|
TIS, translation initiation site.
Figure 5Schematic representation of alternative transcript expression. A) HNRNPA0 protein identified with an upstream alternate N‐terminus in‐frame with annotated start site (bent arrow) is shown. B) Alternative splice donor in BIRC6 is supported by RNA‐seq and novel junctional peptide. The annotated MS/MS spectra supporting these finding is also shown. Known and novel transcript models are shown in brown and black colors, respectively. Track in red color shows the sashimi plot with thick curves connecting the exon–exon boundaries. Amino acid that span the splicing junction are marked in red.