Literature DB >> 19308253

High-throughput proteomics detection of novel splice isoforms in human platelets.

Karen A Power1, James P McRedmond, Andreas de Stefani, William M Gallagher, Peadar O Gaora.   

Abstract

Alternative splicing (AS) is an intrinsic regulatory mechanism of all metazoans. Recent findings suggest that 100% of multiexonic human genes give rise to splice isoforms. AS can be specific to tissue type, environment or developmentally regulated. Splice variants have also been implicated in various diseases including cancer. Detection of these variants will enhance our understanding of the complexity of the human genome and provide disease-specific and prognostic biomarkers. We adopted a proteomics approach to identify exon skip events - the most common form of AS. We constructed a database harboring the peptide sequences derived from all hypothetical exon skip junctions in the human genome. Searching tandem mass spectrometry (MS/MS) data against the database allows the detection of exon skip events, directly at the protein level. Here we describe the application of this approach to human platelets, including the mRNA-based verification of novel splice isoforms of ITGA2, NPEPPS and FH. This methodology is applicable to all new or existing MS/MS datasets.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19308253      PMCID: PMC2654914          DOI: 10.1371/journal.pone.0005001

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Since the publication of the human genome sequence, understanding the functional complexity of the genome has become a primary goal of high-throughput experimental research. By definition, AS contributes to proteomic complexity but it has also been suggested that AS is a major driver of phenotypic complexity, though this role remains unproven [1]–[3]. By splicing several combinations of exons into different transcripts, AS generates, from a single gene, multiple isoforms of a protein with potentially diverse functions. Not only has AS been invoked as an explanation for our complexity as a species, detection of splice isoforms has been associated with the cause and progression of certain diseases. Alternative splicing is associated with a wide variety of conditions including bipolar disorder, schizophrenia, cancer, diabetes, multiple sclerosis, cystic fibrosis and asthma (for a review see Wang & Cooper [4]). Splice isoforms may be functionally relevant in disease or may act as biomarkers - indicators of normal or altered biological processes or pharmacological response to a therapeutic intervention [5]. Biomarkers such as disease-specific AS isoforms can serve as indicators of disease susceptibility as well as diagnostic and prognostic markers. Alternative splicing occurs in many cell types including platelets - hemostatic, anucleate cells derived from megakaryocytes. Although devoid of a nucleus, they retain low levels of mRNA which undergo translation. They have an intact spliceosome and cellular activation of platelets induces splicing of pre-mRNAs including IL-1β [6] and tissue factor (TF) [7]. Platelets are primarily involved in thrombus formation but their functions also extend to pathophysiological processes such as host defense, regulation of vascular tone, inflammation and tumor growth [8]. Splice isoforms in platelets have been implicated in the variable response to aspirin [9] and as possible antithrombotic drug targets [10]. Blood-based biomarker discovery would provide minimally invasive and sensitive detection of disease-associated molecular changes. Disease biomarkers, serving as specific diagnostic signatures of phenotype, could improve drug discovery and facilitate the development of modern, personalized clinical applications. To date, efforts to detect AS events have relied primarily on sequencing mature mRNA species. The bulk of our knowledge comes from mapping expressed sequence tags (ESTs) to the genome. However, this approach is hindered by the lack of EST coverage with few ESTs sequenced for most genes [11] and the central region of mRNAs inadequately represented. More recently, exon arrays have been developed to determine genome-wide exon expression levels. This technology detects differences in expression across a gene to infer the presence of alternative splicing events, but cannot determine unambiguously what combination of exons is present on a single mRNA. The inference of AS is confounded somewhat by the variable hybridization intensities of neighboring probe sets within a sample and differential gene expression between samples. Ultra high-throughput sequencing addresses some of the problems encountered with previous methods of AS detection [12]. This approach can identify many alternative splice variants if sufficient sequence reads are carried out [13], [14]. As longer sequence reads become available, it will be possible to identify considerable structure flanking a given AS event. The capacity to discover AS events at the mRNA level is very powerful and mRNAseq has provided evidence for AS occurring in 100% of multi-exonic human genes [13]. It remains unclear how many of the splice isoforms identified are sufficiently stable to result in translation products. Studying the proteome circumvents this issue - a recent study by Tress and coworkers for example, demonstrated the presence of translated AS isoforms in Drosophila melanogaster [15]. The development of new, innovative discovery approaches based on protein expression will greatly enhance the existing methodologies. Mass spectrometry (MS) has emerged as a highly effective analytical technique capable of detecting vast numbers of peptides in complex mixtures. This is achieved by mapping spectra generated from a MS experiment to a database of known or, more commonly, theoretically derived spectra to infer the peptide sequence. Exon skip splice isoforms are characterized by the peptides spanning the exon-exon junction of a novel splicing event. To detect these peptides, we generate a database containing the theoretical exon-skip junction peptides across a genome. We then use standard MS search tools to identify junction peptides that represent exon skip events in MS/MS spectra by comparison with this database (Fig. 1). Here, we show that this approach can detect novel exon skip events in human platelets and verify a number of these at the mRNA level.
Figure 1

Workflow for the identification of novel exon skip events.

A rectangle represents a program, rhombus represents a program output, cylinder represents a data source and circle represents a program input.

Workflow for the identification of novel exon skip events.

A rectangle represents a program, rhombus represents a program output, cylinder represents a data source and circle represents a program input.

Results

Database design

The strategy we employed to generate the database (which we call SkipE) is outlined in Figure 1. Transcript and exon data were extracted from Ensembl v46 [16] for all 22,680 annotated human protein-coding genes. To create exon skip junctions in silico, a gene containing multiple transcripts was first reduced to a single ‘full length transcript’ (Fig. 2a) as described in Materials and Methods.
Figure 2

Generation and usage of the SkipE database.

(A) Generation of representative transcripts. Each box represents an exon and each line is an intron. i), ii) and iii) represent transcripts from a single gene. iv) shows the full length representative transcript used to generate the junction peptides. (B) Structure of a junction peptide. The top two boxes represent the translated sequences of two separate, non-adjacent exons. The tryptic cleavage sites are represented by dashed vertical bars. The C-terminal sequence of the upstream exon from the final tryptic site is spliced to the N-terminus of the downstream exon and extends to the first tryptic cleavage site of the downstream exon.

Generation and usage of the SkipE database.

(A) Generation of representative transcripts. Each box represents an exon and each line is an intron. i), ii) and iii) represent transcripts from a single gene. iv) shows the full length representative transcript used to generate the junction peptides. (B) Structure of a junction peptide. The top two boxes represent the translated sequences of two separate, non-adjacent exons. The tryptic cleavage sites are represented by dashed vertical bars. The C-terminal sequence of the upstream exon from the final tryptic site is spliced to the N-terminus of the downstream exon and extends to the first tryptic cleavage site of the downstream exon. All non-contiguous junction peptides in a ‘full length transcript’ were created such that the termini are trypsin cleavage sites (Fig. 2b). It is possible to design a database for other proteolytic enzymes but trypsin is by far the most commonly employed proteinase in proteomics experiments. Combinations of exons yielding junction peptides were constrained by the phase of the exons in order to keep the sequences within the correct reading frame. Phase describes the number of nucleotides upstream of an exon that are used to form a codon so that the length of the exon is a multiple of three. A previous study by Sorek et al. [17] showed, using coding sequence information from Genbank, that the majority of orthologous alternatively spliced exons conserved between human and mouse did not endure a frame shift. Furthermore, it is likely that many phase shifting splice events generate transcripts which are degraded via nonsense-mediated decay [18]. In order to detect only alternative splice events in which the correct reading frame is maintained, the phase of both exons joined by the alternatively spliced junction was calculated and only those junctions with exons of compatible phase were entered into the database. Duplicate entries of the same junction peptide mapping to different genes were removed to eliminate ambiguity, since the source of such peptides could not be ascribed to a particular gene. This procedure yielded 307,030 junction peptides for the human genome. Previous genome-based studies, such as 6-frame translation of the genome, result in search spaces that are incompatible with high-throughput approaches. Genome-based methods that reduce the search space complexity, provide a powerful means to identify new protein-coding exons and genes but are not appropriate for direct mapping of exon skips since these junctions are derived from non-contiguous sequences [19]. The database we constructed, subject to the constraints described, generates a search space appropriate for the high-throughput MS/MS methods in use today and into the future. Further details on the composition of the human, mouse and rat databases are provided (Table S1). The skipE database is in FASTA format and therefore suitable for use with any of the major search engines; in this case we employed SEQUEST [20] combined with PeptideProphet and ProteinProphet for statistical validation of identifications [21]. We chose a cutoff score of 0.9, a commonly used cutoff in MS/MS experiments [22], for both tools. We then determined which junction-spanning peptides are novel and those which were previously described by comparing peptide sequences with the Alternative Splice Transcript Database (ASTD) [23]–[25] and the International Protein Index database (IPI) [26] using WU-BLAST (http://blast.wustl.edu). This also filters out junction peptides which are identical to sequences within “canonical” isoforms, whether they occur at exon boundaries or elsewhere.

Identification of platelet proteins and AS peptides

Platelet mass spectra were collected and compared with both the IPI and SkipE databases to identify peptides. The number of peptides and proteins identified in each database are shown in table S2. SEQUEST searching against IPI identified 6,292 unique peptides representing 1,122 unique proteins in the samples with a ProteinProphet probability score of P>0.9. Since the SkipE database harbors peptide rather than protein sequences, ProteinProphet is inappropriate. Therefore, spectra identified by comparison with SkipE were validated using a PeptideProphet probability cut-off of 0.9 resulting in 1,297 unique protein identifications. Of these, 359 were represented by more than a single occurrence of the peptide in the dataset. The spatial distribution of AS identifications closely mirrors that of the IPI data with the exception of the releasate (Fig. 3a, b). In this case, more skips were found in the activated than in the resting samples for the AS data. Although the activation step was very brief, this may indicate a tendency towards diversification of the exported proteome in response to platelet activation. Functionally, this would be advantageous since these cells must interact with the milieu and other cell types but cannot mount a transcriptomic response to stimuli. All identified proteins in both SkipE and IPI data were mapped to KEGG pathways using Pathway-Express [27] (Table S3 and S4). In a typical MS/MS data analysis, protein identifications rely on multiple peptide identifications for any given protein. Since SkipE harbors isolated peptide sequences, we decided to focus further experiments on those AS events for which evidence of cognate gene expression was also obtained in the IPI analysis. Therefore, we constructed a list of 89 genes which represents the intersection of the AS and IPI datasets (Table 1).
Figure 3

Characteristics of the exon skip events detected in human platelets.

(A) and (B) describe the distribution of the SkipE and IPI peptides respectively, across the different subcellular compartments for both resting and activated platelet samples.

Table 1

Description of the 89 genes identified in both SkipE and IPI.

Gene SymbolEnsembl Gene IDExon ID 1Exon ID 2Description
ACLYENSG00000131473ENSE00000898911ENSE00000898879ATP-citrate synthase
ACOX1ENSG00000161533ENSE00001222343ENSE00001117984Acyl-coenzyme A oxidase 1, peroxisomal
ACTN4ENSG00000130402ENSE00000895798ENSE00000895787Alpha-actinin-4
ALOX12ENSG00000108839ENSE00000905333ENSE00000887238Arachidonate 12-lipoxygenase, 12S-type
AMPD2ENSG00000116337ENSE00001153152ENSE00000913099AMP deaminase 2
AP1B1ENSG00000100280ENSE00000652055ENSE00000652051AP-1 complex subunit beta-1
APOBENSG00000084674ENSE00000932268ENSE00000718984Apolipoprotein B-100 precursor
APOL1ENSG00000100342ENSE00000935990ENSE00001369317Apolipoprotein-L1 precursor
ARHGEF7ENSG00000102606ENSE00000686804ENSE00000686825Rho guanine nucleotide exchange factor 7
ARHGEF7ENSG00000102606ENSE00001236980ENSE00000686833Rho guanine nucleotide exchange factor 7
ATICENSG00000138363ENSE00001363573ENSE00001146950Bifunctional purine biosynthesis protein PURH
ATP5C1ENSG00000165629ENSE00001481323ENSE00001094820ATP synthase gamma chain, mitochondrial precursor
C2ENSG00000204364ENSE00001467298ENSE00001467293Complement C2 precursor
C21orf33ENSG00000160221ENSE00001506662ENSE00001506660ES1 protein homolog, mitochondrial precursor
C3ENSG00000125730ENSE00001053527ENSE00000858107Complement C3 precursor
C3ENSG00000125730ENSE00001053551ENSE00000858104Complement C3 precursor
CCT5ENSG00000150753ENSE00001082664ENSE00001082663T-complex protein 1 subunit epsilon
CD109ENSG00000156535ENSE00001144336ENSE00001144250CD109 antigen precursor
CD109ENSG00000156535ENSE00001144243ENSE00001084417CD109 antigen precursor
CLTCENSG00000141367ENSE00000948100ENSE00000948105Clathrin heavy chain 1
CLTCL1ENSG00000070371ENSE00000596272ENSE00001343357Clathrin heavy chain 2
COL14A1ENSG00000187955ENSE00001022732ENSE00001090753Collagen alpha-1
COL14A1ENSG00000187955ENSE00000702894ENSE00001476378Collagen alpha-1
COPB1ENSG00000129083ENSE00000886038ENSE00000703797Coatomer subunit beta
COPB2ENSG00000184432ENSE00001322263ENSE00001311447Coatomer subunit beta
CPENSG00000047457ENSE00001008190ENSE00000779559Ceruloplasmin precursor
CSE1LENSG00000124207ENSE00000845497ENSE00000845507Exportin-2
CYFIP1ENSG00000068793ENSE00000883355ENSE00000883353Cytoplasmic FMR1-interacting protein 1
DCTN1ENSG00000204843ENSE00001261315ENSE00001199793Dynactin-1
ENO1ENSG00000074800ENSE00000739712ENSE00000738913Alpha-enolase
FAM62AENSG00000139641ENSE00000939452ENSE00000939471Protein FAM62A
FHENSG00000091483ENSE00000961691ENSE00001069123Fumarate hydratase, mitochondrial precursor
FLIIENSG00000177731ENSE00001289389ENSE00001289270Protein flightless-1 homolog.
FLNAENSG00000196924ENSE00000678331ENSE00000868362Filamin-A
GLUD1ENSG00000148672ENSE00000986500ENSE00000986506Glutamate dehydrogenase 1, mitochondrial precursor
GPD2ENSG00000115159ENSE00000924640ENSE00001188495Glycerol-3-phosphate dehydrogenase, mitochondrial precursor
GUCY1A3ENSG00000164116ENSE00001231799ENSE00001081588Guanylate cyclase soluble subunit alpha-3
HDENSG00000197386ENSE00000854949ENSE00000854981Huntington disease protein
HDENSG00000197386ENSE00000854965ENSE00001251513Huntington disease protein
HDENSG00000197386ENSE00000854958ENSE00000854991Huntington disease protein
HDENSG00000197386ENSE00000854979ENSE00000855002Huntington disease protein
HERC2ENSG00000128731ENSE00000672196ENSE00001275912Probable E3 ubiquitin-protein ligase HERC2
HERC2ENSG00000128731ENSE00000672179ENSE00001275876Probable E3 ubiquitin-protein ligase HERC2
HERC2ENSG00000128731ENSE00000908550ENSE00000908562Probable E3 ubiquitin-protein ligase HERC2
HK1ENSG00000156515ENSE00001145338ENSE00001276961Hexokinase-1
HSD17B4ENSG00000133835ENSE00001143964ENSE00000972282Peroxisomal multifunctional enzyme type 2
HSD17B4ENSG00000133835ENSE00001143927ENSE00000972282Peroxisomal multifunctional enzyme type 2
HSD17B4ENSG00000133835ENSE00001169924ENSE00001144014Peroxisomal multifunctional enzyme type 2
HSD17B4ENSG00000133835ENSE00001143964ENSE00001143927Peroxisomal multifunctional enzyme type 2
HYOU1ENSG00000149428ENSE00001195270ENSE00000990519150 kDa oxygen-regulated protein precursor
IQGAP2ENSG00000145703ENSE00000971759ENSE00001030776Ras GTPase-activating-like protein IQGAP2.
ITGA2ENSG00000164171ENSE00001082079ENSE00001082066Integrin alpha-2 precursor
ITGA2ENSG00000164171ENSE00001082085ENSE00001082079Integrin alpha-2 precursor
ITGB3ENSG00000056345ENSE00000947489ENSE00000735016Integrin beta-3 precursor
ITIH2ENSG00000151655ENSE00001415117ENSE00001395332Inter-alpha-trypsin inhibitor heavy chain H2 precursor
ITPR1ENSG00000150995ENSE00001072653ENSE00001122088Inositol 1,4,5-trisphosphate receptor type 1
KIF5BENSG00000170759ENSE00001163763ENSE00001163716Kinesin heavy chain
KRT16ENSG00000186832ENSE00001118312ENSE00001118295Keratin, type I cytoskeletal 16
KTN1ENSG00000126777ENSE00001292736ENSE00000867340Kinectin
LCP2ENSG00000043462ENSE00000769281ENSE00000812799Lymphocyte cytosolic protein 2
LRRFIP2ENSG00000093167ENSE00000825531ENSE00000760563Leucine-rich repeat flightless-interacting protein 2
LTBP1ENSG00000049323ENSE00000932484ENSE00000932488Latent-transforming growth factor beta-binding protein, isoform 1L precursor
LTBP1ENSG00000049323ENSE00000932483ENSE00001006678Latent-transforming growth factor beta-binding protein, isoform 1L precursor
LTBP1ENSG00000049323ENSE00000932485ENSE00000744639Latent-transforming growth factor beta-binding protein, isoform 1L precursor
LTBP1ENSG00000049323ENSE00000809557ENSE00000744639Latent-transforming growth factor beta-binding protein, isoform 1L precursor
MACF1ENSG00000127603ENSE00001041391ENSE00001079474Microtubule-actin cross-linking factor 1, isoforms 1/2/3/5
MACF1ENSG00000127603ENSE00001408360ENSE00001218066Microtubule-actin cross-linking factor 1, isoforms 1/2/3/5
MACF1ENSG00000127603ENSE00001411283ENSE00001218029Microtubule-actin cross-linking factor 1, isoforms 1/2/3/5
MACF1ENSG00000127603ENSE00001411283ENSE00001041391Microtubule-actin cross-linking factor 1, isoforms 1/2/3/5
MMRN1ENSG00000138722ENSE00001003940ENSE00001003943Multimerin-1 precursor
MTCH2ENSG00000109919ENSE00000714864ENSE00001267224Mitochondrial carrier homolog 2
MTHFD1ENSG00000100714ENSE00000658410ENSE00000658424C-1-tetrahydrofolate synthase, cytoplasmic
MTHFD1ENSG00000100714ENSE00000658406ENSE00000658420C-1-tetrahydrofolate synthase, cytoplasmic
MYH4ENSG00000141048ENSE00000907666ENSE00000907657Myosin-4
NID2ENSG00000087303ENSE00000854715ENSE00000657316Nidogen-2 precursor
NID2ENSG00000087303ENSE00000657316ENSE00000854708Nidogen-2 precursor
NPEPPSENSG00000141279ENSE00001138170ENSE00001138132Puromycin-sensitive aminopeptidase
NRBP1ENSG00000115216ENSE00000809167ENSE00000733215Nuclear receptor-binding protein.
OGDHENSG00000105953ENSE00000681534ENSE000006815482-oxoglutarate dehydrogenase E1 component, mitochondrial precursor
PDIA5ENSG00000065485ENSE00001149277ENSE00001353839Protein disulfide-isomerase A5 precursor
PICALMENSG00000073921ENSE00000742961ENSE00001376469Phosphatidylinositol-binding clathrin assembly protein
PIP5K2AENSG00000150867ENSE00000996551ENSE00000996552Phosphatidylinositol-4-phosphate 5-kinase type-2 alpha
PKHD1L1ENSG00000205038ENSE00001477427ENSE00001477413fibrocystin L
PKHD1L1ENSG00000205038ENSE00001477417ENSE00001477394fibrocystin L
PKHD1L1ENSG00000205038ENSE00001477471ENSE00001477421fibrocystin L
PKHD1L1ENSG00000205038ENSE00001477455ENSE00001477439fibrocystin L
PKHD1L1ENSG00000205038ENSE00001477437ENSE00001477347fibrocystin L
PKHD1L1ENSG00000205038ENSE00001477474ENSE00001477449fibrocystin L
PLEC1ENSG00000178209ENSE00001244151ENSE00001244041Plectin-1
PLEC1ENSG00000178209ENSE00001244070ENSE00001295392Plectin-1
PLGENSG00000122194ENSE00000828808ENSE00001315450Plasminogen precursor
PLXDC2ENSG00000120594ENSE00001137970ENSE00000996527Plexin domain-containing protein 2 precursor
PROS1ENSG00000184500ENSE00001142430ENSE00001142413Vitamin K-dependent protein S precursor.
PSMC6ENSG00000100519ENSE00000657442ENSE0000065744826S protease regulatory subunit S10B
PTPN18ENSG00000072135ENSE00000436095ENSE00000776192Tyrosine-protein phosphatase non-receptor type 18
RAB8AENSG00000167461ENSE00001113277ENSE00001277163Ras-related protein Rab-8A
RASA3ENSG00000185989ENSE00001334941ENSE00001334928Ras GTPase-activating protein 3
RTN2ENSG00000125744ENSE00000858227ENSE00000858223Reticulon-2
SNX17ENSG00000115234ENSE00000734775ENSE00000734780Sorting nexin-17
SNX17ENSG00000115234ENSE00000962998ENSE00000734785Sorting nexin-17
SPTBN1ENSG00000115306ENSE00001036038ENSE00001036017Spectrin beta chain, brain 1
SRCENSG00000197122ENSE00001390472ENSE00000661882Proto-oncogene tyrosine-protein kinase Src
STOMENSG00000148175ENSE00000983575ENSE00001262522Erythrocyte band 7 integral membrane protein
THBS1ENSG00000137801ENSE00000883758ENSE00000883772Thrombospondin-1 precursor
TMEM33ENSG00000109133ENSE00001489658ENSE00000712706Transmembrane protein 33
TMOD3ENSG00000138594ENSE00001170748ENSE00001102815Tropomodulin-3
TPD52L2ENSG00000101150ENSE00000663594ENSE00001391722Tumor protein D54
UBASH3BENSG00000154127ENSE00001014167ENSE00001014158Suppressor of T-cell receptor signaling 1
UBE1LENSG00000182179ENSE00001305417ENSE00001306981Ubiquitin-activating enzyme E1 homolog
UGCGL1ENSG00000136731ENSE00001148961ENSE00001206051UDP-glucose:glycoprotein glucosyltransferase 1 precursor
UGP2ENSG00000169764ENSE00001189522ENSE00001165982UTP–glucose-1-phosphate uridylyltransferase 2
UNC13DENSG00000092929ENSE00001227797ENSE00001406672Unc-13 homolog D
UNC13DENSG00000092929ENSE00001227615ENSE00001430590Unc-13 homolog D
USP14ENSG00000101557ENSE00001208659ENSE00001252715Ubiquitin carboxyl-terminal hydrolase 14
VPS13AENSG00000197969ENSE00001024130ENSE00000803886Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00001024085ENSE00000708339Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00000708190ENSE00000708458Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00001171911ENSE00000803905Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00001024110ENSE00000708298Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00001024141ENSE00001024126Vacuolar protein sorting-associated protein 13A
VPS13AENSG00000197969ENSE00001024130ENSE00000707929Vacuolar protein sorting-associated protein 13A
VPS13CENSG00000129003ENSE00001124918ENSE00000885044Vacuolar protein sorting-associated protein 13C.
VPS13CENSG00000129003ENSE00001124912ENSE00001364815Vacuolar protein sorting-associated protein 13C.
VPS13CENSG00000129003ENSE00000449795ENSE00001380396Vacuolar protein sorting-associated protein 13C.
VPS13CENSG00000129003ENSE00000885045ENSE00001368990Vacuolar protein sorting-associated protein 13C.
VPS13CENSG00000129003ENSE00000885061ENSE00001484949Vacuolar protein sorting-associated protein 13C.
VPS13CENSG00000129003ENSE00000885061ENSE00000885051Vacuolar protein sorting-associated protein 13C.
WASENSG00000015285ENSE00000669947ENSE00001255082Wiskott-Aldrich syndrome protein
WDR44ENSG00000131725ENSE00000899838ENSE00000899846WD repeat protein 44

The Gene symbol, Ensembl gene and exon identifiers and the gene descriptions are listed for all 129 junctions found in 89 genes. The exon identifiers one and two indicate the exons involved in the junction peptide identified in SkipE.

Characteristics of the exon skip events detected in human platelets.

(A) and (B) describe the distribution of the SkipE and IPI peptides respectively, across the different subcellular compartments for both resting and activated platelet samples. The Gene symbol, Ensembl gene and exon identifiers and the gene descriptions are listed for all 129 junctions found in 89 genes. The exon identifiers one and two indicate the exons involved in the junction peptide identified in SkipE.

Verification of splice variants at mRNA level

We confirmed the presence of several mRNA species encoding previously undescribed exon skip events by RT-PCR and sequencing of the products. We chose 3 junctions identified in the SkipE data for which evidence of protein expression was obtained in the IPI search (Fig. 4 ). The proteins chosen were integrin alpha 2 or platelet glycoprotein Ia (ITGA2), fumarate hydratase (FH) and puromycin-sensitive aminopeptidase (NPEPPS). These proteins represent different compartments and perform various roles in the cell.
Figure 4

Exon skip events verified at mRNA level.

Each numbered box represents an exon and the position in the gene. The skip event is indicated by the diagonal lines. The parallelograms enclose the portion of amino acid sequence that is absent from the novel splice isoform. The bold and underlined form the junction peptides.

Exon skip events verified at mRNA level.

Each numbered box represents an exon and the position in the gene. The skip event is indicated by the diagonal lines. The parallelograms enclose the portion of amino acid sequence that is absent from the novel splice isoform. The bold and underlined form the junction peptides. ITGA2 forms part of a platelet collagen receptor, involved in the initial adhesion of platelets to extracellular matrix exposed at sites of endothelial injury, such as atherosclerotic lesions [28], [29]. Splice variants may be functionally significant: a platelet-specific splice variant may allow some tissue specific functions, while polymorphic variations in ITGA2 are associated with risk of thrombotic stroke [30]. The junction peptide we identified, which was formed by splicing exon 26 to exon 29, occurred 3 times in the SkipE data and 16 peptides were present for this protein in the IPI data. This splicing event results in the deletion of 68 amino acids proximal to the single transmembrane domain on the extracellular surface, far from any reported ligand-binding domains. Similar changes in the length of the ‘stalk’ of the platelet adhesion receptor GPIb are reported to affect the ability of platelets to adhere at high flow rates [31]. FH is a Krebs' cycle enzyme which is located in the cytosol or can be transported to the mitochondrion and has been shown to act as a tumor suppressor [32]. The FH junction under study was formed by splicing exon 2 to exon 6 and was identified 5 times with 7 different peptides identified in the IPI data. The final protein selected, NPEPPS, is a puromycin-sensitive aminopeptidase, common in brain and immune tissues. NPEPPS may play a role in cell development and cell cycle-regulating proteolysis [33]. The NPEPPS junction identified was created via the splicing of exon 10 to exon 17 and occurred 4 times while 4 peptides were identified in IPI sequences. The NPEPPS event was the longest skip we investigated, removing 6 exons. Interestingly, skips of up to 96 exons were observed – the distribution of skip lengths shown in Fig. 5 is highly reminiscent of that observed by Sultan et al. in mRNAseq data [14]. Such long skips remain to be verified (perhaps by the use of 2-dimensional gel separation followed by Western blotting and/or MS), as the number of other potential AS events in genes exhibiting long range AS gives rise to multiple PCR products (data not shown). Primer pairs specific to the exons involved in each junction generated multiple or ambiguous products with a predominant band migrating at the “canonical” amplicon length. It is likely that the AS message is present in relatively small amounts and is out-competed by the canonical isoform in PCR. Therefore, we designed primers to span the novel junctions and paired them with compatible reverse primers providing a skip-specific PCR primer pair (Table 2).
Figure 5

Frequency distribution of exon skip lengths identified in the SkipE database.

The number of occurrences of each skip length identified is shown.

Table 2

Design of exon-junction-spanning PCR primers.

GeneJunction PeptideJunction PrimerProduct Length (bp)
NPEPPS AQELDALDNSHPIEARTCCTATTGAAGCTCGAGCTG200
PIEAR
FH MPEFSGYVQQVKAACGCATGCCAGAATTTAGTG165
MPEFS
ITGA2 ELIPLMIMKPDEKCCAAAGAATTGATTCCCCTGA115
ELIPL

The gene symbol, junction peptide sequence, junction primer and product length are shown. The junction primer column indicates, with a dash, the exon-exon boundary.

Frequency distribution of exon skip lengths identified in the SkipE database.

The number of occurrences of each skip length identified is shown. The gene symbol, junction peptide sequence, junction primer and product length are shown. The junction primer column indicates, with a dash, the exon-exon boundary. PCR products of the expected sizes were observed in each case with cDNA derived both from platelets and from their precursors, megakaryocytes. The bands derived from platelet cDNA were excised and the sequence verified that the predicted products were obtained. It can be seen from Figure 6 that the megakaryocyte template produced a greater quantity of the amplicon in each case, reflective of the availability of template rather than an increased proportion of AS message in these cells.
Figure 6

Agarose gel electrophoresis of PCR products.

Lanes 1–5 show products amplified from megakaryocyte cDNA; samples in lanes 6–10 contained template from platelets. Molecular size markers are in lane 11.

Agarose gel electrophoresis of PCR products.

Lanes 1–5 show products amplified from megakaryocyte cDNA; samples in lanes 6–10 contained template from platelets. Molecular size markers are in lane 11.

Discussion

Our findings demonstrate that many exon skip events, which have not been previously described, occur in platelets. These events have been found in a novel high-throughput fashion. The approach described is compatible with existing MS/MS software solutions accessible to the scientific community. We have shown that, while these events were found computationally, using a proteomics platform, we selected and verified three of them at the transcriptomic level by PCR and sequencing. It is notable that the overlap of proteins, identified in the AS and IPI databases, is relatively low – just 89 genes were represented by peptides in both datasets. In common with many other high-throughput experimental approaches such as yeast two-hybrid and protein interaction networking [34], [35], MS/MS proteomics experiments suffer from a lack of completeness - that is, coverage of the proteome is neither absolute nor unbiased. The completeness of proteomics experiments is increased by high-throughput approaches although approximately 10 repetitions of a multidimensional protein identification technology (MudPIT) experiment are required to reach 95% analytical completeness [36], [37]. The proteins identified in any given experiment will be constrained by a number of factors including expression level and presence of proteotypic peptides [38]. In the case of splice isoforms, these will not necessarily correspond to the ‘canonical’ isoforms. Therefore, although, in this experiment we used IPI-based detection of protein expression to filter potential targets for verification, it is clear that not all genes displaying AS will also be detected as canonical isoforms and vice versa. Although we applied a relatively strict cutoff of 0.9 to the SkipE hits, given the fact that they are subject to only PeptideProphet and not ProteinProphet validation, it is possible there are more false positives in the SkipE data than the IPI results. Ultra high throughput mRNAseq verification of high numbers of skip events detected in proteomics data will demonstrate the synergy derived from the combination of high-throughput techniques and these datasets will provide mutual cross validation. It appears that the novel splice events detected in this study were most likely inherited from the precursor rather than being specific to the platelet. It will be interesting to determine the distribution of these events in a variety of cell types and tissues across different organisms. It will also be of interest to determine whether any of the exon skip events occur specifically in the platelet since it is known that splicing can occur in these cells, despite the absence of a nucleus. While exon skips are the most common type of AS event described to date [13], [14], [39], [40], several other splicing patterns occur during transcription including alternative 5′ and 3′ splice sites and intron retention. These events require a different approach to detection in proteomics data. Clearly, intron-specific peptides could be incorporated into the SkipE database, though this would considerably increase the database size. A parallel intron peptide database would be a feasible approach. Alternative 5′ or 3′ splice sites on the other hand are not amenable to detection in this manner and require an alternative approach. In conclusion, we have developed a novel database, suitable for the detection of alternative splicing in mass spectrometry data and shown that it can detect AS events in a platelet MS/MS dataset. The approach described augments current methodologies. Detection of AS directly at the protein level avoids any requirement for amplification steps and indicates that the events detected are indeed expressed. Millions of spectra, which are already available in both public and private repositories, can be reanalyzed using this database. As label-free quantitation tools are incorporated into proteomics pipelines, the added value becomes even greater as isoforms can be compared at the expression level within and between samples. Again, this approach is applicable to the vast repositories of data already gathered as well as to all new samples. The application of this methodology will rapidly give us new insights into AS throughout a range of tissue types and biological states. Since AS events have previously been associated with particular diseases, the approach described here will allow the discovery of disease-specific biomarkers at the splice isoform level. As the proteome is the network most closely related to the biological phenotype, the potential to discover clinically relevant biomarkers related to diagnosis, prognosis or susceptibility is immense, impacting on all levels of clinical practice and drug development.

Note added in proof

During the review process a similar database development was described by Mo et al. [41].

Materials and Methods

Platelet MS/MS data acquisition

Platelets were prepared as previously described in McRedmond et al. [42] and incubated at 37°C with stirring. One sample was activated by the addition of 5 µM thrombin receptor activating peptide for 5 minutes. Resting and activated samples were separated into subcellular compartments using a ProteoExtract subcellular proteome extraction kit (Merck Biosciences, Nottingham, UK). The manufacturer's protocol was modified to ensure separation of platelet pellets from supernatants and to allow the recovery of released platelet proteins. This procedure yields a ‘nuclear’ fraction, which is artefactual when applied to platelets. Fractions from resting and thrombin receptor activating peptide-activated platelets were separated by SDS-PAGE; gel lanes were cut into 32 slices and digested with trypsin. Peptides were separated by single-dimension reverse-phase liquid chromatography and analysed using an LTQ ion trap mass spectrometer (Thermo-Finnigan, San Jose, CA) [43].

Public data repositories

Ensembl version 46 was used to obtain all protein coding genes and sequences, along with their associated exon predictions for the human, mouse and rat genomes. Previously annotated AS events in our dataset were filtered out by comparing sequences with ASTD version 1.1 and IPI version 3.16 using Washington University basic local alignment search tool (WU-BLAST) version 2.0, applying the pam30 substitution matrix.

Database development

Transcript and exon data were extracted, via the Perl-API, from Ensembl v46 for all annotated genes in each of the human, mouse and rat genomes. For each species, a separate database was generated. Briefly, a standard “full-length transcript” containing, for each exon position along the transcript, the longest predicted exon sequence was generated. This procedure yields a single, representative, “standard” transcript from which to design junction peptides. The junction peptides are the derived peptide sequences that span exon-exon junctions from the most C-terminal protease site in the upstream exon to the most N-terminal protease site in the downstream exon. In this case, we used trypsin as the protease. Only the junctions of non-consecutive exons were included in the database and the content was further constrained by only including junctions in which phase was maintained between exons. The fasta files for all three species are publicly available online at http://bioinformatics.ucd.ie/SkipE.

MS/MS data analysis

All MS/MS data analyses were carried out using the Proline proteomics platform (Biontrack, Dublin http://www.biontrack.com). Spectra were compared against databases using SEQUEST [20]. Validation of peptides and proteins was carried out using the transproteomics pipeline tools PeptideProphet and ProteinProphet [21], respectively, and filtered with a cut-off of P>0.9.

RNA isolation

RNA from platelet and the megakaryocytic cell line Meg-01 was isolated as previously described [42] and reverse-transcribed into cDNA using standard techniques.

Validation

PCR and sequencing was carried out to validate the alternative splicing events. All primer synthesis and sequencing was carried out by MWG biotech (http://www.eurofinsdna.com/). Primer sequences for ITGA2 were, forward CAAAGAATTGATTCCCCTGA and reverse TGCAACCAGAGCTAACAGCA. NPEPPS forward primer is TCCTATTGAAGCTCGAGCTG and reverse CAGCCCAGTCTCTCCCCTAT and FH forward primer is AACGCATGCCAGAATTTAGTG and reverse is CCACTTTTGCAGCAACCTTT. The PCR reactions were made up as follows; 8 µl 5× GoTaq buffer, 1 µl Taq polymerase, 2 µl 4 mM dNTPs (Promega), 22 µl H2O, 2 µl primers and 1.5 µl template. The following PCR conditions were used: 2 minutes of denaturation at 94°C followed by 40 cycles of 30 seconds denaturing at 94°C, 30 seconds annealing at 55°C for NPEPPS and 58°C for FH and ITGA2 and a 90 second extension at 72°C followed by incubation at 4°C. Products were separated on 2% agarose gels. Positive control was integrin ITGA2B (α2B), a known abundant platelet glycoprotein. Negative control was a no-template RT reaction. Characteristics of the contents and constraints applied to create the species-specific SkipE databases. (0.03 MB DOC) Click here for additional data file. Numbers of platelet peptide and protein identifications in IPI and SkipE databases (0.03 MB DOC) Click here for additional data file. KEGG annotations for all of the 89 genes found to be alternatively spliced and represented in the IPI data. In total, 32 pathways were found. These pathways are sorted by impact factor, a probabilistic term which is calculated from the number of genes in the input file, the size of the reference chip (U133 plus2.0), the number of input genes that are on a given pathway and the number of the pathway genes represented on the reference chip. (0.08 MB DOC) Click here for additional data file. KEGG annotations for all the genes found in IPI. In total, 78 pathways were found. These pathways are sorted by impact factor, a probabilistic term which is calculated from the number of genes in the input file, the size of the reference chip (U133 plus2.0), the number of input genes that are on a given pathway and the number of the pathway genes represented on the reference chip. (0.17 MB DOC) Click here for additional data file.
  43 in total

Review 1.  Biomarkers and surrogate endpoints: preferred definitions and conceptual framework.

Authors: 
Journal:  Clin Pharmacol Ther       Date:  2001-03       Impact factor: 6.875

2.  A genomic view of alternative splicing.

Authors:  Barmak Modrek; Christopher Lee
Journal:  Nat Genet       Date:  2002-01       Impact factor: 38.330

3.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors:  Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal:  Anal Chem       Date:  2002-10-15       Impact factor: 6.986

Review 4.  How prevalent is functional alternative splicing in the human genome?

Authors:  Rotem Sorek; Ron Shamir; Gil Ast
Journal:  Trends Genet       Date:  2004-02       Impact factor: 11.639

Review 5.  Alternative splicing: new insights from global analyses.

Authors:  Benjamin J Blencowe
Journal:  Cell       Date:  2006-07-14       Impact factor: 41.582

6.  Computational prediction of proteotypic peptides for quantitative proteomics.

Authors:  Parag Mallick; Markus Schirle; Sharon S Chen; Mark R Flory; Hookeun Lee; Daniel Martin; Jeffrey Ranish; Brian Raught; Robert Schmitt; Thilo Werner; Bernhard Kuster; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2006-12-31       Impact factor: 54.908

7.  Alternative splicing of platelet cyclooxygenase-2 mRNA in patients after coronary artery bypass grafting.

Authors:  Petra Censarek; Gerhard Steger; Carla Paolini; Thomas Hohlfeld; Tilo Grosser; Norbert Zimmermann; Diana Fleckenstein; Karsten Schrör; Artur-Aron Weber
Journal:  Thromb Haemost       Date:  2007-12       Impact factor: 5.249

8.  A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.

Authors:  Marc Sultan; Marcel H Schulz; Hugues Richard; Alon Magen; Andreas Klingenhoff; Matthias Scherf; Martin Seifert; Tatjana Borodina; Aleksey Soldatov; Dmitri Parkhomchuk; Dominic Schmidt; Sean O'Keeffe; Stefan Haas; Martin Vingron; Hans Lehrach; Marie-Laure Yaspo
Journal:  Science       Date:  2008-07-03       Impact factor: 47.728

9.  Platelet glycoprotein Ibalpha polymorphisms modulate the risk for myocardial infarction.

Authors:  Margareth C Ozelo; Andrea F Origa; Francisco J P Aranha; Antonio P Mansur; Joyce M Annichino-Bizzacchi; Fernando F Costa; Eleanor S Pollak; Valder R Arruda
Journal:  Thromb Haemost       Date:  2004-08       Impact factor: 5.249

10.  Different levels of alternative splicing among eukaryotes.

Authors:  Eddo Kim; Alon Magen; Gil Ast
Journal:  Nucleic Acids Res       Date:  2006-12-07       Impact factor: 16.971

View more
  19 in total

1.  Greater collagen-induced platelet aggregation following cyclooxygenase 1 inhibition predicts incident acute coronary syndromes.

Authors:  Rehan Qayyum; Diane M Becker; Lisa R Yanek; Nauder Faraday; Dhananjay Vaidya; Rasika Mathias; Brian G Kral; Lewis C Becker
Journal:  Clin Transl Sci       Date:  2014-07-25       Impact factor: 4.689

2.  Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.

Authors:  Gloria M Sheynkman; Michael R Shortreed; Brian L Frey; Lloyd M Smith
Journal:  Mol Cell Proteomics       Date:  2013-04-29       Impact factor: 5.911

3.  SpliceVista, a tool for splice variant identification and visualization in shotgun proteomics data.

Authors:  Yafeng Zhu; Lina Hultin-Rosenberg; Jenny Forshed; Rui M M Branca; Lukas M Orre; Janne Lehtiö
Journal:  Mol Cell Proteomics       Date:  2014-04-01       Impact factor: 5.911

Review 4.  A tour through the transcriptional landscape of platelets.

Authors:  Sebastian Schubert; Andrew S Weyrich; Jesse W Rowley
Journal:  Blood       Date:  2014-06-05       Impact factor: 22.113

5.  Identification of potential tumor-educated platelets RNA biomarkers in non-small-cell lung cancer by integrated bioinformatical analysis.

Authors:  Linlin Xue; Li Xie; Xingguo Song; Xianrang Song
Journal:  J Clin Lab Anal       Date:  2018-04-17       Impact factor: 2.352

6.  Investigating protein isoforms via proteomics: a feasibility study.

Authors:  Paul Blakeley; Jennifer A Siepen; Craig Lawless; Simon J Hubbard
Journal:  Proteomics       Date:  2010-03       Impact factor: 3.984

Review 7.  Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

Authors:  Gloria M Sheynkman; Michael R Shortreed; Anthony J Cesnik; Lloyd M Smith
Journal:  Annu Rev Anal Chem (Palo Alto Calif)       Date:  2016-03-30       Impact factor: 10.745

8.  Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.

Authors:  Iakes Ezkurdia; Angela del Pozo; Adam Frankish; Jose Manuel Rodriguez; Jennifer Harrow; Keith Ashman; Alfonso Valencia; Michael L Tress
Journal:  Mol Biol Evol       Date:  2012-03-22       Impact factor: 16.240

9.  Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder.

Authors:  Hedi Hegyi; Lajos Kalmar; Tamas Horvath; Peter Tompa
Journal:  Nucleic Acids Res       Date:  2010-10-23       Impact factor: 16.971

10.  OryzaPG-DB: rice proteome database based on shotgun proteogenomics.

Authors:  Mohamed Helmy; Masaru Tomita; Yasushi Ishihama
Journal:  BMC Plant Biol       Date:  2011-04-12       Impact factor: 4.215

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.