Literature DB >> 24349131

Next generation sequencing analysis of human platelet PolyA+ mRNAs and rRNA-depleted total RNA.

Antheia Kissopoulou1, Jon Jonasson1, Tomas L Lindahl1, Abdimajid Osman1.   

Abstract

BACKGROUND: Platelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA.
MATERIALS AND METHODS: We used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT) isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts.
RESULTS: Consistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq.
CONCLUSION: The present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components compared with other cell and tissue types. The in vivo functional significance of the non-mitochondrial transcripts remains to be shown.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24349131      PMCID: PMC3859545          DOI: 10.1371/journal.pone.0081809

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Background

Produced by bone marrow megakaryocytes, platelets are small anucleate elements of the blood that play a pivotal role in hemostasis. They are involved in fibrinolysis and repair of the vessel wall, while circulating in the blood as sentinels of vascular integrity. Platelets lack genomic DNA but retain the ability for protein synthesis from cytoplasmic mRNA [1]. Platelet mRNA was first isolated and converted to a cDNA library more than two decades ago [2]. In recent years, several studies utilizing genome-wide techniques for gene expression profiling, such as microarrays and Serial Analysis of Gene Expression (SAGE) in concert with computer-assisted bioinformatics, have reported that thousands of gene transcripts are present in human platelets [3]–[7]. While microarrays and SAGE have made significant contributions to the characterization of the platelet transcriptome, they also have serious limitations. Hybridization-based approaches rely on probe-target binding of selected sequences and do not detect novel transcripts or unknown genes. In contrast, SAGE uses sequence tags from individual mRNAs and has an advantage over microarrays by detecting unknown genes but does not provide information on splice isoforms and is biased toward short tags, which cannot be uniquely mapped to the human genome [8]. Recently, mass sequencing of transcripts (RNA-Seq) by next generation sequencing (NGS) technologies has emerged as a powerful approach for quantitative transcript discovery [9]–[13]. RNA-Seq has clear advantages over other approaches [14] and shows higher levels of reproducibility for both technical and biological replicates [15]. Two recently published studies used NGS technology to characterize the platelet transcriptome [16]–[17]. One of these used cDNA from poly(dT) isolated mRNA and the other cDNA from ribosomal RNA-depleted total RNA. Both studies used relatively short reads (≤50 base pairs) for alignment to the human genome. In this context, we now report results from both polyA+ mRNA and rRNA-depleted total RNA approaches utilizing 100 bp long sequencing reads for investigating the transcriptional profile of unstimulated human platelets (Fig. 1). We have also for the first time applied a de novo assembly of platelet transcripts to confirm the reference-guided alignments. We believe that our data may provide important clues for understanding the elusive platelet transcriptome and its role in the coagulation system and hemostasis.
Figure 1

Schematic presentation of experimental plan used in this study.

Samples from 4 platelet donors were investigated. One sample (S0) was used for isolation of polyA+ transcripts. The 3 other samples (S1, S2, and S3) were used for analysis of total RNA after depletion of ribosomal RNA (rRNA).

Schematic presentation of experimental plan used in this study.

Samples from 4 platelet donors were investigated. One sample (S0) was used for isolation of polyA+ transcripts. The 3 other samples (S1, S2, and S3) were used for analysis of total RNA after depletion of ribosomal RNA (rRNA).

Results

Mapping of polyA+ mRNA (Sample S0)

We tried three mapping strategies for polyA+ mRNA (Fig. 2).
Figure 2

Mapping strategies and abundance estimates.

i) Alignment of reads (short red lines) to the human reference genome hg19 (thick blue line) using the TopHat program that aligns RNA-Seq reads to the genome while also attending to splice junction reads. Abundance estimates obtained by counting the number of reads that map within the coordinates defining the corresponding gene with RefSeq annotations; ii) Alignment of reads (short red lines) to human reference (RefSeq) mRNA (thick blue line with polyA tail) using the bwa software for abundance estimates; iii) Alignment of reads (short red lines) to a de novo assembled transcript reported by Trinity (thick red line with polyA tail and green SMARTer IIA oligonucleotide as 5′-leader sequence) using Blat for identification and RSEM for abundance estimates.

Mapping strategies and abundance estimates.

i) Alignment of reads (short red lines) to the human reference genome hg19 (thick blue line) using the TopHat program that aligns RNA-Seq reads to the genome while also attending to splice junction reads. Abundance estimates obtained by counting the number of reads that map within the coordinates defining the corresponding gene with RefSeq annotations; ii) Alignment of reads (short red lines) to human reference (RefSeq) mRNA (thick blue line with polyA tail) using the bwa software for abundance estimates; iii) Alignment of reads (short red lines) to a de novo assembled transcript reported by Trinity (thick red line with polyA tail and green SMARTer IIA oligonucleotide as 5′-leader sequence) using Blat for identification and RSEM for abundance estimates. First, the 58,155,680 cleaned sequenced single-end reads with no strand-specificity were mapped to the human reference genome (GRCh37/hg19) using TopHat software (http://tophat.cbcb.umd.edu/) in order to identify exon-exon splice junctions (Fig. 2 i). This resulted in 35,322,009 (60.7%) of uniquely mapped ∼100 bp long single-end reads. The aligned sequencing reads and the Homo_sapiens.GRCh37.71.gtf features were used to estimate the coverage of known genes and transcripts with the aid of bedtools-2.17.0 (http://code.google.com/p/bedtools/). A strong bias towards the 3′-UTR end of transcripts was clearly evident, which can be expected due to the library construction involving oligo-dT primed cDNA in the library preparation procedure (Fig. 3). The uniquely mapped read localizations on the different chromosomes are shown in Table 1. Top 30 loci are shown in Table 2. The HTSeq counts are shown in Table S1 in File S1.
Figure 3

Read start position density on ACTB mRNA.

The horizontal axis shows the distance in nucleotides (bp) from the 5′-end of ACTB mRNA, and the vertical axis shows the natural logarithm of the number of uniquely mapped reads. The fitted red line calculated over the transcript body ignoring both ends corresponds to exponential decay of approximately 50% per 250 bp upstreams fom the polyA-site in the 3′-UTR. Correlation coefficient: 0.93, Slope: 0.0027638, Std error: 0.0002751, t value: -10.05, p-value: 4.70e-08 ***. (Statistics and graph generated by the R-program).

Table 1

Distribution of mapped reads for samples S0, S1, S2 and S3.

Transcript infoNo. Mapped reads per sample
ChrLength (bp)S0S1S2S3
12492506211417879560482860820625836619
2243199373839205482870049420885293788
3198022430376751405330635617705497609
41911542761574321530461653138618582878
5180915260987877247636127510252864982
6171115067510420311654733255483279675
7159138663534153288123029004073260889
8146364022202649178853617180241966160
9141213431182805289800221938034043810
10135534747228845268794625694532952239
11135006516915079225984118907372206632
12133851895396073233795023329032698125
13115169878733311102690310665071135743
141073495401654576691970406100712098431
151025313921039909528043257806955035062
1690354753168779946256807193892257
1781195210440380193447818816032076841
187807724892840118482012639781233671
1959128983257403860224632489900418
2063025520425316132711612952691404632
214812989592487741307599255624149
2251304566148511750870567420726189
X1552705601153342795161567761046929134
Y593735662131151658494950979
MT1656922416906678171690168617198049
Sum:35322009760000007333501188788961
Table 2

TopHat alignment of PolyA + mRNA to genome.

Ensembl id.GeneLocusNRC* Rank
ENSG00000210082MT-RNR2MT:1671–322910000000MT
ENSG00000211459MT-RNR1MT:648–16015000000MT
ENSG00000205542TMSB4XX:12993226–1299534610000001
ENSG00000166710B2M15:45003674–450110758628802
ENSG00000198888MT-ND1MT:3306–4262833782MT
ENSG00000163736PPBP4:74852754–748539145559553
ENSG00000198712MT-CO2MT:7585–8269534277MT
ENSG00000163737PF44:74844540–748487964378424
ENSG00000198763MT-ND2MT:4469–5511407889MT
ENSG00000198886MT-ND4MT:10469–12137355599MT
ENSG00000198899MT-ATP6MT:8365–9990303773MT
ENSG00000198938MT-CO3MT:8365–9990303743MT
ENSG00000198786MT-ND5MT:12336–14148287825MT
ENSG00000198804MT-CO1MT:5903–7445282378MT
ENSG00000198727MT-CYBMT:14746–15887217548MT
ENSG00000187514PTMA2:232571605–2325782512106485
ENSG00000161570CCL517:34195970–342128671852746
ENSG00000228474OST42:27265231–272946411800797
ENSG00000198695MT-ND6MT:14148–14673148341MT
ENSG00000198840MT-ND3MT:10058–10404105928MT
ENSG00000212907MT-ND4LMT:10469–1213798894MT
ENSG00000075624ACTB7:5566781–5603415910798
ENSG00000127920GNG117:93220884–93567791852259
ENSG00000204592HLA-E6:30457244–304619828226310
ENSG00000087086FTL19:49468558–494701358104711
ENSG00000158710TAGLN21:159887897–1598955227761412
ENSG00000120885CLU8:27454434–274725487231013
ENSG00000168497SDPR2:192699027–1930604357186314
ENSG00000150681RGS181:192127586–1921549456522215
ENSG00000163041H3F3A1:226249552–2262597026332616

*NRC =  Normalized Read Counts calculated from transcript length (x) as NRC =  read_count*(1+e-0.0027638x).

Read start position density on ACTB mRNA.

The horizontal axis shows the distance in nucleotides (bp) from the 5′-end of ACTB mRNA, and the vertical axis shows the natural logarithm of the number of uniquely mapped reads. The fitted red line calculated over the transcript body ignoring both ends corresponds to exponential decay of approximately 50% per 250 bp upstreams fom the polyA-site in the 3′-UTR. Correlation coefficient: 0.93, Slope: 0.0027638, Std error: 0.0002751, t value: -10.05, p-value: 4.70e-08 ***. (Statistics and graph generated by the R-program). *NRC =  Normalized Read Counts calculated from transcript length (x) as NRC =  read_count*(1+e-0.0027638x). Second, to check the quality of the TopHat alignments the reads were also mapped against RefSeq mRNAs (Fig. 2 ii) using bwa (http://bio-bwa.sourceforge.net/) and samtools (http://samtools.sourceforge.net/) giving similar results (data not shown). PolyA-sites and the expression level of individual transcripts were visualized by plotting log coverage against the distance from the 5′-end of the RefSeq mRNA sequences (Fig. 4). Additional data is shown in Table S2 in File S1.
Figure 4

Mapping of S0 (poly(dT) selected transcripts) against RefSeq mRNA.

The horizontal axis shows the distance in nucleotides from the 5′-end of the transcript (bin length  = 100 bp), and the vertical logarithmic axis shows the sum of uniquely mapped reads to each position of the bin. The slope of the dotted line corresponds to the exponential decay function derived in Fig. 3. The sudden “drops” correspond to polyA-sites. As seen in the figure NM_002704 (PPBP) has two polyA-sites which correspond to the known polyA-sites at positions 708 and 1307, respectively. The abundance of the longer PPBP transcript appears to be hundred-fold lower than that of the shorter transcript.

Mapping of S0 (poly(dT) selected transcripts) against RefSeq mRNA.

The horizontal axis shows the distance in nucleotides from the 5′-end of the transcript (bin length  = 100 bp), and the vertical logarithmic axis shows the sum of uniquely mapped reads to each position of the bin. The slope of the dotted line corresponds to the exponential decay function derived in Fig. 3. The sudden “drops” correspond to polyA-sites. As seen in the figure NM_002704 (PPBP) has two polyA-sites which correspond to the known polyA-sites at positions 708 and 1307, respectively. The abundance of the longer PPBP transcript appears to be hundred-fold lower than that of the shorter transcript. Finally, a detailed analysis of transcripts and assignment of mRNA isoforms was performed by de novo assembly of transcripts using Trinity RNA-Seq software from (http://trinityrnaseq.sourceforge.net/) followed by quantification of transcripts with RSEM (RNA-Seq by Expectation-Maximization) (Fig. 2 iii). Identification of the de novo assembled transcripts was achieved by Blat and BLAST searches using the UCSC Browser and the NCBI Genome databases, respectively. Table 3a shows the Top 25 out of 9077 reported de novo assembled polyA+ genomic transcripts with identification to locus; excluding the mitochondrial genome for clarity. Full-length transcripts could be identified by the presence of a SMARTer IIA 5′-leader sequence and a 3′-end polyA tail (Fig 5). The magnitude of expression of de novo assembled polyA+ transcripts correlated well (Spearman's rho  = 0.83) with the length normalized coverage by TopHat alignment of polyA+ cDNA reads to the human genome (compare Tables 2 and 3a).
Table 3

de novo assembly of platelet transcripts.

Table 3a. Trinity/RSEMTable 3b. Trinity/RSEM
de novo assembly of PolyA+ mRNA transcripts (MT-RNA excluded)de novo assembly of rRNA-depleted total RNA transcripts (MT-RNA excluded)
RankGeneLengthFPKM NRC* Rank§ GeneLength_meanFPKM _meanFPKM_sd
1TMSB4X673678461399035ncrna7SLRNA349295364120832
2B2M992281108278151TMSB4X64614500515806
3PPBP1789100675244732B2M990640938783
4PF41035130043983443PPBP22962240212048
5OST447019998291396ncrna7SK-RNA33099812495
6CCL57776289148021ncrnaLSU-rRNA103280296510
7FTH196150681449394FTH195580081385
8SERF25986988129001ncrnaRNA45S530765864582
9PTMA103641011257305PF4120163673351
10H3F3B108738491234356SMARCA551059911417
11SH3BGRL37814037954467OST449257651800
12ACTB9743058885568PF4V15303609505
13FTL9143053833619C21orf715273207912
14TAGLN2141418667694110ACTB16802991637
15GNG1187628047364411MYL67002705678
16PTMA32065816201912CCL57772569370
17RGS1842384856127113GNG11100424931019
18C21orf7151813415927614HSMAR117562456344
19SDPR25547745806515RGS1844432432160
20TUBB131096125627016H3F3A5882370736
21MYL669625095337017HIST1H2AC17442280424
22CLU17699394835018MYL12A13162278552
23HLA-E149210254453919EFCAB136082234704
24GPX190713013527420MORC37142225834
25RGS10100111363374321PTMA12322179667

Fragments Per Kilobase of transcript per Million mapped reads.

*NRC =  Normalized Read Count calculated from transcript length (x) as NRC =  read_count*(1+e-0.0027638x).

Ranking of protein coding transcripts only.

Figure 5

Snapshot of UCSC Browser Blat alignment of de novo assembled transcript variant comp1_c0_seq1 mapping to TMSB4X.

The 5′-leader sequence matches the SMARTer IIA oligonucleotide. The Trinity de novo assembled nucleotide sequence is identical to the GRCh37/hg19 reference. Part of the polyA tail is also included. Splice junctions are marked in turquoise.

Snapshot of UCSC Browser Blat alignment of de novo assembled transcript variant comp1_c0_seq1 mapping to TMSB4X.

The 5′-leader sequence matches the SMARTer IIA oligonucleotide. The Trinity de novo assembled nucleotide sequence is identical to the GRCh37/hg19 reference. Part of the polyA tail is also included. Splice junctions are marked in turquoise. Fragments Per Kilobase of transcript per Million mapped reads. *NRC =  Normalized Read Count calculated from transcript length (x) as NRC =  read_count*(1+e-0.0027638x). Ranking of protein coding transcripts only.

Mapping of rRNA-depleted total RNA (Samples S1, S2 and S3)

The three barcoded rRNA-depleted total RNA libraries (S1,S2 and S3) resulted in 153 million pass filter strand-specific read pairs (QC data in Fig. S1 in File S1) which were mapped to the human reference genome (GRCh37/hg19) using TopHat. The uniquely mapped read localizations on the different chromosomes are shown in Table 1. The aligned sequencing reads were assigned to the Homo_sapiens.GRCh37.71.gtf features as described above. Top 30 loci are shown in Table 4. A full table of HTSeq counts is presented in Table S1 in File S1. The biological coefficient of variation as estimated by the edgeR software (http://www.bioconductor.org/) is shown in figure 6. There was a linear dependence between FPKM (Fragments Per Kilobase of transcript per Million mapped reads)-values in samples S1, S2 and S3. Figure 7 shows a pair-wise comparison of S1 (male) and S2 (female) rendering a Pearson's correlation coefficient of 0.99. These results were confirmed by de novo assembly using the Trinity software (Table 3b).
Table 4

TopHat/Cufflinks alignment of rRNA-depleted total RNA to genome (excluding ncrna).

Ensembl id.GeneLocusS1_FPKM S2_FPKM S3_FPKM Rank
ENSG00000205542TMSB4XX:12993226–129953463497328506461201
ENSG00000163736PPBP4:74852754–748539142548923607378322
ENSG00000198804MT-CO1MT:5903–7445235943504527213MT
ENSG00000198888MT-ND1MT:3306–4262170872464018055MT
ENSG00000198938MT-CO3MT:8365–9990164152271515566MT
ENSG00000198840MT-ND3MT:10058–10404152732280514332MT
ENSG00000198886MT-ND4MT:10469–1213714039224679924MT
ENSG00000198899MT-ATP6MT:8365–9990126431560811442MT
ENSG00000198727MT-CYBMT:14746–15887130171564510847MT
ENSG00000166710B2M15:45003674–45011075939411022164843
ENSG00000212907MT-ND4LMT:10469–121379394184698991MT
ENSG00000198786MT-ND5MT:12336–1414810900165188191MT
ENSG00000198712MT-CO2MT:7585–826911460154238156MT
ENSG00000198763MT-ND2MT:4469–551111304165066650MT
ENSG00000228253MT-ATP8MT:8365–999012611107929831MT
ENSG00000163737PF44:74844540–748487965352593399904
ENSG00000228474OST42:27265231–272946417326488262685
ENSG00000180573HIST1H2AC6:26124372–261393445539645836356
ENSG00000150681RGS181:192127586–1921549454310621928417
ENSG00000075624ACTB7:5566781–56034153375254831998
ENSG00000167996FTH111:61717292–617351323044245924139
ENSG00000198695MT-ND6MT:14148–14673274131901528MT
ENSG00000127920GNG117:93220884–9356779123691585285010
ENSG00000168497SDPR2:192699027–19306043518092832204011
ENSG00000154146NRGN11:124609809–12463639228071574215412
ENSG00000101608MYL12A18:3247478–326184827191959175513
ENSG00000180596HIST1H2BC6:26115100–2612415428971822152514
ENSG00000104904OAZ119:2252251–227348720031762195415
ENSG00000163041H3F3A1:226249551–22625970219101233131416
ENSG00000161570CCL517:34195970–3421286714371760123617

Fragments Per Kilobase of transcript per Million mapped reads.

Figure 6

Biological coefficient of variation of samples S1, S2 and S3 as estimated by TopHat/HTSeq/edgeR software.

As expected the more highly expressed genes show much lower dispersion estimates than the mean value. “CPM” represents counts per million.

Figure 7

Plot showing the magnitude of FPKM gene expression in rRNA-depleted total RNA in pair-wise comparisons between sample S1 and sample S2.

Each dot represents a S1/S2 pair for a gene that had detectable expression in both samples. Pearson's correlation coefficient  = 0.99. (TopHat/Cufflinks/Cuffdiff/CummeRbund software).

Biological coefficient of variation of samples S1, S2 and S3 as estimated by TopHat/HTSeq/edgeR software.

As expected the more highly expressed genes show much lower dispersion estimates than the mean value. “CPM” represents counts per million.

Plot showing the magnitude of FPKM gene expression in rRNA-depleted total RNA in pair-wise comparisons between sample S1 and sample S2.

Each dot represents a S1/S2 pair for a gene that had detectable expression in both samples. Pearson's correlation coefficient  = 0.99. (TopHat/Cufflinks/Cuffdiff/CummeRbund software). Fragments Per Kilobase of transcript per Million mapped reads. Further analyses to reveal differential expression (DE) were performed with Cufflinks and the bioinformatic tools HTseq and DESeq from Bioconductor (http://www.bioconductor.org/), which uses the R statistical programming language. Figure 8 shows dispersion and log2 fold change when comparing the two male samples S1 and S3 with the female sample S2 using DESeq. Eighteen differentially expressed genes were identified between the two sexes at 10% false discovery rate (FDR) using DESeq (Fig. 8, red dots). Not all of these genes were located on the Y chromosome (Table 5.).
Figure 8

Graphs showing the dispersion and log2 fold change, respectively, when comparing the two male samples S1 and S3 with the female sample S2 using DESeq.

The “dispersion” on the y-axis in the left-hand plot represents the square of the coefficient of biological variation, and the red “hockey-stick” line is a fitted curve through the estimates of the dispersion value for each gene. In the right-hand plot, the horizontal red line represents equal expression in male and female samples. Red dots represent differentially expressed genes at 10% FDR, and red triangles represent red dots that lie outside the graph (above or below). The identity of the differentially expressed genes and the corrresponding log2 fold changes can be found in Table 5 (columns 2 and 8, respectively).

Table 5

Significantly differentially expressed genes in male and female platelets at 10% FDR as estimated by DESeq.

Ensembl id.genelocusbaseMeanbaseMeanAbaseMeanBFC* log2 FC* pval padj§
ENSG00000183878UTYY:15360259–155925531511.72265.73.50.002−9.37.6E-271.4E-22
ENSG00000198692EIF1AYY:22737611–22755040618.0925.33.50.004−8.04.4E-194.0E-15
ENSG00000210082MT-RNR2MT:1671–3229160159.650966.7378545.57.4272.93.5E-132.2E-09
ENSG00000116117PARD3B2:205410516–206484886843.0153.22222.514.513.91.2E-125.7E-09
ENSG00000154620TMSB4YY:15815447–158179041407.22107.95.60.003−8.51.9E-126.9E-09
ENSG00000196565HBG211:5274420–5667019635.7908.091.00.100−3.34.1E-101.3E-06
ENSG00000067048DDX3YY:15016019–15032390884.41323.85.60.004−7.91.2E-093.2E-06
ENSG00000100362PVALB22:37196728–37215523209.1296.035.30.119−3.17.0E-091.6E-05
ENSG00000113658SMAD55:135468534–1355244351050.2384.32382.06.1982.69.4E-091.9E-05
ENSG00000135426TESPA112:55341802–55378530321.059.3844.614.253.81.7E-083.2E-05
ENSG00000077984CST720:24929866–24940564140.5208.93.50.017−5.92.6E-084.4E-05
ENSG00000118946PCDH1713:58205944–5830344574.20.88220.8251.48.06.0E-079.2E-04
ENSG00000248527MTATP6P11:569076–5697567122.73712.013944.13.7561.91.2E-061.7E-03
ENSG00000012817KDM5DY:21865751–21906825149.8224.40.710.003−8.31.5E-062.0E-03
ENSG00000114374USP9YY:14813160–14972764142.5213.40.710.003−8.22.4E-062.9E-03
ENSG00000185736ADARB210:1228073–177967079.2118.80.000.000-Inf1.53E-051.7E-02
ENSG00000229308AC010084.1Y:3904538–3968361340.5492.3536.70.075−3.751.47E-051.7E-02
ENSG00000240356RPL23AP72:114368079–1143846676531.08750.852091.00.239−2.071.62E-051.7E-02

*Fold change;

P-value;

Adjusted P-value.

Graphs showing the dispersion and log2 fold change, respectively, when comparing the two male samples S1 and S3 with the female sample S2 using DESeq.

The “dispersion” on the y-axis in the left-hand plot represents the square of the coefficient of biological variation, and the red “hockey-stick” line is a fitted curve through the estimates of the dispersion value for each gene. In the right-hand plot, the horizontal red line represents equal expression in male and female samples. Red dots represent differentially expressed genes at 10% FDR, and red triangles represent red dots that lie outside the graph (above or below). The identity of the differentially expressed genes and the corrresponding log2 fold changes can be found in Table 5 (columns 2 and 8, respectively). *Fold change; P-value; Adjusted P-value.

Differential expression at the gene level in polyA + mRNA vs total RNA

Gene expression levels in total RNA samples are conventionally measured as RPKM (Reads Per Kilobase of transcript per Million mapped reads) or FPKM values assuming a rectangular distribution of reads covering the transcript coordinates, i.e. these measures are proportional to the number of reads divided by transcript lengths. The distribution of reads covering the transcript coordinates using oligo(dT) isolated mRNA is very different as it fits an exponential decay function from the 3′-end polyA site towards the 5′-end. (Fig. 3 and Fig. 4). This makes RPKM and FPKM estimates less appropriate for comparison of gene expression levels in polyA + mRNA. Consequently, both transcript lengths and library preparation method ought to be taken into account. Otherwise, false differences will emerge. Adjusted bedcoverage data for the most abundant transcript of each gene is presented in Table S3 in File S1 where columns S0, S1, S2, and S3 represent raw counts and columns S0_adj, and S1_adj to S3_adj represent Normalized Read Counts (NRC) and normalized FPKM figures, respectively (see Table 3a for definition of NRC used in this context). Table 3a demonstrates the fallaciousness of FPKM-values if used on poly(dT) selected transcripts. Figure 9 shows a heatmap of such normalized levels of expression for the 30 most highly expressed genes across the samples from the 4 different patients. Altogether circa 500 differentially expressed genes were identified at 10% FDR comparing mRNA vs. totRNA using DESeq (Fig. 10). A full table of mRNA vs. totRNA comparisons is provided in Table S4 in File S1. As expected, most of this “DE”, which primarily should represent preparation method and mapping artefacts, was observed for non-coding transcripts, which were not present in the polyA+ mRNA preparation, and mitochondrial rRNA transcripts which were more abundant in the polyA+ mRNA sample (Table 6). However, coding transcripts that lack a polyA-tail should also appear as differentially expressed.
Figure 9

Heatmap showing normalized levels of expression for the 30 most highly expressed gene transcripts across mRNA and rRNA-depleted total RNA samples from the 4 different patients.

Nearly all differences of intensity for a given gene are likely to represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. Sample names have a ‘C’ added to indicate that the intensities represent length- and method-adjusted counts (TopHat/bedtools/DESeq and “in-house” software).

Figure 10

Histogram of p-values from the call to negative binomial test with DESeq comparing the length- and method-adjusted counts of polyA + mRNA sample S0 with the rRNA-depleted total RNA samples S1, S2 and S3.

Most of the circa 500 remaining significant differences after length- and method-adjusted normalization presumably represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. However, protein coding transcripts lacking a polyA-tail should also appear as differentially expressed. Note that omission of the length- and method-adjusted normalization yields a couple of thousand “differentially expressed” genes (TopHat/bedtools/DESeq and “in-house” software).

Table 6

Significant DEφ among the most abundant transcripts in polyA+ mRNA versus rRNA-depleted total RNA.

Ensembl id.genelocusbaseMeanbaseMeanAbaseMeanBFC* log2 FC* pval padj§
ENSG00000210082MT-RNR2MT:1671–3229∶116931163668862482794670.004−7.94.0E-232.8E-20
ENSG00000266422RN7SL593P14:50053298–50053594∶1639298908523986InfInf5.7E-093.1E-07
ENSG00000258486RN7SL114:50053297–50053596∶1632905908438746InfInf5.8E-093.1E-07
ENSG00000211459MT-RNR1MT:648–1601∶1619478224607232572980.002−8.753.8E-274.0E-24
ENSG00000265150RN7SL214:50329271–50329567∶−1616592708221236InfInf3.2E-124.1E-10
ENSG00000198888MT-ND1MT:3307–4262∶1169033853361274750750.089−3.56.8E-061.9E-04
ENSG00000163737PF44:74846794–74847841∶−1137529038809895400570.139−2.91.7E-043.3E-03
ENSG00000198763MT-ND2MT:4470–5511∶1126925233756645671140.168−2.65.4E-048.8E-03
ENSG00000198712MT-CO2MT:7586–8269∶1119100830258445793960.191−2.41.3E-031.8E-02
ENSG00000228474OST42:27293340–27294641∶−189687522403524490500.2−2.31.7E-032.3E-02
ENSG00000198899MT-ATP6MT:8527–9207∶177046920466233450840.169−2.65.8E-049.4E-03
ENSG00000198886MT-ND4MT:10760–12137∶176878920516073411830.166−2.65.2E-048.6E-03
ENSG00000198938MT-CO3MT:9207–9990∶172857816536164202320.254−2.06.4E-036.8E-02
ENSG00000187514PTMA2:232571605–232578251∶163489320866091509880.072−3.81.5E-065.0E-05
ENSG00000198786MT-ND5MT:12337–14148∶160568015895482777240.175−2.57.2E-041.1E-02
ENSG00000263900AC006483.17:5567734–5567817∶−15063470675129InfInf5.6E-275.0E-24
ENSG00000210195MT-TTMT:15888–15953∶1466133823745940537.212.91.1E-031.6E-02
ENSG00000210049MT-TFMT:577–647∶14647264264160542014.203.82.6E-056.2E-04
ENSG00000161570CCL517:34198495–34207797∶-139295310500801739110.166−2.65.5E-049.0E-03
ENSG00000198695MT-ND6MT:14149–14673∶−13907658979402217070.247−2.05.5E-036.0E-02
ENSG00000209082MT-TL1MT:3230–3304∶13379497744741924410.248−2.05.9E-036.4E-02
ENSG00000140264SERF215:44069285–44094787∶1252646889675403030.045−4.52.8E-081.3E-06
ENSG00000156265MAP3K7CL21:30449792–30548210∶12403700320494InfInf1.4E-163.8E-14
ENSG00000210196MT-TPMT:15956–16023∶−1238130411123038027.392.99.9E-041.5E-02
ENSG00000087086FTL19:49468558–49470135∶1167208518399501450.097−3.41.7E-054.2E-04
ENSG00000210077MT-TVMT:1602–1670∶1136962329425728070.221−2.23.3E-033.9E-02
ENSG00000142669SH3BGRL31:26605667–26608007∶1136382503776139170.028−5.22.3E-101.9E-08
ENSG00000169756LIMS12:109150857–109303702∶1124932283811719720.254−2.07.1E-037.4E-02
ENSG00000101608MYL12A18:3247479–3256234∶1122263326116193049.665.62.5E-067.9E-05
ENSG00000248527MTATP6P11:569076–569756∶1122224401658290790.072−3.82.0E-066.2E-05

Differential Expression;

*Fold change;

P-value; §Adjusted P-value.

Heatmap showing normalized levels of expression for the 30 most highly expressed gene transcripts across mRNA and rRNA-depleted total RNA samples from the 4 different patients.

Nearly all differences of intensity for a given gene are likely to represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. Sample names have a ‘C’ added to indicate that the intensities represent length- and method-adjusted counts (TopHat/bedtools/DESeq and “in-house” software).

Histogram of p-values from the call to negative binomial test with DESeq comparing the length- and method-adjusted counts of polyA + mRNA sample S0 with the rRNA-depleted total RNA samples S1, S2 and S3.

Most of the circa 500 remaining significant differences after length- and method-adjusted normalization presumably represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. However, protein coding transcripts lacking a polyA-tail should also appear as differentially expressed. Note that omission of the length- and method-adjusted normalization yields a couple of thousand “differentially expressed” genes (TopHat/bedtools/DESeq and “in-house” software). Differential Expression; *Fold change; P-value; §Adjusted P-value.

The platelet transcriptome

The platelet transcriptome data was then compared with RNASeq data from Illumina's Human BodyMap 2.0 project. The Illumina data, generated on HiSeq 2000 instruments, consists of 16 human tissue types, including adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells. The heatmap in figure 11 summarizes expression for this data integrated with the platelet raw data counts obtained with the HTSeq-counts program. The dendrogram at the top clearly shows that the platelet expression profile is unique because the samples S0,S1,S2 and S3 forms a cluster of its own from root level. As expected, the polyA+ mRNA sample S0 profile shows some DE when compared with the rRNA-depleted total RNA samples S1, S2, and S3. Thus, the present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of many mitochondrially encoded genes. Apart from MT-RNR1, MT-RNR2 and MT-TF, mitochondrially encoded gene expression levels were rather similar in totRNA and mRNA preparations (Fig. 12 and Table 7).
Figure 11

The platelet transcriptome data compared with RNASeq data from Illumina's Human BodyMap 2.0 project.

The integrated platelet data from samples S0, S1, S2, and S3 represent counts obtained with TopHat, Ensembl annotations, and the HTSeq-counts program. The Illumina codes are as follows. ERS025098 adipose, ERS025092 adrenal, ERS025085 brain, ERS025088 breast, ERS025089 colon, ERS025082 heart, ERS025081 kidney, ERS025096 liver, ERS025099 lung, ERS025086 lymphnode, ERS025084 mixture, ERS025087 mixture, ERS025093 mixture, ERS025083 ovary, ERS025095 prostate, ERS025097 skeletal_muscle, ERS025094 testes, ERS025090 thyroid, ERS025091 white_blood_cell.

Figure 12

Differential expression of mitochondrial (MT)-genes in total RNA vs mRNA preparations.

The figure shows that apart from MT-RNR1, MT-RNR2 and MT-TF, mitochondrially encoded gene expression levels were rather similar in rRNA-depleted total RNA and polyA + mRNA preparations (TopHat/HTSeq/edgeR software). “FC” denotes fold change whereas “CPM” represents counts per million.

Table 7

Read count table for mitochondrially encoded genes for samples S0, S1, S2 and S3.

Ensembl id.genelocuslengthS0S1S2S3
ENSG00000210049MT-TFMT:577–647∶1714716721944598442484
ENSG00000211459MT-RNR1MT:648–1601∶19544626427987755471951139
ENSG00000210077MT-TVMT:1602–1670∶169363401039968561215
ENSG00000210082MT-RNR2MT:1671–3229∶11559132968852505721137529153605
ENSG00000209082MT-TL1MT:3230–3304∶17586075209171370619721
ENSG00000198888MT-ND1MT:3307–4262∶19561003620480131557853659073
ENSG00000210100MT-TIMT:4263–4331∶1691953821026113197338
ENSG00000210107MT-TQMT:4329–4400∶−17226650315513716727844
ENSG00000210112MT-TMMT:4402–4469∶16853547407805384459771
ENSG00000198763MT-ND2MT:4470–5511∶11042643951625223928642622441
ENSG00000210117MT-TWMT:5512–5579∶16873551999981613780
ENSG00000210127MT-TAMT:5587–5655∶−16968251347858403956
ENSG00000210135MT-TNMT:5657–5729∶−17384571192344494815
ENSG00000210140MT-TCMT:5761–5826∶−16612786774637903592
ENSG00000210144MT-TYMT:5826–5891∶−1668401896748964065
ENSG00000198804MT-CO1MT:5904–7445∶11542414158125794316694531633848
ENSG00000210151MT-TS1MT:7446–7514∶−1693250712569110109894
ENSG00000210154MT-TDMT:7518–7585∶1687787704565044022
ENSG00000198712MT-CO2MT:7586–8269∶1684529645397862485662598293
ENSG00000210156MT-TKMT:8295–8364∶17016629143881323215576
ENSG00000228253MT-ATP8MT:8366–8572∶120761948663537210965706
ENSG00000198899MT-ATP6MT:8527–9207∶1681357851277472294047304636
ENSG00000198938MT-CO3MT:9207–9990∶1784298920387019407105435081
ENSG00000210164MT-TGMT:9991–10058∶1686289112691327511525
ENSG00000198840MT-ND3MT:10059–10404∶13468637892634121847146128
ENSG00000210174MT-TRMT:10405–10469∶1658430113411724919014
ENSG00000212907MT-ND4LMT:10470–10766∶1297111257118428203231207225
ENSG00000198886MT-ND4MT:10760–12137∶11378404373502675664748575461
ENSG00000210176MT-THMT:12138–12206∶169102721495178987056
ENSG00000210184MT-TS2MT:12207–12265∶15986416673771313231
ENSG00000210191MT-TL2MT:12266–12336∶17180138112920912015
ENSG00000198786MT-ND5MT:12337–14148∶11812318124585691707224571473
ENSG00000198695MT-ND6MT:14149–14673∶−152514656516723417537485557
ENSG00000210194MT-TEMT:14674–14742∶−1697417140891508113333
ENSG00000198727MT-CYBMT:14747–15887∶11141220495447899453336452116
ENSG00000210195MT-TTMT:15888–15953∶1669053520844561748793
ENSG00000210196MT-TPMT:15956–16023684530312232171724467
Sum22910855619863582973966919289
Sum without rRNA4987543584928871051486714545

The platelet transcriptome data compared with RNASeq data from Illumina's Human BodyMap 2.0 project.

The integrated platelet data from samples S0, S1, S2, and S3 represent counts obtained with TopHat, Ensembl annotations, and the HTSeq-counts program. The Illumina codes are as follows. ERS025098 adipose, ERS025092 adrenal, ERS025085 brain, ERS025088 breast, ERS025089 colon, ERS025082 heart, ERS025081 kidney, ERS025096 liver, ERS025099 lung, ERS025086 lymphnode, ERS025084 mixture, ERS025087 mixture, ERS025093 mixture, ERS025083 ovary, ERS025095 prostate, ERS025097 skeletal_muscle, ERS025094 testes, ERS025090 thyroid, ERS025091 white_blood_cell.

Differential expression of mitochondrial (MT)-genes in total RNA vs mRNA preparations.

The figure shows that apart from MT-RNR1, MT-RNR2 and MT-TF, mitochondrially encoded gene expression levels were rather similar in rRNA-depleted total RNA and polyA + mRNA preparations (TopHat/HTSeq/edgeR software). “FC” denotes fold change whereas “CPM” represents counts per million. As shown in Figure 11 transcripts from some nuclear genes, particularly TMSB4X, were also more abundant in human platelets as compared to the other cells and tissues. TMSB4X plays a role in regulation of actin polymerization, and is involved in cell proliferation, migration, and differentiation [18]. Furthermore, several genes involved in signal transduction, including chemokines were also abundantly expressed, particularly PPBP. The protein encoded by this gene is a platelet-derived growth factor that belongs to the CXC chemokine family, and is a potent chemoattractant for neutrophils [19]. B2M (beta-2-microglogulin gene) encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells [20]. The PF4 chemokine is released from the alpha granules of activated platelets in the form of a homotetramer, which has high affinity for heparin and is involved in platelet aggregation [21]. ACTB is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins [22]. The full table of platelet RNASeq data integrated in Illumina's Human BodyMap 2.0 project is available in Table S5 in File S1.

Functional classification of platelet coding transcripts

We used the web-based PANTHER software (http://www.pantherdb.org/about.jsp) [23] to classify proteins coded by the top 50 platelet genes using either polyA+ or rRNA-depleted total RNA transcripts mapped against the reference genome. The corresponding genes were grouped into clusters representing gene ontology (GO) categories of molecular functions (Fig. 13). A major finding with this analysis was that the molecular function groups of the top 50 platelet genes for polyA+ enriched RNA (Fig. 13A) correlated remarkably well with those of rRNA-depleted total RNA (Fig. 13B) despite the two distinct approaches and different donors. Among the molecular function GO groups shown in Figure 13, the category binding (GO:0005488) seems to dominate in each top 50 list. As shown in Table 8 most of the genes in this category belong to the protein binding subgroup, a class that is expected to play a prominent role in platelet functions. Another noticeable category is the structural molecule activity group. This category entails structural constituents of the cytoskeleton, and critical functions concerning cell motility and organization.
Figure 13

Classification of the proteins coded by the most abundant (top 50) coding transcripts of human platelets.

Bars represent molecular function categories generated by the PANTHER gene ontology classification web-based tool. A) Sequencing was performed on polyA+ enriched RNA, whereas in B) rRNA-depleted total RNA was analyzed.

Table 8

The function of the proteins coded by top 50 platelet genes, as provided by PANTHER gene ontology classification web-based tool.

Nr.Molecular function category (GO term)Sub category (GO term)Number of genes
1Antioxidant activity (GO:0016209)n.a.*
2Binding (GO:0005488)Calcium ion binding (GO:0005509)2
Nucleic acid binding (GO:0003676)6
Protein binding (GO:0005515)14
3Catalytic activity (GO:0003824)Hydrolase activity (GO:0016787)2
Ligase activity (GO:0016874)2
Oxidoreductase activity (GO:0016491)1
Transferase activity (GO:0016740)1
4Enzyme regulator activity (GO:0030234)Enzyme activator activity (GO:0008047)1
Enzyme inhibitor activity (GO:0004857)2
Kinase regulator activity (GO:0019207)1
Small GTPase regulator activity (GO:0005083)4
5Receptor activity (GO:0004872)n.a.*
6Structural molecule activity (GO:0005198)Structural constituent of cytoskeleton (GO:0005200)9
7Transcription regulator activity (GO:0030528)Transcription cofactor activity (GO:0003712)1
Transcription factor activity (GO:0003700)4
8Transporter activity (GO:0005215)n.a.*

*Not available because of too few genes.

Classification of the proteins coded by the most abundant (top 50) coding transcripts of human platelets.

Bars represent molecular function categories generated by the PANTHER gene ontology classification web-based tool. A) Sequencing was performed on polyA+ enriched RNA, whereas in B) rRNA-depleted total RNA was analyzed. *Not available because of too few genes.

Discussion

In the present study we have compared results of RNA-Seq mapping of polyA+ transcripts in purified blood platelets with those obtained with rRNA-depleted total RNA from healthy blood donors against the set of chromosomes of the Human Feb. 2009 (GRCh37/hg19) assembly (http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/). Based on four individuals, the present data show an apparently unique transcriptome profile as compared with other tissues of the Illumina bodymap 2.0 project. In a typical RNA-Seq experiment, reads are sampled from RNA extracts and either mapped back to a reference genome or used for de novo assembly. Alignment and assembly of short or inaccurate reads poses a problem, which we have avoided by using 100 bp high quality Illumina reads. How closely the cDNA sequencing reflects the original RNA population is supposedly mainly determined in the library preparation step. As expected, our mapping of polyA+ reads showed a substantial bias for the 3′-end of gene transcripts due to the selection of mRNA using oligo-dT during the RNA extraction procedure and the following cDNA preparation step [24]. This 3′-UTR bias follows an exponential decay function. After length correction of coverage figures using that function for mRNA and FPKM-values for total RNA, we obtained a reasonably good agreement between quantitative estimates from mapping of polyA+ mRNA and rRNA-depleted total RNA reads to the human genome GRCh37/hg19. It is a notoriously difficult problem to assign reads to a particular isoform if there are many transcript variants with overlaps between them. Very high coverage figures are needed for satisfactory results. This is one of the reasons why RNA-Seq with low coverage has many of the same limitations as other RNA expression analysis pipelines. Obviously, mapping of reads against the human genome and also mapping against the human exome both rely on the accuracy of gene and transcript annotations. In order to fully characterize the platelet transcriptome without reference to previous results, including the possibility to detect and fully characterize novel transcripts, we also performed a de novo assembly of transcripts using Trinity RNA-Seq software (http://trinityrnaseq.sourceforge.net/). This software will extract full-length transcripts for alternatively spliced isoforms based on the generation and analysis of de Bruijn graphs. RSEM software with the bowtie aligner (http://bowtie-bio.sourceforge.net/) was used for mapping the RNA-Seq reads back to the reported transcripts for abundance estimation. Identification of the transcripts was achieved by Blat and BLAST searches using the UCSC Browser and the NCBI Genome databases, respectively. These data fully supported our results obtained by mapping the reads to the human genome and exome, respectively, using gtf.guided assembly. However, even if transcript abundance figures agreed only the most abundant transcripts could be reliably reconstructed by the de novo assembly approach; presumably due to insufficient amounts of reads that were available. When we started this study there was no published RNA-Seq data on platelet gene expression although microarray based as well as SAGE and real-time PCR methods have been used in the past. However, two studies using RNA-Seq by NGS were published during the progress of this study. One of these studies was reported by Rowley et al. who used polyA+ enriched RNA to characterize the transcriptomes of human and mouse platelets [17]. In contrast, Bray et al. utilized rRNA-depleted total RNA and found that their data correlated with those of previously reported microarray transcriptome data at least for the well-expressed mRNAs [16]. Both studies used relatively short sequence reads (≤50 bp for Rowley et al. and ≤40 bp for Bray et al.). The present study employed different strategies for library preparation in addition to the longer (100 bp) read length used for mapping. It is thus expected that there might exist some discrepancies between the current and the previously reported platelet transcript data. A notable difference is the “missed” NGRN transcripts (i.e. an at least tenfold lower amount) in our study when compared to the data of Rowley et al., which possibly could be due to differences of the sample preparation method. However, it should also be kept in mind that when adopting available NGS software for the RNA-seq analyses even small changes in parameter settings can produce a remarkably different result. We used settings of the Bowtie program allowing only 2 mismatches when aligning 100 bp reads to the reference sequence. Context sequencing errors (CSE) that are supposedly specific for the sequencing platform could obviously affect the read counts under such circumstances but a >10-fold reduction seems unlikely because reads from the reverse strand in the mRNA sample S0 should not have been affected to the same extent. One could also speculate whether RNA editing might influence the mapping of our platelet RNA transcripts. Adenosine to inosine (A>I) RNA editing occurs widely across the human transcriptome in certain tissues, especially in the brain [25]. Although there is no data available regarding RNA editing in platelets, we cannot exclude that possibility. However, RNA editing of protein-coding regions appears to be relatively rare events, and may thus have had limited impact on the mapping of cDNA from platelet transcripts. The relative frequency of reads mapping uniquely to genes involved in platelet function and our molecular function protein classification by PANTHER software is consistent with but does not prove the notion that at least some mRNAs in platelets are not merely remnants from the megakaryocytes without function, but rather reflect an important role of mRNA in the physiological function of platelets. In this regard, it is not surprising that many of the proteins coded by top 50 platelet genes represent key platelet functions such as structural constituent of cytoskeleton, protein binding, calcium binding and signal transduction. On the other hand transcripts of some genes encoding prominent platelet receptors were missing or present with few sequence reads, suggesting that no further synthesis of such proteins is needed after platelet formation in the bone marrow. We also searched for transcript signal for tissue factor (TF) since this protein's eventual presence and function in platelets has been debated for years. However, we could not detect any transcripts encoding TF. Interestingly, Schwertz et al. reported that resting platelets contain TF pre-mRNA that, upon activation, is spliced into mature mRNA, indicating that only activated platelets express mature TF mRNA transcripts [26]. Simultaneously, we have confirmed the dominant frequency of mitochondrially expressed genes comprising the platelet mRNA pool. Specifically in our polyA+ mRNA study, 22,416,906 out of 35,322,009 uniquely mapped reads represent MT-transcripts, apparently related to persistent MT-transcription in the absence of nuclear-derived transcription. This is not unexpected as platelets are metabolically adapted to rapidly expend large amounts of energy required for aggregation, granule release, and clot retraction.

Conclusions

This study demonstrates that human platelets carry a unique signature of well-defined and highly abundant coding transcripts that are expressed at similar levels among individuals. However, the in vivo functional significance of nuclearly encoded platelet mRNAs remains to be shown. Future studies need to focus on establishing the biological and biochemical functions of the identified genes in the physiological and pathological regulation of platelets. The desired end point would be to define a platelet mRNA profile that is directly associated with athero-thrombotic disease, which could eventually lead to the identification of novel targets for anti-thrombotic agents.

Methods

Ethical statement

The Regional Ethical Review Board in Linköping (EPN; http://www.epn.se/start/startpage.aspx; Linköping, Sweden) granted an ethical permission for this study (permission number M74-09). Informed written consent was obtained from all participants involved in this study.

Platelet preparation

Non-irradiated apheresis platelets were collected from healthy blood donors. Platelets were collected by COBE Spectra system (Gambro BCT) as previously described [27] and were used on the same collection day. Residual leukocytes were depleted with anti-CD45 conjugated Dynabeads, according to the manufacturer's recommendations (Pan Leukocyte; Invitrogen, Carlsbad, CA). The platelet suspension, with a volume of 30 mL and a platelet count of 1.4×109 cells/mL, was centrifuged at 800 g for 8 min and the supernatant was discarded. The platelet pellet was processed for leukocyte depletion, as recommended by the supplier of the Pan Leukocyte reagent (Invitrogen, Carlsbad, CA). Since leukocytes possess magnitudes of order more mRNA than platelets, the purification of platelets is a pivotal step. The leukocyte removal was performed at room temperature. Approximately 70–75% of the original platelets were recovered after the leukocyte depletion. To investigate the level of leukocyte contamination, we determined the level of CD45 (PTPRC) transcript in multiple platelet preparations (n = 6) by qPCR using TaqMan Gene Expression Assay for this gene according to the recommendations of the supplier (Assay ID: Hs00894713_m1; Applied Bioystems, Carlsbad, CA, USA).

RNA extraction and cDNA synthesis

The different strategies used for RNA-extraction, library preparation, sequencing and mapping are graphically depicted in Figure 1. For isolation of total RNA, we employed the miRVana RNA Extraction Kit as recommended by the supplier (Life Technologies). Isolation of polyA+ mRNA and synthesis of cDNA were performed by the method described by Rox et al. [22], with the exception that we used Smarter PCR cDNA Synthesis kit for the cDNA synthesis (Clontech, Mountain View, CA, USA). Briefly, the leukocyte-depleted platelet suspension was centrifuged at 1000 g for 10 min and the supernatant was discarded. PolyA+ mRNA was isolated from the platelet cell pellet by using Dynabeads Oligo(dT)25 according to the instruction of the manufacturer (Invitrogen, Carlsbad, CA, USA). To synthesize the first-strand cDNA, 3.5 µL of polyA+ mRNA was combined with 1 µL of 3′-Smart CDS primer II A (12 µM). After mixing the tube, the sample was incubated at 72 C for 3 min and then at 42 C for 2 min. This was followed by the addition of a master mix containing 2 µL of 5 First-Strand Buffer, 0.25 µL DTT (100 mM), 1 µL dNTP mix (10 mM), 1 µL SMARTer IIA oligonucleotide (12 µM), 0.25 µL RNase inhibitor, and 1 µL of SMARTScribe Reverse Transcriptase (100 U). The reverse transcription was run by incubating the tube at 42 C for 1 h before the reaction was terminated at 70 C for 10 min. The sample was then diluted with 90 µL of TE-buffer (10 nM Tris, 1 nM EDTA, pH 8.0). To run Long-Distance PCR, 10 µL of the diluted and reverse-transcribed platelet cDNA was mixed with 10 µL 10 Advantage 2 PCR buffer, 2 µL 50 dNTP mix (10 mM), 2 µL 5′PCR primer IIA (12 µM), 2 µL 50 Advantage 2 polymerase mix, and 74 µL of deionized water to a final volume of 100 µL. The sample was then incubated in a thermal cycler running a PCR program containing 95 C for 1 min, and then 20 cycles of 95 C for 15 s, 65 C for 30 s, and 68 C for 3 min. The synthesized platelet cDNA was purified with QIAquick PCR purification kit according to the manufacturer's instructions (Qiagen, Hilden, Germany), and the amount of cDNA was estimated on a Nanodrop spectrophotometer (ND1000; Saveen & Werner, Limhamn, Sweden).

Illumina HiSeq2000 sequencing

The cDNA obtained from the platelet polyA+ mRNA sample was shotgun sequenced (1×100 bp single read module) with the Illumina HiSeq 2000® instrument (Illumina, San Diego, CA, USA) by using a customer sequencing service (Eurofins MWG Operon, Ebersberg, Germany) which also included nebulization and end repair of cDNA, ligation of adaptors, gel purification, PCR amplification and library purification. The number of raw sequencing reads was 65,111,491. Filtering to remove bad quality bases and reads resulted in 58,155,680 reads (89.3%). These sequences were then mapped against the set of chromosomes of the Human Feb. 2009 (GRCh37/hg19) assembly. Initially, the mapping was conducted using the software TopHat 1.2.0 (http://tophat.cbcb.umd.edu). The post-processing of the mapping results was conducted using SamTools 0.1.12 a (http://samtools.sourceforge.net) and custom made Ruby 1.8.7 software. Bowtie http://bowtie-bio.sourceforge.net/ and bwa http://bio-bwa.sourceforge.net/ were used for aligning to de novo assembled transcripts and RefSeq mRNA respectively. RNA samples from three platelet donors were prepared for total RNA sequencing. For these samples, ribosomal RNA was depleted with Ribo-Zero (Epicentre, Madison, WI, USA) and strand specific barcoded RNA-sequencing libraries were prepared using ScriptSeq v2 (Epicentre) according to manufacturers instructions. The barcoded libraries were run on a single lane paired end 100 bp on a HiSeq2000® (Illumina, San Diego, CA, USA), which resulted in 153 million pass filter read pairs. QC data can be found in Figure S1 in File S1. The TopHat2 software was used with the bowtie aligner.

Submission of the sequencing data to public repository

The complete sequencing data is publicly available at The European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the accession numbers E-MTAB-715 (polyA+ transcripts) and E-MTAB-1846 (total RNA transcripts). Both accession numbers are cross-referenced to one another.

Assembly of reads and bioinformatics

de novo assembly of transcripts was performed using the Trinity RNA-Seq software (http://trinityrnaseq.sourceforge.net). Contents: Table S1: HTSeq raw counts per gene in samples S0, S1, S2, and S3. Table S2: Mapping of S0 (poly(dT) selected transcripts) against RefSeq mRNA. Table S3: Length and method (NRC/FPKM) adjusted counts per gene as represented by the most abundant transcript in samples S0, S1, S2, and S3. Table S4: Differential expression of genes in polyA+ mRNA (A) versus rRNA-depleted total RNA (B). Table S5: The S0, S1, S2 and S3 RNA-Seq data integrated with Illumina's Human BodyMap 2.0 project raw data. Figure S1: QC report. (PDF) Click here for additional data file.
  27 in total

1.  Integration of proteomics and genomics in platelets: a profile of platelet proteins and platelet-specific genes.

Authors:  J P McRedmond; S D Park; D F Reilly; J A Coppinger; P B Maguire; D C Shields; D J Fitzgerald
Journal:  Mol Cell Proteomics       Date:  2003-11-25       Impact factor: 5.911

2.  Messenger RNA profiling of human platelets by microarray hybridization.

Authors:  Peter Bugert; Alex Dugrillon; Ayse Günaydin; Hermann Eichler; Harald Klüter
Journal:  Thromb Haemost       Date:  2003-10       Impact factor: 5.249

3.  PANTHER: a library of protein families and subfamilies indexed by function.

Authors:  Paul D Thomas; Michael J Campbell; Anish Kejariwal; Huaiyu Mi; Brian Karlak; Robin Daverman; Karen Diemer; Anushya Muruganujan; Apurva Narechania
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

4.  Analysis of SAGE data in human platelets: features of the transcriptome in an anucleate cell.

Authors:  Marcus Dittrich; Ingvild Birschmann; Julia Pfrang; Sabine Herterich; Albert Smolenski; Ulrich Walter; Thomas Dandekar
Journal:  Thromb Haemost       Date:  2006-04       Impact factor: 5.249

5.  Isolation and characterization of human blood platelet mRNA and construction of a cDNA library in lambda gt11. Confirmation of the platelet derivation by identification of GPIb coding mRNA and cloning of a GPIb coding cDNA insert.

Authors:  A N Wicki; A Walz; S N Gerber-Huber; R H Wenger; R Vornhagen; K J Clemetson
Journal:  Thromb Haemost       Date:  1989-06-30       Impact factor: 5.249

6.  Release of platelet factor 4 in vivo during intravascular coagulation and in thrombotic states.

Authors:  R Farbiszewski; S Niewiarowski; K Worowski; B Lipiński
Journal:  Thromb Diath Haemorrh       Date:  1968-07-31

7.  Gene expression analysis in platelets from a single donor: evaluation of a PCR-based amplification technique.

Authors:  Jutta Maria Rox; Peter Bugert; Jens Müller; Alexander Schorr; Peter Hanfland; Katharina Madlener; Harald Klüter; Bernd Pötzsch
Journal:  Clin Chem       Date:  2004-10-07       Impact factor: 8.327

8.  Biosynthesis of major platelet proteins in human blood platelets.

Authors:  N Kieffer; J Guichard; J P Farcet; W Vainchenker; J Breton-Gorius
Journal:  Eur J Biochem       Date:  1987-04-01

9.  Platelet-derived chemokines CXC chemokine ligand (CXCL)7, connective tissue-activating peptide III, and CXCL4 differentially affect and cross-regulate neutrophil adhesion and transendothelial migration.

Authors:  Birgit I Schenk; Frank Petersen; Hans-Dieter Flad; Ernst Brandt
Journal:  J Immunol       Date:  2002-09-01       Impact factor: 5.422

10.  The complex transcriptional landscape of the anucleate human platelet.

Authors:  Paul F Bray; Steven E McKenzie; Leonard C Edelstein; Srikanth Nagalla; Kathleen Delgrosso; Adam Ertel; Joan Kupper; Yi Jing; Eric Londin; Phillipe Loher; Huang-Wen Chen; Paolo Fortina; Isidore Rigoutsos
Journal:  BMC Genomics       Date:  2013-01-16       Impact factor: 3.969

View more
  30 in total

1.  Comparison of library construction kits for mRNA sequencing in the Illumina platform.

Authors:  Yong-Soo Park; Songmi Kim; Dong-Guk Park; Dong Hee Kim; Kyeong-Wook Yoon; Wonseok Shin; Kyudong Han
Journal:  Genes Genomics       Date:  2019-07-26       Impact factor: 1.839

2.  RNA sequencing and swarm intelligence-enhanced classification algorithm development for blood-based disease diagnostics using spliced blood platelet RNA.

Authors:  Myron G Best; Sjors G J G In 't Veld; Nik Sol; Thomas Wurdinger
Journal:  Nat Protoc       Date:  2019-03-20       Impact factor: 13.491

3.  Slowed decay of mRNAs enhances platelet specific translation.

Authors:  Eric W Mills; Rachel Green; Nicholas T Ingolia
Journal:  Blood       Date:  2017-02-17       Impact factor: 22.113

Review 4.  A tour through the transcriptional landscape of platelets.

Authors:  Sebastian Schubert; Andrew S Weyrich; Jesse W Rowley
Journal:  Blood       Date:  2014-06-05       Impact factor: 22.113

5.  Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2.

Authors:  Lukas M Simon; Edward S Chen; Leonard C Edelstein; Xianguo Kong; Seema Bhatlekar; Isidore Rigoutsos; Paul F Bray; Chad A Shaw
Journal:  Am J Hum Genet       Date:  2016-04-28       Impact factor: 11.025

6.  Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction.

Authors:  John D Eicher; Yoshiyuki Wakabayashi; Olga Vitseva; Nada Esa; Yanqin Yang; Jun Zhu; Jane E Freedman; David D McManus; Andrew D Johnson
Journal:  Platelets       Date:  2015-09-14       Impact factor: 3.862

7.  Discovery of PROTAC BCL-XL degraders as potent anticancer agents with low on-target platelet toxicity.

Authors:  Xuan Zhang; Dinesh Thummuri; Xingui Liu; Wanyi Hu; Peiyi Zhang; Sajid Khan; Yaxia Yuan; Daohong Zhou; Guangrong Zheng
Journal:  Eur J Med Chem       Date:  2020-02-27       Impact factor: 6.514

8.  Utilizing PROTAC technology to address the on-target platelet toxicity associated with inhibition of BCL-XL.

Authors:  Xuan Zhang; Dinesh Thummuri; Yonghan He; Xingui Liu; Peiyi Zhang; Daohong Zhou; Guangrong Zheng
Journal:  Chem Commun (Camb)       Date:  2019-12-05       Impact factor: 6.222

9.  Recent Selection Changes in Human Genes under Long-Term Balancing Selection.

Authors:  Cesare de Filippo; Felix M Key; Silvia Ghirotto; Andrea Benazzo; Juan R Meneu; Antje Weihmann; Genís Parra; Eric D Green; Aida M Andrés
Journal:  Mol Biol Evol       Date:  2016-02-01       Impact factor: 16.240

10.  Severe obesity and bariatric surgery alter the platelet mRNA profile.

Authors:  Sean P Heffron; Christian Marier; Manish Parikh; Edward A Fisher; Jeffrey S Berger
Journal:  Platelets       Date:  2018-11-02       Impact factor: 3.862

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.