| Literature DB >> 30720798 |
Susanna Zucca1, Stella Gagliardi1, Cecilia Pandini1,2, Luca Diamanti3,4, Matteo Bordoni1,3, Daisy Sproviero1, Maddalena Arigoni5, Martina Olivero5, Orietta Pansarasa1, Mauro Ceroni3,4, Raffaele Calogero5, Cristina Cereda1.
Abstract
Coding and long non-coding RNA (lncRNA) metabolism is now revealing its crucial role in Amyotrophic Lateral Sclerosis (ALS) pathogenesis. In this work, we present a dataset obtained via Illumina RNA-seq analysis on Peripheral Blood Mononuclear Cells (PBMCs) from sporadic and mutated ALS patients (mutations in FUS, TARDBP, SOD1 and VCP genes) and healthy controls. This dataset allows the whole-transcriptome characterization of PBMCs content, both in terms of coding and non-coding RNAs, in order to compare the disease state to the healthy controls, both for sporadic patients and for mutated patients. Our dataset is a starting point for the omni-comprehensive analysis of coding and lncRNAs, from an easy to withdraw, manage and store tissue that shows to be a suitable model for RNA profiling in ALS.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30720798 PMCID: PMC6362931 DOI: 10.1038/sdata.2019.6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
RNA-seq profiling to evaluate differential gene expression. All samples had the same source (blood sample) and were processed with the following steps: PBMCs isolation, RNA extraction and RNA-seq.
| Sample | GEO | Sex | Age | Age of onset | Mutation |
|---|---|---|---|---|---|
| All samples included in this study were Italian. All ALS patients had spinal onset and sample collection was performed after 6 months from exordium. | |||||
| CTRL1 | GSE106443 | M | 48 | na | na |
| CTRL2 | GSE106443 | M | 60 | na | na |
| CTRL3 | GSE106443 | M | 68 | na | na |
| CTRL4 | GSE115259 | M | 38 | na | na |
| CTRL5 | GSE115259 | F | 36 | na | na |
| CTRL6 | GSE115259 | F | 40 | na | na |
| CTRL7 | GSE115259 | F | 40 | na | na |
| ALS-s1 | GSE106443 | M | 66 | 65 | na |
| ALS-s2 | GSE106443 | M | 63 | 61 | na |
| ALS-s3 | GSE106443 | F | 61 | 60 | na |
| ALS-s4 | GSE106443 | F | 70 | 68 | na |
| ALS-s5 | GSE106443 | M | 56 | 55 | na |
| ALS-s6 | GSE106443 | F | 55 | 53 | na |
| ALS-s7 | GSE106443 | F | 56 | 55 | na |
| ALS-s8 | GSE106443 | F | 83 | 73 | na |
| ALS-s9 | GSE106443 | M | 66 | 60 | na |
| ALS-s10 | GSE106443 | F | 58 | 58 | na |
| ALS-s11 | GSE106443 | M | 68 | 65 | na |
| ALS-s12 | GSE115259 | M | 86 | 84 | na |
| ALS-s13 | GSE115259 | F | 68 | 65 | na |
| ALS-s15 | GSE115259 | M | 71 | 70 | na |
| ALS-s16 | GSE115259 | F | 69 | 67 | na |
| FUS-m1 | GSE106443 | M | 49 | 48 | FUS R521C |
| FUS-m2 | GSE106443 | F | 50 | 49 | FUS P160R |
| FUS-m3 | GSE115259 | M | 57 | 56 | FUS G302A |
| SOD1-m1 | GSE106443 | M | 50 | 49 | SOD1 L106F |
| SOD1-m2 | GSE106443 | F | 74 | 73 | SOD1 G147S |
| SOD1-m3 | GSE115259 | F | 58 | 57 | SOD1 G93D |
| TARDBP-m1 | GSE106443 | F | 52 | 50 | TARDBP A382T |
| TARDBP-m2 | GSE106443 | M | 68 | 66 | TARDBP G357D |
| VCP-m1 | GSE115259 | M | 51 | 50 | VCP R191Q |
RNA-seq read statistics. All RNA samples used to perform RNA-Seq analaysis had RIN > = 8 and 260/280 ratio > 1.5, as recommended in manufacturer’s protocol.
| Sample | Number of input reads | Average input read length | Uniquely mapped reads % | Number of reads mapped to multiple loci | % of reads mapped to multiple loci | Seq. batch |
|---|---|---|---|---|---|---|
| CTRL1 | 9.85E + 07 | 2 × 75 | 22.68% | 5.55E + 07 | 56.38% | 1 |
| CTRL2 | 8.51E + 07 | 2 × 75 | 25.65% | 5.49E + 07 | 64.53% | 1 |
| CTRL3 | 9.00E + 07 | 2 × 75 | 26.99% | 5.53E + 07 | 61.47% | 2 |
| CTRL4 | 2.00E + 07 | 2 × 75 | 6.50% | 1.23E + 07 | 61.53% | 2 |
| CTRL5 | 2.28E + 08 | 2 × 75 | 11.02% | 1.71E + 08 | 74.93% | 3 |
| CTRL6 | 1.05E + 08 | 2 × 150 | 4.17% | 2.88E + 07 | 27.54% | 4 |
| CTRL7 | 5.53E + 07 | 2 × 150 | 2.96% | 1.77E + 07 | 31.95% | 4 |
| ALS-s1 | 8.30E + 07 | 2 × 75 | 26.69% | 3.96E + 07 | 47.72% | 1 |
| ALS-s2 | 9.25E + 07 | 2 × 75 | 27.52% | 5.73E + 07 | 61.92% | 1 |
| ALS-s3 | 9.39E + 07 | 2 × 75 | 33.88% | 5.23E + 07 | 55.73% | 1 |
| ALS-s4 | 9.10E + 07 | 2 × 75 | 37.71% | 2.56E + 07 | 28.17% | 1 |
| ALS-s5 | 9.18E + 07 | 2 × 75 | 31.32% | 5.51E + 07 | 60.00% | 2 |
| ALS-s6 | 8.74E + 07 | 2 × 75 | 32.43% | 5.29E + 07 | 60.55% | 2 |
| ALS-s7 | 1.03E + 08 | 2 × 75 | 17.58% | 7.60E + 07 | 73.79% | 2 |
| ALS-s8 | 7.73E + 07 | 2 × 75 | 28.06% | 4.71E + 07 | 60.97% | 2 |
| ALS-s9 | 7.32E + 07 | 2 × 75 | 24.14% | 3.69E + 07 | 50.35% | 3 |
| ALS-s10 | 9.12E + 07 | 2 × 75 | 14.76% | 5.79E + 07 | 63.49% | 3 |
| ALS-s11 | 7.83E + 07 | 2 × 75 | 34.72% | 4.48E + 07 | 57.26% | 3 |
| ALS-s12 | 4.11E + 07 | 2 × 150 | 7.12% | 2.12E + 07 | 51.64% | 4 |
| ALS-s13 | 7.17E + 07 | 2 × 150 | 17.87% | 3.76E + 07 | 52.36% | 4 |
| ALS-s15 | 5.36E + 07 | 2 × 150 | 10.09% | 3.52E + 07 | 65.63% | 4 |
| ALS-s16 | 8.36E + 07 | 2 × 150 | 7.68% | 5.53E + 07 | 66.15% | 4 |
| FUS-m1 | 9.63E + 07 | 2 × 150 | 22.49% | 6.84E + 07 | 71.03% | 4 |
| FUS-m2 | 9.82E + 07 | 2 × 75 | 13.62% | 7.33E + 07 | 74.65% | 2 |
| FUS-m3 | 1.65E + 06 | 2 × 150 | 10.43% | 1.28E + 07 | 81.24% | 4 |
| SOD1-m1 | 1.14E + 08 | 2 × 75 | 28.38% | 6.86E + 07 | 60.15% | 3 |
| SOD1-m2 | 8.50E + 07 | 2 × 75 | 31.92% | 4.88E + 07 | 57.46% | 1 |
| SOD1-m3 | 3.51E + 07 | 2 × 75 | 22.36% | 2.33E + 07 | 66.42% | 3 |
| TARDBP-m1 | 1.16E + 08 | 2 × 75 | 31.21% | 6.21E + 07 | 53.46% | 3 |
| TARDBP-m2 | 7.94E + 07 | 2 × 75 | 29.40% | 3.50E + 07 | 44.14% | 3 |
| VCP-m1 | 2.98E + 07 | 2 × 75 | 13.46% | 2.07E + 07 | 69.66% | 1 |
Figure 1Quality assessment FASTQ data.
Quality assessment of raw FASTQ sequence data for paired end and left (sample_name_R1) and right reads (sample_name_R2). Box and whisker plots demonstrate the distribution of per base quality for each left and right read position read for each of the analyzed samples. Mean value is indicated by the blue line and the yellow box represents the interquartile range (25–75%) with the lower and upper whiskers represent the 10 and 90% points, respectively. Plots were generated by FastQC program (see Code Availability 1).
Figure 2Experimental overiview and evaluation of sample variance.
(a) The flowchart represents RNA-Seq workflow and data analysis. (b) An estimate of the dispersion parameter for each gene is shown. (c) Principal component analysis results. (d) Heatmap showing the sample-to-sample distance. It was obtained with DeSeq2 package on regularized-logarithm transformed counts. Color code is reported above the heatmap.
Figure 3Detected transcripts per sample.
Number of detected transcripts per sample for coding (dark grey bars) and lncRNAs (light grey bars), separately. Transcripts were considered if covered by at least 5 reads. Dark grey bars represent coding RNAs while light grey bars represent lncRNAs.