Literature DB >> 33385022

RNA sequencing data of different grade astrocytoma cell lines.

Juliana Ferreira de Sousa1, Patrick da Silva2, Rodolfo Bortolozo Serafim2,3, Ricardo Perecin Nociti3,4, Cristiano Gallina Moreira2, Wilson Araujo Silva3,5,6, Valeria Valente2,3.   

Abstract

Astrocytomas are the most common and aggressive type of primary brain tumors in adults. The World Health Organization (WHO) assorts them into grades, from I to IV, based on histopathological features that reflect their malignancy [1]. Alongside with tumor progression, comes an increased proliferation, genomic instability, infiltration in normal brain tissue and resistance to treatments. The high genomic instability forges tumor cells enhancing key proteins that avoid cells from collapsing and favor therapy resistance [2]. To explore genes and pathways associated with tumor progression phenotypes we analyzed gene expression in a panel of non-tumor and glioma cell lines, namely: ACBRI371, non-tumor human astrocytes; HDPC, fibroblasts derived from dental pulp; Res186, Res259, Res286 and UW467 that include grade I, II and III astrocytoma cell lines derived from pediatric tumors; and T98G, U343MG, U87MG, U138MG and U251MG, all derived from GBM (grade IV). We also profiled gene expression changes caused by exogenously induced replicative stress, performing RNA sequencing with camptothecin (CPT)-treated cells. Here we describe the RNA-sequencing data set acquired, including quality of reads and sequencing consistency, as well as the bioinformatics strategy used to analyze it. We also compared gene expression patterns and pathway enrichment between non-tumor versus lower-grade (LGG), non-tumor versus GBM, LGG versus GBM, and CPT-treated versus non-treated cells. In brief, a total of 6467 genes showed differential expression and 5 pathways were enriched in tumor progression, while 2279 genes and 7 pathways were altered under the replication stress condition. The raw data was deposited in the NCBI BioProject database under the accession number PRJNA631805. Our dataset is valuable for researchers interested in differential gene expression among different astrocytoma grades and in expression changes caused by replicative stress, facilitating studies that seek novel biomarkers of glioma progression and treatment resistance.
© 2020 Published by Elsevier Inc.

Entities:  

Keywords:  Astrocytoma; Camptothecin (CPT); Gene expression profiling; Glioblastoma; RNAseq; Replicative stress; Tumor progression

Year:  2020        PMID: 33385022      PMCID: PMC7772531          DOI: 10.1016/j.dib.2020.106643

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

These data provide essential information on gene expression profiling of normal astrocytes and different astrocytoma cell lines, and of GBM cells submitted to CPT-induced replicative stress. Scientists who study gene expression regarding astrocytoma progression and its resistance to genotoxic treatments would benefit from this data set to expand their knowledge and apply new insights to the current clinical management of patients. The dataset generated here can be used to design new experiments and projects aiming to use the analyzed cell lines as a model to improve disease progression understanding. This RNA-sequencing dataset can be further explored for the identification of novel biomarkers of prognosis prediction and/or treatment responsiveness. This dataset can also be interrogated intending the identification of potential new target-genes for the development of new drugs and/or therapeutic approaches that sensitize tumor cells to the available treatments.

Data Description

In this report we present the RNA sequencing analysis of different grade astrocytoma cell lines. Astrocytomas are the most common and aggressive type of primary brain tumors in adults. The World Health Organization (WHO) assorts them into grades, from I to IV, based on histopathological features that reflect their malignancy [1]. Grade I comprise benign curable tumors that are more frequent in children. Grade II are considered low-grade lesions, with restrained mitotic activity, but showing infiltrative capacity and tendency to progress to higher grades. Grade III present prominent mitotic activity, nuclear atypia and are also prone to undergo progression to grade IV. Glioblastoma (GBM), the most aggressive type of astrocytoma, is classified as grade IV and exhibits considerably higher mitotic activity, atypia, likewise angiogenesis and necrosis [1,2]. To explore genes and pathways associated with tumor progression we analyzed gene expression in a panel of non-tumor and glioma cell lines. We used two non-tumor cells and 9 cell lines representative of tumor progression, comprising: two non-tumor cells (HDPC and ACBRI371), two grade I (Res186, Res286), one grade II (Res259), one grade III (UW467) astrocytoma cells, and 5 GBM (T98G, U343MG, U87MG, U138MG and U251MG) cell lines. Here we considered cell lines from grades I, II and III as a group representing lower-grade glioma (LGG) and the GBM cell lines as representative of higher-grade glioma (HGG). Additionally, we evaluated the impact of CPT-induced replication stress in the transcriptome of two GBM cells, the most resistant (U138MG) and the most sensitive (U251MG) (data not shown), along with non-tumor control cells. Data collection was obtained in two rounds of sequencing (with Genome Analyzer IIx and with NextSeq 500, Illumina Inc.), to increment the total amount of reads produced and complete the sample set to be studied. We generated from 27.2 to 33.6 million of reads (trimmed/aligned) for libraries sequenced in the first run and from 58.2 to 86.1 million of reads for libraries sequenced in the second run (Table 1). With this dataset we could measure the expression levels of 44,608 genes among all samples evaluated. To verify the consistency of the generated data, we made scatter plots with the number of reads obtained per gene whose expression was detected in each sequenced sample (Fig. 1). Plots depicted the maximum Pearson correlation coefficient (PCC) when comparing different datasets (labeled A, B, C or D) of the same sample, in both rounds of sequencing (#1 and #2), for all groups of cells analyzed: non-tumor (Fig. 1A), LGG (Fig. 1B), GBM (Fig. 1C) and CPT treated GBM cells (Fig. 1D). Among non-tumor cells, we observed decreased PCC values when comparing ACBRI371 cells treated or not with CPT (0.87–0.88) (Fig. 1A). HDPC datasets showed a significant lower correlation with ACBRI371 cells, with PCC varying from 0.45 to 0.51 for the different conditions evaluated (Fig. 1A). In contrast, we detected a high degree of similarity concerning all LGG cells that showed PCC values varying in between 0.98 and 0.99 amongst all comparisons (Fig. 1B). Much larger variation was observed for the GBM cell lines, in which PCC values remained at 0.62 or 0.64 in all comparisons (Fig. 1C). When GBM cells were exposed to CPT, we also observed a reduction in gene expression correlation between treated and non-treated cells, similarly to ACBRI371 cells. However, differences were more pronounced for U138MG (PCC=0.79) than for U251MG (PCC=0.97) cells (Fig. 1D).
Table 1

RNA Sequencing metrics. The total amounts of reads produced for each cell line or condition analyzed were grouped into two datasets (A and B) for the first run (#1) or four datasets (A, B, C and D) for the second run (#2). Counting of the number of raw reads, timmmed reads and aligned reads are shown.

First run #1
SampleDatasets*raw readstrimmed readsaligned readstotal aligned reads per condition
ACBRI371.AA18,777,55716,073,97415,938,79032,064,952
ACBRI371.BB18,812,31116,251,93616,126,162
HDPC.AA15,943,72413,606,44413,504,13327,185,197
HDPC.BB15,979,93113,774,36713,681,064
T98.AA17,728,14715,076,52014,976,75030,171,998
T98.BB17,785,96315,286,11715,195,248
U138.AA16,704,45814,166,95914,071,55228,340,144
U138.BB16,744,62414,355,06414,268,592
U251.AA18,962,97516,244,79916,142,38432,480,625
U251.BB19,015,48916,431,03316,338,241
U343.AA19,758,34016,810,86516,692,12233,601,621
U343.BB19,791,84017,016,54116,909,499
U87.AA19,764,53616,898,82316,788,20032,998,714
U87.BB18,881,40816,305,40916,210,514

Second run #2

ACBRI371 + cpt18hs.AA21,518,09821,095,09020,862,59283,405,190
ACBRI371 + cpt18hs.BB21,195,30620,755,94020,502,694
ACBRI371 + cpt18hs.CC21,913,60821,476,15621,245,612
ACBRI371 + cpt18hs.DD21,516,07421,073,31620,794,292
R186.AA19,595,94219,245,97018,417,84273,482,118
R186.BB19,230,79618,859,75818,014,710
R186.CC19,960,03419,593,56818,755,774
R186.DD19,544,22819,171,26618,293,792
R259.AA18,566,90218,175,51016,400,98665,617,994
R259.BB18,311,80617,885,02016,128,140
R259.CC18,900,30018,492,50016,693,170
R259.DD18,628,41418,200,30016,395,698
R286.AA19,534,34419,152,00217,501,63269,847,434
R286.BB19,192,13418,795,24817,144,668
R286.CC19,876,01019,479,95417,808,962
R286.DD19,487,04819,087,88417,392,172
U138 + cpt18hs.AA22,201,70221,832,01614,562,90058,244,234
U138 + cpt18hs.BB21,898,44621,490,13414,317,296
U138 + cpt18hs.CC22,596,73022,211,56014,821,870
U138 + cpt18hs.DD22,234,62021,823,15414,542,168
U251 + cpt18hs.AA22,296,22421,818,67221,557,69886,098,692
U251 + cpt18hs.BB21,929,75621,428,12221,135,756
U251 + cpt18hs.CC22,702,12822,210,04621,951,788
U251 + cpt18hs.DD22,280,75821,776,94421,453,450
UW467.AA20,931,10820,520,95418,115,33472,368,544
UW467.BB20,578,78620,146,76217,756,036
UW467.CC21,313,47820,886,71618,446,260
UW467.DD20,930,31020,494,22218,050,914

*Refers to groups of reads obtained from different lanes of sequencing runs.

Fig. 1

Scatter plots showing the correlations among datasets of reads for each sample analyzed. Scatter plots were generated with the number of reads per gene for all genes whose expression was identified in each sequenced sample. Plots were clustered into four different groups, according to the cell line origin and/or treatment condition: non-tumor cells, LGG cells, GBM cells and CPT-treated GBM cells. #1 refers to samples sequenced in the Genome Analyzer IIx, while #2 refers to samples sequenced in the NextSeq 500, Illumina Inc. The letters A, B, C and D, accompanying the name of each cell line, refer to reads obtained from different lanes of sequencing for the same sample.

RNA Sequencing metrics. The total amounts of reads produced for each cell line or condition analyzed were grouped into two datasets (A and B) for the first run (#1) or four datasets (A, B, C and D) for the second run (#2). Counting of the number of raw reads, timmmed reads and aligned reads are shown. *Refers to groups of reads obtained from different lanes of sequencing runs. Scatter plots showing the correlations among datasets of reads for each sample analyzed. Scatter plots were generated with the number of reads per gene for all genes whose expression was identified in each sequenced sample. Plots were clustered into four different groups, according to the cell line origin and/or treatment condition: non-tumor cells, LGG cells, GBM cells and CPT-treated GBM cells. #1 refers to samples sequenced in the Genome Analyzer IIx, while #2 refers to samples sequenced in the NextSeq 500, Illumina Inc. The letters A, B, C and D, accompanying the name of each cell line, refer to reads obtained from different lanes of sequencing for the same sample. Differential gene expression detected in the comparisons between the datasets representative of astrocytes, LGG and GBM cell lines are illustrated by the Volcano plots in Fig. 2. We identified a total of 2877 genes differentially expressed between the groups of cells that characterize the progression from LGG to GBM, of which 1698 were down regulated and 1179 were up regulated (Fig. 2A). We also found 2466 altered genes by comparing ACBRI371 and LGG, being 1509 down regulated and 957 up regulated (Fig. 2B), and 1124 differentially regulated genes between ACBRI371 and GBM, being 250 up regulated and 874 down regulated (Fig. 2C). Among the cell lines submitted to a replicative stress condition, we identified 790 genes down regulated and 437 up regulated in ACBRI371 (Fig. 2D), 365 down regulated and 494 unregulated genes in U138MG (Fig. 2E) and 21 down regulated and 172 up regulated genes in U251MG (Fig. 2F). According to KEGG analysis, when considering all the altered genes found, we encountered enriched pathways only for: LGG versus GBM, ACBRI371 versus LGG, ACBRI371 versus GBM, and in ACBRI371 and U138MG cells CPT-treated versus non-treated (Table 2 and Fig. 3). For comparisons representative of tumor progression, the most preeminent pathways were pathways in cancer, PI3K-Akt signaling, neuroactive ligand-receptor interaction, cell adhesion molecules and calcium signaling. While for the comparisons evocative of responses against replication stress, the most enriched pathways were PI3K-Akt signaling, axon guidance, neuroactive ligand-receptor interaction, retrograde endocannabinoid signaling, p53 signaling and apoptosis (Fig. 3). The genes uncovered in each of these pathways are shown in Supplementary Table 1.
Fig. 2

Volcano plots displaying the degree of altered gene expression among groups of cell lines and treated versus non-treated cells. Volcano plots were produced using the fold change values and p-values generated through the DESeq2 R package analysis to compare the mRNA expression changes between LGG vs GBM (A), ACBRI371 vs LGG (B), ACBRI371 vs GBM (C); and Control cells vs CPT treated cells for: ACBRI371 (D), U138MG (E) and U251MG (F). Blue dots show genes with significant p-value. Green dots show genes with significant fold change. Red dots represent genes with significance in both p-value and fold change.

Table 2

KEGG pathways enrichment analysis. Differentially expressed genes (q-values ≤ 0.0001 and log fold change > 2 or < −2) were submitted to KEGG evaluation. Enriched pathways found in each comparison and details of the analysis results are shown.

DOWN LGG - UP GBM
Gene SetDescriptionSizeExpectRatiop valueFDRLOGFOLD > 2Enrichment%
hsa05200Pathways in cancer52628.241.80590.000020660.001347519.69581749
hsa04151PI3K-Akt signaling pathway35419.0062.15722.0102E-060.000218444111.5819209
hsa05165Human papillomavirus infection33918.21.9780.0000583940.00271953610.61946903
hsa04510Focal adhesion19910.6842.80792.13E-070.0000695993015.07537688
hsa04010MAPK signaling pathway29515.8381.89420.000514070.0167593010.16949153
hsa04514Cell adhesion molecules (CAMs)1447.73122.58690.0000782040.00318682013.88888889
hsa04512ECM-receptor interaction824.40253.86151.1754E-060.000191591720.73170732
hsa04933AGE-RAGE signaling pathway in diabetic complications995.31523.19840.0000172180.0013471717.17171717
hsa04668TNF signaling pathway1105.90572.70920.000235050.00851411614.54545455
hsa05144Malaria492.63074.18130.0000426170.00231551122.44897959
DOWN GBM - UP LGGNo enriched pathway
DOWN ACBRI371 - UP GBMNo enriched pathway

DOWN GBM - UP ACBRI371
hsa04080Neuroactive ligand-receptor interaction27710.8662.11660.000519070.022238.303249097
hsa04020Calcium signaling pathway1837.17892.92538.7111E-060.00141992111.47540984
hsa04723Retrograde endocannabinoid signaling1485.80592.75580.000211050.013761610.81081081
hsa04713Circadian entrainment963.7663.71750.0000197040.00214121414.58333333
hsa05032Morphine addiction913.56983.64160.0000488050.00397761314.28571429
hsa04724Glutamatergic synapse1144.47212.90690.000491480.0221311.40350877
hsa04727GABAergic synapse883.45213.18640.000607360.0221112.5
hsa05033Nicotine addiction401.56926.37292.2051E-060.000718871025
hsa05133Pertussis762.98143.35410.000711290.0231881013.15789474
hsa05144Malaria491.92224.16190.000565110.022816.32653061
DOWN ACBRI371 - UP LGGNo enriched pathway
DOWN LGG - UP ACBRI371
hsa04080Neuroactive ligand-receptor interaction27720.881.86780.0000945120.00348823914.07942238
hsa04514Cell adhesion molecules (CAMs)14410.8543.31666.42E-112.09E-083625
hsa04510Focal adhesion199152.06660.0000760630.00348823115.57788945
hsa04512ECM-receptor interaction826.1813.39754.26E-070.0000654652125.6097561
hsa05032Morphine addiction916.85943.06152.6922E-060.000219422123.07692308
hsa04012ErbB signaling pathway856.40712.96540.0000131460.000857091922.35294118
hsa05133Pertussis765.72872.96750.0000369730.00200891722.36842105
hsa05140Leishmaniasis745.5782.86840.0000962990.00348821621.62162162
hsa05033Nicotine addiction403.01514.64336.02E-070.0000654651435
hsa05150Staphylococcus aureus infection564.22123.07970.000203370.00616661323.21428571

UP ACBRI371 CPT
hsa04115p53 signaling pathway7210,21888,0786.71E-030.00021880912.5
hsa05210Apoptosis13619,30141,4480.000662330.04250685.882352941
hsa05222Colorectal cancer8612,20557,3530.000203490.03072878.139534884
hsa04064Small cell lung cancer9313,19953,0360.000330820.03072877.52688172
hsa04210NF-kappa B signaling pathway9513,48251,9200.000377030.03072877.368421053
hsa03018TNF signaling pathway11015,61144,8400.000912700.04250676.363636364
hsa04668RNA degradation7911,21253,5160.000851550.04250667.594936709

DOWN ACBRI371 CPT
hsa05033Neuroactive ligand-receptor interaction27711,12626,0650.00000179770.0000586042910.46931408
hsa04723Retrograde endocannabinoid signaling14859,44645,4202.30E-073.75E-052718.24324324
hsa04724Glutamatergic synapse11445,78948,0465.75E-066.24E-042219.29824561
hsa04727Dopaminergic synapse13152,61738,0102.20E-030.0000102392015.26717557
hsa05032Cell adhesion molecules (CAMs)14457,83934,5790.00000104830.0000427172013.88888889
hsa04713Nicotine addiction4016,06611,8261.11E-123.62E-101947.5
hsa04728GABAergic synapse8835,34650,9258.42E-056.86E-031820.45454545
hsa04514Morphine addiction9136,55149,2461.47E-049.61E-031819.78021978
hsa05150Circadian entrainment9638,55946,6813.56E-040.00000193191818.75
hsa04080Staphylococcus aureus infection5622,49353350.00000161550.0000585181221.42857143
UP U138 CPTNo enriched pathway

DOWN U138 CPT
hsa04151PI3K-Akt signaling pathway35473,93824,3450.000395310.021461185.084745763
hsa04360Axon guidance17536,55149,2461.93E-040.00000628371810.28571429
hsa04015Rap1 signaling pathway20643,02637,1870.00000550990.00089812167.766990291
hsa04510Focal adhesion19941,56436,0890.0000157720.0017139157.537688442
hsa04020Calcium signaling pathway18338,22236,6280.0000259780.0021172147.650273224
hsa04540Gap junction8818,38048,9660.0000838680.0054682910.22727273
hsa05146Amoebiasis9620,05139,8990.000846770.02983088.333333333
hsa04720Long-term potentiation6713,99450,0220.000460810.021461710.44776119
hsa04971Gastric acid secretion7515,66544,6860.000915040.02983079.333333333
hsa05143African trypanosomiasis350.7310268,3970.000729340.029720514.28571429
UP U251 CPTNo enriched pathway

DOWN U251 CPT
hsa05330Allograft rejection380.02035198,2760.000150270.02095625.263157895
hsa05332Graft-versus-host disease410.02195791,0850.000175190.02095624.87804878
hsa04940Type I diabetes mellitus430.02302986,8490.000192850.02095624.651162791
hsa05320Autoimmune thyroid disease530.02838470,4620.000293770.02375623.773584906
hsa05416Viral myocarditis590.03159763,2970.000364360.02375623.389830508
hsa04612Antigen processing and presentation770.04123748.50.000621090.03374622.597402597
Fig. 3

Enriched KEGG pathways in each collection of genes presenting altered expression in the indicated comparisons. All genes with q-values ≤ 0.0001 (adjusted p-value set to avoid identification of false positive enrichments) and log fold change > 2 or < −2 of each comparison were subjected to pathway analysis by KEGG. Comparisons that revealed enriched pathways are shown. A False Discovery Rate (FDR) ≤ 0.05 were used as threshold to select significant pathways. Graphs were plotted with GraphPad Prism 4.0 software.

Volcano plots displaying the degree of altered gene expression among groups of cell lines and treated versus non-treated cells. Volcano plots were produced using the fold change values and p-values generated through the DESeq2 R package analysis to compare the mRNA expression changes between LGG vs GBM (A), ACBRI371 vs LGG (B), ACBRI371 vs GBM (C); and Control cells vs CPT treated cells for: ACBRI371 (D), U138MG (E) and U251MG (F). Blue dots show genes with significant p-value. Green dots show genes with significant fold change. Red dots represent genes with significance in both p-value and fold change. KEGG pathways enrichment analysis. Differentially expressed genes (q-values ≤ 0.0001 and log fold change > 2 or < −2) were submitted to KEGG evaluation. Enriched pathways found in each comparison and details of the analysis results are shown. Enriched KEGG pathways in each collection of genes presenting altered expression in the indicated comparisons. All genes with q-values ≤ 0.0001 (adjusted p-value set to avoid identification of false positive enrichments) and log fold change > 2 or < −2 of each comparison were subjected to pathway analysis by KEGG. Comparisons that revealed enriched pathways are shown. A False Discovery Rate (FDR) ≤ 0.05 were used as threshold to select significant pathways. Graphs were plotted with GraphPad Prism 4.0 software.

Experimental Design, Materials and Methods

Cell culture and treatment

ACBRI371 is a non-tumor human astrocyte cell line that was kindly donated by Prof. Dr. Elza Tiemi Sakamoto Hojo (São Paulo University, Ribeirão Preto, SP, Brazil). HDPC (Human Dental Pulp Cells) is a primary culture of fibroblasts isolated from dental pulp of a 5 years old boy in the laboratory of Dr. C. Costa and Dr. J. Henbling, who gently provided these cells to our laboratory. HDPC were cultivated in standard conditions, with α-MEM (Minimum Essential Medium Eagle) supplemented with 10% of fetal bovine serum and 100 U/mL penicillin and 0.23 mg/mL streptomycin. HDPC was used as an outside cell culture, to be representative of non-brain expression patterns. The cell lines Res186 (grade I), Res286 (grade I), Res259 (grade II) and UW467 (grade III) were all derived from pediatric tumors that were first established by Dr. Michael Bobola (University of Washington, Seattle, WA) and kindly donated to our group by Dr. Fausto Rodriguez (Johns Hopkins University, Baltimore, MD). T98G, U343MG, U87MG, U138MG and U251MG are commercially available GBM cell lines and were obtained from the American Type Culture Collection. The non-tumor and GBM cell lines were grown in high-glucose DMEM, while LGG cells were grown in DMEM-F12. All media used were supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. All cellular stocks were kept in liquid nitrogen before thawed with the appropriate medium. They were all cultured up to a maximum of 75% confluence before and after plating for RNA isolation. To identify differentially expressed genes associated with tumor progression, we simply cultured the panel of cell lines representative of different grade astrocytoma and compared their RNA-sequencing results. For the replicative stress study, we developed preliminary experiments to choose adequate CPT treatment conditions and the most appropriate cell lines for sequencing, caring to induce maximum replicative stress yet keeping viable proliferating cells, and selecting one highly CPT-resistant and another CPT-sensitive GBM cell line. In summary, we conducted: (1) MTT dose-response curve to identify the CPT IC50 for each GBM cell line and pinpoint the most resistant and sensitive cells, which were U138MG (IC50=3.425 nM) and U251MG (IC50=0.05 nM), respectively; ACBRI371 astrocytes presented an intermediate IC50 (1.041 nM) and were also used as control cells (data not shown). (2) H2AX phosphorylation was also accessed in 9 different time points of CPT-treatment at the IC50 for U251MG and U138MG. The peak of H2AX activation was reached around 18 h of treatment, which was then selected as the appropriate time point of analysis (data not shown). Therefore, to evaluate the impact of replicative stress induction on gene expression of GBM cells, we performed RNA-sequencing of U138MG, U251MG and ACBRI371 cell lines treated or not with CPT at the IC50 for 18 h.

RNA isolation and sequencing

Total RNA was isolated with RNeasy Mini Kit (Qiagen) following the manufacturer's instructions. The RNA isolation for each cell line and treatment condition was performed only once (one biological replicate). The density and purity of RNA samples were accessed by 260/280 nm absorbance ratios, measured with NanoDrop spectrophotometer (Thermo Fisher Scientific). The RNA quality was evaluated by electrophoresis in the Bioanalyzer Instrument (Agilent), and samples with an RNA Integrity Number (RIN) ≥ 7 were subsequently utilized. For the library construction, 300 ng of high-quality RNA and the TruSeq Stranded Total RNA LT Sample Prep Kit (Illumina Inc.) were applied. Firstly, the RiboZero technology (Illumina Inc.) was used to remove rRNA and preserve only poly(A) and other non-poly(A) transcripts, which were then fragmented. RNA fragments sized between 200 and 500 bp were utilized to generate the sequencing libraries. Clustering was performed in an automated system cBot (Illumina Inc.) and samples were sequenced with TruSeq SBS kit v5, single-read 72 cycles. We have produced two datasets, one sequenced in Genome Analyzer IIx (GAIIx, Illumina Inc.) and another sequenced in NextSeq 500 (Illumina Inc.) (Table 1). These two datasets are technical replicates obtained from the same RNA extraction. All reagents were utilized following the manufacturer's protocols.

Reads mapping and normalization

The two raw datasets obtained were qualitatively analyzed with FastQC [3]. For each RNA sample analyzed, the GAIIx sequencing generated 8 single-read fastq files obtained from 8 lanes, while the NextSeq 500 dataset yielded 8 paired-end fastq files obtained from 4 lanes. Raw reads quality filtering was accomplished by Trimmomatic [4], removing the Illumina adaptor sequences, low quality bases (phred score quality > 20), and reads shorter than 35 bp. Subsequently, the read error correction was performed by SGA in the preprocess mode (set to -q 25 -f 20 -m 35), followed by index and then correct mode [5]. To evaluate trimming and quality control, the processed reads were inspected by FastQC. Furthermore, the total human transcriptome coverage was assessed for each single-read fastq file or paired-end duo. To achieve a similar coverage distribution for the two different sequencing runs, the GAIIx reads (8 lanes) were randomly grouped in two distinct replicates (A and B), whereas the NextSeq dataset (4 lanes) was kept as four paired-end duos fastq files, each lane representing one replicate (A, B, C and D). This organization was taken forward into the mapping step, in which reads were aligned with the Genome Reference Consortium Human Build 38 (GRCh38) using HISAT2 [6], according to a previously described optimization [7]. Table 1 shows the statistical information for each step.

Differential gene expression

In order to assess differential gene expression, the number of reads for each transcript was calculated by the HTSeq-count algorithm, settings were operated as -f bam -r pos -s no -a 10 -t exon -i gene_id -m intersection-nonempty [8]. We conducted a Pearson correlation assessment for all output data using the R function core. Then, the data were directed to the DESeq2 R package for differential expression analysis [9]. Genes were considered differentially expressed when showing an expression change greater than 2-fold, with a p-value cutoff of 10e−6.

Pathway enrichment analysis

The genes that showed with q-values ≤ 0.0001 (adjusted p-value set to avoid identification of false positive enrichments) and log fold change > 2 or < −2 for each comparison were subjected to pathway analysis using the KEGG database (www.webgestalt.org). A False Discovery Rate (FDR) ≤ 0.05 was used as a threshold to select significant pathways. Pathway enrichment charts were plotted with GraphPad Prism 4.0 software.

Ethics Statement

The authors declare that this study did not involve any human or animal subjects.

CRediT Author Statement

Juliana de Sousa: Investigation, Validation, Writing - Original draft preparation, Writing - Reviewing & Editing; Patrick da Silva: Formal analysis, Validation, Data curation, Visualization, Writing - Original draft; Rodolfo Serafim: Formal analysis, Data curation, Visualization; Ricardo Nociti: Formal analysis; Cristiano Moreira: Supervision; Wilson Silva Jr: Supervision, Resources; Valeria Valente: Conceptualization, Resources, Supervision and Data analysis, Project administration, Funding acquisition, Writing - Original draft and Reviewing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships, which have, or could be perceived to have, influenced the work reported in this article.
SubjectCancer Research
Specific subject areaTranscriptomic changes in different grade astrocytoma cells, comparing gene expression changes that occur in tumor progression or under replicative stress induced by the topoisomerase I inhibitor, Camptothecin (CPT).
Type of dataTranscriptomic dataFiguresTables
How data were acquiredRNA sequencing: Bioanalyzer Instrument (Agilent), lllumina Sequencers Genome Analyzer IIx and NextSeq 500 (Illumina Inc.).Software: FastQC, Trimmomatic, SGA, HISAT2, SAMtools, HTSeq-count, DESeq2
Data formatRaw: sra file format (repository link below)Analyzed: excel spreadsheet, tif figures
Parameters for data collectionCells were grown under standard conditions or treated with CPT for 18 h, then RNA isolation and sequencing were performed.
Description of data collectionTotal RNA was isolated using RNeasy mini kit (Qiagen), RNA quality was evaluated by Bioanalyzer (Agilent), rRNA was removed from samples and then samples were clustered and sequenced.
Data source locationInstitution: Ribeirão Preto Blood BankRibeirão Preto, São Paulo, BrazilCoordinates: 21°11′18.1″S 47°48′17.3″W (−21.188357, −47.804813)
Data accessibilityRaw data is available at NCBI BioProject repository under the identification number PRJNA631805. Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA631805
  8 in total

Review 1.  The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary.

Authors:  David N Louis; Arie Perry; Guido Reifenberger; Andreas von Deimling; Dominique Figarella-Branger; Webster K Cavenee; Hiroko Ohgaki; Otmar D Wiestler; Paul Kleihues; David W Ellison
Journal:  Acta Neuropathol       Date:  2016-05-09       Impact factor: 17.088

2.  Efficient de novo assembly of large genomes using compressed data structures.

Authors:  Jared T Simpson; Richard Durbin
Journal:  Genome Res       Date:  2011-12-07       Impact factor: 9.043

3.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors:  Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal:  Nat Biotechnol       Date:  2019-08-02       Impact factor: 54.908

4.  Simulation-based comprehensive benchmarking of RNA-seq aligners.

Authors:  Giacomo Baruzzo; Katharina E Hayer; Eun Ji Kim; Barbara Di Camillo; Garret A FitzGerald; Gregory R Grant
Journal:  Nat Methods       Date:  2016-12-12       Impact factor: 28.547

5.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

6.  HTSeq--a Python framework to work with high-throughput sequencing data.

Authors:  Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal:  Bioinformatics       Date:  2014-09-25       Impact factor: 6.937

7.  DNA repair genes in astrocytoma tumorigenesis, progression and therapy resistance.

Authors:  Juliana Ferreira de Sousa; Rodolfo Bortolozo Serafim; Laura Marise de Freitas; Carla Raquel Fontana; Valeria Valente
Journal:  Genet Mol Biol       Date:  2019-12-13       Impact factor: 1.771

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.