Literature DB >> 29064465

Small non-coding RNA transcriptome of the NCI-60 cell line panel.

Erin A Marshall1, Adam P Sage1, Kevin W Ng1, Victor D Martinez1, Natalie S Firmino1, Kevin L Bennewith1, Wan L Lam1.   

Abstract

Only 3% of the transcribed human genome is translated into protein, and small non-coding RNAs from these untranslated regions have demonstrated critical roles in transcriptional and translational regulation of proteins. Here, we provide a resource that will facilitate cell line selection for gene expression studies involving sncRNAs in cancer research. As the most accessible and tractable models of tumours, cancer cell lines are widely used to study cancer development and progression. The NCI-60 panel of 59 cancer cell lines was curated to provide common models for drug screening in 9 tissue types; however, its prominence has extended to use in gene regulation, xenograft models, and beyond. Here, we present the complete small non-coding RNA (sncRNA) transcriptomes of these 59 cancer cell lines. Additionally, we examine the abundance and unique sequences of annotated microRNAs (miRNAs), PIWI-interacting RNAs (piRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs), and reveal novel unannotated microRNA sequences.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 29064465      PMCID: PMC5654365          DOI: 10.1038/sdata.2017.157

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The NCI-60 Human Tumour Cell Lines Screen is an initiative started by the National Institutes of Health (NIH) in the late 1980s, focusing on the development of 59 human tumour cell lines for use as an in vitro drug screen model[1-3] (Table 1 (available online only)). These cell lines, derived from nine solid and blood malignancies, have shown great utility both in its original purpose for therapeutic screening as well as in basic cancer research (reviewed by Shoemaker et al.[4]). They have since been extensively characterized for various molecular features, including karyotypic complexity[1], DNA fingerprinting[2], gene expression microarray profiling[5,6], and human leukocyte antigen typing[3]. However, the small non-coding RNA (sncRNA) transcriptomes of the NCI-60 cell lines have yet to have been reported at the sequencing level.
Table 1

NCI-60 cell line characteristics.

TissueCell lineSubtypeCell type of originDoubling time (hrs)Year of originCulture conditions
BREASTBT-549Ductal CarcinomaEpithelial53.91978RPMI-1640+10% FBS
 HS-578TCarcinomaEpithelial53.81977DMEM+10% FBS
 MCF-7AdenocarcinomaEpithelial25.41973MEM+10% FBS
 MDA-MB-231AdenocarcinomaEpithelial41.91974L-15 Medium+10% FBS
 T-47DDuctal CarcinomaEpithelial45.51981DMEM+10% FBS
CNSSF-539Gliblastoma multiformeRight Temporal Lobe35.41986MEM+10% FBS
 SF-295GlioblastomaLeft Temporal Lobe29.51986RPMI-1640+10% FBS
 SF-268Highly Anaplastic AstrocytomaRight Parietal Lobe33.11987RPMI-1640+10% FBS
 U251GlioblastomaNon-Epithelial (CNS)23.81978EMEM+10% FBS
 SNB-75GlioblastomaNon-Epithelial (CNS)62.81988RPMI-1640+10% FBS
 SNB-19AstrocytomaLeft parieto-occipital34.61980Ham's F10+10% FBS
COLONCOLO 205Dukes' Type D, colorectal adenocarcinomaEpithelial23.81975RPMI-1640+10% FBS
 HCT-15Colon adenocarcinomaEpithelial20.61979RPMI-1640+10% FBS
 HCC2998Colon adenocarcinomaEpithelial31.51988RPMI-1640+10% FBS
 HCT-116Colon adenocarcinomaEpithelial17.41981McCoy's 5a+10% FBS
 HT-29Colon adenocarcinomaEpithelial19.51964RPMI-1640+10% FBS
 KM12Colon adenocarcinomaEpithelial23.71988DMEM+10% FBS
 SW-620Dukes' Type D, colorectal adenocarcinomaEpithelial20.41977L15+10% FBS
LEUKEMIACCRF-CEMAcute lymphoblastic leukemiaT lymphoblast26.71964RPMI-1640+10% FBS
 HL-60(TB)Acutre promyelocytic leukemiaPromyeloblast28.61977IMDM+20% FBS
 K-562Chronic myelogenous leukemiaLymphoblast19.61975IMDM+20% FBS
 MOLT-4Acute lymphoblastic leukemiaLymphoblast27.91980RPMI-1640+10% FBS
 RPMI 8226Plasmacytoma; myelomaB Lymphocyte33.51966RPMI-1640+10% FBS
 SRLarge cell immunoblastic lymphomaLymphoblast28.71983RPMI-1640+10% FBS
MELANOMAMALME-3MMalignant melanomaFibroblast46.21975IMDM+20% FBS
 LOX-IMVIAmelanotic melanomaSkin20.51988RPMI-1640+10% FBS
 M14Malignant melanomaSkin26.31976RPMI-1640+10% FBS
 MDA-MB-435Previously described as ductal carcinomaMelanocyte25.81976L15+10% FBS
 SK-MEL-28Cutaneous melanomaSkin35.11976DMEM+10% FBS
 SK-MEL-5Malignant melanomaStellate25.21977MEM+10% FBS
 SK-MEL-2Malignant melanomaSkin45.51975MEM+10% FBS
 UACC-257Malignant melanomaSkin38.51991RPMI-1640+10% FBS
 UACC-62Melanotic melanomaSkin31.31991RPMI-1640+10% FBS
NSCLCA549AdenocarcinomaEpithelial22.91972RPMI-1640+10% FBS
 EKVXAdenocarcinomaEpithelial43.61988RPMI-1640+10% FBS
 HOP 92Large cell carcinomaEpithelial79.51991RPMI-1640+10% FBS
 HOP 62AdenocarcinomaEpithelial391989RPMI-1640+10% FBS
 NCI-H23AdenocarcinomaEpithelial33.41980RPMI-1640+10% FBS
 NCI-H322MSmall cell bronchioloalveolar carcinomaEpithelial35.31991RPMI-1640+10% FBS
 NCI-H226Squamous cell carcinoma; mesotheliomaEpithelial611980RPMI-1640+10% FBS
 NCI-H460Carcinoma; large cell lung cancerEpithelial17.81982RPMI-1640+10% FBS
 NCI-H522AdenocarcinomaEpithelial38.21985RPMI-1640+10% FBS
OVARIANIGR-OV1Ovarian endometrioid adenocarcinomaEndometreoid311985RPMI-1640+10% FBS
 NCI/ADR-RESHigh Grade Ovarian Serous AdenocarcinomaEpithelial341986RPMI-1640+20% FBS
 OVCAR-3AdenocarcinomaEpithelial34.71982RPMI-1640+10% FBS
 OVCAR-8High Grade Ovarian Serous AdenocarcinomaEpithelial26.41984RPMI-1640+10% FBS
 OVCAR-4High Grade Ovarian Serous AdenocarcinomaEpithelial41.41984RPMI-1640+10% FBS
 OVCAR-5High Grade Ovarian Serous AdenocarcinomaEpithelial48.81984RPMI-1640+10% FBS
 SK-OV-3AdenocarcinomaEpithelial48.71973RPMI-1640+10% FBS
PROSTATEDU-145Prostate carcinomaEpithelial32.31978MEM+10% FBS
 PC-3AdenocarcinomaEpithelial27.11980F-12K+10% FBS
RENALA498CarcinomaEpithelial66.81977MEM+10% FBS
 CAKI-1Clear cell renal cell carcinomaEpithelial391975McCoy's 5a+10% FBS
 786-0Renal cell adenocarcinomaEpithelial22.41976RPMI-1640+10% FBS
 ACHNRenal cell adenocarcinomaEpithelial27.51979MEM+10% FBS
 RXF393Renal cell carcinomaEpithelial62.91991RPMI-1640+10% FBS
 SN12CRenal cell carcinomaEpithelial29.51986DMEM+10% FBS
 TK-10Clear cell renal cell carcinomaEpithelial51.31987RPMI-1640+10% FBS
 UO-31Renal cell carcinomaEpithelial41.71991DMEM+10% FBS
The advent of next-generation sequencing has revealed the large proportion of non-coding genes in the human genome, and the relevance of these non-coding species in regulating the expression of both neighbouring and distant protein-coding genes. In the context of cancer, microRNAs (miRNAs) remain the best-studied non-coding RNA species, and have been implicated in all stages of cancer: initiation, progression, and response to therapy (reviewed by Hayes et al.[7]). Recent advances in the bioinformatic tools used for the discovery of small non-coding RNA have considerably expanded the number of known miRNA sequences[8]. Other types of sncRNA, including PIWI-interacting RNAs (piRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs) are emerging topics in cancer biology (reviewed by Ng et al. and Mannoor et al.[9,10]). Beyond their functions in gene regulation, sncRNAs are attractive prognostic biomarkers due to their abundance and stability in various biofluids[11]. We sequenced the sncRNA transcriptomes of the 59 cell lines in the panel (Fig. 1). SncRNA profiles were generated using the OASIS analysis platform v2.0 (ref. 12). For known sncRNA species (miRNAs, piRNAs, snoRNA, snRNA, and rRNA), high quality reads were mapped to the hg38 build of the human genome and quantified based on annotations containing their specific chromosomal locations. Detection of novel miRNAs was performed using well-established prediction algorithms that assess reads for miRNA folding characteristics, among other factors that indicate the probability that the tested sequence belongs to the miRNA family of sncRNAs[13]. In total, the genomic loci of 49,961 sncRNAs were examined. Using a detection threshold of greater than or equal to 5 reads across all tissues, we detected a total of 24,621 unique sncRNAs [Data Citation 1].
Figure 1

Experimental workflow.

Graphical representation of experimental procedure used to extract, process, and analyze RNA from cell lines.

We then examined the genomic distribution of the detected sncRNAs across all tissue types (Table 2,Fig. 2). Notably, sncRNAs are expressed across all chromosomes in every tissue type assessed. SncRNA loci commonly expressed among all tissues may indicate their involvement in preserved biological or cancer-relevant processes, whereas differences in expression may denote tissue specificity.
Table 2

Average number of sncRNA species detected and sequencing coverage per tissue type.

 Number of sncRNA
Sequencing details
 TotalmiRNAnovel miRNApiRNAsnoRNAsnRNAOtherAverage number of reads per sampleAverage contig lengthAvg. quality%GCAverage coverage
Tissue type24,7942,50928819,0182591,60241223,457,23322.3433.2546.26%28.94
Breast11,0791,9051836,9092391,24840325,310,15622.3233.2345.81%25.36
CNS10,1201,7931806,1502361,17539725,633,60822.3533.3444.57%45.08
Colon13,9851,9772118,0502321,27639225,850,34922.2233.246.15%22.97
Leukemia10,7281,8411855,6042281,11238718,921,56222.4133.2548.77%20.23
Melanoma9,5361,8301794,6942241,02038316,116,67122.3632.6446.56%28.45
NSCLC15,7072,0512329,3852361,29339823,407,81222.3433.346.18%22.8
Ovarian10,4221,9161886,3122381,16739827,221,87022.434.1246.56%41.49
Prostate6,1671,3931212,58921677134435,637,05322.333.5444.44%30.11
Renal14,5321,9432098,5492341,28839623,949,37322.33345.99%27.05
Figure 2

Genome-wide distribution of expressed small non-coding RNA by tissue type.

Genomic position of sncRNAs detected (reads≥5) in each tissue type in reference to the hg38 chromosome build karyotype. From inner-most ring to outer: breast (red), CNS (magenta), colon (purple), leukemia (blue), melanoma (teal), lung (green), ovarian (yellow), prostate (orange), and renal (red).

We also examined the relative frequency of detection for each sncRNA species, both in the entire NCI-60 cell line panel and in lines grouped by organ type (Fig. 3a). Beyond those annotated in miRBase (v.21), novel unannotated miRNAs were determined by integrating secondary structure formation potential with free energy scoring[14]. These novel miRNAs represent an increase of approximately 10% of total miRNAs expressed across all tissue types (Fig. 3b), highlighting the constant expansion of the known non-coding transcriptome as sequencing technologies and bioinformatic tools advance. Consistent with the number of annotated loci in the human genome, piRNAs represent the largest proportion of sncRNA species expressed, followed by miRNA and snRNA (Fig. 3a). Of note, an appreciable number of tissue-specific piRNA sequences across all tissues analyzed increased the relative fraction of piRNAs for all tissues expressed (Fig. 3c,d). Thus, as parts of the small non-coding RNA transcriptome are significantly understudied, we provide this resource to the research community for studying sncRNA-related genetic and epigenetic regulation in cancer using the NCI-60 cell models.
Figure 3

sncRNA distribution by tissue type.

(a) Relative fraction of sncRNA species detected per tissue type. (b) Average fraction of currently annotated (blue) and novel unannotated (red) miRNA per tissue type. (c) Relative fraction of tissue-specific unique sncRNA sequences detected per tissue type. (d) Fraction of tissue-specific unique sncRNA species.

Methods

Cell line and sequencing information

Cell line doubling times were obtained directly from the National Institutes of Health NCI (https://dtp.cancer.gov/discovery_development/nci-60/cell_list.htm), and year-of-origin information refers to data of first publication containing the cell line (Table 1 (available online only)). Cell lines were obtained directly from the National Cancer Institute (NCI), were thawed and passaged twice precisely before total RNA was manually extracted using phenol-chloroform protocols from all cell lines using Trizol reagent (Invitrogen, CA, USA). 5,000 ng of extracted RNA per sample was used for sequencing input. Sequencing was performed in accordance with The Cancer Genome Atlas miRNA sequencing protocol (described by Chu et al.[15]). Briefly, after ligation to adaptors, 15 cycles of PCR was performed for amplification (98 °C-15 s, 62 °C-30 s and 72 °C-15 s), followed by 5 min at 72 °C. Small RNA exclusion was performed using gel extraction on a 3% MetaPhor Agarose gel (Lonza Inc., Basel, Switzerland), selecting species shorter that 200 nucleotides in order to enrich for targets optimized at 22 nucleotides in length, and was subsequently ethanol-precipitated. Library quality was confirmed by analysis on the Agilent Bioanalyzer DNA1000 chip (Agilent Technologies). Small non-coding RNA sequencing was performed on the Illumina HiSeq2500 platform at the Michael Smith Genome Sciences Centre at the BC Cancer Research Centre, with 8 multiplexed libraries per sequencing lane (Table 3 (available online only), Fig. 1)[15,16]. Data resulting from small non-coding RNA sequencing can be found on the Sequence Read Archive [Data Citation 2].
Table 3

Sequencing quality metrics for sequenced cell line

TissueExternal IDSample name (.fastq)Total readsTotal alignmentsAlignedTotal unalignedUnalignedTotal uniqueUniqueTotal non-uniqueNon-uniqueCoverageAvg. coverage depthAvg. lengthAvg. quality%GC
BreastBT-549MX1381-C9C07ANXX-1-CACTGT26,779,75720,087,59374.84%6,738,93225.16%19,994,12274.66%46,7030.17%0.62%23.3622.3333.0247.70%
 HS-578TMX1383-C9C07ANXX-3-GGGGTT24,961,80920,875,16883.42%4,137,59016.58%20,773,35883.22%50,8610.20%0.35%41.9522.1334.1945.13%
 MCF-7MX1384-C9C07ANXX-4-CTGGGT43,741,14232,170,06973.27%11,690,61726.73%31,931,26373.00%119,2620.27%0.81%28.4622.2733.5542.37%
 MDA-MB-231MX1384-C9C07ANXX-4-GCCGGT16,798,22711,552,25168.68%5,261,55131.32%11,521,12368.59%15,5530.09%0.43%19.4322.4533.2546.01%
 T-47DMX1387-C9C07ANXX-7-CTAAGG14,269,8479,004,91863.04%5,274,37236.96%8,986,04162.97%9,4340.07%0.48%13.5822.432.1647.83%
CNSSF-268MX1386-C9C07ANXX-6-TAGTTG35,537,77532,273,09690.47%3,385,6419.53%32,031,46390.13%120,6710.34%0.28%84.2922.3534.6846.13%
 SF-295MX1386-C9C07ANXX-6-CCGGTG26,519,31923,694,54389.10%2,890,21510.90%23,563,80688.86%65,2980.25%0.24%70.6622.3934.4342.31%
 SF-539MX1386-C9C07ANXX-6-ATCGTG25,500,55823,012,41890.00%2,549,67010.00%22,889,44889.76%61,4400.24%0.25%65.7922.2534.4946.43%
 SNB-19MX1387-C9C07ANXX-7-GCGTGG32,717,87221,567,74365.75%11,204,28534.25%21,459,53165.59%54,0560.17%0.85%18.322.4432.2443.60%
 SNB-75MX1387-C9C07ANXX-7-CATGGG12,680,8157,882,30862.10%4,805,81637.90%7,867,69362.04%7,3060.06%0.37%15.1322.232.0545.20%
 U251MX1387-C9C07ANXX-7-ATTCCG20,845,30813,006,36562.30%7,858,91537.70%12,966,43562.20%19,9580.10%0.58%16.2922.4732.1743.76%
ColonCOLO 205MX1381-C9C07ANXX-1-TCAAGT18,351,75213,907,29875.66%4,466,91824.34%13,862,40175.54%22,4330.12%0.44%22.7722.2333.1945.58%
 HCC2998MX1382-C9C07ANXX-2-GTAGCC22,418,94316,741,25074.53%5,710,38025.47%16,675,92174.38%32,6420.15%0.88%13.421.8332.8347.12%
 HCT-116MX1382-C9C07ANXX-2-TACAAG19,321,62413,822,47971.42%5,521,56128.58%13,777,66271.31%22,4010.12%0.54%18.4322.2333.0646.02%
 HCT-15MX1382-C9C07ANXX-2-ATGTTT22,372,57715,816,34670.57%6,585,27829.43%15,758,28170.44%29,0180.13%0.62%18.1522.2432.9246.00%
 HT-29MX1383-C9C07ANXX-3-CAAGTT43,295,01336,704,35184.42%6,746,83415.58%36,392,45984.06%155,7200.36%0.64%40.7922.2334.1945.58%
 KM12MX1383-C9C07ANXX-3-GTCCTT21,097,77617,022,53180.52%4,108,87319.48%16,955,32180.37%33,5820.16%0.41%29.6822.3534.0346.33%
 SW-620MX1387-C9C07ANXX-7-TTGCGG34,094,75522,349,35265.38%11,803,91634.62%22,232,42065.21%58,4190.17%0.92%17.5722.4632.2146.44%
LeukemiaCCRF-CEMMX1381-C9C07ANXX-1-GATCTG16,675,72812,545,48075.12%4,148,47024.88%12,509,05475.01%18,2040.11%0.40%22.5522.4833.449.05%
 HL-60(TB)MX1382-C9C07ANXX-2-TGCTTT14,448,3749,588,87766.29%4,870,20733.71%9,567,46766.22%10,7000.07%0.51%13.522.1232.7249.74%
 K-562MX1383-C9C07ANXX-3-TCGCTT35,495,59328,500,40380.03%7,089,71119.97%28,311,60379.76%94,2790.27%0.60%34.4622.4334.0848.92%
 MOLT-4MX1384-C9C07ANXX-4-GAGAGT17,384,26112,413,01671.30%4,989,14128.70%12,377,23771.20%17,8830.10%0.49%18.2522.4933.6148.27%
 RPMI 8226MX1385-C9C07ANXX-5-AGGAAT17,460,56514,241,84281.43%3,242,34018.57%14,194,63781.30%23,5880.14%0.46%22.3722.4734.0648.91%
 SRMX1387-C9C07ANXX-7-CCACTC12,064,8506,751,73955.92%5,318,41644.08%6,741,13155.87%5,3030.04%0.48%10.2322.4831.6247.70%
MelanomaLOX-IMVIMX1383-C9C07ANXX-3-CCTATT31,657,83125,924,12181.64%5,812,01118.36%25,767,65381.39%78,1670.25%0.47%39.922.3834.145.79%
 M14MX1383-C9C07ANXX-3-GTTTGT16,456,81013,115,28979.57%3,361,43420.43%13,075,47979.45%19,8970.12%0.32%29.622.4534.0847.29%
 MALME-3MMX1383-C9C07ANXX-3-AGATGT97,64042,45443.48%55,18756.52%42,45243.48%10%0.01%4.8522.3329.7646.02%
 MDA-MB-435MX1384-C9C07ANXX-4-TATCGT16,462,11211,741,08471.22%4,736,98328.78%11,709,18371.13%15,9460.10%0.50%16.922.3633.447.73%
 SK-MEL-2MX1386-C9C07ANXX-6-TGAGTG30,951,87027,402,64788.25%3,636,21311.75%27,228,84087.97%86,8170.28%0.32%61.5622.5134.5246.50%
 SK-MEL-28MX1386-C9C07ANXX-6-CGCCTG14,712,87812,743,24486.48%1,988,45013.52%12,705,63886.36%18,7900.13%0.22%41.5722.2634.2446.57%
 SK-MEL-5MX1386-C9C07ANXX-6-GCCATG12,394,33310,855,86087.48%1,552,17312.52%10,828,47487.37%13,6860.11%0.19%41.2622.534.446.38%
 UACC-257MX1388-C9C07ANXX-8-AGCTAG46,0629,21320.00%36,84980.00%9,21320.00%00%0%2.1322.1726.5546.63%
 UACC-62MX1388-C9C07ANXX-8-GTATAG22,270,50115,834,59370.97%6,465,11429.03%15,776,22570.84%29,1620.13%0.62%18.2622.2432.746.13%
NSCLCA549MX1381-C9C07ANXX-1-GCCTAA24,542,33718,533,59375.35%6,048,72124.65%18,453,69675.19%39,9200.16%0.53%25.1822.3933.1845.04%
 EKVXMX1382-C9C07ANXX-2-AAGCTA18,358,00013,056,34071.01%5,321,50128.99%13,016,66970.90%19,8300.11%0.46%20.0822.1332.9945.71%
 HOP 62MX1382-C9C07ANXX-2-GCATTT22,874,59216,227,43470.81%6,678,20929.19%16,165,37570.67%31,0080.14%0.58%19.8322.1832.8645.15%
 HOP 92MX1382-C9C07ANXX-2-CGTACG16,671,00511,307,21367.74%5,378,66132.26%11,277,49267.65%14,8520.09%0.49%16.6322.1632.745.80%
 NCI-H226MX1384-C9C07ANXX-4-TCTTCT18,745,70212,999,21569.24%5,766,39930.76%12,959,42069.13%19,8830.11%0.52%17.9222.3833.2847.25%
 NCI-H23MX1384-C9C07ANXX-4-CTATCT19,234,42913,677,42671.00%5,578,69429.00%13,634,06670.88%21,6690.11%0.51%19.4722.4433.3648.52%
 NCI-H322MMX1384-C9C07ANXX-4-GATGCT40,017,51430,184,12875.16%9,938,56024.84%29,974,03174.90%104,9230.26%0.95%22.9822.4733.5845.76%
 NCI-H460MX1385-C9C07ANXX-5-AGCGCT12,712,2479,871,92877.57%2,851,88922.43%9,848,79477.47%11,5640.09%0.32%22.3922.3833.7247.50%
 NCI-H522MX1385-C9C07ANXX-5-CGGCCT37,514,47930,982,04982.29%6,643,64217.71%30,759,90081.99%110,9370.30%0.55%40.7622.5334.0244.85%
OvarianIGR-OV1MX1383-C9C07ANXX-3-AGTCTT28,718,34023,886,77182.95%4,897,72517.05%23,754,57382.72%66,0420.23%0.42%41.2522.4134.1646.10%
 NCI/ADR-RESMX1384-C9C07ANXX-4-ATCAGT22,291,99416,348,36473.20%5,974,63026.80%16,286,40873.06%30,9560.14%0.54%21.9922.3733.4947.02%
 OVCAR-3MX1385-C9C07ANXX-5-AATTAT29,981,15225,036,20883.26%5,017,92616.74%24,890,37183.02%72,8550.24%0.46%38.9422.4134.1647.68%
 OVCAR-4MX1385-C9C07ANXX-5-CCGTAT28,168,17922,957,60881.29%5,271,64318.71%22,835,57581.07%60,9610.22%0.43%38.3122.3934.0247.21%
 OVCAR-5MX1385-C9C07ANXX-5-TAGGAT24,989,66220,898,93383.43%4,141,25616.57%20,797,96183.23%50,4450.20%0.40%37.1322.334.2245.64%
 OVCAR-8MX1385-C9C07ANXX-5-ATAGAT21,340,41117,484,01981.76%3,891,73318.24%17,413,38981.60%35,2890.17%0.35%36.6222.4434.2246.25%
 SK-OV-3MX1386-C9C07ANXX-6-AAAATG35,063,35231,836,63890.46%3,344,8439.54%31,600,66690.12%117,8430.34%0.30%76.2122.4934.5446.04%
ProstateDU-145MX1382-C9C07ANXX-2-CTGATC42,719,25030,543,57571.24%12,284,01828.76%30,327,14170.99%108,0910.25%1.01%21.6822.1732.9342.80%
 PC-3MX1385-C9C07ANXX-5-GCTCAT28,554,85523,674,99182.68%4,944,65917.32%23,545,54482.46%64,6520.23%0.44%38.5422.4234.1546.08%
Renal786-0MX1381-C9C07ANXX-1-CGTGAT28,239,45821,417,62075.65%6,875,22924.35%21,310,92075.47%53,3090.19%0.53%28.7922.3233.2145.18%
 A498MX1381-C9C07ANXX-1-ACATCG23,983,34118,160,24475.56%5,861,37824.44%18,083,72475.40%38,2390.16%0.49%26.5722.2633.1444.19%
 ACHNMX1381-C9C07ANXX-1-TGGTCA26,253,33520,447,81677.70%5,854,22522.30%20,350,49577.52%48,6150.19%0.50%29.3822.4233.444.66%
 CAKI-1MX1381-C9C07ANXX-1-ATTGGC20,365,41615,568,69576.31%4,825,05123.69%15,512,06676.17%28,2990.14%0.38%29.3922.3733.4244.53%
 RXF393MX1386-C9C07ANXX-6-CTTTTG29,863,98826,677,42389.05%3,269,28010.95%26,512,15688.78%82,5520.28%0.34%57.3322.4234.4248.85%
 SN12CMX1387-C9C07ANXX-7-TGTTGG23,377,83315,428,84465.88%7,976,64934.12%15,373,56365.76%27,6210.12%0.67%16.4522.2832.2145.57%
 TK-10MX1387-C9C07ANXX-7-TTCTCG26,774,55617,534,74465.36%9,275,49834.64%17,463,40765.22%35,6510.13%1.16%10.6621.9231.6949.55%
 UO-31MX1388-C9C07ANXX-8-TCTGAG12,737,0558,368,47665.64%4,376,74034.36%8,352,16365.57%8,1520.06%0.34%17.8222.4432.5345.42%
MEAN
23,457,23318,073,05574.34%5,429,37025.66%17,982,75674.18%45,1070.16%0.49%2922.3433.2546.26%
Standard Deviation9,294,2257,957,39911.69%2,519,47811.69%7,888,11211.63%35,8710.08%0.22%170.141.281.64%

Pre-processing and small non-coding RNA species detection

Small-RNA sequencing data was analyzed according to published protocols[17]. In order to extract information for the sncRNA species of interest, unaligned reads (in FASTQ format) were trimmed for adaptors (Cutadapt v1.7.1) and based on sequencing quality (‘trim bases’ from Partek Flow v6.0.17.0614) to reach a Phred quality score ≥20 (Fig. 4a–d). FASTQ files were then aligned using the Spliced Transcripts Alignment to a Reference (STAR v2.4.1d) aligner to the human genome (hg38)[18]. Quantification algorithms (featureCounts v1.4.6 (ref. 19) were applied using chromosomal location annotations for known miRNA (Mirbase v.21 (ref. 20), piRNA (piRNAbank v.2 (ref. 21), snoRNA (Ensembl v.84 (ref. 22), and snRNA (Ensembl v.84 (ref. 22) locations[12]. Detection of novel miRNA is performed using the miRDeep2 algorithm (v2.0.0.5), which considers the relative free energy of miRNAs and their random folding P-values[13]. Chromosomal position of expressed small RNAs was plotted against and hg38 karyotype obtained from UCSC Genome Browser (Fig. 2). According to OASIS sncRNA software recommendation (v2.0), sncRNA species were considered expressed if the total reads across all samples considered summed to ≥5 reads[12]. Data resulting from species quantification can be found in Data Citation 1.
Figure 4

Sequencing and mapping quality.

(a) Phred quality score per sncRNA base position. (b) Genome-wide read depth (column) and genome coverage (line) per sample. (c) Fraction of sequencing reads per Phred score. (d) Percentage of total reads aligned (unique: green, unaligned: red).

Normalization and quantification

Raw reads were scaled/normalized using reads per kilobase exon per million mapped reads (RPKM) method[23], and expression correlation matrices were created using Pearson scores with unsupervised hierarchal clustering performed using one-minus-Pearson correlation scores (Fig. 5). For validation of sncRNA expression, we then correlated miRNA species present in two published microarray cohorts of the NCI-60 cell lines. For the 50 (of the 59) cell lines also present in the Sanger Cell Line Database[24] (http://www.cancerrxgene.org/translation/CellLine), raw reads from each unique sequence were correlated with expression of the sequence previously detected by microarray by rank-normalized Spearman`s correlations (Table 4 (available online only)), and performed a similar analysis against all cell lines present in the cohort described by Sokilde et al.[5].
Figure 5

Data Records

Raw unaligned sequencing reads (in FASTQ file format) are available through the Sequence Read Archive (Data Citation 2). Raw sequencing file names (in FASTQ format) are listed in Table 3 (available online only). A summary of raw sequencing reads for each detected small RNA species are available at through Figshare (Data Citation 1).

Technical Validation

High-throughput sequencing allows for direct in-depth analyses of the human genome, recently revealing a critical role for the expression of the non-coding transcriptome in both genetic and epigenetic regulatory processes.

Sequencing quality control

We examined only high-confidence reads from miRNA sequencing. Samples were sequenced to an average depth of 22.34±0.14 (mean±s.d.; Table 3 (available online only), Fig. 4b). In order to assure only the calling of high-quality sequencing reads, we filtered detected reads to only to include Phred scores ≥20. On average, samples had a Phred score of 33.24±1.28 (Table 3 (available online only), Fig. 4c). Additionally, reads for each sample had an average percent GC content of 46.26±1.6% (Table 3) (available online only). Unsupervised hierarchical clustering and similarity (one-minus-Spearman correlation) of normalized reads revealed relative similarity of sncRNA expression profiles across all cell lines and tissue types analyzed (Fig. 5).

miRNA detection validation

In order to validate the detection of the sncRNA species in these cell lines, we correlated the raw reads per miRNA detected with corresponding miRNA detected by microarray[24,25]. This analysis was performed for the 50 NCI-60 cell lines present in the Sanger Cell Line miRNA Normalized Data from the Broad Institute (http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi; File name: Sanger_miR_data1.pn.cn.matlab2.res). Using Spearman’s Rank-Order correlation, we analyzed the correlation of this RMA-normalized miRNA expression to reads obtained from sequencing this cell line panel. Expression of miRNAs in all lines analyzed correlated significantly between sequencing and microarray analysis (Table 4 (available online only); P-values <0.0001, rmean=0.67). Similarly, we correlated sequencing-detected miRNA expression against a complete NCI-60 microarray cohort described by Sokilde et al.[5]. In this study, profiling was performed on the LNA-enhanced mercury Dx 9.2 microarray platform, and data was log2-normalized after pre-processing (Table 4; P-value range <0.0001–0.0647, rmean=0.28). Microarray data from multiple platforms was compared to sequencing data presented here in order to de-emphasize platform bias and illustrate the need for comprehensive profiling when considering small RNA expression[26].
Table 4

Spearman correlation of sequenced NCI-60 cell lines to published miRNA microarray expression levels

 Broad
Sokilde et al.
Cell liner95% confidence intervalp (two-tailed)r95% confidence intervalp (two-tailed)
786-00.70640.651–0.7543<0.00010.32520.1976–0.4421<0.0001
A598n/an/an/a0.26440.1327–0.3869<0.0001
A5490.68090.6218–0.7324<0.00010.18150.04619–0.31030.0071
ACHN0.69590.6389–0.7453<0.00010.270.1387–0.392<0.0001
BT-549n/an/an/a0.32020.1922–0.4376<0.0001
CAKI-10.70040.6441–0.7492<0.00010.16040.02453–0.29050.0175
CCRF-CEMn/an/an/a0.1251-0.01157–0.25710.0647
COLO-2050.67660.6169–0.7287<0.00010.29740.1677–0.4169<0.0001
DU-1450.7260.6735–0.7711<0.00010.32930.2019–0.4457<0.0001
EKVX0.62480.558–0.6836<0.00010.26340.1317–0.386<0.0001
HCC29980.65120.5879–0.7066<0.00010.33910.2125–0.4545<0.0001
HCT-1160.70190.6458–0.7505<0.00010.33930.2127–0.4547<0.0001
HCT-150.71930.6658–0.7654<0.00010.27040.1391–0.3924<0.0001
HL-600.63580.5704–0.6932<0.00010.25330.1211–0.37670.0002
HOP-620.68990.632–0.7401<0.00010.25660.1245–0.37980.0001
HOP-920.65760.5951–0.7121<0.00010.22880.09537–0.35420.0006
HS-578-T0.62830.5619–0.6867<0.00010.28570.1554–0.4063<0.0001
HT-290.69560.6386–0.7451<0.00010.28750.1572–0.4079<0.0001
IGR-OV10.69860.642–0.7476<0.00010.30650.1774–0.4251<0.0001
K-562n/an/an/a0.26570.1341–0.3881<0.0001
KM12n/an/an/a0.32650.1989–0.4432<0.0001
LOX-IMVI0.66990.6091–0.7228<0.00010.26370.132–0.3862<0.0001
M140.69380.6365–0.7435<0.00010.25850.1265–0.38150.0001
MALME-3M0.58480.513–0.6485<0.00010.30690.1779–0.4256<0.0001
MCF-70.72350.6707–0.769<0.00010.32710.1996–0.4438<0.0001
MDA-MB-2310.71250.658–0.7596<0.00010.18370.04847–0.31230.0064
MDA-MB-4350.70810.653–0.7558<0.00010.29070.1607–0.4109<0.0001
MOLT-40.70340.6476–0.7518<0.00010.28090.1502–0.402<0.0001
NCI/ADR-RESn/an/an/a0.30460.1754–0.4234<0.0001
NCI-H2260.69010.6323–0.7403<0.00010.36130.2365–0.4743<0.0001
NCI-H230.6310.565–0.689<0.00010.26490.1332–0.3873<0.0001
NCI-H322Mn/an/an/a0.36240.2378–0.4754<0.0001
NCI-H4600.63950.5746–0.6964<0.00010.27940.1486–0.4006<0.0001
NCI-H5220.70470.649–0.7529<0.00010.28560.1552–0.4062<0.0001
OVCAR-30.69510.638–0.7446<0.00010.36290.2383–0.4758<0.0001
OVCAR-40.67310.6128–0.7256<0.00010.34220.2158–0.4573<0.0001
OVCAR-50.71010.6552–0.7575<0.00010.29520.1655–0.415<0.0001
OVCAR-80.66130.5994–0.7154<0.00010.34070.2142–0.456<0.0001
PC-30.63730.5721–0.6945<0.00010.28930.1592–0.4096<0.0001
RPMI-82260.67040.6098–0.7233<0.00010.26410.1325–0.3866<0.0001
RXF3930.56560.4915–0.6316<0.00010.30850.1797–0.427<0.0001
SF-2680.67880.6193–0.7305<0.00010.28120.1505–0.4022<0.0001
SF-2950.68310.6243–0.7343<0.00010.29860.1691–0.418<0.0001
SF-5390.66230.6005–0.7162<0.00010.30370.1745–0.4227<0.0001
SK-MEL-20.62940.5632–0.6876<0.00010.28510.1546–0.4057<0.0001
SK-MEL-280.67650.6167–0.7286<0.00010.28670.1564–0.4072<0.0001
SK-MEL-50.60950.5408–0.6702<0.00010.36810.2439–0.4804<0.0001
SK-OV-30.6580.5956–0.7125<0.00010.26130.1295–0.384<0.0001
SN12C0.71820.6645–0.7644<0.00010.3210.193–0.4382<0.0001
SNB-190.61260.5441–0.6729<0.00010.23320.09998–0.35830.0005
SNB-75n/an/an/a0.2940.1642–0.4139<0.0001
SRn/an/an/a0.15990.02397–0.290.0179
SW-6200.71240.6578–0.7595<0.00010.38050.2575–0.4915<0.0001
T-47D0.68270.6238–0.7339<0.00010.39280.2709–0.5023<0.0001
TK-100.71370.6593–0.7606<0.00010.19810.06341–0.32580.0032
U-2510.62420.5573–0.6831<0.00010.28620.1558–0.4067<0.0001
UACC-2570.48730.4049–0.5618<0.00010.23320.09994–0.35830.0005
UACC-620.6760.6161–0.7281<0.00010.30740.1785–0.426<0.0001
UO-31n/an/an/a0.29750.1672–0.4175<0.0001

Additional information

How to cite this article: Marshall, E. A. et al. Small non-coding RNA transcriptome of the NCI-60 cell line panel. Sci. Data 4:170157 doi: 10.1038/sdata.2017.157 (2017). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  26 in total

1.  Developmental transcription factor NFIB is a putative target of oncofetal miRNAs and is associated with tumour aggressiveness in lung adenocarcinoma.

Authors:  Daiana D Becker-Santos; Kelsie L Thu; John C English; Larissa A Pikor; Victor D Martinez; May Zhang; Emily A Vucic; Margaret Ty Luk; Anita Carraro; Jagoda Korbelik; Daniela Piga; Nicolas M Lhomme; Mike J Tsay; John Yee; Calum E MacAulay; Stephen Lam; William W Lockwood; Wendy P Robinson; Igor Jurisica; Wan L Lam
Journal:  J Pathol       Date:  2016-09-19       Impact factor: 7.996

2.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

Review 3.  MicroRNAs in cancer: biomarkers, functions and therapy.

Authors:  Josie Hayes; Pier Paolo Peruzzi; Sean Lawler
Journal:  Trends Mol Med       Date:  2014-07-12       Impact factor: 11.951

4.  Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs.

Authors:  Eric Londin; Phillipe Loher; Aristeidis G Telonis; Kevin Quann; Peter Clark; Yi Jing; Eleftheria Hatzimichael; Yohei Kirino; Shozo Honda; Michelle Lally; Bharat Ramratnam; Clay E S Comstock; Karen E Knudsen; Leonard Gomella; George L Spaeth; Lisa Hark; L Jay Katz; Agnieszka Witkiewicz; Abdolmohamad Rostami; Sergio A Jimenez; Michael A Hollingsworth; Jen Jen Yeh; Chad A Shaw; Steven E McKenzie; Paul Bray; Peter T Nelson; Simona Zupo; Katrien Van Roosbroeck; Michael J Keating; George A Calin; Charles Yeo; Masaya Jimbo; Joseph Cozzitorto; Jonathan R Brody; Kathleen Delgrosso; John S Mattick; Paolo Fortina; Isidore Rigoutsos
Journal:  Proc Natl Acad Sci U S A       Date:  2015-02-23       Impact factor: 11.205

5.  Karyotypic complexity of the NCI-60 drug-screening panel.

Authors:  Anna V Roschke; Giovanni Tonon; Kristen S Gehlhaus; Nicolas McTyre; Kimberly J Bussey; Samir Lababidi; Dominic A Scudiero; John N Weinstein; Ilan R Kirsch
Journal:  Cancer Res       Date:  2003-12-15       Impact factor: 12.701

6.  Oasis: online analysis of small RNA deep sequencing data.

Authors:  Vincenzo Capece; Julio C Garcia Vizcaino; Ramon Vidal; Raza-Ur Rahman; Tonatiuh Pena Centeno; Orr Shomroni; Irantzu Suberviola; Andre Fischer; Stefan Bonn
Journal:  Bioinformatics       Date:  2015-02-19       Impact factor: 6.937

7.  Large-scale profiling of microRNAs for The Cancer Genome Atlas.

Authors:  Andy Chu; Gordon Robertson; Denise Brooks; Andrew J Mungall; Inanc Birol; Robin Coope; Yussanne Ma; Steven Jones; Marco A Marra
Journal:  Nucleic Acids Res       Date:  2015-08-13       Impact factor: 16.971

8.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Authors:  Wanjuan Yang; Jorge Soares; Patricia Greninger; Elena J Edelman; Howard Lightfoot; Simon Forbes; Nidhi Bindal; Dave Beare; James A Smith; I Richard Thompson; Sridhar Ramaswamy; P Andrew Futreal; Daniel A Haber; Michael R Stratton; Cyril Benes; Ultan McDermott; Mathew J Garnett
Journal:  Nucleic Acids Res       Date:  2012-11-23       Impact factor: 16.971

9.  miRBase: annotating high confidence microRNAs using deep sequencing data.

Authors:  Ana Kozomara; Sam Griffiths-Jones
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

10.  The Ensembl gene annotation system.

Authors:  Bronwen L Aken; Sarah Ayling; Daniel Barrell; Laura Clarke; Valery Curwen; Susan Fairley; Julio Fernandez Banet; Konstantinos Billis; Carlos García Girón; Thibaut Hourlier; Kevin Howe; Andreas Kähäri; Felix Kokocinski; Fergal J Martin; Daniel N Murphy; Rishi Nag; Magali Ruffier; Michael Schuster; Y Amy Tang; Jan-Hinnerk Vogel; Simon White; Amonida Zadissa; Paul Flicek; Stephen M J Searle
Journal:  Database (Oxford)       Date:  2016-06-23       Impact factor: 3.451

View more
  7 in total

1.  Meet some code-breakers of noncoding RNAs.

Authors:  Vivien Marx
Journal:  Nat Methods       Date:  2018-01-30       Impact factor: 28.547

2.  Identification of tRNA-derived ncRNAs in TCGA and NCI-60 panel cell lines and development of the public database tRFexplorer.

Authors:  Alessandro La Ferlita; Salvatore Alaimo; Dario Veneziano; Giovanni Nigita; Veronica Balatti; Carlo M Croce; Alfredo Ferro; Alfredo Pulvirenti
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

3.  Penalized co-inertia analysis with applications to -omics data.

Authors:  Eun Jeong Min; Sandra E Safo; Qi Long
Journal:  Bioinformatics       Date:  2019-03-15       Impact factor: 6.937

Review 4.  Genomics and Epigenetics of Malignant Mesothelioma.

Authors:  Adam P Sage; Victor D Martinez; Brenda C Minatel; Michelle E Pewarchuk; Erin A Marshall; Gavin M MacAulay; Roland Hubaux; Dustin D Pearson; Aaron A Goodarzi; Graham Dellaire; Wan L Lam
Journal:  High Throughput       Date:  2018-07-27

5.  Expanding the miRNA Transcriptome of Human Kidney and Renal Cell Carcinoma.

Authors:  Adam P Sage; Brenda C Minatel; Erin A Marshall; Victor D Martinez; Greg L Stewart; Katey S S Enfield; Wan L Lam
Journal:  Int J Genomics       Date:  2018-07-03       Impact factor: 2.326

6.  Beyond sequence homology: Cellular biology limits the potential of XIST to act as a miRNA sponge.

Authors:  Erin A Marshall; Greg L Stewart; Adam P Sage; Wan L Lam; Carolyn J Brown
Journal:  PLoS One       Date:  2019-08-16       Impact factor: 3.240

7.  Human spliceosomal snRNA sequence variants generate variant spliceosomes.

Authors:  Justin W Mabin; Peter W Lewis; David A Brow; Heidi Dvinge
Journal:  RNA       Date:  2021-07-07       Impact factor: 5.636

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.