Literature DB >> 26240788

Molecular subtyping of leiomyosarcoma with 3' end RNA sequencing.

Xiangqian Guo1, Erna Forgó2, Matt van de Rijn2.   

Abstract

Leiomyosarcoma (LMS) is a malignant neoplasm with smooth muscle differentiation. Little is known about its molecular heterogeneity and no targeted therapy currently exists for LMS. We performed expression profiling on 99 cases of LMS with 3'end RNA sequencing (3SEQ) and demonstrated the existence of 3 molecular subtypes in this cohort. We consequently showed that these molecular subtypes are reproducible using an independent cohort of 82 LMS cases from TCGA. Two new formalin-fixed, paraffin-embedded (FFPE) tissue-compatible diagnostic immunohistochemical markers were identified for two of the three subtypes: LMOD1 for subtype I LMS and ARL4C for subtype II LMS. Subtype I and subtype II LMS were associated with good and poor prognosis, respectively. Here, we describe the details of LMS diagnosis, RNA isolation, 3SEQ library construction, 3SEQ sequencing data analysis and molecular subtype determination. The 3SEQ data produced in this study was deposited into Gene Expression Omnibus (GEO) under GSE45510.

Entities:  

Keywords:  3’ end RNA sequencing; Leiomyosarcoma; expression profiling; subtypes

Year:  2015        PMID: 26240788      PMCID: PMC4521214          DOI: 10.1016/j.gdata.2015.06.029

Source DB:  PubMed          Journal:  Genom Data        ISSN: 2213-5960


Direct link to deposited data

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE45510.

Experimental design, materials and methods

Experimental design

To explore the molecular subtypes of leiomyosarcoma (LMS), paraffin blocks of 99 LMS cases from 1991 to 2012 from nine hospitals (Stanford Hospital, Brigham and Women's Hospital, McKay-Dee Hospital Center, St. Luke's Hospital, Baptist Health Medical Center, Ingalls Hospital, Vancouver General Hospital, Hospital de la Santa Creu i Sant Pau and Alta Bates Summit Medical Center), were collected with IRB approval and a waiver of consent due to the archival nature of the specimens. The total RNA was extracted for these cases and subsequently analyzed by 3′ end RNA sequencing. Consensus Clustering was used to determine the optimal number of subtypes and Silhouette analysis was then performed to measure the confidence of subtype assignment per case. To test the reproducibility of molecular subtype classification from the 3SEQ dataset, the expression profiles (RNASeq data) of 82 additional LMS cases were downloaded from The Cancer Genome Atlas (TCGA) database and analyzed in an identical way to the 3SEQ data. Subclass mapping was then used to find the common subtypes identified in both datasets [1].

Materials

The LMS cases used in this study were formalin-fixed, paraffin-embedded (FFPE) tissues.

3SEQ library construction

After LMS FFPE blocks were obtained, two experienced pathologists (one from Stanford University and one from Brigham and Women's Hospital) assessed and circled the regions comprised of LMS tumor cells. Samples with paucity of material or poor preservation of material were excluded. Multiple 2 mm-diameter cores from the areas circled were re-embedded longitudinally into new paraffin blocks, and were sectioned and re-evaluated for the second time by H&E staining to ensure the purity of the samples. Only cores with ≥ 90% of tumor cells were processed for subsequent RNA extraction. The total RNA of the LMS cases was extracted using RecoverAll™ Total Nucleic Acid Isolation kit (Ambion, Cat # 1975). The quality of total RNA was assessed by agarose gel electrophoresis and used to determine the amount of time necessary for shearing of the total RNA by heat in first strand buffer (Invitrogen, Cat # 18080-044) for subsequent 3SEQ library construction. The 3SEQ library construction included the following steps; first strand cDNA synthesis with Superscript III Reverse Transcriptase (Invitrogen, Cat # 18080-044), second strand cDNA synthesis with E coli DNA ligase (Invitrogen, Cat # 18052-019) and E. coli DNA polymerase I (New England Biolab, Cat # M0209L), followed by the addition of ‘A’ to the 3′ end of double strand DNA fragments with Klenow exo (3′ to 5′ exo minus, New England Biolab, Cat # M0212L), ligation of adapters (Illumina, Cat # 1001782 —OLIGO MIX) with DNA ligase (New England Biolab, Cat # M2200L) and PCR amplification using 2 × Phusion PCR master mix (New England Biolab, Cat # F-531S) [1], [2], [3], [4], [5], [6]. The detailed protocol of 3SEQ library construction can be accessed using the following link, http://med.stanford.edu/labs/vanderijn-west/documents/3endRNAseqlibraryconstruction_update_11_4_2014.doc. The 3SEQ libraries were sent to the Stanford Center for Genomics and Personalized Medicine to be sequenced directionally (36 bp) from 5′ end of mRNA fragments towards their poly(A) ends using Illumina GA IIx and HiSeq 2000 machines (Illumina, Inc., San Diego, CA, USA). The gene expression profiling data (3SEQ) have been deposited in the Gene Expression Omnibus (GEO) and are publicly accessible through GSE45510.

3SEQ data analysis

Sequence reads (fastq format), first filtered for read quality, were re-filtered by fastx (fastx_artifacts_filter-v-Q 33, http://hannonlab.cshl.edu/fastx_toolkit/index.html), and mapped to the transcriptome (refMrna, downloaded from the UCSC genome browser, http://www.genome.ucsc.edu/) using SOAP2, allowing at most two mismatches [7]. The total numbers of sequence reads for each gene symbol from the transcriptome mapping were determined and used to create the gene-expression profile matrix (22,144 genes). Read counts from each library were normalized to transcripts per million reads (TPM). A custom Perl script was used to run the 3SEQ data processing and is publicly available [3].

Subtype determinant and validation

To determine the optimal number of molecular subtypes of leiomyosarcoma [1], the expression matrix of genes with the most variant expression levels, filtered with a standard deviation greater than 100 across all 99 LMS cases (1300 genes), were transformed by log2 and gene-based centering [8]. Consensus Clustering (R package ConsensusClusteringPlus) [9] was performed. This analysis was run over 1000 iterations with the settings of “Distance − (1 − Pearson correlation), 80% sample resampling, 80% gene resampling, maximum evaluated k of 12, and agglomerative hierarchical clustering algorithm”. Based on the analysis we chose the optimal number of subtypes as three. Expression profiling data of the 82 additional LMS cases by RNASeq was downloaded from the TCGA database. To compare with 3SEQ data, the TCGA data were normalized into TPM and analyzed with ConsensusClusteringPlus, as was done for the 3SEQ data. To measure the reproducibility of the LMS subtypes, the cases from both datasets (3SEQ and TCGA RNASeq) were evaluated using Silhouette analysis [10], where an LMS case was defined as a “core case” upon assignment of a positive Silhouette value. Subclass mapping was performed to determine the common LMS subtypes based on these “core cases” identified in the 3SEQ and TCGA RNASeq datasets. In order to discover subtype-specific genes, SAMSeq [11] was performed on both datasets (3SEQ and TCGA RNASeq) between each subtype and all other subtypes with a FDR of 0.05, and significantly differentially expressed genes from the SAMSeq analysis was referred to identify the diagnostic biomarker for each LMS subtype.
Specifications
Organism/cell line/tissueLeiomyosarcoma, FFPE tissues, human
SexMale or female
Sequencer or array type3′ end RNA sequencing
Data formatTPM (transcripts per million reads) normalized matrix
Experimental factorsArchival FFPE blocks for 99 cases of leiomyosarcoma
Experimental featuresTotal RNA isolation, oligo(dT) selection and gene expression profiling of 99 leiomyosarcomas
ConsentIRB approval and a waiver of consent due to the archival nature of the specimens
Sample source locationNine hospitals from United States, Canada and Europe (see Experimental design)
  10 in total

1.  Open source clustering software.

Authors:  M J L de Hoon; S Imoto; J Nolan; S Miyano
Journal:  Bioinformatics       Date:  2004-02-10       Impact factor: 6.937

2.  SOAP2: an improved ultrafast tool for short read alignment.

Authors:  Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

3.  Clinically Relevant Molecular Subtypes in Leiomyosarcoma.

Authors:  Xiangqian Guo; Vickie Y Jo; Anne M Mills; Shirley X Zhu; Cheng-Han Lee; Inigo Espinosa; Marisa R Nucci; Sushama Varma; Erna Forgó; Trevor Hastie; Sharon Anderson; Kristen Ganjoo; Andrew H Beck; Robert B West; Christopher D Fletcher; Matt van de Rijn
Journal:  Clin Cancer Res       Date:  2015-04-20       Impact factor: 12.531

4.  Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.

Authors:  Jun Li; Robert Tibshirani
Journal:  Stat Methods Med Res       Date:  2011-11-28       Impact factor: 3.021

5.  14-3-3 fusion oncogenes in high-grade endometrial stromal sarcoma.

Authors:  Cheng-Han Lee; Wen-Bin Ou; Adrian Mariño-Enriquez; Meijun Zhu; Mark Mayeda; Yuexiang Wang; Xiangqian Guo; Alayne L Brunner; Frédéric Amant; Christopher A French; Robert B West; Jessica N McAlpine; C Blake Gilks; Michael B Yaffe; Leah M Prentice; Andrew McPherson; Steven J M Jones; Marco A Marra; Sohrab P Shah; Matt van de Rijn; David G Huntsman; Paola Dal Cin; Maria Debiec-Rychter; Marisa R Nucci; Jonathan A Fletcher
Journal:  Proc Natl Acad Sci U S A       Date:  2012-01-05       Impact factor: 11.205

6.  ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking.

Authors:  Matthew D Wilkerson; D Neil Hayes
Journal:  Bioinformatics       Date:  2010-04-28       Impact factor: 6.937

7.  Cyclin D1 as a diagnostic immunomarker for endometrial stromal sarcoma with YWHAE-FAM22 rearrangement.

Authors:  Cheng-Han Lee; Rola H Ali; Marjan Rouzbahman; Adrian Marino-Enriquez; Meijun Zhu; Xiangqian Guo; Alayne L Brunner; Sarah Chiang; Samuel Leung; Nataliya Nelnyk; David G Huntsman; C Blake Gilks; Torsten O Nielsen; Paola Dal Cin; Matt van de Rijn; Esther Oliva; Jonathan A Fletcher; Marisa R Nucci
Journal:  Am J Surg Pathol       Date:  2012-10       Impact factor: 6.394

8.  Next generation sequencing-based expression profiling identifies signatures from benign stromal proliferations that define stromal components of breast cancer.

Authors:  Xiangqian Guo; Shirley X Zhu; Alayne L Brunner; Matt van de Rijn; Robert B West
Journal:  Breast Cancer Res       Date:  2013-12-17       Impact factor: 6.466

9.  A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions.

Authors:  Alayne L Brunner; Jun Li; Xiangqian Guo; Robert T Sweeney; Sushama Varma; Shirley X Zhu; Rui Li; Robert Tibshirani; Robert B West
Journal:  Genome Biol       Date:  2014-05-23       Impact factor: 13.583

10.  Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers.

Authors:  Alayne L Brunner; Andrew H Beck; Badreddin Edris; Robert T Sweeney; Shirley X Zhu; Rui Li; Kelli Montgomery; Sushama Varma; Thea Gilks; Xiangqian Guo; Joseph W Foley; Daniela M Witten; Craig P Giacomini; Ryan A Flynn; Jonathan R Pollack; Robert Tibshirani; Howard Y Chang; Matt van de Rijn; Robert B West
Journal:  Genome Biol       Date:  2012-08-28       Impact factor: 13.583

  10 in total
  2 in total

1.  Distinct esophageal adenocarcinoma molecular subtype has subtype-specific gene expression and mutation patterns.

Authors:  Xiangqian Guo; Yitai Tang; Wan Zhu
Journal:  BMC Genomics       Date:  2018-10-24       Impact factor: 3.969

2.  Epigenetic signatures differentiate uterine and soft tissue leiomyosarcoma.

Authors:  Nesrin M Hasan; Anup Sharma; Nensi M Ruzgar; Hari Deshpande; Kelly Olino; Sajid Khan; Nita Ahuja
Journal:  Oncotarget       Date:  2021-08-03
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.