Literature DB >> 29307142

Classification of Genes Based on Age-Related Differential Expression in Breast Cancer.

Abstract

Transcriptome analysis has been widely used to make biomarker panels to diagnose cancers. In breast cancer, the age of the patient has been known to be associated with clinical features. As clinical transcriptome data have accumulated significantly, we classified all human genes based on age-specific differential expression between normal and breast cancer cells using public data. We retrieved the values for gene expression levels in breast cancer and matched normal cells from The Cancer Genome Atlas. We divided genes into two classes by paired t test without considering age in the first classification. We carried out a secondary classification of genes for each class into eight groups, based on the patterns of the p-values, which were calculated for each of the three age groups we defined. Through this two-step classification, gene expression was eventually grouped into 16 classes. We showed that this classification method could be applied to establish a more accurate prediction model to diagnose breast cancer by comparing the performance of prediction models with different combinations of genes. We expect that our scheme of classification could be used for other types of cancer data.

Entities: CellLine Chemical Disease Gene Species

Keywords: biomarkers; breast cancer; differentially expressed genes; gene classification

Year: 2017 PMID： 29307142 PMCID： PMC5769863 DOI： 10.5808/GI.2017.15.4.156

Source DB: PubMed Journal: Genomics Inform ISSN： 1598-866X

Introduction

Breast cancer is known to one of the leading causes of cancer death among females [1]. A massive number of research studies on the genomic characterization of breast cancer, particularly the discovery of differentially expressed genes (DEGs), have revealed clinically relevant molecular subtypes [2], which has increased the accuracy of the prognosis [3-5] and has resulted in successful targeted therapy [6, 7]. During recent decades, resources based on high-throughput sequencing technologies, such as The Cancer Genome Atlas (TCGA) [8] and International Cancer Genome Consortium (ICGC) [9], have facilitated more accurate detection of DEGs and cancer driver genes. The identification of DEGs is prominent, in that it leads to more accurate subtyping and more precise treatment for various types of cancers. In transcriptome analysis based on microarray [10] or RNA sequencing [11] by next-generation sequencing, DEGs are usually identified by statistical tests, such as t test, nonparametric test, and Bayesian models [12]. Subsequent analysis of pathways and functional enrichment tests for DEGs are performed to increase the understanding of molecular mechanisms [13]. In the case of breast cancer, it is well known that molecular subtype and patient age are strongly associated with clinical features, such as survival rate. Fredholm et al. [14] reported that the 5-year survival rate was lowest in the of 25–34-year-old age group and decreased with increasing age. Likewise, Gnerlich et al. [15] reported that younger women were more likely to die from breast cancer than older ones, based on the statistics of 243,012 breast cancer patients. Recently, Azim et al. [16] studied genomic aberrations in young and elderly breast cancer patients based on TCGA data. They found that older patients had more somatic mutations and copy number variations (CNVs) and that 11 mutations and two CNVs were independently associated with age at diagnosis. In this work, we aimed to classify human genes based on age-specific differential expression between normal and breast cancer cells. DEGs were identified based on their p-values for differential gene expression between tumor and matched normal cells. DEGs and non-DEGs were then classified based on age-specific differential expression by the three age groups we defined. To show an application of the classification, we compared the accuracy of prediction models that distinguish normal and tumor cells, constructed by support vector machine (SVM) using various combinations of genes by class. The performance of SVM was measured by the average area under the receiver operating characteristics curve value after 1,000 times bootstrap.

Methods

All gene expression values in this work were gathered from TCGA. Eligible patients had complete clinical data and a gene expression dataset of breast cancer cells and matched normal cells. Eventually, we retrieved the gene expression values of tumor and matched normal cells of 96 patients. The distribution of their ages is shown in Fig. 1. The values for gene expression level that we used were generated by the Illumina Hi-Seq platform (Illumina, Inc., San Diego, CA, USA) and normalized by root square error methods [17]. All subsequent statistical analyses were carried out using R, version 3.2.3. SVM classifiers were constructed using the e1071 package of R, and a simple linear kernel was used. Functional enrichment analysis of gene classes was performed in ToppGene Suite [18].

Fig. 1

Distribution of age of breast cancer patients analyzed in this work.

Young patients were defined as ≤45 years of age, and elderly patients were defined as those ≥60 years of age (Fig. 1). The rest of the patients were defined as “intermediate.” The statistical significance of differential expression between tumor and matched normal cells was determined by paired t test, based on a p-value threshold of 8.48 × 10−7. The threshold value was set based on the Bonferroni correction, because a test was performed for each age group, and three age groups were defined for each of the 19,646 genes.

Results and Discussion

The overall scheme of our classification is depicted in Fig. 2. We first divided genes into two classes (A and B) by paired t test without considering age. A total of 5,962 genes in class A were defined as significant DEGs in breast cancer, and 13,684 in class B were nonsignificant. Ones who want to find biomarkers or driver genes are likely to investigate only genes in class A. However, we classified the genes of each class once again into eight groups, based on the pattern of p-values, which were calculated separately for every age group (secondary classification in Fig. 2). After a second round of classification, the genes were eventually divided into 16 classes (A1–B8) (Supplementary Table 1). The numbers of genes of the classes are shown in Table 1.

Fig. 2

Schematic view of overall procedures of the two-step classification.

Table 1

Definition of classification of genes based on age-specific significance and the corresponding number of genes

Secondary class	Young	Intermediate	Older	Primary class
Secondary class	Young	Intermediate	Older	A	B
1	Significant	Significant	Significant	377	0
2	Nonsignificant	Nonsignificant	Nonsignificant	4,495	13,548
3	Nonsignificant	Nonsignificant	Significant	781	69
4	Nonsignificant	Significant	Nonsignificant	56	29
5	Significant	Nonsignificant	Nonsignificant	44	18
6	Significant	Significant	Nonsignificant	28	9
7	Significant	Nonsignificant	Nonsignificant	71	8
8	Nonsignificant	Significant	Significant	110	3

Significances were defined based on the p-value of paired t test.

It was easily observed that there was no gene classified as class B1. Probably, this was because the significance of genes in class B was already tested in the primary classification step, and no genes showed significance over all age groups. The 377 genes in class A1 exhibited differential expression for every age group and all samples. These genes are the most powerful DEGs between normal and breast cancer cells. Genes of class 2, which did not have significantly different expression in any age group, accounted for the majority in both classes A and B. Indeed, classes 1 and 2 did not have any age-specific significance. Thus, we focused on the genes of classes 3–8, which showed age-specific differential expression (Fig. 3).

Fig. 3

Patterns of significance of age-specific differential expression of class A (A) and class B (B).

Functional enrichment analysis for each class was performed, but we could not find any relevant or intriguing biological implications or pathways related to breast cancer. Hence, we decided to provide the results of the analysis as raw data (Supplementary Table 2) rather than trying an unfeasible deduction. To show an example of how the classification can be applied, we constructed prediction models that aimed to distinguish normal and breast cancer cells, using the expression values of various combinations of genes. We defined two types of combinations comprising gene lists. For type I combinations, genes were chosen evenly from each class. By contrast, genes were chosen randomly for type II combinations without considering gene class. For example, if we make a list composed of three genes from classes 3, 4, and 5, the type I combination should consist of a gene from class 3, another from class 4, and the other one from class 5, but a type II combination could be composed of any genes randomly chosen from the pool of the three classes. We compared the accuracies of the SVM of type I and II combinations for three subsets each for classes A and B (Table 2). The performance of type I was significantly better than that of type II, except in two cases (classes 6–8 in both classes A and B), the genes of which were significantly differentially expressed in the two age groups.

Table 2

Comparison of performance of SVM models of different combinations of genes to distinguish breast cancer and normal cells

Input genes (N)	Sampling pool	Type I (one gene per class)	Type II (n random genes from the pool)	p-value
3	Classes A3–A5	0.9351	0.9215	4.732e-16
3	Classes A6–A8	0.9660	0.9662	0.8391
6	Classes A3–A8	0.9842	0.9743	2.2e-16
3	Classes B3–B5	0.8701	0.8522	1.441e-15
3	Classes B6–B8	0.9269	0.9392	0.4868
6	Classes B3–B8	0.9386	0.9227	6.984e-09

N input genes were sampled from each sampling pool by adopting two types of combinations (types I and II).

SVM, support vector machine.

These results highlight the value of gene classification based on our method. Based on the first classification, 13,684 genes in class B were probably considered genes that cannot distinguish normal and breast cancer cells. However, by adapting one more classification step based on age-specific differential expression, we identified 171 age-specific DEGs in classes B3–B8. Despite the underestimated value of the classes, we showed that a balanced selection of genes from these classes could be applied as biomarkers, identifying breast cancer from normal cells. For example, genes that are known to be high-penetrance breast cancer susceptibility genes, such as TP53, STK11, and CDH11, and moderate-penetrance genes, such as RAD50, RAD51C, RAD51D, NBS1, and FANCM, were classified in class B2 [19]. In addition to proposing the possibility of genes in class B as biomarkers, even for class A, we exhibited that our method of selecting biomarker genes based on secondary classification could be useful in making a combination of biomarker genes for a more accurate prediction by using different classes complementarily. In summary, we retrieved a gene expression dataset of breast cancer and matched normal cells from TCGA and then classified the genes into 16 classes by two-step classification. This classification could be applied to generate a more accurate prediction model for identifying cancer. Furthermore, we expect that our scheme of classification could be used for other types of cancer data.

19 in total

1. Global functional profiling of gene expression.

Authors: Sorin Draghici; Purvesh Khatri; Rui P Martins; G Charles Ostermeier; Stephen A Krawetz
Journal: Genomics Date: 2003-02 Impact factor: 5.736

2. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies.

Authors: Brian D Lehmann; Joshua A Bauer; Xi Chen; Melinda E Sanders; A Bapsi Chakravarthy; Yu Shyr; Jennifer A Pietenpol
Journal: J Clin Invest Date: 2011-07 Impact factor: 14.808

3. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing.

Authors: Ryan Morin; Matthew Bainbridge; Anthony Fejes; Martin Hirst; Martin Krzywinski; Trevor Pugh; Helen McDonald; Richard Varhol; Steven Jones; Marco Marra
Journal: Biotechniques Date: 2008-07 Impact factor: 1.993

Review 4. Exploring the new world of the genome with DNA microarrays.

Authors: P O Brown; D Botstein
Journal: Nat Genet Date: 1999-01 Impact factor: 38.330

5. The triple negative paradox: primary tumor chemosensitivity of breast cancer subtypes.

Authors: Lisa A Carey; E Claire Dees; Lynda Sawyer; Lisa Gatti; Dominic T Moore; Frances Collichio; David W Ollila; Carolyn I Sartor; Mark L Graham; Charles M Perou
Journal: Clin Cancer Res Date: 2007-04-15 Impact factor: 12.531

6. Molecular portraits of human breast tumours.

Authors: C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal: Nature Date: 2000-08-17 Impact factor: 49.962

Review 7. Beyond BRCA: new hereditary breast cancer susceptibility genes.

Authors: P Economopoulou; G Dimitriadis; A Psyrri
Journal: Cancer Treat Rev Date: 2014-11-06 Impact factor: 12.111

8. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

9. Supervised risk predictor of breast cancer based on intrinsic subtypes.

Authors: Joel S Parker; Michael Mullins; Maggie C U Cheang; Samuel Leung; David Voduc; Tammi Vickery; Sherri Davies; Christiane Fauron; Xiaping He; Zhiyuan Hu; John F Quackenbush; Inge J Stijleman; Juan Palazzo; J S Marron; Andrew B Nobel; Elaine Mardis; Torsten O Nielsen; Matthew J Ellis; Charles M Perou; Philip S Bernard
Journal: J Clin Oncol Date: 2009-02-09 Impact factor: 44.544

10. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.

Authors: Jing Chen; Eric E Bardes; Bruce J Aronow; Anil G Jegga
Journal: Nucleic Acids Res Date: 2009-05-22 Impact factor: 16.971

5 in total

1. Investigation of the effects of downregulation of jumping translocation breakpoint (JTB) protein expression in MCF7 cells for potential use as a biomarker in breast cancer.

Authors: Madhuri Jayathirtha; Anca-Narcisa Neagu; Danielle Whitham; Shelby Alwine; Costel C Darie
Journal: Am J Cancer Res Date: 2022-09-15 Impact factor: 5.942

2. MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies.

Authors: Arjun Bhattacharya; Yun Li; Michael I Love
Journal: PLoS Genet Date: 2021-03-08 Impact factor: 5.917

3. Deep proteome profiling of human mammary epithelia at lineage and age resolution.

Authors: Stefan Hinz; Antigoni Manousopoulou; Masaru Miyano; Rosalyn W Sayaman; Kristina Y Aguilera; Michael E Todhunter; Jennifer C Lopez; Lydia L Sohn; Leo D Wang; Mark A LaBarge
Journal: iScience Date: 2021-08-23

Review 4. Multimerin-1 and cancer: a review.

Authors: Mareike G Posner
Journal: Biosci Rep Date: 2022-02-25 Impact factor: 3.840

5. An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.

Authors: Ying Zhang; Qingchun Deng; Wenbin Liang; Xianchun Zou
Journal: Biomed Res Int Date: 2018-08-30 Impact factor: 3.411

5 in total