Literature DB >> 18000217

Challenges in projecting clustering results across gene expression-profiling datasets.

Lara Lusa1, Lisa M McShane, James F Reid, Loris De Cecco, Federico Ambrogi, Elia Biganzoli, Manuela Gariboldi, Marco A Pierotti.   

Abstract

BACKGROUND: Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes.
METHODS: Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor-positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i.e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER- samples were systematically varied.
RESULTS: When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered.
CONCLUSIONS: Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18000217     DOI: 10.1093/jnci/djm216

Source DB:  PubMed          Journal:  J Natl Cancer Inst        ISSN: 0027-8874            Impact factor:   13.506


  41 in total

Review 1.  The molecular pathology of breast cancer progression.

Authors:  Alessandro Bombonati; Dennis C Sgroi
Journal:  J Pathol       Date:  2010-11-16       Impact factor: 7.996

2.  Genomic architecture characterizes tumor progression paths and fate in breast cancer patients.

Authors:  Hege G Russnes; Hans Kristian Moen Vollan; Ole Christian Lingjærde; Alexander Krasnitz; Pär Lundin; Bjørn Naume; Therese Sørlie; Elin Borgen; Inga H Rye; Anita Langerød; Suet-Feung Chin; Andrew E Teschendorff; Philip J Stephens; Susanne Månér; Ellen Schlichting; Lars O Baumbusch; Rolf Kåresen; Michael P Stratton; Michael Wigler; Carlos Caldas; Anders Zetterberg; James Hicks; Anne-Lise Børresen-Dale
Journal:  Sci Transl Med       Date:  2010-06-30       Impact factor: 17.956

3.  Test set bias affects reproducibility of gene signatures.

Authors:  Prasad Patil; Pierre-Olivier Bachant-Winner; Benjamin Haibe-Kains; Jeffrey T Leek
Journal:  Bioinformatics       Date:  2015-03-18       Impact factor: 6.937

Review 4.  Multidimensionality of microarrays: statistical challenges and (im)possible solutions.

Authors:  Stefan Michiels; Andrew Kramar; Serge Koscielny
Journal:  Mol Oncol       Date:  2011-02-03       Impact factor: 6.603

5.  A personalized committee classification approach to improving prediction of breast cancer metastasis.

Authors:  Md Jamiul Jahid; Tim H Huang; Jianhua Ruan
Journal:  Bioinformatics       Date:  2014-03-10       Impact factor: 6.937

Review 6.  How many etiological subtypes of breast cancer: two, three, four, or more?

Authors:  William F Anderson; Philip S Rosenberg; Aleix Prat; Charles M Perou; Mark E Sherman
Journal:  J Natl Cancer Inst       Date:  2014-08-12       Impact factor: 13.506

Review 7.  Multiscale integration of -omic, imaging, and clinical data in biomedical informatics.

Authors:  John H Phan; Chang F Quo; Chihwen Cheng; May Dongmei Wang
Journal:  IEEE Rev Biomed Eng       Date:  2012

8.  Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer.

Authors:  Larissa A Korde; Lara Lusa; Lisa McShane; Peter F Lebowitz; LuAnne Lukes; Kevin Camphausen; Joel S Parker; Sandra M Swain; Kent Hunter; Jo Anne Zujewski
Journal:  Breast Cancer Res Treat       Date:  2010-02       Impact factor: 4.872

Review 9.  Breast cancer classification and prognostication through diverse systems along with recent emerging findings in this respect; the dawn of new perspectives in the clinical applications.

Authors:  Vida Pourteimoor; Samira Mohammadi-Yeganeh; Mahdi Paryan
Journal:  Tumour Biol       Date:  2016-09-20

Review 10.  Genetic prognostic and predictive markers in colorectal cancer.

Authors:  Axel Walther; Elaine Johnstone; Charles Swanton; Rachel Midgley; Ian Tomlinson; David Kerr
Journal:  Nat Rev Cancer       Date:  2009-06-18       Impact factor: 60.716

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.