Literature DB >> 27381624

The Doppelgänger Effect: Hidden Duplicates in Databases of Transcriptome Profiles.

Levi Waldron1, Markus Riester1, Marcel Ramos1, Giovanni Parmigiani1, Michael Birrer1.   

Abstract

Whole-genome analysis of cancer specimens is commonplace, and investigators frequently share or re-use specimens in later studies. Duplicate expression profiles in public databases will impact re-analysis if left undetected, a so-called "doppelgänger" effect. We propose a method that should be routine practice to accurately match duplicate cancer transcriptomes when nucleotide-level sequence data are unavailable, even for samples profiled by different microarray technologies or by both microarray and RNA sequencing. We demonstrate the effectiveness of the method in databases containing dozens of datasets and thousands of ovarian, breast, bladder, and colorectal cancer microarray profiles and of matching microarray and RNA sequencing expression profiles from The Cancer Genome Atlas (TCGA). We identified probable duplicates among more than 50% of studies, originating in different continents, using different technologies, published years apart, and even within the TCGA itself. Finally, we provide the doppelgangR Bioconductor package for screening transcriptome databases for duplicates. Given the potential for unrecognized duplication to falsely inflate prediction accuracy and confidence in differential expression, doppelgänger-checking should be a part of standard procedure for combining multiple genomic datasets.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2016        PMID: 27381624      PMCID: PMC5241903          DOI: 10.1093/jnci/djw146

Source DB:  PubMed          Journal:  J Natl Cancer Inst        ISSN: 0027-8874            Impact factor:   13.506


  13 in total

1.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

2.  High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway.

Authors:  Kosuke Yoshihara; Tatsuhiko Tsunoda; Daichi Shigemizu; Hiroyuki Fujiwara; Masayuki Hatae; Hisaya Fujiwara; Hideaki Masuzaki; Hidetaka Katabuchi; Yosuke Kawakami; Aikou Okamoto; Takayoshi Nogawa; Noriomi Matsumura; Yasuhiro Udagawa; Tsuyoshi Saito; Hiroaki Itamochi; Masashi Takano; Etsuko Miyagi; Tamotsu Sudo; Kimio Ushijima; Haruko Iwase; Hiroyuki Seki; Yasuhisa Terao; Takayuki Enomoto; Mikio Mikami; Kohei Akazawa; Hitoshi Tsuda; Takuya Moriya; Atsushi Tajima; Ituro Inoue; Kenichi Tanaka
Journal:  Clin Cancer Res       Date:  2012-01-12       Impact factor: 12.531

3.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.

Authors:  Lance D Miller; Johanna Smeds; Joshy George; Vinsensius B Vega; Liza Vergara; Alexander Ploner; Yudi Pawitan; Per Hall; Sigrid Klaar; Edison T Liu; Jonas Bergh
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-02       Impact factor: 11.205

4.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma.

Authors:  Houtan Noushmehr; Daniel J Weisenberger; Kristin Diefes; Heidi S Phillips; Kanan Pujara; Benjamin P Berman; Fei Pan; Christopher E Pelloski; Erik P Sulman; Krishna P Bhat; Roel G W Verhaak; Katherine A Hoadley; D Neil Hayes; Charles M Perou; Heather K Schmidt; Li Ding; Richard K Wilson; David Van Den Berg; Hui Shen; Henrik Bengtsson; Pierre Neuvial; Leslie M Cope; Jonathan Buckley; James G Herman; Stephen B Baylin; Peter W Laird; Kenneth Aldape
Journal:  Cancer Cell       Date:  2010-04-15       Impact factor: 31.743

5.  Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets.

Authors:  Kosuke Yoshihara; Atsushi Tajima; Tetsuro Yahata; Shoji Kodama; Hiroyuki Fujiwara; Mitsuaki Suzuki; Yoshitaka Onishi; Masayuki Hatae; Kazunobu Sueyoshi; Hisaya Fujiwara; Yoshiki Kudo; Kohei Kotera; Hideaki Masuzaki; Hironori Tashiro; Hidetaka Katabuchi; Ituro Inoue; Kenichi Tanaka
Journal:  PLoS One       Date:  2010-03-12       Impact factor: 3.240

6.  A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer.

Authors:  Tomas Bonome; Douglas A Levine; Joanna Shih; Mike Randonovich; Cindy A Pise-Masison; Faina Bogomolniy; Laurent Ozbun; John Brady; J Carl Barrett; Jeff Boyd; Michael J Birrer
Journal:  Cancer Res       Date:  2008-07-01       Impact factor: 12.701

7.  Integrated genomic analyses of ovarian carcinoma.

Authors: 
Journal:  Nature       Date:  2011-06-29       Impact factor: 49.962

8.  DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis.

Authors:  Quanhu Sheng; Yu Shyr; Xi Chen
Journal:  BMC Bioinformatics       Date:  2014-09-30       Impact factor: 3.169

9.  curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome.

Authors:  Benjamin Frederick Ganzfried; Markus Riester; Benjamin Haibe-Kains; Thomas Risch; Svitlana Tyekucheva; Ina Jazic; Xin Victoria Wang; Mahnaz Ahmadifar; Michael J Birrer; Giovanni Parmigiani; Curtis Huttenhower; Levi Waldron
Journal:  Database (Oxford)       Date:  2013-04-02       Impact factor: 3.451

10.  Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors:  Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal:  Nature       Date:  2013-06-16       Impact factor: 49.962

View more
  7 in total

1.  Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma.

Authors:  Gregory M Chen; Lavanya Kannan; Ludwig Geistlinger; Victor Kofia; Zhaleh Safikhani; Deena M A Gendoo; Giovanni Parmigiani; Michael Birrer; Benjamin Haibe-Kains; Levi Waldron
Journal:  Clin Cancer Res       Date:  2018-07-03       Impact factor: 12.531

2.  BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology.

Authors:  Kleanthi Lakiotaki; Nikolaos Vorniotakis; Michail Tsagris; Georgios Georgakopoulos; Ioannis Tsamardinos
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

3.  Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.

Authors:  Lilah Toker; Min Feng; Paul Pavlidis
Journal:  F1000Res       Date:  2016-08-30

4.  Curated compendium of human transcriptional biomarker data.

Authors:  Nathan P Golightly; Avery Bell; Anna I Bischoff; Parker D Hollingsworth; Stephen R Piccolo
Journal:  Sci Data       Date:  2018-04-17       Impact factor: 6.444

5.  Doppelgänger spotting in biomedical gene expression data.

Authors:  Li Rong Wang; Xin Yun Choy; Wilson Wen Bin Goh
Journal:  iScience       Date:  2022-07-19

6.  Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis.

Authors:  Siyuan Ma; Shuji Ogino; Princy Parsana; Reiko Nishihara; Zhirong Qian; Jeanne Shen; Kosuke Mima; Yohei Masugi; Yin Cao; Jonathan A Nowak; Kaori Shima; Yujin Hoshida; Edward L Giovannucci; Manish K Gala; Andrew T Chan; Charles S Fuchs; Giovanni Parmigiani; Curtis Huttenhower; Levi Waldron
Journal:  Genome Biol       Date:  2018-09-25       Impact factor: 13.583

7.  The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Authors:  Stephen R Piccolo; Avery Mecham; Nathan P Golightly; Jérémie L Johnson; Dustin B Miller
Journal:  PLoS Comput Biol       Date:  2022-03-11       Impact factor: 4.475

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.