Literature DB >> 26862323

Highlight report: Erroneous sample annotation in a high fraction of publicly available genome-wide expression datasets.

Marianna Grinberg1.   

Abstract

Entities:  

Year:  2015        PMID: 26862323      PMCID: PMC4743481          DOI: 10.17179/excli2015-760

Source DB:  PubMed          Journal:  EXCLI J        ISSN: 1611-2156            Impact factor:   4.068


× No keyword cloud information.



Recently, Lohr et al. have published a method that identifies sample annotation errors in gene expression data (Lohr et al., 2015[18]). Surprisingly, 40 % of 45 analyzed publicly available datasets including 4913 patients were affected by erroneous sample annotation. The authors conclude that sample annotation errors may be a more widespread phenomenon as previously expected (Lohr et al., 2015[18]). The authors used two strategies for identifying sample mix-up. First, a classifier was established that differentiates between samples from female and male patients. This classifier is based on the x-chromosomal gene XIST and the y-chromosomal genes RPS4Y1 and DDX3Y (Lohr et al., 2015[18]). In datasets with similar numbers of male and females, approximately half of sample mix-ups will result in sex mislabeling. A further possible error is sample duplication, where the same sample is analyzed twice and the duplicate is erroneously labeled with another patient (Lohr et al., 2015[18]). To identify such duplications, a correlation-based strategy was used. A strength of the techniques presented by Lohr et al. is that they include normalization steps which make it possible to apply the same algorithm on samples of all datasets. The algorithm then differentiates between 'correctly classified' and 'misclassified' samples. In the analyzed 45 publicly available cohorts 18 contained at least one misclassification. The authors also show that deleting the erroneous samples can strongly influence the number of statistically significant prognostic genes. Currently, genome-wide data are frequently used in cancer research (Stock et al., 2015[33]; Sicking et al., 2014[28]; Cadenas et al., 2014[4]; Mattson et al., 2015[21]). Intensively studied fields are breast- and ovarian cancer (Siggelkow et al., 2012[29]; Godoy et al., 2014[11]; Stewart et al., 2012[31]; Schmidt et al., 2012[24]). It can be expected that cohorts with only samples from either females or males have a lower risk of sex mislabeling. Therefore, it was surprising that an example of mislabeled patients was also identified in breast cancer patients. For example the well-known TRANSBIG cohort contains one female node-negative breast cancer patient who in reality is a man (Lohr et al., 2015[18]). Besides its intensive use in cancer research (Micke et al., 2014[22]; Schmidt et al., 2008[24]; Botling et al., 2013[3]) genome-wide expression data are also frequently used in toxicology (Campos et al., 2014[5]; Stöber et al., 2014[32]; Marchan, 2014[20][19]; Bolt, 2013[1]; Song et al., 2013[30]; Godoy et al., 2013[12]; Bolt et al., 2010[2]). The goal of these studies is to obtain first evidence of the mechanism of action of chemicals (Glahn et al., 2008[10]; Shimada et al., 2010[26]; Dika Nguea et al., 2008[6]; Hendrickx et al., 2014[15]; Shinde et al., 2015[27]; Yao and Costa, 2014[34]; Gagné et al., 2013[9]; Fang et al., 2014[8]; Kim et al., 2012[16]) or to establish classifiers of co-regulated gene clusters (Grinberg et al., 2014[14]; Shao et al., 2014[25]; Krug et al., 2013[17]; Godoy et al., 2015[13]; Rempel et al., 2015[23]; Doktorova et al., 2012[7]). In these datasets the correlation-based classifier for sample duplication may be helpful. In conclusion, the easy to use classifiers published by Lohr and colleagues (2015[18]) should be routinely included into the analysis of gene array but also RNA seq data to reduce the number of erroneous sample annotations.
  33 in total

1.  Gene array screening for identification of drugs with low levels of adverse side effects.

Authors:  Hermann M Bolt; Rosemarie Marchan; Jan G Hengstler
Journal:  Arch Toxicol       Date:  2010-04       Impact factor: 5.153

2.  Gene expression profiles in the brain of the neonate mouse perinatally exposed to methylmercury and/or polychlorinated biphenyls.

Authors:  Miyuki Shimada; Satomi Kameo; Norio Sugawara; Kozue Yaginuma-Sakurai; Naoyuki Kurokawa; Satomi Mizukami-Murata; Kunihiko Nakai; Hitoshi Iwahashi; Hiroshi Satoh
Journal:  Arch Toxicol       Date:  2009-12-18       Impact factor: 5.153

3.  Gelsolin Is Associated with Longer Metastasis-free Survival and Reduced Cell Migration in Estrogen Receptor-positive Breast Cancer.

Authors:  Anna-Maria Stock; Franziska Klee; Karolina Edlund; Marianna Grinberg; Seddik Hammad; Rosemarie Marchan; Cristina Cadenas; Bernd Niggemann; Kurt S Zänker; Jörg Rahnenführer; Marcus Schmidt; Jan G Hengstler; Frank Entschladen
Journal:  Anticancer Res       Date:  2015-10       Impact factor: 2.480

4.  Aberrantly activated claudin 6 and 18.2 as potential therapy targets in non-small-cell lung cancer.

Authors:  Patrick Micke; Johanna Sofia Margareta Mattsson; Karolina Edlund; Miriam Lohr; Karin Jirström; Anders Berglund; Johan Botling; Jörg Rahnenfuehrer; Millaray Marincevic; Fredrik Pontén; Simon Ekman; Jan Hengstler; Stefan Wöll; Ugur Sahin; Ozlem Türeci
Journal:  Int J Cancer       Date:  2014-04-08       Impact factor: 7.396

5.  Differential gene expression in human hepatocyte cell lines exposed to the antiretroviral agent zidovudine.

Authors:  Jia-Long Fang; Tao Han; Qiangen Wu; Frederick A Beland; Ching-Wei Chang; Lei Guo; James C Fuscoe
Journal:  Arch Toxicol       Date:  2013-11-30       Impact factor: 5.153

6.  Cadmium, cobalt and lead cause stress response, cell cycle deregulation and increased steroid as well as xenobiotic metabolism in primary normal human bronchial epithelial cells which is coordinated by at least nine transcription factors.

Authors:  Felix Glahn; Wolfgang Schmidt-Heck; Sebastian Zellmer; Reinhard Guthke; Jan Wiese; Klaus Golka; Roland Hergenröder; Gisela H Degen; Thomas Lehmann; Matthias Hermes; Wiebke Schormann; Marc Brulport; Alexander Bauer; Essam Bedawy; Rolf Gebhardt; Jan G Hengstler; Heidi Foth
Journal:  Arch Toxicol       Date:  2008-07-25       Impact factor: 5.153

7.  Human embryonic stem cell-derived test systems for developmental neurotoxicity: a transcriptomics approach.

Authors:  Anne K Krug; Raivo Kolde; John A Gaspar; Eugen Rempel; Nina V Balmer; Kesavan Meganathan; Kinga Vojnits; Mathurin Baquié; Tanja Waldmann; Roberto Ensenat-Waser; Smita Jagtap; Richard M Evans; Stephanie Julien; Hedi Peterson; Dimitra Zagoura; Suzanne Kadereit; Daniel Gerhard; Isaia Sotiriadou; Michael Heke; Karthick Natarajan; Margit Henry; Johannes Winkler; Rosemarie Marchan; Luc Stoppini; Sieto Bosgra; Joost Westerhout; Miriam Verwei; Jaak Vilo; Andreas Kortenkamp; Jürgen Hescheler; Ludwig Hothorn; Susanne Bremer; Christoph van Thriel; Karl-Heinz Krause; Jan G Hengstler; Jörg Rahnenführer; Marcel Leist; Agapios Sachinidis
Journal:  Arch Toxicol       Date:  2012-11-21       Impact factor: 5.153

8.  Gene networks and transcription factor motifs defining the differentiation of stem cells into hepatocyte-like cells.

Authors:  Patricio Godoy; Wolfgang Schmidt-Heck; Karthick Natarajan; Baltasar Lucendo-Villarin; Dagmara Szkolnicka; Annika Asplund; Petter Björquist; Agata Widera; Regina Stöber; Gisela Campos; Seddik Hammad; Agapios Sachinidis; Umesh Chaudhari; Georg Damm; Thomas S Weiss; Andreas Nüssler; Jane Synnergren; Karolina Edlund; Barbara Küppers-Munther; David C Hay; Jan G Hengstler
Journal:  J Hepatol       Date:  2015-05-25       Impact factor: 25.083

9.  Transcriptome based differentiation of harmless, teratogenetic and cytotoxic concentration ranges of valproic acid.

Authors:  Regina Stöber
Journal:  EXCLI J       Date:  2014-12-11       Impact factor: 4.068

10.  Loss of circadian clock gene expression is associated with tumor progression in breast cancer.

Authors:  Cristina Cadenas; Leonie van de Sandt; Karolina Edlund; Miriam Lohr; Birte Hellwig; Rosemarie Marchan; Marcus Schmidt; Jörg Rahnenführer; Henrik Oster; Jan G Hengstler
Journal:  Cell Cycle       Date:  2014       Impact factor: 4.534

View more
  3 in total

1.  Highlight report: Predicting late metastasis in breast cancer.

Authors:  Seddik Hammad; Gada S Osman; Mohamed Ezzeldien; Hassan Ahmed; Ahmed M Kotb
Journal:  EXCLI J       Date:  2016-12-23       Impact factor: 4.068

2.  The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome.

Authors:  Isaac Adeyemi Babarinde; Andrew Paul Hutchins
Journal:  BMC Genomics       Date:  2022-07-04       Impact factor: 4.547

3.  Highlight report: Intratumoral metabolomic heterogeneity of breast cancer.

Authors:  Regina Stoeber
Journal:  EXCLI J       Date:  2017-12-22       Impact factor: 4.068

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.