Literature DB >> 26608184

Identification of sample annotation errors in gene expression datasets.

Miriam Lohr1, Birte Hellwig1, Karolina Edlund2, Johanna S M Mattsson3, Johan Botling3, Marcus Schmidt4, Jan G Hengstler2, Patrick Micke3, Jörg Rahnenführer5.   

Abstract

The comprehensive transcriptomic analysis of clinically annotated human tissue has found widespread use in oncology, cell biology, immunology, and toxicology. In cancer research, microarray-based gene expression profiling has successfully been applied to subclassify disease entities, predict therapy response, and identify cellular mechanisms. Public accessibility of raw data, together with corresponding information on clinicopathological parameters, offers the opportunity to reuse previously analyzed data and to gain statistical power by combining multiple datasets. However, results and conclusions obviously depend on the reliability of the available information. Here, we propose gene expression-based methods for identifying sample misannotations in public transcriptomic datasets. Sample mix-up can be detected by a classifier that differentiates between samples from male and female patients. Correlation analysis identifies multiple measurements of material from the same sample. The analysis of 45 datasets (including 4913 patients) revealed that erroneous sample annotation, affecting 40 % of the analyzed datasets, may be a more widespread phenomenon than previously thought. Removal of erroneously labelled samples may influence the results of the statistical evaluation in some datasets. Our methods may help to identify individual datasets that contain numerous discrepancies and could be routinely included into the statistical analysis of clinical gene expression data.

Entities:  

Keywords:  Gene expression; Male–female classifier; Microarray; Misannotation; Quality control

Mesh:

Year:  2015        PMID: 26608184      PMCID: PMC4673097          DOI: 10.1007/s00204-015-1632-4

Source DB:  PubMed          Journal:  Arch Toxicol        ISSN: 0340-5761            Impact factor:   5.153


Introduction

The generation of large gene expression datasets presents a logistic challenge that extends from the initial procurement and storage of tissue samples, through laboratory procedures, to bioinformatic data processing and analysis. Although anticipated to be low, little is known about the actual frequency of sample mix-up during this multi-step process. The reasons for sample identity being swapped between individuals are diverse, and these events are difficult to pinpoint retrospectively with absolute certainty. In datasets with roughly balanced frequencies of male and female individuals, it can be assumed that approximately half of the mix-ups will result in sex mislabeling. These cases can be identified by assessment of genes with male- or female-specific expression. Other commonly annotated clinicopathological parameters, such as tumor stage, would also be affected by mislabeling, but the lack of genes that exhibit for instance a reliable stage-specific expression pattern makes the standardized assessment of these parameters unsuitable. Few attempts have been made to systematically identify sample mix-ups in public gene expression datasets. The MixupMapper software (Westra et al. 2011) requires DNA sequence data (SNP) in addition to gene expression data. However, the majority of previous studies are based exclusively on gene expression data. Recent approaches use the expression of the X-chromosomal gene XIST and genes located on the Y chromosome for the discrimination between male and female samples in the analysis of single datasets. However, these methods are not generalizable because of the lack of normalization across datasets (‘t Hoen et al. 2013; Broman et al. 2015). To gain insight into frequencies of sample annotation discrepancies in publicly available gene expression datasets, we established a male–female classifier based on gene expression array data. In addition, correlations between expression values for pairs of samples were assessed to identify multiple measurements of tissue from the same individual, as this represents an additional hypothetical source of inconsistencies with regard to sample annotation.

Methods

In this investigation, 45 publicly available MIAME-compliant sample collections were included (see Tables 1, 2 for details), all with accessible gene expression array data and available information on male or female sex for each study subject. In total, the studies comprised 4913 patients (3034 females, 1879 males). Gene expression array data and information on male or female sex for each study subject were accessed from the Gene Expression Omnibus (GEO) or directly from the authors’ Web site (Edgar et al. 2002; Shedden et al. 2008; Bild et al. 2006). Only datasets using the AffymetrixGeneChip© HG-U133A or HG-U133 Plus 2.0 were included in this analysis.
Table 1

Overview of analyzed datasets

TypeCohortsSample size (female/male)
Non-small cell lung cancerGSE37745, Shedden, GSE31547, GSE29013, GSE14814, GSE4573, GSE31210, GSE19188, GSE31546, GSE104451338(594/744)
Colon cancerGSE33113, GSE12945, GSE31595, GSE4271, GSE1433, GSE17536, GSE17537769(358/411)
Other cancerGSE5720, GSE4107, GSE42952, GSE34111, GSE31684200(64/136)
Non-cancerGSE19027, GSE17913, GSE23343, GSE25462, GSE7821, GSE20950, GSE24427408(219/189)
Breast cancerGSE11121, GSE2034, TRANSBIG (GSE7390/GSE6532), GSE16446, GSE20194, GSE20271, GSE22093, GSE239881373(1373/0)
Ovarian cancerBild, GSE14764, GSE19829, GSE26712426(426/0)
Prostate cancerGSE17951, GSE25136, GSE3325, GSE8218399(0/399)

Tissue collections and gene array datasets analyzed by the male–female classifier, if available identified by their Gene Expression Omnibus (GEO) Series (GSE) number

Table 2

Detailed description of analyzed datasets

Cohort# Female# Male# TotalType (disease or subject of study)
GSE3774589107196NSCLC
Shedden220223443NSCLC
GSE31547361450NSCLC + controls
GSE29013173855NSCLC
GSE14814236790NSCLC
GSE45734782129NSCLC
GSE3121010995204NSCLC
GSE19188235982NSCLC
GSE3154614317NSCLC
GSE10445165672NSCLC
GSE4107121022Colorectal cancer
GSE33113484290Colorectal cancer
GSE31595221537Colorectal cancer
GSE12945283462Colorectal cancer
GSE14333106120226Colorectal cancer
GSE175368196177Colorectal cancer
GSE17537292655Colorectal cancer
GSE42713268100Other cancer: glioma
GSE31684256893Other cancer: bladder
GSE3411162430Other cancer: gastrointestinal
GSE5720243054Other cancer: 9 different tissues
GSE4295291423Other cancer: pancreatic
GSE19027114859Bronchial epithelium of (non-) smokers with and without lung cancer
GSE17913384078Smoking
GSE2334371017Insulin resistance/type 2 diabetes
GSE25462282250Insulin resistance/type 2 diabetes
GSE7821281240Healthy twins
GSE20950271239Insulin resistance/obesity
GSE244278045125Multiple sclerosis
GSE111212000200Breast cancer
GSE20342860286Breast cancer
TRANSBIG (GSE7390/GSE6532)2800280Breast cancer
GSE164461140114Breast cancer; chemo response
GSE201942470247Breast cancer; chemo response
GSE202711390139Breast cancer; chemo response
GSE2209347047Breast cancer; chemo response
GSE2398860060Breast cancer; chemo response
Bild1330133Ovarian cancer
GSE1476480080Ovarian cancer
GSE1982928028Ovarian cancer
GSE267121850185Ovarian cancer
GSE179510153153Prostate cancer
GSE2513607979Prostate cancer
GSE332501919Prostate cancer
GSE82180148148Prostate cancer

Overview over the studied tissue collections and gene array data

Overview of analyzed datasets Tissue collections and gene array datasets analyzed by the male–female classifier, if available identified by their Gene Expression Omnibus (GEO) Series (GSE) number Detailed description of analyzed datasets Overview over the studied tissue collections and gene array data To construct the classifier, we proceeded in three steps: 1. selection of probe sets with male- or female-specific expression, 2. dataset normalization to enable analysis of unlabelled cohorts and cohorts comprising only female or only male patients, and 3. combination of evidence from male- and female-specific probe sets into a final classifier that categorizes each sample as “correctly classified,” “misclassified,” or “unconfident.” In each step (1–3) a likelihood-based strategy was applied that ensures robustness against outliers (Algorithms 1–3 in Suppl. material). The initial probe set selection was based on 10 publicly available non-small cell lung cancer (NSCLC) gene expression datasets analyzed on the AffymetrixGeneChip© HG-U133A or HG-U133 Plus 2.0 array (Suppl. material: Algorithm 1). For each sample, sex information and gene expression measurements for 22,277 probe sets were available. Only seven probe sets achieved median male–female classification accuracy above 75 % and only five above 90 %. The top four probe sets were included in the classifier (Table 3). Two of them map to the XIST gene (221728_x_at and 214218_s_at), located on the X chromosome, and the other two to RPS4Y1 (201909_at) and DDX3Y (205000_at), respectively, both located on the Y chromosome. XIST is expressed from the inactive female X chromosome and silenced in men. This is illustrated in one NSCLC dataset (GSE31210), with high expression of XIST (221728_x_at) observed in all patients labelled as female (Fig. 1), but only in one sample labelled as male. Hence, this exception was clearly located in the female XIST expression range. RPS4Y1 and DDX3Y showed the opposite behavior, with high expression values observed in male patients. RPS4Y1 encodes a structurally conserved ribosomal protein with putative function during spermatogenesis (Lopes et al. 2010), whereas DDX3Y is primarily expressed in testis and is involved in germ-line translation control (Rauschendorf et al. 2011). Probe sets with low discriminating power were not included in the classifier.
Table 3

Probe sets included in the male–female classifier

Affymetrix IDGeneChromosomeCut point (99 % quantile)Evidence (male/female)
221728_x_at XIST X>0.389Female
214218_s_at XIST X>0.385Female
201909_at RPS4Y1 Y>0.431Male
205000_at DDX3Y Y>0.276Male

Probe sets included into the male–female classifier, with corresponding cut points for evidence whether a sample originates from a male or a female

Fig. 1

Differentiation between male and female samples by XIST expression. Bean plots of the expression values of probe set 221728_x_at (XIST) in the NSCLC cohort GSE31210. A clear separation between low expression values in males (blue) and high expression values in females (red) can be observed. One sample is mislabelled

Probe sets included in the male–female classifier Probe sets included into the male–female classifier, with corresponding cut points for evidence whether a sample originates from a male or a female Differentiation between male and female samples by XIST expression. Bean plots of the expression values of probe set 221728_x_at (XIST) in the NSCLC cohort GSE31210. A clear separation between low expression values in males (blue) and high expression values in females (red) can be observed. One sample is mislabelled The expression levels of the four selected probe sets were evaluated in 35 additional datasets, including seven colon cancer, five other cancer, and seven non-cancer datasets containing samples from both male and female subjects, as well as eight breast cancer, four ovarian cancer, and four prostate cancer datasets. A plot of raw expression values for the probe set 201909_at (RPS4Y1) across all datasets showed high male–female classification accuracy per dataset, but large overall expression shifts between datasets (Fig. 2a). After normalizing expression values with a linear transformation to median values of 0 and 1 for the low and high expression groups, respectively (Suppl. material: Algorithm 2), expression levels were reliably comparable across cohorts (Fig. 2b).
Fig. 2

Improvement in comparability of cohorts by normalization. a Raw expression values of female (red) and male (blue) labelled samples set 201909_at (RPS4Y1) across all datasets. b The same cohorts after normalization. Specifically, two outliers in datasets TRANSBIG and GSE22093 indicate two breast cancer patients with high RPS4Y1 expression, feature clearly inconsistent with female sex

Improvement in comparability of cohorts by normalization. a Raw expression values of female (red) and male (blue) labelled samples set 201909_at (RPS4Y1) across all datasets. b The same cohorts after normalization. Specifically, two outliers in datasets TRANSBIG and GSE22093 indicate two breast cancer patients with high RPS4Y1 expression, feature clearly inconsistent with female sex In a final step, the four sex-specific probe sets were combined to categorize each sample as “correctly classified,” “misclassified,” or “unconfident” (Suppl. material: Algorithm 3). First, for each cohort and for each probe set, the expression values were clustered into two groups of low and high values and a normal distribution was fitted to the low expression group, estimating location and scale with robust measures (median and Rousseeuw–Croux estimator Qn (Rousseeuw and Croux 1993)). Next, the expression value of the probe set for each sample was compared to the 99.9 % quantile of the fitted normal distribution. A value above this cut point is inconsistent with the typical range for the low expression group and thus provides strong evidence that the corresponding sample belongs to the high expression group. For each individual sample, a female-evidence score was then defined for each of the two XIST probe sets. As high XIST expression is inconsistent with male sex, the female-evidence score was set to 1 if the corresponding XIST expression value was above the cut point. Analogously, for DDX3Y and RPS4Y1, respectively, a male-evidence score was set to 1 if the expression value of the probe set was above the corresponding cut point. Taking the evidence scores of all four probe sets into account, a sample was classified as male if at least one male-evidence score was 1 and both female-evidence scores were 0. Vice versa, a sample was classified as female if at least one female-evidence score was 1 and both male-evidence scores were 0. Finally, the new classifications were compared to the original sex annotations, categorizing each sample as “correctly classified,” “misclassified,” or “unconfident.” Samples with both at least one positive female-evidence score and at least one positive male-evidence score, or with no positive evidence score, were classified as “unconfident.”

Results and discussion

The male–female classifier was applied to all 45 cohorts, categorizing 4913 patients (3034 females, 1879 males) (Fig. 3). In total 54 patients (1.1 %) were categorized as “misclassified” and 149 (3.0 %) were labelled “unconfident.” The direction of sex mislabeling was nearly balanced, with 29 female samples mislabeled as male and 25 male samples mislabeled as female. Overall, in 18 of the 45 cohorts (40 %) at least one “misclassified” sample was detected. The proportion of “correctly classified” samples was 100 % in 15 cohorts, below 90 % in five cohorts, and in between for the remaining 25 cohorts. Note that these numbers are probably overoptimistic, as 16 cohorts included in the study consisted of breast, ovarian, or prostate cancer patients, with lower risk of sex mislabeling. Still, one breast cancer patient in the cohort TRANSBIG (comprising node-negative untreated patients of GSE7390 and GSE6532) was classified as male (Fig. 3).
Fig. 3

Application of the male–female classifier. Application of the male–female classifier to all cohorts, cohorts grouped by caner type. Green “correctly classified,” red “misclassified,” and orange “unconfident” samples

Application of the male–female classifier. Application of the male–female classifier to all cohorts, cohorts grouped by caner type. Green “correctly classified,” red “misclassified,” and orange “unconfident” samples The prevalence of sample identity inconsistencies in public data repositories can be anticipated to be at least twice as high as indicated by the male–female classifier, as mix-up may occur also between samples from individuals of the same sex. To visualize the expression-based sex assignment per cohort, we plotted mean normalized expression values of the two X-chromosomal probe sets and the two Y-chromosomal probe sets against each other (Fig. 4). For most cohorts, two clearly distinguishable groups representing males and females can be recognized, and category assignment by visual inspection is well in agreement with our likelihood-based classifier.
Fig. 4

Visualization of the male–female classifier with mean expression values of the two prove sets for XIST on the x-axis and DDX3Y and RPS4Y1 on the y-axis. The points represent individual patients. The point clouds on the left and are characteristic for males and females, respectively. Colors indicate classification accuracy samples. Green “correctly classified,” red “misclassified,” and orange “unconfident.” a Results for the Uppsala cohort (GSE37745): One female patients clearly mislabeled as male, and two samples are labeled “unconfident.” b Results for GSE33113 with clear discrimination between males and females and no sex misannotations. c Results for GSE5720 with two misclassified samples and large number of samples classified as “unconfident.” d Results for a breast cancer dataset (TRANSBIG) with one male patient assigned to the category “misclassified”

Visualization of the male–female classifier with mean expression values of the two prove sets for XIST on the x-axis and DDX3Y and RPS4Y1 on the y-axis. The points represent individual patients. The point clouds on the left and are characteristic for males and females, respectively. Colors indicate classification accuracy samples. Green “correctly classified,” red “misclassified,” and orange “unconfident.” a Results for the Uppsala cohort (GSE37745): One female patients clearly mislabeled as male, and two samples are labeled “unconfident.” b Results for GSE33113 with clear discrimination between males and females and no sex misannotations. c Results for GSE5720 with two misclassified samples and large number of samples classified as “unconfident.” d Results for a breast cancer dataset (TRANSBIG) with one male patient assigned to the category “misclassified” A further error that may occur during tissue processing is sample duplication. The same sample may be analyzed twice and the duplicate is erroneously labelled with the identification number of another patient. To identify such duplications, a correlation-based analysis strategy was applied. For each cohort, the 1000 probe sets with highest variance across all samples were selected and Pearson correlation coefficients between all pairs of samples in the cohort were calculated. The largest distance between all ordered values of correlations was identified to distinguish between duplicated measurements and pairs of measurements from different samples. In 15 of the 45 cohorts at least one duplicate was identified. In total 32 duplicates were detected. Comparing these duplicates with the results from the male–female classifier, nine of the 54 “misclassified” assignments (16.7 %) could be explained by duplicated measurements. The general impact of misannotated samples on gene expression is difficult to assess. To illustrate the relevance of misannotations in gene expression studies, we re-analyzed six lung cancer cohorts with available survival times. Prognostic relevance of a gene was determined by fitting a univariate Cox model (Cox 1972) to its expression values. The number of significant genes (p value <0.01; not FDR-adjusted) was first calculated for the original datasets. Removing all unambiguously misannotated samples from the six datasets with misannotations, 12–53 % of the previously significant genes were not significant any more. In contrast, using only the reduced number of samples, the number of newly discovered genes was in the range of 9–39 % of the original number of significant genes (Table 4).
Table 4

Results of univariate Cox models

DatasetNo. of patientsNo. of misannotations and duplicationsNo. of significant genes (original scenario)Percentage of genes no longer significant after removal of the misannotated samplesPercentage of genes newly significant after removal of the misannotated samples
GSE37745196345012.2214.00
Shedden44314135415.668.79
GSE2901355141915.5114.32
GSE4573129518926.6338.62
GSE3154750131850.5123.27
GSE1918882819053.1634,374

Results of univariate Cox models for six NSCLC datasets. Comparison between significance genes (p < 0.01) identified in the original cohort and significance genes identified in the reduced cohort after removal of misannotated and duplicated samples

Results of univariate Cox models Results of univariate Cox models for six NSCLC datasets. Comparison between significance genes (p < 0.01) identified in the original cohort and significance genes identified in the reduced cohort after removal of misannotated and duplicated samples To elucidate the reason behind the sample mislabeling observed in our own non-small cell lung cancer cohort (GSE37745), one patient annotated as male in the original records and assigned as female by our classifier was re-analyzed. First, new DNA and RNA samples were prepared from the original biobanked tissue specimen. Male sex was then confirmed based on the analysis of STR marker distribution using the AmpFLSTR® Identifiler® PCR Amplification Kit according to the manufacturer’s instructions (Applied Biosystems, Foster City, USA), suggesting that sample mix-up in this case did not occur during sample collection and biobanking procedures. Subsequently, the gene expression array analysis was repeated for the misclassified sample and for five additional control samples from the previously analyzed cohort. The pairwise correlation between the new and old misclassified sample was only 0.464, strongly indicating that these two samples were derived from different individuals. In contrast, a striking correlation of 0.993 was detected between the misclassified sample and a sample from one other female patient in the previously analyzed cohort. The high correlation suggests that the mRNA sample from one female patient erroneously had been measured twice in the previous analysis. A second duplicated measurement was detected, with correlation 0.990 between the expression values of two patients with sex label male. In contrast, all correlations of the repeated control samples with the corresponding original measurements were high (correlation coefficients: 0.910–0.987). The rapidly increasing number of newly published results of microarray and RNA-seq experiments reveals that genome-wide expression data play an important role in translational research (Petermann et al. 2007; Verhaak et al. 2013). Therefore, quality control for gene expression measurements and clinical information on samples should be performed routinely before analyzing the data. Retrospective identification of misannotated samples is possible by a classifier-based computational strategy together with correlation analysis. In 18 of 45 cohorts analyzed at least one “misclassified” sample was detected. The easy-to-use classifier presented here, combined with correlation analysis to detect samples erroneously measured multiple times, helps to identify individual datasets that contain numerous discrepancies. Re-evaluation of gene expression array data demonstrated that sample mislabeling may have a considerable impact on the output of the statistical evaluation and allows inferences on the accuracy of biobanking. In conclusion, methods for identifying sample misannotations should be routinely included into the statistical analysis of clinical gene expression data. Supplementary material 1 (DOCX 54 kb)
  10 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors:  Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal:  Nat Biotechnol       Date:  2013-09-15       Impact factor: 54.908

3.  MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects.

Authors:  Harm-Jan Westra; Ritsert C Jansen; Rudolf S N Fehrmann; Gerard J te Meerman; David van Heel; Cisca Wijmenga; Lude Franke
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

4.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies.

Authors:  Andrea H Bild; Guang Yao; Jeffrey T Chang; Quanli Wang; Anil Potti; Dawn Chasse; Mary-Beth Joshi; David Harpole; Johnathan M Lancaster; Andrew Berchuck; John A Olson; Jeffrey R Marks; Holly K Dressman; Mike West; Joseph R Nevins
Journal:  Nature       Date:  2005-11-06       Impact factor: 49.962

5.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma.

Authors:  Roel G W Verhaak; Pablo Tamayo; Ji-Yeon Yang; Diana Hubbard; Hailei Zhang; Chad J Creighton; Sian Fereday; Michael Lawrence; Scott L Carter; Craig H Mermel; Aleksandar D Kostic; Dariush Etemadmoghadam; Gordon Saksena; Kristian Cibulskis; Sekhar Duraisamy; Keren Levanon; Carrie Sougnez; Aviad Tsherniak; Sebastian Gomez; Robert Onofrio; Stacey Gabriel; Lynda Chin; Nianxiang Zhang; Paul T Spellman; Yiqun Zhang; Rehan Akbani; Katherine A Hoadley; Ari Kahn; Martin Köbel; David Huntsman; Robert A Soslow; Anna Defazio; Michael J Birrer; Joe W Gray; John N Weinstein; David D Bowtell; Ronny Drapkin; Jill P Mesirov; Gad Getz; Douglas A Levine; Matthew Meyerson
Journal:  J Clin Invest       Date:  2012-12-21       Impact factor: 14.808

6.  The human RPS4 paralogue on Yq11.223 encodes a structurally conserved ribosomal protein and is preferentially expressed during spermatogenesis.

Authors:  Alexandra M Lopes; Ricardo N Miguel; Carole A Sargent; Peter J Ellis; António Amorim; Nabeel A Affara
Journal:  BMC Mol Biol       Date:  2010-05-07       Impact factor: 2.946

7.  CD200 is induced by ERK and is a potential therapeutic target in melanoma.

Authors:  Kimberly B Petermann; Gabriela I Rozenberg; Daniel Zedek; Pamela Groben; Karen McKinnon; Christin Buehler; William Y Kim; Janiel M Shields; Shannon Penland; James E Bear; Nancy E Thomas; Jonathan S Serody; Norman E Sharpless
Journal:  J Clin Invest       Date:  2007-12       Impact factor: 14.808

8.  Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study.

Authors:  Kerby Shedden; Jeremy M G Taylor; Steven A Enkemann; Ming-Sound Tsao; Timothy J Yeatman; William L Gerald; Steven Eschrich; Igor Jurisica; Thomas J Giordano; David E Misek; Andrew C Chang; Chang Qi Zhu; Daniel Strumpf; Samir Hanash; Frances A Shepherd; Keyue Ding; Lesley Seymour; Katsuhiko Naoki; Nathan Pennell; Barbara Weir; Roel Verhaak; Christine Ladd-Acosta; Todd Golub; Michael Gruidl; Anupama Sharma; Janos Szoke; Maureen Zakowski; Valerie Rusch; Mark Kris; Agnes Viale; Noriko Motoi; William Travis; Barbara Conley; Venkatraman E Seshan; Matthew Meyerson; Rork Kuick; Kevin K Dobbin; Tracy Lively; James W Jacobson; David G Beer
Journal:  Nat Med       Date:  2008-07-20       Impact factor: 53.440

9.  Complex transcriptional control of the AZFa gene DDX3Y in human testis.

Authors:  M-A Rauschendorf; J Zimmer; R Hanstein; C Dickemann; P H Vogt
Journal:  Int J Androl       Date:  2011-02

10.  Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

Authors:  Karl W Broman; Mark P Keller; Aimee Teo Broman; Christina Kendziorski; Brian S Yandell; Śaunak Sen; Alan D Attie
Journal:  G3 (Bethesda)       Date:  2015-08-19       Impact factor: 3.154

  10 in total
  21 in total

1.  Comprehensive analysis of the prognostic value and immune implications of the TTK gene in lung adenocarcinoma: a meta-analysis and bioinformatics analysis.

Authors:  Bo Li; Xiaojuan Gu; Hanbing Zhang; Hao Xiong
Journal:  Anim Cells Syst (Seoul)       Date:  2022-05-24       Impact factor: 2.398

2.  A community effort to identify and correct mislabeled samples in proteogenomic studies.

Authors:  Seungyeul Yoo; Zhiao Shi; Bo Wen; SoonJye Kho; Renke Pan; Hanying Feng; Hong Chen; Anders Carlsson; Patrik Edén; Weiping Ma; Michael Raymer; Ezekiel J Maier; Zivana Tezak; Elaine Johanson; Denise Hinton; Henry Rodriguez; Jun Zhu; Emily Boja; Pei Wang; Bing Zhang
Journal:  Patterns (N Y)       Date:  2021-05-07

3.  Identification of carcinogens by a selected panel of DNA damage response associated genes.

Authors:  Regina Stöber
Journal:  EXCLI J       Date:  2015-12-22       Impact factor: 4.068

4.  Highlight report: Erroneous sample annotation in a high fraction of publicly available genome-wide expression datasets.

Authors:  Marianna Grinberg
Journal:  EXCLI J       Date:  2015-12-21       Impact factor: 4.068

5.  HYSYS: have you swapped your samples?

Authors:  Jan Schröder; Vincent Corbin; Anthony T Papenfuss
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

6.  Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.

Authors:  Lilah Toker; Min Feng; Paul Pavlidis
Journal:  F1000Res       Date:  2016-08-30

7.  Highlight report: Predicting late metastasis in breast cancer.

Authors:  Seddik Hammad; Gada S Osman; Mohamed Ezzeldien; Hassan Ahmed; Ahmed M Kotb
Journal:  EXCLI J       Date:  2016-12-23       Impact factor: 4.068

8.  Highlight report: Intratumoral metabolomic heterogeneity of breast cancer.

Authors:  Regina Stoeber
Journal:  EXCLI J       Date:  2017-12-22       Impact factor: 4.068

9.  Highlight report: The relationship of DNA copy number alterations and mRNA levels in cancer.

Authors:  Seddik Hammad
Journal:  EXCLI J       Date:  2017-12-21       Impact factor: 4.068

10.  Integrative analysis of genome-wide gene copy number changes and gene expression in non-small cell lung cancer.

Authors:  Verena Jabs; Karolina Edlund; Helena König; Marianna Grinberg; Katrin Madjar; Jörg Rahnenführer; Simon Ekman; Michael Bergkvist; Lars Holmberg; Katja Ickstadt; Johan Botling; Jan G Hengstler; Patrick Micke
Journal:  PLoS One       Date:  2017-11-07       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.