Literature DB >> 30794641

Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects.

Robin Gradin¹, Malin Lindstedt², Henrik Johansson¹.

Abstract

Many biological data acquisition platforms suffer from inadvertent inclusion of biologically irrelevant variance in analyzed data, collectively termed batch effects. Batch effects can lead to difficulties in downstream analysis by lowering the power to detect biologically interesting differences and can in certain instances lead to false discoveries. They are especially troublesome in predictive modelling where samples in training sets and test sets are often completely correlated with batches. In this article, we present BARA, a normalization method for adjusting batch effects in predictive modelling. BARA utilizes a few reference samples to adjust for batch effects in a compressed data space spanned by the training set. We evaluate BARA using a collection of publicly available datasets and three different prediction models, and compare its performance to already existing methods developed for similar purposes. The results show that data normalized with BARA generates high and consistent prediction performances. Further, they suggest that BARA produces reliable performances independent of the examined classifiers. We therefore conclude that BARA has great potential to facilitate the development of predictive assays where test sets and training sets are correlated with batch.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30794641 PMCID： PMC6386283 DOI： 10.1371/journal.pone.0212669

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Data acquisition techniques designed to quantify biological signals from gene- or protein expression are often associated with batch effects. The problem with batch effects is that it leads to the inclusion of biologically irrelevant variance in the obtained data, which can lower the power of subsequent analyses or lead to false discoveries [1-4]. The variance may be due to a variety of different experimental parameters, including analysis date, sample processing or reagent quality [5]. Further, batch effects are not exclusive to high throughput acquisition methods but are also observed in data from lower throughput methods such as qPCR or NanoString nCounter technologies [6, 7]. The high incidence of batch effects in multiple biological platforms is a contributing factor to the relatively small number of diagnostic and prognostic biomarker signatures that have been implemented in clinical settings [6, 8]. Some actions have been shown to reduce the impact of batch effects. One such action is to carefully design experiments to minimize the correlation between possible sources of technical variance and known biological factors. However, this action is not possible for all types of experiments. For predictive modelling, for example, the correlation between biological factors and batches cannot be eliminated. This is due to the inherent nature of these experiments, where fixed training sets are often used to infer parameter values used to predict subsequently acquired test sets. This leads to total confoundment between batches and samples in the test sets, which can result in poor predictive performances on test sets [9]. Another option to reduce the impact of batch effects is to apply analytical methods on already obtained data. Many such methods have been designed, but most require prior knowledge of the biological factors of interest and low confoundment between batches and the biological groups [10-12]. Examples of such methods are ComBat and surrogate variable analysis (SVA). ComBat is a supervised batch correction method that requires that the sources of batch effects are known. It is a location and scale method that uses the empirical Bayes method to moderate the batch effect estimates, making it better equipped to handle smaller datasets [10]. In contrast to ComBat and other supervised batch effect adjustment methods, SVA does not require that the sources of batch effects are known. Instead, the biological sources of interest should be known and specified in the model. The initial step of SVA estimates and removes the variance associated with the known biological information. Latent structures are then identified in the residual matrix, which can either be removed to generate a cleaned dataset or be incorporated in subsequent significant analyses. Identified latent structures can contain information linked to batch effects, but they can also contain other sources of expression heterogeneity, such as biological factors not included in the initial modeling [11]. Both SVA and ComBat were originally developed for datasets in discovery studies, where biological sources of interest and possible sources of batch effects are known. Because of this, they and other methods developed for similar purposes are not directly applicable to datasets generated in predictive settings. However, by making certain assumptions, the algorithms can be modified to be used in predictive modelling. For ComBat, one must assume that the composition of test sets is similar to that in the training set. But this assumption can be violated when, for example, the size of the test set decreases, as shown in [13]. For SVA, one can assume that latent structures identified in the training set can also be identified in test sets. This assumption was used to develop the frozen SVA algorithm [14]. However, this assumption is not valid if latent structures associated with batch effects are different in the training set and test sets. This can lead to poor predictive performances as shown in [13]. In general, for a normalization method to be applicable in a wide range of prediction problems, it should allow for training sets and test sets to be correlated with batch. Further, the training set should not be altered when normalizing with different test sets. Finally, it should ideally allow test sets to be acquired without the need to include a large amount of reference samples. In this paper, we introduce Batch Adjustment by Reference Alignment (BARA) to adjust for batch effects in predictive modelling. The method has the advantage that only a few reference samples are necessary to perform batch corrections. Also, rather than attempting to clean the data by removing batch effects from both training set and test sets, BARA aims to transform the test set to make it more similar to the training set. BARA performs the adjustment in a compressed data space spanned by the training set, thereby alleviating the number of necessary batch estimates that needs to be performed. We test the BARA method on a collection of 25 publicly available datasets and show that BARA consistently aids the classifier to achieve high prediction performances. We further show that the performance of BARA is better or comparable to the performance of existing methods on the examined datasets. By reducing the negative impact of batch effects, the prediction performances observed with BARA can facilitate the development of predictive assays.

Materials and methods

The R software environment was used to perform the analyses presented in this paper [15]. Figures were created with the R-package ggplot2 [16]. In addition, the following R-packages were used; reshape2, dplyr, stringr, data.table, magrittr, foreach, doParallel, e1071, randomForest, class and bapred [17-27]. The scripts used to generate the results, including the BARA algorithm, are available at: https://github.com/gradinetal2018/BARA.

Cross-study datasets

25 datasets compiled by Hornung et. al. [13] were downloaded from ArrayExpress [28], see Table 1. The gene expression levels of all datasets were quantified with Affymetrix GeneChip Human Genome U133 Plus 2.0. The raw data files (CEL files) were normalized using single channel array normalization [29]. For each dataset, duplicated samples were removed, and only samples with existing annotations of gender were retained. All samples were annotated by gender/sex.

Table 1

Datasets used in the cross-study analysis.

Accession Number	Sample Size	Reference
E-GEOD-19722	46	[30]
E-GEOD-28654	112	[31]
E-GEOD-29623	65	[32]
E-GEOD-39084	70	[33]
E-GEOD-45216	31	[34]
E-GEOD-45670	38	[35]
E-GEOD-46474	40	[36]
E-GEOD-48278	57	[37]
E-GEOD-48350	83	[38, 39]
E-GEOD-48780	49	[40]
E-GEOD-49243	73	[41, 42]
E-GEOD-50774	21	[43]
E-GEOD-53224	53	[44]
E-GEOD-53890	41	[45]
E-GEOD-54543	30	[46]
E-GEOD-54837	226	[47]
E-GEOD-58697	124	[48]
E-GEOD-59312	79	[49]
E-GEOD-60028	24	[50]
E-GEOD-61804	325	[51]
E-GEOD-63626	63	[52]
E-GEOD-64415	209	[53]
E-GEOD-64857	81	[54]
E-GEOD-67851	31	[55]
E-GEOD-68720	97	[56]

The table describes each dataset’s accession number and the number of samples extracted from it.

Cross-study prediction evaluation

Cross-study prediction performances were used to evaluate the performance of BARA and to compare it to existing normalization methods. The normalization methods included in this analysis were; BARA, ComBat [10], FABatch [21], fSVA exact [14], mean centering, ratio A, reference centering, reference ratio A and standardization. The reference centering method subtracts the reference samples’ mean expression of each gene from all samples in the training set and the test set respectively. Similarly, the reference ratio A method scales each gene and sample in the training set and the test set by their respective reference samples mean expression. To examine the predictive performance on normalized data, each of the 25 datasets was iteratively used as a temporary training set. First, 3 samples from the same biological group were randomly selected as reference samples in the training set. Next, using all samples in the training set, the 500 most significant differentially expressed genes, comparing males to females, were identified using limma [57, 58]. Because the normalization methods had all been adjusted to be used in predictive modelling, as implemented in the bapred package [21] or through implementations in R, each method was first applied to the training set. Next, the transformed training set was used to define the prediction models. Three different prediction models were examined; k-nearest neighbors (kNN), random forest, and support vector machines (SVM). The prediction models were tuned using repeated cross-validation on the training set, with 3 repeats and 10 folds. The parameters resulting in the highest mean prediction performance, evaluated using Mathews Correlation Coefficient (MCC), were selected to establish the final prediction model. To allow for variation among the samples acting as reference sample, the test set normalization and prediction procedure was repeated 10 times for each test set, using a different selection of reference samples in each iteration. More specifically, when classifying the samples in each test set, 3 samples from the same group as the reference samples in the training set were randomly selected from the test set. The normalization methods that did not rely on reference samples used all samples in the test set, including the 3 reference samples, while the reference-based normalization methods only used the reference samples to normalize the data. Because information about the group of the reference samples could be considered being leaked during the normalization procedure, the reference samples were removed from the test sets before the predictions were made. The final prediction performance for each test set was calculated as the median MCC from the 10 iterations. To obtain an overall prediction performance for each training set, the MCCs of the 24 test sets were averaged. A summarized prediction score for each normalization and prediction model was calculated as the mean MCC from all the training sets.

Assessment of BARA’s dependence on the number of reference samples

To assess the performance of the BARA algorithm as the number of reference samples was varied, the cross-study prediction approach described above was repeated. The performance estimation was repeated 6 times, where the number of utilized reference samples was varied from 1 sample to 6 samples. The predictive performances were summarized as described above.

The BARA algorithm

The BARA algorithm was created specifically for predictive modelling, where a fixed training set is used to classify test samples possibly affected by batch effects. The training set is used to identify a set of directions that captures the largest part of the variance in the data, using singular value decomposition (SVD). This step allows the data to be compressed into a lower dimensional space, which reduces the number of necessary batch estimates, and simultaneously decreases the complexity of the data. The training dataset, X, contains m samples in rows and n variables in columns. Each variable in X is centered by its mean value, and the matrix is decomposed using SVD to identify the directions where the batch adjustment is performed. where represents a 1*n dimensional vector containing the column means of X, U is the left singular vectors, S the singular values, and V the right singular vectors. The number of dimensions retained, k, is an adjustable parameter that can be set by using a predetermined value, estimated with for example cross-validation, or determined by setting an acceptable loss of variance in the training set, for example 10%. The centered training data is then multiplied by the first k columns of the right singular vector to obtain a transformed training set. The test dataset, Z, contains p samples in rows and n variables. The variables are first adjusted by subtracting the mean values of the training data, and is then projected onto the identified directions. A batch adjustment factor, aj, is estimated for all retained dimensions, using the reference samples present in both the training set and the test set. For example, the adjustment factor of dimension j is estimated by comparing the mean value of the reference samples in the training set for dimension j, to the mean value of the reference samples in the test set for dimension j. Where and are 1*k dimensional vectors containing the variable means for the reference samples in the transformed test set and the transformed training set respectively. The transformed test data is then adjusted by the adjustment factors and both datasets are reconstructed to the original data space. To achieve a level of expression to what was originally observed, the mean values estimated from the training set are finally added to the reconstructed data. The predictions are then performed as normal, where the model is built from the compressed training dataset, Xk, and the predictions are made on the adjusted test set, Zk.

Normalization methods and parameter settings

The following methods were used through their implementations in the R-package bapred; ComBat, FAbatch, fSVA exact, mean centering, ratio A and standardization. For FAbatch, the default values of the parameters were used, i.e. the number of factors to estimate for each batch was left unspecified, the preliminary probabilities were estimated using leave-one-out cross-validation, maximum number of iterations were 100, and the maximum number of factors were 12. For fSVA exact, the algorithm parameter was changed to correspond to the exact algorithm instead of the fast, while the default values were used for the remaining parameters. For the other methods implemented in the bapred package, no additional parameter values could be specified. The two reference-based methods, reference mean centering and reference ratio A, where implemented in R. Reference mean centering subtracted each batch’s genes by the mean expression of its reference samples, and reference ratio A scaled each batch’s genes expression by the mean expression of the reference samples. BARA was implemented by specifying the loss parameter as a criterion for selecting the number of dimensions to retain. The loss parameter was set to 10%. Thus, at most 10% of the variance in the training data was lost in the normalization.

Prediction models

Three types of prediction models were used to assess the performance of the normalization methods in the cross-study analysis. The prediction models were; random forest, kNN and SVM with linear kernel. The prediction models were implemented using the R-packages randomForest, class and e1071 [18, 22, 26]. The prediction models were selected to include both linear and non-linear classifiers. For every training set, the prediction models were tuned to maximize the MCC using repeated cross-validation with 3 repeats and 10 folds. For the respective prediction model, the following parameters and parameter values were tuned: kNN Number of nearest neighbors: 1, 2, 3, 4, 5, 6, 7, 8, 9 Random Forest Mtry: 5, 7, 9, 10, 11, 13, 15, 17 Ntree: 500, 1000, 1500, 2000, 2500, 3000 SVM Cost: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50

Results

To assess the performance of BARA and to compare it to existing normalization methods, cross-study predictions were examined. Because the acquired datasets originated from separate studies, the biological annotation used for classification was sex/gender. Fig 1 shows a PCA plot of the 25 datasets, which shows clear signs of batch effects.

Fig 1

PCA plot of the datasets.

The figure shows the first two principal components after merging the acquired datasets. The samples were colored by the datasets from where they originated.

PCA plot of the datasets.

The figure shows the first two principal components after merging the acquired datasets. The samples were colored by the datasets from where they originated. To assess the different normalization methods, each dataset was iteratively used as training set to define a prediction model. Batch effects between the training sets and the test sets were adjusted with the examined normalization methods. Normalized test sets were classified with the trained prediction models and MCCs were calculated to estimate the prediction performances. The resulting MCCs for the normalized data and for the unnormalized data on the 3 different prediction models are shown in Figs 2–4. Further, the mean performance for each normalization method and prediction model can be seen in Table 2. Figs 2–4 show that data normalized with BARA seems to generate consistent performances independent of the examined prediction model. Further, the estimated MCCs are high with low variance. In fact, considering the calculated performance scores in Table 2, BARA achieves the highest mean MCC compared to the other examined normalization methods. Lastly, BARA also shows an improved prediction performance compared to the unnormalized data.

Fig 2

Prediction performance, kNN.

The plot shows the predictive performances for the different methods when normalized data were classified with kNN models. The boxes represent the 25 MCCs obtained in the iterative exercise where each dataset was used as training set to classify the remaining datasets.

Fig 4

Prediction performance, SVM.

Table 2

Prediction performances.

Normalization Method	kNN	Random Forest	SVM
BARA	0.88 ± 0.20	0.80 ± 0.31	0.78 ± 0.28
ComBat	0.80 ± 0.30	0.60 ± 0.37	0.57 ± 0.36
FAbatch	0.65 ± 0.24	0.55 ± 0.30	0.54 ± 0.27
fSVA Exact	0.82 ± 0.27	0.74 ± 0.35	0.39 ± 0.29
Mean Centered	0.82 ± 0.26	0.63 ± 0.34	0.60 ± 0.31
None	0.81 ± 0.30	0.76 ± 0.34	0.43 ± 0.29
Ratio A	0.04 ± 0.07	0.60 ± 0.39	0.20 ± 0.10
Reference Centered	0.86 ± 0.21	0.70 ± 0.32	0.51 ± 0.31
Reference Ratio A	0.20 ± 0.26	0.71 ± 0.34	0.37 ± 0.16
Standardized	0.60 ± 0.30	0.66 ± 0.36	0.59 ± 0.35

Prediction performances for each normalization and prediction model. Each performance is given as the mean MCC ± the standard deviation.

Prediction performance, kNN.

Prediction performance, random forest.

The plot shows the predictive performances for the different methods when normalized data were classified with random forest models. The boxes represent the 25 MCCs obtained in the iterative exercise where each dataset was used as training set to classify the remaining datasets.

Prediction performance, SVM.

The plot shows the predictive performances for the different methods when normalized data were classified with SVMs. The boxes represent the 25 MCCs obtained in the iterative exercise where each dataset was used as training set to classify the remaining datasets. Prediction performances for each normalization and prediction model. Each performance is given as the mean MCC ± the standard deviation. Because all information used to adjust for batch effects in the BARA method is estimated from the reference samples, the performance of the method as the number of reference samples was varied was examined. The number of reference samples was varied from 1 sample to 6 samples, and the cross-study prediction approach described above was repeated. Again, the performances were summarized by calculating the mean MCC for each run. The performance for each repeat can be seen in Fig 5. The plot suggests that the performance of BARA in the examined datasets is robust to the number of reference samples utilized. However, it is also evident that the run with a single reference sample achieve the lowest performance metric.

Fig 5

Performance of BARA as the number of reference samples is varied.

The points represent the mean MCCs as each of the 25 datasets was iteratively used as training set to classify the remaining samples. The bars represent the standard deviation of the MCC.

Performance of BARA as the number of reference samples is varied.

The points represent the mean MCCs as each of the 25 datasets was iteratively used as training set to classify the remaining samples. The bars represent the standard deviation of the MCC.

Discussion

Batch effects are a widespread problem that exist in most biological data acquisition platforms and hinder the development and implementation of promising biomarker signatures [1-5]. Batch effects are especially difficult to account for in predictive modelling where the biological factors in the test set are completely confounded with batch. In this paper, we have introduced BARA, a normalization method for the adjustment of batch effects in predictive modelling, and compared it to already existing methods. The aim of normalization methods applied in settings of predictive modelling is to make prediction models and inferred information transferable to test sets affected by batch effects. Common strategies suggest using reference samples to estimate the perturbation induced by batch effects [9]. However, including multiple additional samples in each batch of test samples can be both time consuming and add analytical costs. Therefore, normalization methods should ideally require none or few reference samples. Only a few reference samples are required by BARA; three were used in the comparison described above and BARA was also found to achieve robust performance metrics when only a single reference sample was used. However, the optimal number of reference samples could generally be assumed to be based on a trade-off between costs and accuracy. Because the reference samples are used to estimate mean adjustments for every batch in a compressed data space, additional reference samples should provide more certain estimates of the mean values. This was also observed when the number of reference samples used by BARA was varied, see Fig 5. Because mean values are estimated, it is also expected that the ideal number of reference samples will not be the same for different data acquisition techniques and datasets but will depend on, for example, the variation between replicates. Further, even though random assignment of reference samples does not represent an ideal selection strategy, the performance of BARA could be considered stable as indicated by the low variance in the performance metrics, see Figs 2–4. In an ideal scenario, the reference samples should represent a standardized sample where the major difference compared to other reference samples are batch effects. This could lead to better estimates with a lower number of required reference samples. BARA estimates the batch adjustments in a compressed data space spanned by the training set. The compression is calculated with SVD and is thus a linear combination of the original variables along the directions of maximum variance. SVD was chosen because we hypothesized that it would be suitable for many datasets associated with predictive modelling where biomarker signatures have been identified to optimize prediction performances, which suggests that low-dimensional representations of the data exist that captures large fractions of the important variance. Further, SVD is a well-known operation that offers a convenient way of compressing data, performing batch adjustment and reconstructing the original variables. Because SVD is used in BARA, multiple levels of compression can be selected for a dataset depending on the number of directions that are retained. In the cross-study analyses described above, a maximum of 10% of the variance in the training set was considered an acceptable loss, and the smallest number of dimensions satisfying this condition was selected in the normalization steps. However, other approaches for selecting an optimal number of dimensions to retain in a specific dataset could be pursued. For example, cross-validation performances could be compared as the number of dimensions retained are varied, or an external validation set affected by batch effects could aid the selection by better mimicking an actual prediction scenario. The ability of BARA to restore the predictive performances in datasets suffering from batch effects was assessed by cross-study predictions, and the obtained performances were compared to those obtained using existing normalization methods or no normalization. Due to the difficulty in procuring public datasets containing training sets suitable for predictive modelling where external test sets affected by batch effects exist, a collection of datasets previously compiled [13] were used, where sex/gender was used as classification label. Successively, each of the 25 datasets in the collection were designated as a training set. From the training set, the 500 most significant genes after comparing the gene expression of females to males were identified, and prediction models were defined using kNN, random forest and SVM. Batch effects between the training sets and the test sets were removed with the examined normalization methods, and the samples in the test set were classified. Figs 2–4 shows the predictive performances after applying the examined normalization methods. The figures show that BARA produces high and consistent performances for the examined datasets independent of prediction model. The results further suggest, that BARA outperforms the other examined normalization methods on the examined datasets, by reaching the highest mean MCC values and consistently show low variation in performance. Further, the performance is also improved compared to using no normalization, which indicates that the BARA algorithm mitigates some of the negative effects caused by batch effects in the studied datasets. It is also worth noting that genes associated with sex/gender are strong predictors, and the performance of the unnormalized data was often higher than those obtained by some of the normalization methods. In fact, BARA was the only normalization method that consistently resulted in improved mean prediction performance compared to the unnormalized data in all three prediction models. In conclusion, we have introduced a novel method to adjust for batch effects in predictive modelling and compared it to already existing methods. We show that BARA improves the prediction performances in the examined datasets compared to applying no normalization. Further, the BARA-normalized datasets achieved higher or comparable prediction performances compared to datasets normalized with the other examined methods. These results suggest that BARA can be considered a useful tool to reduce the negative impact of batch effect in predictive modelling.

45 in total

1. Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors: W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal: Biostatistics Date: 2006-04-21 Impact factor: 5.899

2. Molecular profiling of contact dermatitis skin identifies allergen-dependent differences in immune response.

Authors: Nikhil Dhingra; Avner Shemer; Joel Correa da Rosa; Mariya Rozenblit; Judilyn Fuentes-Duculan; Julia K Gittler; Robert Finney; Tali Czarnowicki; Xiuzhong Zheng; Hui Xu; Yeriel D Estrada; Irma Cardinale; Mayte Suárez-Fariñas; James G Krueger; Emma Guttman-Yassky
Journal: J Allergy Clin Immunol Date: 2014-04-25 Impact factor: 10.793

3. NFATc1 as a therapeutic target in FLT3-ITD-positive AML.

Authors: S K Metzelder; C Michel; M von Bonin; M Rehberger; E Hessmann; S Inselmann; M Solovey; Y Wang; K Sohlbach; C Brendel; T Stiewe; J Charles; A Ten Haaf; V Ellenrieder; A Neubauer; S Gattenlöhner; M Bornhäuser; A Burchert
Journal: Leukemia Date: 2015-04-14 Impact factor: 11.528

4. Airway basal cells of healthy smokers express an embryonic stem cell signature relevant to lung cancer.

Authors: Renat Shaykhiev; Rui Wang; Rachel K Zwick; Neil R Hackett; Roland Leung; Malcolm A S Moore; Camelia S Sima; Ion Wa Chao; Robert J Downey; Yael Strulovici-Barel; Jacqueline Salit; Ronald G Crystal
Journal: Stem Cells Date: 2013-09 Impact factor: 6.277

5. SELDI-TOF MS whole serum proteomic profiling with IMAC surface does not reliably detect prostate cancer.

Authors: Dale McLerran; William E Grizzle; Ziding Feng; Ian M Thompson; William L Bigbee; Lisa H Cazares; Daniel W Chan; Jackie Dahlgren; Jose Diaz; Jacob Kagan; Daniel W Lin; Gunjan Malik; Denise Oelschlager; Alan Partin; Timothy W Randolph; Lori Sokoll; Shiv Srivastava; Sudhir Srivastava; Mark Thornquist; Dean Troyer; George L Wright; Zhen Zhang; Liu Zhu; O John Semmes
Journal: Clin Chem Date: 2007-11-16 Impact factor: 8.327

6. Key differences identified between actinic keratosis and cutaneous squamous cell carcinoma by transcriptome profiling.

Authors: S R Lambert; N Mladkova; A Gulati; R Hamoudi; K Purdie; R Cerio; I Leigh; C Proby; C A Harwood
Journal: Br J Cancer Date: 2013-12-12 Impact factor: 7.640

7. Batch effects and the effective design of single-cell gene expression studies.

Authors: Po-Yuan Tung; John D Blischak; Chiaowen Joyce Hsiao; David A Knowles; Jonathan E Burnett; Jonathan K Pritchard; Yoav Gilad
Journal: Sci Rep Date: 2017-01-03 Impact factor: 4.379

8. Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors: Jeffrey T Leek; John D Storey
Journal: PLoS Genet Date: 2007-08-01 Impact factor: 5.917

9. A molecular signature for the prediction of recurrence in colorectal cancer.

Authors: Lisha Wang; Xiaohan Shen; Zhimin Wang; Xiuying Xiao; Ping Wei; Qifeng Wang; Fei Ren; Yiqin Wang; Zebing Liu; Weiqi Sheng; Wei Huang; Xiaoyan Zhou; Xiang Du
Journal: Mol Cancer Date: 2015-02-03 Impact factor: 27.401

10. Exome sequencing identifies somatic gain-of-function PPM1D mutations in brainstem gliomas.

Authors: Liwei Zhang; Lee H Chen; Hong Wan; Rui Yang; Zhaohui Wang; Jie Feng; Shaohua Yang; Siân Jones; Sizhen Wang; Weixin Zhou; Huishan Zhu; Patrick J Killela; Junting Zhang; Zhen Wu; Guilin Li; Shuyu Hao; Yu Wang; Joseph B Webb; Henry S Friedman; Allan H Friedman; Roger E McLendon; Yiping He; Zachary J Reitman; Darell D Bigner; Hai Yan
Journal: Nat Genet Date: 2014-06-01 Impact factor: 38.330

1 in total

1. Methods for predicting vaccine immunogenicity and reactogenicity.

Authors: Patrícia Gonzalez-Dias; Eva K Lee; Sara Sorgi; Diógenes S de Lima; Alysson H Urbanski; Eduardo Lv Silveira; Helder I Nakaya
Journal: Hum Vaccin Immunother Date: 2019-12-23 Impact factor: 3.452

1 in total