Literature DB >> 26637205

Rare variants analysis using penalization methods for whole genome sequence data.

Akram Yazdani¹, Azam Yazdani², Eric Boerwinkle^3,4.

Abstract

BACKGROUND: Availability of affordable and accessible whole genome sequencing for biomedical applications poses a number of statistical challenges and opportunities, particularly related to the analysis of rare variants and sparseness of the data. Although efforts have been devoted to address these challenges, the performance of statistical methods for rare variants analysis still needs further consideration. RESULT: We introduce a new approach that applies restricted principal component analysis with convex penalization and then selects the best predictors of a phenotype by a concave penalized regression model, while estimating the impact of each genomic region on the phenotype. Using simulated data, we show that the proposed method maintains good power for association testing while keeping the false discovery rate low under a verity of genetic architectures. Illustrative data analyses reveal encouraging result of this method in comparison with other commonly applied methods for rare variants analysis.
CONCLUSION: By taking into account linkage disequilibrium and sparseness of the data, the proposed method improves power and controls the false discovery rate compared to other commonly applied methods for rare variant analyses.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 26637205 PMCID： PMC4670502 DOI： 10.1186/s12859-015-0825-4

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

Despite success in detecting associations of common variants with complex traits (www.genome.gov/gwastudies/), it has proven difficult to elucidate a comprehensive picture of the genetic architecture of risk factor and disease traits without considering the effects of both rare and common variants via whole exome or genome sequencing. Decreasing costs and increasing quality have made discovery and genotyping of rare variants, which refer to variants with minor allele frequency (MAF) less than 0.05, more accessible across a large proportion of the genome and in large sample sizes. As a result of rapid expansion of human populations, there are very large numbers of rare variants segregating and these rare variants are relatively recent in origin [1, 2]. Detecting genotype-phenotype associations and identifying novel loci having rare variant-phenotype associations are challenging since single-variant based statistical methods are inappropriate in this context due to the very large number of alleles and their low frequency. Furthermore, no or minimal effects of the majority of rare variants on a particular phenotype leads to a low signal-to-noise ratio and consequently underfitting with multiple-variant models. Hence, there is considerable interest in statistical methods that combine information across multiple variants, and thus reduce the cost of the large degrees of freedom in multivariate tests or adjustment for extensive multiple testing [3-9]. However simply combining information by pooling or collapsing does not take into account the direction of the variants’ effects on a phenotype and alternative methods have been proposed that address this limitation (see, e.g. [10-17]). Furthermore, inclusion of large numbers of correlated variants may lead to overestimation. Transitioning from common variant analyses to rare variant analyses creates three challenges related to sparse data [18]. First, within an individual personal genome, the number of sites that differ from the reference genome is small relative to the total number of bases. Second, sequence data, unlike array-based genotype data, contain a large number of rare variants. In fact, about half of the variant alleles in a study sample are seen only one or two times [19]. And third, only a small subset of the variable sites is expected to influence a given trait of interest, and the rest is expected to be neutral. This study presents a statistical and computational method tailored for sparse data and how it can be applied to whole genome sequence data to promote novel gene and rare variant discovery. We introduce a new method called Convex-Concave Rare variant Selection (CCRS), which includes both convex and concave penalization. We leverage the fact that rare variants data have low intrinsic dimensionality and are sparse. Hence, we project the variants into a full rank space with new coordinates in order to enhance information in new predictors comparing with original variants. We obtain these new coordinates using principal component analysis that includes a convex penalty to incorporate sparsity assumption. The CCRS improves the performance of sparse principal component (SPC) based method [20] in the context of rare variants analysis by selecting the components based on their degree of association with a complex trait which is appropriate for rare variant analysis. To this end, we use a concave penalized regression model to select the most promising variants while estimating their effect simultaneously.

Method

The CCRS method is applicable for all variants, but in this presentation, we focus on the analysis of rare variants because they pose special opportunities (i.e. large effects sizes) and challenges (i.e. sparsity). Assume we have detected and genotyped m rare variants X=(x 1,…,x ) in a sample of n individuals having a quantitative trait y=(y 1,…,y ) measured on each individual. In a typical whole exome or genome sequencing scenario, m is several orders of magnitude larger than n. To combat over-determination, the typical analysis considers a subset of the variables at a time defined by physical proximity (e.g. a window) instead of functional characteristic (e.g. an annotated gene or enhancer element) because the vast majority of the rare variants are in noncoding regions in the genome. Interpretation of the results requires adjusting for multiple comparisons using accepted experiment-wise error or false discovery rate methods. Assume for the kth subset of X denoted by where p where =(ε 1,…ε ) is an error vector, Σ is an n×n diagonal matrix; α is the overall mean; T is an n×q covariate matrix, which includes non-genetic predictors such as age, sex and race; and are p-vector of genetic effects and q-vector of non-genetic effects, respectively. Although model (1) does not face the n≪p problem, the data lie in a lower-dimensional subspace due to dependency among rare variants [21, 22] (i.e. linkage disequilibrium, LD) and coefficients are sparse because a large proportion of variants have small or no effects on the phenotype(s) of interest. Here, we introduce a new approach for rare variants analysis to address these two issues; LD and sparsity.

The CCRS approach

In rare variant analyses, the design matrix is more likely to be singular because of the LD structure in the population [21, 22]. In addition due to low allele frequencies, there is little information about the association of each individual variant with a phenotype. Hence, applying a penalized regression model might not lead to identifying the true set of variants or genomic regions with nonzero effects. To bypass this difficulty, we project the genotype data into full rank space in order to reparameterize the regression model. Principal component analysis (PCA) is an appropriate tool for addressing collinearity and utilizes the low rank structure of the covariance matrix. One drawback of PCA is its lack of straight forward interpretability. However, in rare variant analyses each single variant is uninformative and there is a need to aggregate information in a region in order to identify association with the trait of interest. An issue of concern when applying PCA in the context of rare variants is that PCA may lead to new coordinates that include many non-influential variants due to sparseness. Accounting for such sparsity facilities identification of phenotype-influencing factors in each of the coordinates and also improves interpretability of the result because of the sparse loading matrix [23-25]. To accomplish this, we obtain a full rank approximation to the matrix X as by imposing constraints on the columns of V and U similar to [20], where r is the rank of X, which is smaller than min(n,p); ∥.∥ denotes L norm; and is a diagonal matrix of eigenvalues of the matrix X such that d 1≥d 2≥…≥d ; v and u are the jth columns of V and U respectively. The L 1 norm penalization is equivalent to , where v is ijth entry of V, provides sparse principal components, U D. This is an optimization problem equivalent to maximizing respect to u and v under constraint (2) and (3). This biconvex problem can be readily solved [26]. Therefore, we first fix u and obtain v when c is in the set of feasible solution . We then obtain optimum solution of u when and for j>1, u ⊥u 1,u 2,…,u . The optimal value of c, which determines the level of sparsity, can be obtained through a cross validation approach [27, 28]. By projecting data into a lower dimensional space, we reduce the number of predictors in the model to the rank of the design matrix, which increases the degree of freedom for hypothesis test and aggregates information into fewer predictor variables which helps alleviate one aspect of the low allele frequency challenge. These two features improve the power of identifying promising genetic regions influencing a phenotype of interest (see below). In this context, it is not appropriate to select only the first few principal components as is usual in many applications, but rather we select the PCs based on their degree of association with the phenotype. To simultaneously measure the genotype-phenotype association and carry out variable selection, we consider a linear regression model including a concave penalization with loss function where Z=U D indicates a matrix of computed PCs with corresponding effect size , κ∈(0,1) and regularization parameter . Without loss of generality, hereafter, we assume the overall mean is zero. This model is a form of Bridge regression and naturally yields sparse estimate for , in the sense that some of components of (, may be explicitly shrunk to zero [29, 30]. The choice of κ<1 leads to nonconcave minimization problems (see, e.g., [30-32]) and provides a much sparser solution than the well-known penalized regression, lasso, with κ=1 [33].

Result

A simulation study

To evaluate the performance of the CCRS method, we randomly identified 1000 regions from a real whole genome sequence data set available from phs000668 study in dbGAP (http://www.ncbi.nlm.nih.gov). Each region includes 50 variants (50,000 rare variants total) sequenced for 1456 individuals. Based on our experience, we have found that 50 variants are appropriate to capture the LD structure. As an example, Fig. 1 represents this LD structure for two regions of the genome.

Fig. 1

LD among variants with MAF <.05 in two different regions of the genome

LD among variants with MAF <.05 in two different regions of the genome We considered six different phenotypic effect scenarios (Table 1). We first randomly split the set of regions into two subsets to be influential regions and noninfluential regions. We then randomly selected 10 % of variants in each influential region to be causal variants with effect size +1 for Model-1 and Model-3 and with effect size ±1 for Model-2 and Model-4. In Model-5 and Model-6, the number of causal variants in a region is increased to 20 % of the total variants with different effect sizes randomly selected from U(0.5,1) and {U(−1,−0.5),U(0.5,1)}, respectively, where U denotes uniform distribution. Hence, we considered models with the same and also different effect directions.

Table 1

Six genotype effect scenarios considered in simulation studies

Model-1:	10 % of variants in influential regions are causal with effect size +1, while each one is correlated with some neutral variants in their region.
Model-2:	10 % of variants in influential regions are causal with effect size ±1, while each one is correlated with some neutral variants in their region.
Model-3:	10 % of variants in influential regions are causal with effect size +1, while they are uncorrelated with other variants in their region.
Model-4:	10 % of variants in influential regions are causal with effect size ±1, while they are uncorrelated with other variants in their region.
Model-5:	20 % of variants in influential regions are causal with effect size selected from U(0.5,1), while 10 % are correlated and 10 % are uncorrelated with other causal and neutral variants in their region.
Model-6:	20 % of variants in influential regions are causal with effect size selected from U(−1,−0.5) and U(0.5,1) while 10 % are correlated and 10 % are uncorrelated with other causal and neutral variants in their region.

Six genotype effect scenarios considered in simulation studies To obtain a better understanding about the effect of LD on the result of the analysis, we selected variants based on their correlations. In Model-1 and Model-2, the causal variants are correlated with some neutral variants in their regions but in Model-3, Model-4 they are uncorrelated. For Model-5 and Model-6, both correlated and uncorrelated variants are selected (10 % of each). In rare variants analysis, we are interested in identifying regions with significant effects on the phenotype corresponding to the following set of hypotheses for each region To test these hypotheses, we calculated the likelihood ratio of the selected model based on CCRS to the Null model, which does not include genotype variants in the model. We evaluated the performance of the CCRS method compared to four other commonly applied methods: Collapsing [8] denoted here as Col, CAST [3], SKAT-O [17] and sparse principal regression (SPC) [20]. The collapsing method generates a binary variable for each region to represent whether the minor allele is observed. It then tests the association between the traits level and the new binary variable through regression model. The CAST method sums over all variants in the region and leads to . SKAT-O is a score based test, when in (1) follows an arbitrary distribution with mean 0 and variance τ and pairwise correlation ρ between different β s. Here, is the predicted mean of y under H 0, is an n×n kernel matrix, R =(1−ρ)I+ρ 1 1 where I is an p×p compound symmetric matrix, and 1 =(1,…,1). To examine the impact of significance level on the false and true discovery rates, we considered both α=0.01 and 0.05 and calculated false discovery rate (FDR) and true positive discovery rate (TPR) defined as where F is the number of false positives; T is the number of true discoveries; R is the total number of significant regions; and M is the total number of regions. To select the best model based on the CCRS method, we set ν = 0.01 and κ = 0.5 after calculating BIC of the model over for different values of ν in {0.001,0.005,0.01,0.02,0.05}. Here, BIC of the model is defined as where d(ν) is the number of effective parameters, and minimize (2.4) with a given ν [34]. A larger penalty parameter ν might be applied for problems with larger number of variants in each region. The results of the simulation study for Model-1 and Model-2 are shown in Fig. 2. The Col method does not have sufficient power to detect the associated regions. The CAST method shows better performance at the level α=0.05, when the direction of effects are the same. At the level α=0.05, the CCRS method shows better performance than SKAT-O when the direction of effects are different. At the level α=0.01, the CCRS and SKAT-O show nearly the same performance. It is clear from the figure that the CCRS method improves performance over the SPC method.

Fig. 2

FDR vs. TPR of Model-1 in left panel, and Model-2 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O

FDR vs. TPR of Model-1 in left panel, and Model-2 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O Figure 3 shows the result of simulation analysis for Model-3 and Model-4. The CAST method for Model-3 and the Col method for both Model-3 and Model-4 show poor performance. In both models, the CCRS shows noticeably better performance in both α levels. The influential regions in Model-1 through Model-4 have the same effect sizes on the phenotype. Hence, comparing Figs. 2 and 3 provides insight into understanding the influence of LD between causal variants and neutral variants on the power and accuracy of selection. The FDR of the Col and CAST methods shows the largest differences between these two figures. The FDR of CCRS is robust to the correlation among causal and neutral variants in comparison to the other methods.

Fig. 3

FDR vs. TPR of Model-3 in left panel, and Model-4 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O

FDR vs. TPR of Model-3 in left panel, and Model-4 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O Figure 4 shows the result of analysis of Model-5 and Model-6 which include both correlated and uncorrelated effective variants. SKAT-O shows smaller FDR at the level α=0.05 in the left panel and slightly smaller TPR than CCRS, although at level α=0.01 CCRS shows better performance in terms of FDR and TPR. When the directions of effects are different (Model-6), right panel, CCRS outperforms the other methods.

Fig. 4

FDR vs. TPR of Model-5 in left panel, and Model-6 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O

FDR vs. TPR of Model-5 in left panel, and Model-6 in right panel, for α=0.01(∙) and α=0.05(■), compare the CCRS performance with commonly applied methods, SPC, Col, CAST, SKAT-O The result of this simulation study shows that the CCRS performs better and more robust than other methods under a variety of genetic architectures, and it is much more prominent when the causal variants are not correlated with neutral variants in the region. Neglecting the presence of LD leads to overestimation of the overall effect of the regions. Although this overestimation might increase the power of detecting a region with some small effects that are correlated with some neutral variants, it increases the risk of missing more promising regions in procedure of multiple comparison of hypotheses testing.

Real data analysis

We analyzed sequencing data from the Atherosclerosis Risk in Communities (ARIC) study [35]. The data are described more fully in [19]. Briefly, 496 African-American individuals were whole genome sequenced at an average depth of 6.3-fold using an Illumine HiSeq 2000 and, after alignment, approximately 31 million high quality variants were called using SNPTools. We present here the result of an association analysis of rare and low frequency variants (MAF≤0.05) with log transformed Apolipoprotein A1 levels (ApoA1). ApoA1 is a component of high density lipoprotein (HDL), which is associated with reduced risk of coronary heart disease [36, 37]. The protein promotes lipid efflux, including cholesterol, from tissues to the liver for excretion [38]. The genotype data includes 949,986 rare variants that are mostly in noncoding regions in the genome [19]. Therefore, we used a sliding window approach to define physical proximity (window). There are approximately 38 thousand consecutive windows each including 50 rare variants and stepping 25 variants until the next window. Therefore, by design, the windows overlap and the results of consecutive windows are not independent. To detect associated regions potentially influencing plasma ApoA1 levels, we used SPC, SKAT, SKAT-O, CAST and Col methods in addition to the CCRS method introduced here. To define the threshold for statistical significance taking into account multiple hypothesis testing, we ran 100,000 permutation test over 100 windows. Based on this threshold, 10−6, we detected one significantly associated region by the CCRS method. Figure 5 shows the p-values of 80 windows around this region. All of the approaches except CAST and Collapsing test have a peak in this region. The figure shows that the CCRS maintains power for detecting phenotype-influencing region while keeping the p-value of the null or neutral regions small. This is an important property of CCRS that controls the false discovery rate.

Fig. 5

p-values of 80 windows around the selected window. The red points represent the calculated p-value by CCRS for the windows and each curves is smooth curve over calculated p-values by one approach; yellow: CAST test, red: CCRS, green: SKAT, blue: SKAT-O, black: Col, orange: SPC The region contains the gene, FAM78B, which is expressed at high levels in myocytes, fibroblasts, endothelial cells. Little is known about the function of FAM78B. However, within the promoter for FAM78B, three binding sites for the transcription factor PPARG and two binding sites for the transcription factor HNF1A have been identified (http://www.sabiosciences.com and [39]). Pi et al. [40] have shown a significant effect of PPARG on HDL and ApoA1; the major protein component of HDL [41]. PPARs are also expressed in the cardiovascular system such as endothelial cells, vascular smooth muscle cells and monocytes/macrophages (see for e.g. [42]).

Discussion

We have introduced a new approach, CCRS, for the analysis of whole genome sequence data in order to identify regions of the genome (e.g. genes or other functional motifs) influencing a phenotype of interest. The CCRS improves the power of identifying a set of variants associated with a phenotype by taking into account the sparseness and LD structure in the data. The CCRS applies a concave penalized regression method after projecting the sequence variants in a full rank space that is more informative via sparse principal component analysis. By applying sparse PCA, the CCRS aims to enhance the information in the predictors instead of reducing dimension as typical application of sparse PCA, which might increase risk of missing important variants in rare variants analysis. While the first step of analysis (sparse PCA) is an unsupervised method, it does not increase the FDR of the method in the second step of the analysis. Although the CCRS method can be applied to both common and rare variants, the focus of this analysis was on rare variants because of the role of these variants on phenotype variation. The CCRS method also can be easily expanded to logistic regression and applied for case control studies. However, we investigated the CCRS performance for quantitative traits while the overwhelming majority of the literature focusing on case/control studies and there is a daunting need to develop methods for quantitative traits. Using simulated data, we show that the FDR of the CCRS method is smaller than other commonly applied genomic region-based test methods while it has higher power of identification in most of the situations. Furthermore, the FDR of CCRS is smaller and robust to the LD structure in the region in comparison to the other methods. While the statistical test for rare variants are typically region-based test, there is risk of overestimation of overall effect of regions by neglecting the LD between causal and neutral variants in the region. Consequently, the risk of missing promising regions might increase through multiple hypotheses testing. Penalized regression and other shrinkage methods that have been introduced for sparse data applications can correctly select nonzero coefficients under specific conditions [43, 44]. Applying these approaches to large-scale genome sequence applications that include correlated variants due to LD might not lead to a true set of selected variants with nonzero coefficients. Addressing this challenge is difficult in rare variant analyses because each individual variant by itself includes little information. To resolve this problem, the CCRS reparameterizes the model via PCA restricted with L 1 norm constraints to provide a full rank design matrix. Imposing L 1 penalization in PCA generates a sparse loading matrix that renders the analysis interpretable. The CCRS method efficiently incorporates information from low frequency variants by generating new predictors that are much more informative. The CCRS uses a concave penalized regression model to simultaneously select the most important PCs regarding their association with the phenotype of interest, but also to estimate their effect sizes. The zero effect sizes can be uniquely identified due to the use of full rank approximation of the design matrix. The advantage of the concave penalty term is that the rate of shrinkage gets smaller as the effect size increases. In other words, the CCRS not only has the property of parsimony, it also avoids shrinkage over large effect sizes. Thus, the CCRS maintains power for detecting phenotype-influencing regions while keeping the p-value of the neutral regions small. As an example real data application, we used the CCRS method and genome sequence data to analyze plasma ApoA1 levels, and one region met the experiment-wise criterion for statistical significance. The region contains the gene, FAM78B, which is expressed at high levels in myocytes, fibroblasts, endothelial cells (http://www.proteinatlas.org/ENSG00000188859-FAM78B/tissue). In a real application, annotation of the non-coding regions should be integrated into the analysis, and replication in an independent sample would be the next step to consider it as a novel discovery.

Conclusions

Large-scale whole genome sequencing and high-powered computing are becoming more readily available and affordable. There is an emerging shift from sequencing and computing technologies toward study design, data processing algorithms, and statistical and informatics methods for extracting usable information from the very large amount of genome sequence data that are imminent. The CCRS method presented here for the first time is a practical, powerful and efficient method for taking into account the nature of whole genome sequence variation to identify regions of the genome influencing common complex risk factor phenotypes and diseases.

29 in total

1. Clan genomics and the complex architecture of human disease.

Authors: James R Lupski; John W Belmont; Eric Boerwinkle; Richard A Gibbs
Journal: Cell Date: 2011-09-30 Impact factor: 41.582

2. Pooled association tests for rare variants in exon-resequencing studies.

Authors: Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev
Journal: Am J Hum Genet Date: 2010-05-13 Impact factor: 11.025

3. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes.

Authors: Matthew Zawistowski; Shyam Gopalakrishnan; Jun Ding; Yun Li; Sara Grimm; Sebastian Zöllner
Journal: Am J Hum Genet Date: 2010-11-12 Impact factor: 11.025

4. Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression.

Authors: Fang Han; Wei Pan
Journal: Genet Epidemiol Date: 2010-11 Impact factor: 2.135

5. Evolution and functional impact of rare coding variation from deep sequencing of human exomes.

Authors: Jacob A Tennessen; Abigail W Bigham; Timothy D O'Connor; Wenqing Fu; Eimear E Kenny; Simon Gravel; Sean McGee; Ron Do; Xiaoming Liu; Goo Jun; Hyun Min Kang; Daniel Jordan; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; Goncalo Abecasis; David Altshuler; Deborah A Nickerson; Eric Boerwinkle; Shamil Sunyaev; Carlos D Bustamante; Michael J Bamshad; Joshua M Akey
Journal: Science Date: 2012-05-17 Impact factor: 47.728

6. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST).

Authors: Stephan Morgenthaler; William G Thilly
Journal: Mutat Res Date: 2006-11-13 Impact factor: 2.433

7. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators.

Authors:
Journal: Am J Epidemiol Date: 1989-04 Impact factor: 4.897

8. A new testing strategy to identify rare variants with either risk or protective effect on disease.

Authors: Iuliana Ionita-Laza; Joseph D Buxbaum; Nan M Laird; Christoph Lange
Journal: PLoS Genet Date: 2011-02-03 Impact factor: 5.917

9. A groupwise association test for rare mutations using a weighted sum statistic.

Authors: Bo Eskerod Madsen; Sharon R Browning
Journal: PLoS Genet Date: 2009-02-13 Impact factor: 5.917

10. ENCODE whole-genome data in the UCSC Genome Browser.

Authors: Kate R Rosenbloom; Timothy R Dreszer; Michael Pheasant; Galt P Barber; Laurence R Meyer; Andy Pohl; Brian J Raney; Ting Wang; Angie S Hinrichs; Ann S Zweig; Pauline A Fujita; Katrina Learned; Brooke Rhead; Kayla E Smith; Robert M Kuhn; Donna Karolchik; David Haussler; W James Kent
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

5 in total

1. On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.

Authors: Heide Loehlein Fier; Dmitry Prokopenko; Julian Hecker; Michael H Cho; Edwin K Silverman; Scott T Weiss; Rudolph E Tanzi; Christoph Lange
Journal: Genet Epidemiol Date: 2017-03-20 Impact factor: 2.135

5. A Multi-Trait Approach Identified Genetic Variants Including a Rare Mutation in RGS3 with Impact on Abnormalities of Cardiac Structure/Function.

Authors: Akram Yazdani; Azam Yazdani; Raúl Méndez Giráldez; David Aguilar; Luca Sartore
Journal: Sci Rep Date: 2019-04-10 Impact factor: 4.379