Literature DB >> 25780333

A novel approach to identify candidate prognostic factors for hepatitis C treatment response integrating clinical and viral genetic data.

Alicia Amadoz¹, Fernando González-Candelas².

Abstract

The combined therapy of pegylated interferon (IFN) plus ribavirin (RBV) has been for a long time the standard treatment for patients infected with hepatitis C virus (HCV). In the case of genotype 1, only 38%-48% of patients have a positive response to the combined treatment. In previous studies, viral genetic information has been occasionally included as a predictor. Here, we consider viral genetic variation in addition to 11 clinical and 19 viral populations and evolutionary parameters to identify candidate baseline prognostic factors that could be involved in the treatment outcome. We obtained potential prognostic models for HCV subtypes la and lb in combination as well as separately. We also found that viral genetic information is relevant for the combined treatment assessment of patients, as the potential prognostic model of joint subtypes includes 9 viral-related variables out of 11. Our proposed methodology fully characterizes viral genetic information and finds a combination of positions that modulate inter-patient variability.

Entities: Chemical Disease Gene Species

Keywords: data integration; evolutionary genetics; genetic variability; hepatitis C virus; prognostic model; treatment response

Year: 2015 PMID： 25780333 PMCID： PMC4344356 DOI： 10.4137/EBO.S20853

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

Hepatitis C virus (HCV) causes a disease that affects the human liver in more than 185 million people worldwide, 170 millions of whom are estimated to be chronic patients with increasing risk of developing cirrhosis and cancer of the liver.1 The World Health Organization (WHO) estimates that between 3 and 4 million people are infected each year and about 70% of them will develop chronic hepatitis.2 HCV belongs to the genus Hepacivirus from the Flaviviridae family and includes seven genotypes and more than 60 subtypes identified.3 The distribution of HCV genotypes varies geographically,4 with genotype la being (the prototype genotype) common in the United States and Western Europe and genotype 1b being the widely distributed genotype worldwide. Despite several strategies against HCV being developed,5 the combined therapy of pegylated interferon (IFN) plus ribavirin (RBV) has been for a long time the standard treatment for patients infected with HCV. IFN treatment is effective in 39% of patients,6 but when combined with RBV, it is effective in more than 60% of patients.7 However, there are differences in the response to treatment among the viral genotypes. In the case of genotype 1, about 48% of patients have a positive response to the combined treatment,8 but in the case of genotypes 2 and 3, it is about 80%.7 Moreover, the cost of HCV treatment is high,9 it has numerous side effects, and it might not be appropriate for some patients.10 The selective pressure of drugs, the high replication rate of HCV, and its low replication fidelity are the main viral causes of treatment resistance. It is estimated that, on average, a nucleotide change is produced per replication cycle.11 The identification of specific mutations and genetic patterns responsible for clinical phenotypes would improve diagnosis and treatment of patients.12 Some studies have revealed that HCV’s genetic variability contributes to its escape from the patient’s immune response.13,14 HCV variability is not distributed evenly along its genome, and it affects differently to treatment in each genome region.15 It has been established that the greater the immune pressure in a region, the higher the genetic variability,16 and, therefore, most studies of treatment–variability relationship have focused on these genome regions. On the other hand, it has been suggested that it is the overall genome variability that influences treatment response.17

Treatment response

Different HCV treatment response types have been established depending on the number of weeks until HCV-RNA levels in serum or plasma are not detectable.7 A rapid virological response (RVR) appears at treatment week 4, an early virological response (EVR) appears at treatment week 12, an end-of-treatment response (ETR) appears at the end of 24 or 48 weeks of treatment, and a sustained virological response (SVR) appears at 24 weeks after cessation of treatment. It has been observed that the latter type of response depends mainly on the viral genotype.18 In the case of HCV genotype 1, patients treated during 48 weeks have an SVR rate of 38%–48%.8 On the other hand, a patient is considered a non-responder if HCV-RNA clearing from serum fails after 24 weeks of therapy, or relapse when HCV-RNA reappears in serum after therapy is discontinued. Personalized therapy of HCV infection is a common practice due to the diversity of disease progression.19 Identifying those patients that will respond or not to the treatment before starting it would increase therapeutic efficacy and reduce personal suffering. Viral, environmental, treatment, and host factors play important roles in the outcome of HCV infection and treatment response.20–22 Several studies on treatment–outcome prediction have taken into account factors measured before treatment.23–27 In general, these studies take into account variables that describe the patient from clinical (alanine transaminase levels, viral load in serum, kidney biopsy, etc) and demographic (age, sex, habits, etc) points of view and also include some viral variables such as genotype, core and interferon sensitivity determining region (ISDR) region substitutions, and some variability parameters of the E1E2 region. It has been shown that the most consistent factors with treatment outcome are the viral genotype and viral load.28

Phylogenetic predictors

A methodology to detect candidate genetic polymorphisms influencing clinical outcome from pathogen genomes29 uses well-supported clades in a phytogeny as statistical predictors. Differences between clades were not well defined at the viral subtype level in our dataset; therefore, the statistical power of the previous methodology was diluted. Here, we propose an alternative methodology that overcomes the lack of statistical support at the phylogenetic subtype level but considers major determinants of genetic variation in the viral genome to help in the prediction of treatment response.

Materials and Methods

Clinical and epidemiological data were retrieved together with viral sequences from the local epiPATH bioinformatics platform.30

Patients

Serum samples were obtained from 49 patients infected with HCV genotype la (17 patients, of which 11 had a positive response and 6 had a negative response) and genotype 1b (32 patients, of which 12 had a positive response and 20 had a negative response).31,32 In summary, our sample included 23 patients with a positive response and 26 patients with a negative response. All patients provided written consent to be included in the study, which was approved by the corresponding ethics committees of the institutions involved (Hospital General de Valencia, Hospital Clínico Universitario de Valencia, and Hospital General de Alicante). The research was conducted in accordance with the principles of the Declaration of Helsinki. Treatment response assessment was done by the institutions involved with the following criteria: Positive response: absence of HCV-RNA in serum or a >21og viral load decline relative to the basal viral load at week 12. Negative response: when there was no positive response or presence of HCV-RNA in serum after week 12. The following demographic, clinical, and treatment variables were included in this study (Table 1): age, sex, Knodell index,33 the ratio between Glutamic-oxaloacetic transaminase and Glutamic-pyruvate transaminase (GOT/GPT) serum levels, alanine transaminase (ALT) serum levels, treatment duration, completed treatment, number of treatment, IFN dose and RBV dose.

Table 1

Demographic, clinical, and treatment factors.

VARIABLE	VALUES	PATIENTS (N = 49)
Outcome	Positive (%)	23 (46.94%)
	Negative (%)	26 (53.06%)
Age	Years	43.61 ± 12.122 (23;73)
Sex	Male (%)	34 (69.4%)
	Female (%)	15 (30.6%)
Knodell index		8 ± 3.403 (1;17)
GOT/GPT		0.601 ± 0.323 (0.3;2.3)
ALT		122.67 ± 74.463 (24;361)
Treatment duration	months	11.27 ± 1.987 (6;12)
Completed treatment	Yes	(%) 43 (87.8%)
	No	(%) 6 (12.2%)
Number of treatment	1 (%)	28 (57.1%)
	2 (%)	20 (40.8%)
	3 (%)	1 (2%)
IFN dose	3 mU/3tpw (%)	40 (81.6%)
	5 mU/3tpw (%)	3 (6.1%)
	90 g/day (%)	1 (2%)
	100 g/day (%)	3 (6.1%)
	120 g/day (%)	2 (4.1%)
RBV dose	mg/day	1040.82 ± 122.336 (800;1200)

Notes: Age, Knodell index, GOT/GPT, and ALT were measured at baseline. Data are shown as mean ± standard deviation unless stated otherwise.

HCV sequences

Host’s immune pressure affects HCV variants but not equally through all the viral genome regions.32 Two of the most studied regions in relation to genetic variability and treatment response are the E1E2 and NS5A viral regions. Independently of the genotype, viral factors included in our study were calculated using a high number of partial sequences from both regions described elsewhere34 to capture the amount of variability of the viral quasispecies. Specifically, we used 100 sequences per patient of 472 nucleotides from the E1E2 region (nucleotides 1310–1781 of HCV genome reference sequence with accession number D50481), which includes three hypervariable regions (HVR-1, HVR-2, and HVR-3) but does not include the E2-PePHD region. Regarding the NS5A region, we used between 25 and 96 sequences per patient of 743 nucleotides (nucleotides 6742–7484 in the HCV genome reference sequence), which includes the ISDR, protein kinase R binding domain (PKR-BD), and variable region 3 (V3) regions that have been related to the combined treatment outcome.35 Therefore, viral genome variability from about 7,500 sequences were summarized in the viral factors. RNA extraction, reverse transcription, amplification, cloning, and sequencing, were explained in detail elsewhere.36 Briefly, after viral RNA extraction (High Pure Viral RNA Kit; Roche, Mannheim, Germany), reverse transcription reactions were performed with random hexadeoxynucleotides in order to prevent any bias during reactions due to unspecific oligonucleotides. Amplified DNA products for each region were purified with High Pure PCR product Purification Kit (Roche) and directly cloned into EcoRV-digested pBluescript II SK (+) phagemid (Stratagene, Heidelberg, Germany). Plasmid DNA was purified with High Pure Plasmid Isolation Kit (Roche). Cloned products for E1E2 region or NS5A region were sequenced using vector-based primers KS and SK (Stratagene). The following intra-patient viral variables per region were included in this study (Table 2): genotype, number of non-synonymous substitutions per non-synonymous site (dN), number of synonymous substitutions per synonymous site (dS), dN/dS ratio, total number of mutations (η), number of segregating sites (S), GC content (GC), haplotype diversity that accounts for the allele combinations of a genetic region (H), H2, H3, nucleotide diversity (π), π2, π3, mean number of nucleotide differences (K), number of sites under positive selection (pts), and the most relevant parameters of neutrality tests: Tajima’s D, Fu and Li’s D*, Fu and Li’s F*, and Fu’s Fs. Genotype information was retrieved from the local epiPATH bioinformatics platform; polymorphism parameters were calculated with DnaSP,37 and sites under positive selection were identified with CODEML.38

Table 2

Viral variables.

GENOTYPE	SUBTYPES	PATIENTS (N = 49)
1	1a (%)	17 (34.7%)
	1b (%)	32 (65.3%)
VARIABLE	E1E2	NS5A
dN	0.017 ± 0.013 (0;0.044)	0.005 ± 0.005 (0;0.022)
dS	0.041 ± 0.034 (0;0.122)	0.0379 ± 0.032 (0;0.128)
dN/dS	0.671 ± 1.238 (0;8.277)	0.147 ±0.103 (0;0.479)
η	77.224 ± 43.394 (2;153)	77.449 ± 53.659 (2;177)
S	70.939 ± 38.698 (2;134)	74.224 ± 50.569 (2;163)
GC	0.594 ± 0.011 (0.569;0.621)	0.603 ± 0.008 (0.588;0.621)
H	0.863 ± 0.23 (0.039;0.999)	0.833 ± 0.285 (0.066;1)
H²	0.797 ± 0.285 (0.002;0.998)	0.774 ± 0.332 (0.004;1)
H³	0.75 ± 0.315 (0;0.997)	0.736 ± 0.351 (3e⁻⁴;1)
π	0.022 ± 0.016 (0;0.058)	0.013 ± 0.011 (9e⁻⁵;0.04)
π²	0.001 ± 0.001 (0;0.003)	2.93e⁻⁴ ± 4.02e⁻⁴ (8.1e⁻⁹;0.002)
π³	3.01e⁻⁵ ± 4.45e⁻⁵ (0;2e⁻⁴)	7.95e⁻⁶ ± 1.51e⁻⁵ (0;7.33e⁻⁵)
k	10.458 ± 7.74 (0.039;27.33)	9.782 ± 8.183 (0.07;31.099)
D	−0.993 ± 0.985 (–2.58;0.89)	−1.418 ± 0.678 (−2.43;0.07)
D*	−1.908 ± 1.627 (–5.45;1.88)	−2.567 ± 1.363 (–5.24;0.27)
F*	−1.847 ± 1.507 (–5.21;1.76)	−2.546 ± 1.218 (−4.92;–0.17)
Fs	−46.341 ± 32.161 (–124.662;1.256)	−26.647 ± 19.965 (−85.549;1.662)
pts	3.388 ± 3.523 (0;13)	0.878 ± 1.409 (0;5)

Note: Data are shown as mean ± standard deviation (range) unless stated otherwise.

Sequence similarity

In addition to population and evolutionary viral parameters, we obtained a sequence similarity measure at the molecular level. We used a consensus viral protein sequence per region and patient (72 patients, including 49 whose clinical data were available) obtained before treatment. Then, we aligned and filtered the variable positions from the consensus sequences per genotype and viral regions. We applied a multiple correspondence analysis (MCA) per dataset with SPSS 13.0 statistical software39 considering each amino acid position as a variable and its amino acid type as its value in terms of 20 characters. Therefore, patients were grouped using the functional-related polymorphisms detected in the viral sequence. This analysis allowed us to reduce the viral sequence information into fewer variables while preserving its patient variability. We selected seven dimensions for E1E2 and 12 dimensions for NS5A following Cattell’s criteria.40 Dimensions were included as viral variables in the treatment– response modeling methodology.

Statistical methodology

The treatment response was modeled using a logistic regression with 49 patients and 66 variables, including viral, demographic, and clinical data. We applied two methodologies in obtaining the regression model: (a) using variable subgroups, and (b) using all variables together. In an epidemiological context, logistic regression coefficients are interpreted as the odds ratio (OR) logarithm, ie, the effect of a unit of change in its corresponding variable on having a positive treatment response. OR is an association measure between the treatment outcome and the variables included in the regression model. In the subgroup-based method (a), we created four groups of variables depending on which environment they were related to: patients, E1E2 region, NS5A region, and MCA dimensions. Then, a subgroup-based model was balanced among different treatment response factors. It was obtained following a generalized linear model (GLM) approach with a stepwise selection process in R using a logit transformation. First, we applied the following methodology to each subgroup of variables. We generated a minimum model without variables and a maximum model with all variables. Then, variables were added from the minimum model to the maximum model and evaluated with their Chi-squared value. We filtered variables with a Chi-squared significance level value ≤0.05 and generated a model 1. Next, a backward–forward stepwise selection method was applied to model 1, and Akaike information criterion (AIC)41 was used to evaluate models. Finally, subgroup-based models were joined in a new model, and interactions between variables were studied with the previous method. This methodology was applied to genotypes la and lb jointly (1a+1b) and separately. Once the subgroup-based models were obtained, we used them to predict the treatment outcome of three datasets: (1) the 49 patients used to obtain the models, (2) 8 new patients with complete clinical information, and (3) 10 new patients with some missing clinical factors which were estimated following four methods: the expectation-maximization algorithm (EM) implemented both in R and SPSS, mean substitution, and SPSS regression estimation. Predictions were made in R using the corresponding subgroup-based model, and prediction results were compared with the observed treatment outcome. In the all-variables method (b), we applied the LASSO methodology42 implemented in R also to genotypes la and lb jointly and separately. The LASSO method selects variables penalizing regression coefficients so that, if the coefficients are not greater than a given threshold, the corresponding variables are not included in the model. This method is also used to study how subgroup variables are affecting each other. We used minimum estimated lambda value and its standard deviation as thresholds.

Results

Subgroup-based models

The best model obtained for HCV subtypes la and lb combined (AIC = 24) included the following parameters: treatment duration and ALT levels from the patients’ variables subgroup; some parameters related to the NS5A region subgroup, including H and the number of sites under positive selection; some parameters related to the E1E2 region subgroup including dS, H3 and π2; some MCA dimensions related to both viral regions: the 11th dimension of NS5A region and the 7th dimension of E1E2 region; and interactions between NS5A 11th dimension and E1E2 7th dimension, and between E1E2 dS and E1E2 H3 (Table 3).

Table 3

Subgroup-based model of 1a+1b HCV genotypes.

PARAMETER	COEFFICIENT	STD. ERROR	X-STANDARDIZED COEFFICIENT	ODDS RATIO
(Intercept)	−3.19e³	3.66e⁵
Treatment duration	5.89e²	6.14e⁴	1,170.26	5.7e²⁵⁵
ALT	3.526	3.88e²	262.56	3.4e¹
NS5A 11th	−1.62e²	1.75e⁴	−2,310.48	5.9e⁻⁷¹
E1E2 7th	2.68e	4.68e³	382.37	4.2e¹¹
E1E2 dS	−1.43e⁵	1.46e⁷	−2,039,800.00	0
E1E2 H³	−9.2e²	1.09e⁵	–11,065.69	0
E1E2 pi²	1.94e⁵	2.33e⁷	2,770,577.00	inf
NS5A H	−3.01e³	3.08e⁵	−35,762.37	0
NS5A pts	8.57e	9.83e³	120.69	1.6e³⁷
NS5A 11th:E1E2 7th	3.14e²	3.29e⁴		1.4e¹³⁶
E1E2 dS:E1E2 H³	1.32e⁵	1.35e⁷		inf

Note: AIC = 24.

With respect to the best model obtained for subtype la separately (AIC = 8), the following parameters were included: H2 and GC content related to the E1E2 region subgroup, and S related to the NS5A region subgroup (Supplementary Table 1). On the other hand, the best model obtained for subtype lb separately (AIC = 14) included the following parameters: treatment duration included in the patients’ variables subgroup; some parameters related to the NS5A region subgroup including dS,η, K and dN; and the 11th dimension of NS5A region included in the MCA dimensions subgroup (Supplementary Table 2). The goodness of fit of the best models was quite good despite the low sample size (Table 4). All predictions were correct using patients included in the development of prediction models (test dataset), and about 66%–70% of predictions were correct using a new dataset of patients: 6 new patients for the best model of subtype la, 9 new patients for the best model of subtype lb, and 10 new patients for the joint subgroup-based model. The selection of new patients for each validation data-set was performed taking into account the parameters included in each model, in such a way that not all the patients from separate subtypes had information about parameters required in the combined subtypes model. Therefore, the number of new patients used in the validation dataset of the combined subtypes model was lower than that of the separate subtypes model.

Table 4

Goodness of fit from response predictions.

GENOTYPE	PATIENTS
GENOTYPE	INCLUDED	NOT INCLUDED
1a	17/17 (100%)	4/6 (66.66%)
1b	32/32 (100%)	6/9 (66.66%)
1a+1b	49/49 (100%)	7/10 (70%)

Notes: Included: patients used to obtain models; not included: patients not used to obtain models. Results shown as correct/total.

After obtaining the dimensional variables in the final models, we tried to find positions that modulate inter-patient variability. We found that, in the NS5A region, the 11th dimension retained more than 95% of the variability in 10/69 polymorphisms of subtype la and in 100/127 of those in subtype lb. With regard to the E1E2 region, the seventh dimension retained more than 95% of the total variability in 27/65 polymorphisms of subtype la and 44/93 of lb polymorphisms. A summary of the positions that contribute with more than 3% individually can be found in Tables 5 and 6. We did not find a clear relationship between the type of amino acid in each position and treatment response (results not shown). Nevertheless, when comparing the positions with the highest individual contribution to global variability with the substitution patterns reported by Enomoto et al.43,44, we found that IFN-sensitive amino acid substitutions in 386 and 388 codons of 1b E1E2 region had a greater variability contribution than IFN-resistant ones (Table 7).

Table 5

Contribution of amino acid positions to the total variability of NS5A region.

SUBTYPE	SUBREGION	POSITION	CONTRIBUTION
1a		2143	0.169
		2198	0.169
		2203	0.084
	ISDR/PKR-BD	2221	0.042
		2304	0.042
		2306	0.084
		2350	0.038
	V3	2368	0.032
	V3	2376	0.071
		2381	0.169
		2382	0.084
1b		2146	0.08
		2169	0.032
	ISDR/PKR-BD	2229	0.031
	ISDR/PKR-BD	2233	0.032
	PKR-BD	2264	0.079
	PKR-BD	2265	0.049
		2353	0.08
	V3	2376	0.043

Table 6

Contribution of amino acid positions to the total variability of E1E2 region.

SUBTYPE	SUBREGION	POSITION	CONTRIBUTION	POSITIVE SELECTION
1a	E1	357	0.031
	E2-HVR1	384	0.030	True
	E2-HVR1	393	0.070
	E2-HVR1	394	0.048	True
	E2-HVR1	397	0.030	True
	E2-HVR1	399	0.098	True
	E2-HVR1	401	0.048	True
	E2-HVR1	406	0.064
	E2-HVR1	407	0.070	True
	E2-HVR1	410	0.033	True
	E2	418	0.030
	E2	419	0.033
	E2-HVR3	436	0.030
	E2-HVR3	440	0.058
	E2-HVR3	444	0.037	True
	E2	452	0.041
1b	E2-HVR1	386	0.058	True
	E2-HVR1	388	0.054	True
	E2-HVR1	392	0.038	True
	E2-HVR1	393	0.034
	E2-HVR1	397	0.076	True
E2-HVR1	407	0.031	True
	E2-HVR2	478	0.062	True
	E2-HVR2	480	0.042	True

Notes: Selected positions that contribute individually with a >3% to the total genetic variation in the MCA analysis. Position is given as the corresponding amino acid position in the HCV reference sequence (D50481). Positive selection indicates whether the position has a significant positive selection or not.45

Table 7

Common substitution patterns with Enomoto et al.43,44

REGION	SUBREGION	POSITION	TOTAL	AMINO ACID	INDIVIDUAL
E1E2	E2-HVR1	386	0.058	T^sen	0.029
				D	0.024
				G	0.002
				N	1.55e⁻⁰³
				S	0.76e⁻⁰³
				E	0.55e⁻⁰³
				R^sen	8.02e⁻⁰⁵
				Q	4.01e⁻⁰⁵
				H^res*	1.13e⁻⁰⁶
				Y^res	–
	E2-HVR1	388	0.054	H	0.024
				R	0.024
				Y	0.004
				T^sen*	0.60e⁻⁰³
				N	0.52e⁻⁰³
				V	4.77e⁻⁰⁵
				ores	–
NS5A		2169	0.032	s	0.011
				A^sen/res*	0.005
				E	0.004
				H	0.82e⁻⁰³
				T^sen^/^res	–

Notes: Codon positions that contribute individually with a >3% to the total variability of the seventh dimension in the MCA analysis. Substitutions refer to HCV lb subtype. Selected positions were found in common with the substitution patterns reported by Enomoto et al.43,44 Position is given as the corresponding amino acid position in the HCV reference sequence (D50481). Total column indicates the total contribution of the position and Individual column indicates the individual contribution of the specific amino acid. sen indicates IFN-sensitive, res indicates IFN-resistant, and *indicates HCV-J as reported in Enomoto et al.43,44

All-variables models

The main disadvantage of the subgroup-based approximation is that the influence between the parameters of different subgroups is not evaluated. Therefore, we applied a secondary methodology to check the influence between all variables included in this study. We used minimum λ (min) and its standard deviation (1 se) as threshold coefficients, the former being the most conservative approach (Fig. 1). The prediction model obtained for HCV subtypes la and lb combined with GLMNET methodology includes parameters from different subgroups defined for the subgroup-based methodology (Table 8). Therefore, it can be concluded that there is no relevant influence among subgroups of variables.

Figure 1

Parameter estimates based on the LASSO method for the HCV subtypes 1a+1b combined.

Table 8

LASSO-based model of 1a+1b HCV genotypes.

1SE		MIN
PARAMETER	COEFFICIENT	PARAMETER	COEFFICIENT
(Intercept)	0.893	(Intercept)	0.715
NS5A H	−0.057	E1E2 Tajima’s D	0.001
NS5A H²	−0.003	NS5A S	−0.003
NS5A H³	−0.0004	NS5A H	−0.059
NS5A 11^th	−0.010	NS5A H²	−0.004
Treatment duration	0.029	NS5A H³	−0.0004
IFN dose	−0.111	NS5A 11^th	−0.019
Treatment number	0.412	Treatment duration	0.053
		IFN dose	−0.250
		Treatment number	0.872

Discussion

The aim of this study was to identify candidate baseline prognostic factors that could be involved in the response of patients infected with HCV subtypes la and lb to combined treatment with IFN and RBV. Although HCV drug therapies have experienced a recent change with the availability of new antiviral drugs,5 the time required to design and conduct treatment–response studies led us to outline a retrospective study with data previously generated in our group.31,32 We used treatment and patient variables along with more viral variables than similar previous studies.23–27 A new viral factor included was a multidimensional measure of sequences similarity that accounts for inter-patient viral variability. The hypothesis based on which this study was designed is that the integration of treatment response with viral sequences data would provide new insights into the interaction between different viral genomic regions and the treatment outcome, which eventually would improve our understanding of the viral evolution role towards patient therapy. In this sense, our proposed methodology could be applied in future studies, which include the recently developed drugs. Different models were obtained with two distinct methodologies (Table 3, Supplementary Tables 1 and 2 with subgroup-based methodology; and Table 8 with LASSO-based methodology). The main difference between the two methods is that the LASSO method studies how subgroup variables affect each other. Results from LASSO-based method demonstrate that variables included belong to different groups defined in subgroup-based methodology. Therefore, it can be concluded that there is no relevant influence between subgroups of variables. The subgroup-based method weights all subgroups equally and evaluates parameters thoroughly. Subgroup-based model for both viral genotypes is balanced regarding variable subgroups, because it includes variables from the different subgroups defined for this methodology (Table 3). It includes treatment duration and ALT levels as patient variables. The current recommended duration of treatment for genotype 1-infected patients is 48 weeks,7 although an extension to 72 weeks in patients without RVR has been proposed46 and a reduction to 24 weeks in those patients with RVR.47 Before treatment, ALT levels could represent the patient’s immune system activity, as they include the elimination of infected cells by natural killer (NK) cells and cytotoxic T lymphocytes (CTL).48,49 The fact that ALT levels appear in the model could indicate that the higher the immune activity, the more effective the treatment. In this respect, ALT levels have also been obtained in other models.24,27 Nevertheless, it has been shown that the SVR rate in patients with normal levels of ALT is equivalent to patients with higher ALT levels50 and there are other factors that could influence before-treatment ALT levels such as an imbalance of fatty acids and carbohydrates metabolism,51 alcohol abuse,52 and other drugs.53 Sequence similarity measures at the molecular level are obtained in our results. NS5A 11th and E1E2 7th dimensions account for specific amino acids in certain sequence positions. There has been extensive discussion about the relationship between treatment response and mutations in both regions. There is a correlation between treatment response and NS5A substitutions in Japanese patient cohorts,54,55 but it has not been reported in European56,57 or American patients.58 However, some European studies have found a correlation between the ISDR sequence of genotype lb and treatment response.59,60 After a long controversy, a correlation between NS5A’s ISDR region and treatment response has been reported in three different meta-analyses61–63 and in more recent studies.64,65 Moreover, substitutions in other NS5A regions seem to be related to treatment response,35,65,66 and recently an association between an SVR and combined mutations in NS5A and core regions has been shown.67 With regard to the genetic variation in the E1E2 region, some studies have found no relationship between the E2-PePHD region and treatment response because it is a highly conserved region and its variability levels do not differ between responder and non-responder patients,35,68 although some studies have found this relationship.69,70 Despite the fact that the E2-PePHD region was not included in our study, we found that some originally reported IFN-sensitive codon substitutions43 had greater contribution to the total variability than IFN-resistant in specific E2-HVR1 positions (Table 7). An interesting aspect of our results is that, in addition to including sequence similarity variables in the final la+lb HCV genotype subgroup-based model, we also obtained the interaction of both NS5A 11th and E1E2 7th dimensions. This result could indicate that a joint substitution profile of both regions might increase the chances of responding to treatment. In this respect, a correlation between the number of mutations in the E2 and ISDR regions has been found70; and another study obtained significant results for the correlation between nucleotide diversity of both E1E2 and NS5A regions.34 A possible explanation of these observations could be that variants in both viral regions interact in an epistatic way due to the functional and/or structural relationship between them, as it has been previously suggested with E2, NS2 y NS5A regions.71 Some variables related to the viral response to selection were included in the joint subgroup-based model for HCV subtypes 1a+1b. This is the case for the rate of synonymous substitutions (dS) in the E1E2 region and the number of positions under positive selection in the NS5A region. The possibility of positive selection affecting viral escape from the patient’s immune system has been previously studied45,72,73 but without any evidence of relationship with treatment response. The HCV E1E2 region includes some hypervariable regions that tend to accumulate amino acid changes during viral infection, and it is known that the main evolutionary force that affects this region is purifying selection due to the functional restrictions.74 In this sense, E1E2 dS obtained in our results would indicate the presence of purifying selection because the higher the dS, the lower the ω, causing a decrease in the probability of a treatment response. Regarding the NS5A region, a high number of positions under positive selection could cause a wrong PKR inhibition and reduce viral replication, which would lead to a higher probability of treatment response. However, conflicting results on the action of positive selection in the NS5A region of SVR patients have been reported.66,72 Intra-patient viral variability factors that are included in the final 1a+1b HCV genotypes subgroup-based model are E1E2 H3, E1E2 π2, and NS5A H. Viral genetic diversity before treatment is higher in nonresponsive patients than in responsive patients,17,34 and a higher haplotype diversity could promote the appearance of treatment-resistant variants that allow the escape from the immune system, as seen previously.71 On the other hand, there seems to exist a diversity of nucleotide positions among patients that would disallow the virus from entering the cell, new viral particle assembly, antibody neutralization, and/or their interaction with PKR, despite the fact that a high variability would decrease the patient’s immune response.75 Recently, it has been shown that a higher NS5A variability is correlated with a positive response to treatment.76 Though genetic diversity is not usually considered as a good SVR predictor,77 a possible explanation of its inclusion in our model is that we have transformed these variables and detected the amount of information related to the treatment outcome. Another interesting aspect of our results is that they include the interaction between E1E2 dS and E1E2 H3, which could indicate that a certain amount of purifying selection together with the presence of some E1E2 haplotypes would increase the probability of response to treatment. Models obtained for la and lb genotypes separately are completely different (Supplementary Tables 1 and 2). Variables included in genotype la results are related to viral polymorphisms, but variables obtained for genotype lb are related to treatment, viral polymorphisms, and sequence similarity measures. The latter variable appears also in the model for both viral genotypes but not in the la model. In this sense, a significant correlation between ISDR mutations and treatment response has been observed in lb but not in la patients.60 It is not always easy to provide a biological explanation when obtaining models under statistical criteria. We have provided a biological interpretation for every variable included in the final 1a+1b HCV genotype subgroup-based model; however, all variables should be considered together in the interpretation of our results. The methodologies that we used in this study include variables depending on their statistical significance; and because of high standard errors in the subgroup-based methodology, more tests would be necessary before applying these results to personalized therapy. Moreover, coefficients and OR values shown are the final result from the complete statistical methodology applied and should not be interpreted as their final individual significance. In this respect, we have used statistical methods for the identification of relevant candidate prognostic factors for HCV treatment response and not for the quantification of their individual effect in treatment response. Nevertheless, our results indicate that viral genetic information is essential for the IFN–RBV combined treatment assessment of patients. In this sense, identifying a profile of combined mutations along viral genome regions that modulate treatment outcome could help treatment management, reducing costs and side effects. In general, our methodology can be applied to identify different joint-substitution patterns that could arise comparing the new HCV therapies that are currently being developed, eg, different patients, drug combinations, different time points of the treatment, in order to assess the best therapy approach for each case. Moreover, it could also be applied in the study of different viruses as well as in co-infections.

Conclusions

Population and evolutionary parameters quantify genetic sequences in terms of variability and its functional effect. In a high replication rate and a low replication fidelity scenario such as RNA viruses, the action of selective pressures could modulate the treatment response. Therefore, the relevance and interest of studying viral populations under an evolutionary perspective has a direct application on therapy improvement. As far as we know, population and evolutionary parameters together with the complete sequence variability have not been used before to study HCV treatment response. In our study, we have found that these kinds of parameters are relevant prognostic factors, as they have been included in the best prognostic models obtained from different datasets. The best prognostic model of la and lb joint subtypes includes 9 out of 11 variables related to population and evolutionary parameters. We have discussed the interpretation of each of them separately, and we have also given some insights of the applicability of our proposed new measurement to related studies. The integration of clinical and viral genetic data is an important issue for the evaluation of different factors related to HCV treatment response. In this study, a new viral factor that accounts for inter-patient viral variability was suggested. One of the advantages that we have found by using the sequence similarity measure is that we were able to reduce the number of parameters for regression analyses. Moreover, it preserved patient variability in terms of complete viral sequences and could be integrated with clinical data. These characteristics make our multidimensional measure of sequence similarity useful for the identification of joint substitutions profiles that might modulate the chances of responding to treatment. Our proposed new methodology could be applied in related studies to identify viral positions involved in treatment response and also in a comparison of new therapies at different time points to study the evolution of viral joint-mutation profiles regarding treatment outcome. Supplementary Table 1. Subgroup-based model of la HCV genotype. Supplementary Table 2. Subgroup-based model of lb HCV genotype.

71 in total

1. Viral and metabolic factors influencing alanine aminotransferase activity in patients with chronic hepatitis C.

Authors: Daniele Prati; Mitchell L Shiffman; Moisés Diago; Edward Gane; K Rajender Reddy; Paul Pockros; Patrizia Farci; Christopher B O'Brien; Pilar Lardelli; Steven Blotner; Stefan Zeuzem
Journal: J Hepatol Date: 2006-01-25 Impact factor: 25.083

Review 2. Viral factors influencing the response to the combination therapy of peginterferon plus ribavirin in chronic hepatitis C.

Authors: Shinya Maekawa; Nobuyuki Enomoto
Journal: J Gastroenterol Date: 2009 Impact factor: 7.527

3. Predictive graphical model, network-based medical tool for the prognosis of chronic hepatitis C patients treated with peg-interferon plus ribavirin.

Authors: M Trapero-Marugan; M Marin; J P Pivel; J M Del Rio; O Nunez; G Clemente; J P Gisbert; R Moreno-Otero
Journal: Aliment Pharmacol Ther Date: 2008-06-12 Impact factor: 8.171

4. Peginterferon-alpha2a and ribavirin combination therapy in chronic hepatitis C: a randomized study of treatment duration and ribavirin dose.

Authors: Stephanos J Hadziyannis; Hoel Sette; Timothy R Morgan; Vijayan Balan; Moises Diago; Patrick Marcellin; Giuliano Ramadori; Henry Bodenheimer; David Bernstein; Mario Rizzetto; Stefan Zeuzem; Paul J Pockros; Amy Lin; Andrew M Ackrill
Journal: Ann Intern Med Date: 2004-03-02 Impact factor: 25.391

5. Mutations in the nonstructural protein 5A gene and response to interferon in patients with chronic hepatitis C virus 1b infection.

Authors: N Enomoto; I Sakuma; Y Asahina; M Kurosaki; T Murakami; C Yamamoto; Y Ogura; N Izumi; F Marumo; C Sato
Journal: N Engl J Med Date: 1996-01-11 Impact factor: 91.245

6. Drug-induced liver injury.

Authors: Neil Kaplowitz
Journal: Clin Infect Dis Date: 2004-03-01 Impact factor: 9.079

7. Relation of pretreatment sequence diversity in NS5A region of HCV genotype 1 with immune response between pegylated-INF/ribavirin therapy outcomes.

Authors: A T L de Queiróz; V Maracaja-Coutinho; A C G Jardim; P Rahal; I M V G de Carvalho-Mello; S R Matioli
Journal: J Viral Hepat Date: 2011-02 Impact factor: 3.728

8. Why is the interferon sensitivity-determining region (ISDR) system useful in Japan?

Authors: I Nakano; Y Fukuda; Y Katano; S Nakano; T Kumada; T Hayakawa
Journal: J Hepatol Date: 1999-06 Impact factor: 25.083

9. Mutations in the NS5A region do not predict interferon-responsiveness in american patients infected with genotype 1b hepatitis C virus.

Authors: R T Chung; A Monto; J L Dienstag; L M Kaplan
Journal: J Med Virol Date: 1999-08 Impact factor: 2.327

10. epiPATH: an information system for the storage and management of molecular epidemiology data from infectious pathogens.

Authors: Alicia Amadoz; Fernando González-Candelas
Journal: BMC Infect Dis Date: 2007-04-20 Impact factor: 3.090