| Literature DB >> 27677172 |
Robert A Power1, Siva Davaniah1, Anne Derache1,2, Eduan Wilkinson1, Frank Tanser1, Ravindra K Gupta3, Deenan Pillay1,3, Tulio de Oliveira1.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) have considerably advanced our understanding of human traits and diseases. With the increasing availability of whole genome sequences (WGS) for pathogens, it is important to establish whether GWAS of viral genomes could reveal important biological insights. Here we perform the first proof of concept viral GWAS examining drug resistance (DR), a phenotype with well understood genetics.Entities:
Year: 2016 PMID: 27677172 PMCID: PMC5038937 DOI: 10.1371/journal.pone.0163746
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Number of WGS treated with each drug, and correlations between drugs within samples.
| Drug | Treated | Untreated | Correlation with: | |||||
|---|---|---|---|---|---|---|---|---|
| Zidovudine | Stavudine | Tenofovir | Efavirenz | Nevirapine | Lopinavir | |||
| 32 | 311 | 1 | - | - | - | - | - | |
| 291 | 52 | -0.058 | 1 | - | - | - | - | |
| 101 | 242 | -0.117 | -0.507 | 1 | - | - | - | |
| 259 | 84 | 0.011 | -0.023 | 0.128 | 1 | - | - | |
| 113 | 230 | -0.017 | 0.127 | -0.057 | -0.623 | 1 | - | |
| 26 | 317 | 0.213 | -0.033 | 0.053 | -0.151 | -0.115 | 1 | |
Results for genome-wide significant SNPs and their corresponding amino acid positions.
| Drug | SNP | Missing-ness | A1 | Ref. | Gene | Amino acid N | Ref. Amino Acid | A1 Amino Acid | Known | OR | SE | Unadjusted p-value | Permutation adjusted p-value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nevirapine | 3078G | 14% | G | A | RT | 181 | Y | C | Yes | 5.20 | 0.26 | 4.77E-10 | 1.00E-07 |
| Stavudine | 2739A | 14% | A | G | RT | 68 | S | N | Cis | 0.08 | 0.54 | 5.38E-06 | 0.0081 |
| Tenofovir | 1063A | 18% | G | A | MA (p17) | 91 | R | G | No | 1.79 | 0.14 | 2.42E-05 | 0.016 |
| 2730G | 13% | G | A | RT | 65 | K | R | Yes | 6.44 | 0.24 | 1.67E-14 | 1.00E-07 | |
| 2738G | 14% | G | A | RT | 68 | S | G | Cis | 2.89 | 0.24 | 1.45E-05 | 0.0088 | |
| 2852A | 14% | A | G | RT | 106 | V | M | Conv. | 1.72 | 0.14 | 6.19E-05 | 0.047 | |
| 2880T | 13% | T | A | RT | 115 | Y | F | Conv. | 5.77 | 0.41 | 1.80E-05 | 0.011 | |
| Zidovudine | 2745G | 16% | G | A | RT | 70 | K | R | Yes | 3.11 | 0.22 | 2.94E-07 | 0.0006 |
Note that the effect of SNP 2739A is protective against stavudine resistance (i.e. odds ratio [OR] <1) and the association is actually with tenofovir, that has a negatively correlated prescription regime. Ref. = Reference; BP = base position; A1 = effect allele; Cis = proximal to known DR variant; Conv. = convergent, i.e. known DR variant for another drug; OR = Odds ratio; SE = standard error.
Fig 1Analysis pipeline for HIV whole genome sequence (WGS) genome-wide association study (GWAS) compared to a human study using a SNP chip.
Step 1) Diploidy defined for both human and pathogen, to reflect ‘real’ heterozygosity and heterozygosity from within host viral diversity. 2) While missingness and Hardy-Weinberg Equilibrium are used to assess genotyping quality in human GWAS, in viral GWAS we used depth of sequencing to assess variant calls. As such, higher calling confidence is associated with higher missingness in viral SNPs, while the reverse is true in humans. Low minor allele frequency (MAF) is always used to remove variants that have low power to detect effects and may reflect errors. 3&4) Correction for ancestry and relatedness are key to human GWAS, however due to both more homogenous sampling and difficulty in applying conventional corrections in human data to viral, this was done as a sensitivity test in a smaller sample for top SNPs in HIV GWAS.
Fig 2Manhattan plot comparing HIV sequences that were exposed to tenofovir to those that were not.
The reference line at p = 7E-5 is the line for permutation adjusted genome wide significance. Dashed grey lines on genomic locations refer to borders of genes (black dashed refer to GAG, Pol and ENV). Each marker is a SNP, weighted by it’s—log(p-value) to highlight the most significant SNPs.
Fig 3Plot of standardised values for the ancestry informative principle components 1 & 2 (red) and latitude & longitude (gold) for HIV sequences, with values for each sequence linked by a line.
No correlation between geographic position and genetic position was observed.