| Literature DB >> 30105048 |
Olivier Naret1,2, Nimisha Chaturvedi1,2, Istvan Bartha1,2, Christian Hammer1,2, Jacques Fellay1,2,3.
Abstract
Studies of host genetic determinants of pathogen sequence variations can identify sites of genomic conflicts, by highlighting variants that are implicated in immune response on the host side and adaptive escape on the pathogen side. However, systematic genetic differences in host and pathogen populations can lead to inflated type I (false positive) and type II (false negative) error rates in genome-wide association analyses. Here, we demonstrate through a simulation that correcting for both host and pathogen stratification reduces spurious signals and increases power to detect real associations in a variety of tested scenarios. We confirm the validity of the simulations by showing comparable results in an analysis of paired human and HIV genomes.Entities:
Keywords: escape variants; genome-wide association study; host-pathogen genomics; population stratification; simulation study
Year: 2018 PMID: 30105048 PMCID: PMC6078058 DOI: 10.3389/fgene.2018.00266
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Host SNPs parameters.
| 40,000 | Absent | Non Stratified | Non Stratified |
| 10,000 | Absent | 0.2 | Non Stratified |
| 100 | Absent | 0.2 | 0.005 |
| 100 | Present (Table | Non Stratified | Non Stratified |
| 100 | Present (Table | 0.2 | 0.016 |
Stratification defined by f.
Pathogen variants parameters.
| 100 | Absent | 0.2 | Non Stratified |
| 100 | Absent | 0.2 | 0.005 |
| 100 | Present | Non Stratified | Non Stratified |
| 100 | Present | 0.2 | 0.01 |
Stratification defined by f.
Figure 1False positive signal. (A) Simulated host and pathogen variant frequency distribution for case 1. (B) P-value boxplot for case 1. (C) Simulated host and pathogen variant frequency distribution for case 2. (D) P-value boxplot for case 2.
Figure 2Power gain. (A) Simulated data structure for stratified host and pathogen data with true associations. (B) P-value boxplot for stratified host and pathogen data with true associations.
Figure 3Population structures in HIV data. (A) Principal component plot for host data (first and second axis) (B). Phylogenetic principal component plot for the HIV virus data (first and second axis) (C). Pearson correlation between first five host principal components and first three HIV virus phylogenetic principal components.
Figure 4Allelic distribution of host SNP rs4913471 and HIV amino acid variant at position 67 in the protease region. (A) Genotypes of rs4913471 plotted on first two host principal components. (B) Presence or absence of amino acid variant at position 67 in the protease region plotted on first two host principal components.
HIV G2G associations.
| PR | 35(E) | rs2596477 (6:31327723) | 8.503e-28 |
| 35(D) | rs17199328 (6:31322395) | 4.083e-25 | |
| 12(T) | rs75344417 (6:31429439) | 1.559e-15 | |
| 12(A) | rs116855165 (6:31059097) | 1.266e-11 | |
| 37(N) | rs2596477 (6:31327723) | 3.370e-14 | |
| 93(I) | rs9378249 (6:31327701) | 1.953e-13 | |
| 93(L) | rs9378249 (6:31327701) | 1.953e-13 | |
| 67(Y) | rs9391775 (6:31427948) | 3.394e-12 | |
| RT | 135(I) | rs2844527 (6:31367636) | 3.444e-35 |
| 135(T) | rs79556279 (6:31329846) | 1.114e-29 | |
| 165(T) | rs2442724 (6:31319907) | 1.557e-13 | |
| 165(I) | rs92647589 (6:31248434) | 1.919e-12 | |
| 123(E) | rs114773933 (6:31148349) | 1.917e-12 | |
| 138(A) | rs114073761 (6:31336749) | 1.929e-11 |
Strongest associations between amino acid variants and SNPs. The first three columns of the table give HIV genes, amino acid positions and the host SNP ids.