| Literature DB >> 35886007 |
Xing Hua1,2, Lei Song1, Guoqin Yu3, Emily Vogtmann1, James J Goedert1, Christian C Abnet1, Maria Teresa Landi1, Jianxin Shi1.
Abstract
The microbiome is the collection of all microbial genes and can be investigated by sequencing highly variable regions of 16S ribosomal RNA (rRNA) genes. Evidence suggests that environmental factors and host genetics may interact to impact human microbiome composition. Identifying host genetic variants associated with human microbiome composition not only provides clues for characterizing microbiome variation but also helps to elucidate biological mechanisms of genetic associations, prioritize genetic variants, and improve genetic risk prediction. Since a microbiota functions as a community, it is best characterized by β diversity; that is, a pairwise distance matrix. We develop a statistical framework and a computationally efficient software package, microbiomeGWAS, for identifying host genetic variants associated with microbiome β diversity with or without interacting with an environmental factor. We show that the score statistics have positive skewness and kurtosis due to the dependent nature of the pairwise data, which makes p-value approximations based on asymptotic distributions unacceptably liberal. By correcting for skewness and kurtosis, we develop accurate p-value approximations, whose accuracy was verified by extensive simulations. We exemplify our methods by analyzing a set of 147 genotyped subjects with 16S rRNA microbiome profiles from non-malignant lung tissues. Correcting for skewness and kurtosis eliminated the dramatic deviation in the quantile-quantile plots. We provided preliminary evidence that six established lung cancer risk SNPs were collectively associated with microbiome composition for both unweighted (p = 0.0032) and weighted (p = 0.011) UniFrac distance matrices. In summary, our methods will facilitate analyzing large-scale genome-wide association studies of the human microbiome.Entities:
Keywords: gene–environment interaction; genome-wide association study; host genetics; microbiome; skewness and kurtosis; tail probabilities
Mesh:
Substances:
Year: 2022 PMID: 35886007 PMCID: PMC9317577 DOI: 10.3390/genes13071224
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Microbiome distances are positively correlated with genetic distances at an associated SNP.
Figure 2Define the joint test for testing vs. . We assume that , and under . Details are in Appendix C.
Figure 3Correcting tail probabilities for skewness and kurtosis. (A) The standard normal distribution and an approximately normal distribution with positive skewness. The skewness has big impact when calculating the tail probability for a large value of . (B) Numerical evaluation of tail probability approximation for . We used the unweighted UniFrac distance matrix of 500 samples from the American Gut Project (AGP). For each value of b (>0), we calculated p-values based on , skewness correction, both skewness and kurtosis correction, and 108 simulations. (C) Skewness depends on minor allele frequency (MAF) of SNPs and the sample size of the study, calculated based on the weighted UniFrac distance matrix in AGP data. (D) Kurtosis depends on MAF of SNPs and the sample size, calculated based on the weighted UniFrac distance matrix in the AGP data.
Type-I error rates estimated based on 108 simulations. Minor allele frequency = 20%. Simulations were based on the weighted UniFrac distance matrix of the gut microbiome data from the American Gut Project. Reported are the type-I error inflation factor. A value greater than 1 indicates an inflated type-I error.
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N | α = 10−3 | 10−5 | 10−7 | 10−3 | 10−5 | 10−7 | 10−3 | 10−5 | 10−7 | |
| Asymptotic approximation | 100 | 5.5 | 51.6 | 610.0 | 4.7 | 36.1 | 342.8 | 7.3 | 80.9 | 1148.0 |
| 200 | 3.7 | 23.0 | 187.3 | 3.1 | 15.8 | 105.5 | 4.6 | 33.0 | 316.7 | |
| 500 | 2.4 | 9.4 | 45.2 | 2.1 | 6.7 | 25.5 | 2.8 | 11.9 | 64.1 | |
| 1000 | 2.0 | 5.7 | 21.3 | 1.8 | 4.4 | 14.0 | 2.2 | 6.9 | 28.5 | |
| Adjusted for skewness and kurtosis | 100 | 1.0 | 1.2 | 0.7 | 1.0 | 1.1 | 0.6 | 1.0 | 1.5 | 2.0 |
| 200 | 1.0 | 1.1 | 1.0 | 1.0 | 1.1 | 0.7 | 0.9 | 1.3 | 1.8 | |
| 500 | 1.0 | 1.1 | 1.3 | 1.0 | 1.0 | 0.9 | 0.9 | 1.0 | 1.7 | |
| 1000 | 1.0 | 1.0 | 1.2 | 1.0 | 1.0 | 0.8 | 0.9 | 1.0 | 1.1 | |
Figure 4Computation time for a microbiome GWAS with 500,000 SNPs. “Main”: computation time for testing main effect only. “All”: computation time for testing main effect, interaction and the joint null hypothesis .
Figure 5Results of analyzing the microbiome GWAS data of 147 adjacent normal lung tissues in the EAGLE study. (A) Skewness and kurtosis for the main effect test using the unweighted and the weighted UniFrac distance matrices. (B) Quantile–quantile (QQ) plot for association p-values using the unweighted UniFrac distance matrix. “Adjusted”: p-values were corrected for skewness and kurtosis. “Unadjusted”: p-values were approximated based on the asymptotic distribution . (C) Quantile–quantile (QQ) plot for association p-values using the weighted UniFrac distance matrix. (D) Manhattan plots based on the unweighted or the weighted UniFrac distance matrices. (E) Box plots for the top nine loci in microbiome GWAS analysis. Subject pairs are classified into three groups according to the genetic distance at the SNP. The y-coordinate is the microbiome distance.
Association p-values between lung cancer risk SNPs and microbiome composition in the EAGLE data.
| SNP | Chr | Annotated Genes | Unweighted UniFrac | Weighted UniFrac |
|---|---|---|---|---|
| rs2036534 | 15q25.1 |
| 0.425 | 0.167 |
| rs1051730 | 15q25.1 |
| 0.020 | 0.401 |
| rs2736100 | 5p15.33 |
| 0.089 | 0.267 |
| rs401681 | 5p15.33 |
| 0.056 | 0.005 |
| rs6489769 | 12p13.3 |
| 0.197 | 0.329 |
| rs1333040 | 9p21.3 | 0.249 | 0.224 | |
| Overall test | 0.0032 | 0.011 | ||