| Literature DB >> 24320595 |
Harish Dharuri, Peter Henneman, Ayse Demirkan, Jan Bert van Klinken, Dennis Owen Mook-Kanamori, Rui Wang-Sattler, Christian Gieger, Jerzy Adamski, Kristina Hettne, Marco Roos, Karsten Suhre, Cornelia M Van Duijn, Ko Willems van Dijk, Peter A C 't Hoen1.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets.Entities:
Mesh:
Year: 2013 PMID: 24320595 PMCID: PMC3879060 DOI: 10.1186/1471-2164-14-865
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The database interrogation schemes. The two interrogation schemes: pathway scheme (A) and reaction scheme (B) are shown. The blue color indicates the intermediate steps to filter out certain pathways/compounds from the two schemes to avoid non-specific connections.
Figure 2Strategy to find biologically relevant SNP-metabolite pairs in published GWAS datasets. Background knowledge pertaining to a metabolite is collected from the pathway databases KEGG and BioCyc in an automated fashion to generate a gene/SNP set relevant to the synthesis and degradation of the metabolite.
Gene and SNP sets generated by the database: interrogation schemes for each of the metabolites
| Arginine | 20 | 104 | 57 | 179 | 257 | 10788 | 10788 |
| Glutamine | 51 | 132 | 100 | 282 | 388 | 15591 | 15591 |
| Glycine | 90 | 192 | 173 | 432 | 523 | 20767 | 20767 |
| Histidine | 8 | 9 | 45 | 155 | 181 | 7126 | 7126 |
| Leucine | 8 | 0 | 44 | 83 | 117 | 5037 | 5037 |
| Methionine | 27 | 104 | 35 | 243 | 284 | 11532 | 11532 |
| Ornithine | 16 | 150 | 103 | 159 | 247 | 10089 | 10089 |
| Phenylalanine | 6 | 113 | 25 | 163 | 196 | 8419 | 8419 |
| Proline | 10 | 12 | 57 | 83 | 119 | 5075 | 5075 |
| Serine | 37 | 135 | 152 | 219 | 360 | 14996 | 14996 |
| Threonine | 1 | 11 | 39 | 49 | 75 | 2633 | 2633 |
| Tryptophan | 15 | 19 | 78 | 221 | 261 | 10419 | 10419 |
| Tyrosine | 14 | 106 | 61 | 158 | 219 | 9365 | 9365 |
| Valine | 15 | 93 | 80 | 137 | 211 | 9365 | 9365 |
| Carnitine | 32 | 206 | 81 | 94 | 263 | 11239 | 460799 |
| Phosphatidylcholine | 188 | 361 | 312 | 343 | 640 | 31676 | 2914192 |
| Sphingomyelin | 160 | 331 | 189 | 241 | 460 | 21290 | 319350 |
| Sum | 698 | 2078 | 1631 | 3241 | 4801 | 205407 | 3835543 |
| Unique set | 399 | 806 | 703 | 768 | 1246 | 55952 | 55952 |
The number of genes for each metabolite and the corresponding database:interrogation scheme is shown. 1 The size of the union of the gene set obtained from all the four database:interrogation schemes. 2 The size of the corresponding SNP set. 3 The number of tests is the same as the size of the SNP set for the amino acids whereas for aggregated entities like the lipids and carnitine the SNP set is multiplied by the number of compounds present in that class.
Figure 3Gene set overlap for the KEGG and BioCyc databases. The Venn diagram depicts the overlap between the non-redundant gene set for KEGG and the BioCyc metabolic pathway database. These genes correspond to the combined set from the pathway and reaction interrogation schemes. The total number of unique genes that our method yields is 1246.
Performance of the database:interrogation schemes in GWAS dataset analysis
| BioCyc pathway | 399 | 0.53 | |
| BioCyc reaction | 806 | 0.47 | |
| KEGG pathway | 703 | 0.67 | |
| KEGG reaction | 768 | 0.53 | |
| Pooled set | 1246 | 0.67 |
Snapshot of the matches between our method and the association data from the Illig et al. 2010 study for each of the database:interrogation scheme. 1corresponds to the unique set of genes generated for all the metabolites for the given database:interrogation scheme. 2corresponds to the top hits in the Illig et al. publication that were present in the gene set for the given database:interrogation scheme. 3Sensitivity is a measure of the actual positives that have been captured by our method and is equal to the ratio of the number of top hits identified by the method over the total number of top hits in the Illig et al. publication which is 15.
Replication of candidate genes in the Demirkan et al. dataset
| PC ae C40:6 | rs11786743 | 4.03E-05 | rs913819 | 6.73E-04 | 2.15E-07 | |
| PC ae C40:6 | rs2839631 | 5.67E-06 | rs378376 | 5.17E-04 | 2.90E-08 | |
| PC ae C38:2 | rs10485168 | 2.42E-04 | rs9359765 | 4.61E-04 | 7.54E-07 | |
| PC ae C34:3 | rs2246253 | 1.25E-04 | rs2419603 | 1.76E-04 | 1.56E-07 | |
| PC aa C34:4 | rs2862999 | 2.66E-05 | rs11037685 | 6.13E-04 | 1.35E-07 | |
| PC ae C40:6 | rs9465673 | 1.11E-04 | rs694094 | 4.47E-04 | 3.53E-07 | |
| PC aa C38:0 | rs3770536 | 5.55E-04 | rs3770562 | 9.43E-05 | 3.79E-07 | |
| PC aa C30:0 | rs6056188 | 9.55E-06 | rs17363114 | 1.96E-03 | 2.06E-07 | |
| PC aa C32:0 | rs7252966 | 1.69E-05 | rs7254215 | 2.09E-03 | 3.57E-07 |
Top hits from the meta-analysis of candidate genes identified in the Illig et al. study and replicated in the Demirkan et al. dataset. 1,2,3p-value of association of the SNP with the trait in the Illig et al., Demirkan et al. and combined p-value respectively. indicates genes for which further evidence was found.
Figure 4Role of ALDH1L1 in the cytosolic one-carbon pool metabolism. A simplified schematic of the one-carbon pool metabolism in the cytosol is depicted. ALDH1L1: Aldehyde Dehydrogenase 1 Family, Member L1; THF: tetrahydrofolate; SHMT: Serine hydrxymethyltransferase.