Literature DB >> 27884205

Whole genome sequence analysis of serum amino acid levels.

Bing Yu¹, Paul S de Vries¹, Ginger A Metcalf², Zhe Wang¹, Elena V Feofanova¹, Xiaoming Liu¹, Donna Marie Muzny², Lynne E Wagenknecht³, Richard A Gibbs², Alanna C Morrison¹, Eric Boerwinkle^4,5.

Abstract

BACKGROUND: Blood levels of amino acids are important biomarkers of disease and are influenced by synthesis, protein degradation, and gene-environment interactions. Whole genome sequence analysis of amino acid levels may establish a paradigm for analyzing quantitative risk factors.
RESULTS: In a discovery cohort of 1872 African Americans and a replication cohort of 1552 European Americans we sequenced exons and whole genomes and measured serum levels of 70 amino acids. Rare and low-frequency variants (minor allele frequency ≤5%) were analyzed by three types of aggregating motifs defined by gene exons, regulatory regions, or genome-wide sliding windows. Common variants (minor allele frequency >5%) were analyzed individually. Over all four analysis strategies, 14 gene-amino acid associations were identified and replicated. The 14 loci accounted for an average of 1.8% of the variance in amino acid levels, which ranged from 0.4 to 9.7%. Among the identified locus-amino acid pairs, four are novel and six have been reported to underlie known Mendelian conditions. These results suggest that there may be substantial genetic effects on amino acid levels in the general population that may underlie inborn errors of metabolism. We also identify a predicted promoter variant in AGA (the gene that encodes aspartylglucosaminidase) that is significantly associated with asparagine levels, with an effect that is independent of any observed coding variants.
CONCLUSIONS: These data provide insights into genetic influences on circulating amino acid levels by integrating -omic technologies in a multi-ethnic population. The results also help establish a paradigm for whole genome sequence analysis of quantitative traits.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Amino acids; Metabolomics; Multi-ethnic; Rare variants; Whole genome sequence

Mesh：

Substances：
Amino Acids
Biomarkers

Year: 2016 PMID： 27884205 PMCID： PMC5123402 DOI： 10.1186/s13059-016-1106-x

Source DB: PubMed Journal: Genome Biol ISSN： 1474-7596 Impact factor: 13.583

Background

Conventional wisdom holds that common complex diseases are polygenic and rare Mendelian diseases are monogenic. Indeed the biology of human health and disease is complex and there is a continuum of genetic architectures. For example, ever since the seminal work of Goldstein and Brown with familial hypercholesterolemia [1], it is appreciated that a subset of individuals in the far tails of the phenotype distribution (e.g., LDL-cholesterol) may have a Mendelian form of a condition while others may have a polygenic predisposition. To gain a complete understanding of the genetic architecture of health and disease will require: 1) realization of the continuum of Mendelian and polygenic conditions; 2) consideration of the whole genome; and 3) multi-omic approaches that allow measurements of intermediate phenotypes closer to gene action and that bridge genome variation with inter-individual differences in disease risk. Circulating blood levels of amino acids and whole genome sequence data combined with state-of-the-art annotation and analysis tools can help establish a paradigm for defining the genetic architecture of quantitative phenotypes. Rare recessive mutations in genes that lead to deficiencies or excess of specific amino acids are the root cause of a number of inborn errors of metabolism [2]. Inter-individual differences in several amino acids are risk factors for common disease (e.g., branched-chain and aromatic amino acids for diabetes) [3]. Amino acids are important components of protein metabolism and cell signaling. They reflect a variety of cellular and physiologic processes and may, therefore, mirror gene–environment interactions. Genome-wide association studies (GWAS) have identified common variants associated with multiple amino acid levels [4-6]. Low-frequency variants that modulate amino acid levels independent of known GWAS loci have also been reported using exome arrays and a targeted analytical approach for exome sequence data [7, 8]. To date, no study has assessed the impact of rare and low-frequency variations captured by systematic and comprehensive sequencing of the protein-encoding exons and whole genomes on amino acid levels in a multi-ethnic population. We used exon and whole genome sequencing in a sample of 3424 European and African Americans to investigate the genetic determinants of 70 blood amino acid levels. Significant effects discovered in African Americans (AA) were replicated in an independent set of European Americans (EA). This study demonstrates the utility of combining multi-omic data and the importance of intermediate phenotypes close to gene action for identifying regions of the genome influencing biologically and clinically relevant traits.

Results

Baseline characteristics

We sequenced exons and whole genomes and measured serum levels of 70 amino acids in 1872 AA for the discovery stage and 1552 EA for the replication stage among participants in the Atherosclerosis Risk in Communities (ARIC) study. Baseline characteristics of both the discovery and replication samples are shown in Additional file 1: Table S1. The mean age of the AA and EA participants was 52.7 and 54.7 years, respectively, and 65.2 and 54.9% of the samples were female. Prevalent diabetes was diagnosed in 16 and 8% of the AA and EA subjects, respectively, and 52 and 31%, respectively, had prevalent hypertension. In the AA samples, a total of 330,490 single nucleotide variants (SNVs) in the exons were captured by exome sequencing and 52,094,875 in the whole genomes; 94.8% of the SNVs were rare or low-frequency (minor allele frequency (MAF) ≤5%) in the exons and this number was 82.9% in the whole genomes. The proportion of variants within frequency bins characterized as rare (0% < MAF < 1%), low-frequency (1% ≤ MAF ≤ 5%), and common (MAF > 5%) is shown in Additional file 2: Figure S1. We used four approaches to examine the association of amino acid levels with genetic variants across the genome: 1) a gene exon approach; 2) an annotated regulatory motif approach; 3) a genome-wide sliding window approach; and 4) a single variant approach. The single variant approach analyzes the variants individually and the other three approaches collapse rare and low-frequency variants into a burden test because insufficient information is available for any one rare variant within a fixed sample size. The gene exon approach leverages the strength of the exome sequence data and the regulatory motif and sliding window approaches highlight the utility of whole genome sequence data. Each of these approaches is separately addressed in the following paragraphs. Overall, a total of 14 genetic loci–amino acid paired associations exceeded our a priori defined threshold for statistical significance in the discovery analysis in AA samples and were replicated in the EA samples. Within the 14 pairs, six loci–amino acid relationships were detected by more than one analytical approach (Fig. 1). Ten out of 14 pairs have been reported by previous GWAS, and the other four pairs are novel. A comparison between the 14 pairs and previous GWAS findings is provided in Additional file 1: Table S2.

Fig. 1

Identified significant genetic associations with serum amino acid levels. Gene names with a single line underneath indicate the association was reported in previous studies for European ancestry; gene names with double lines underneath indicate the association was reported in previous studies for both African and European ancestry. Gene names shown in the single variant test were assigned according to the leading common variant annotations

Gene exon approach

For the gene exon approach, we restricted our analysis to predicted functional variants with MAF ≤5%. A total of 15,589 genes with cumulative minor allele counts (cMAC) ≥7 were analyzed. We identified and replicated seven gene–amino acid pairs (HAO2–alpha-hydroxyisovalerate, AGA–asparagine, DMGDH–dimethylglycine, CCBL1–indolelactate, ACY1–N-acetylalanine and ACY1–N-acetylthreonine, PRODH–proline) with significant discovery p values (P ) < 4.6 × 10−8 and a replication p value (P ) <0.003 (Table 1). There were 12 to 30 rare and low-frequency variants involved within each of the identified genes. Detailed results for each rare and low-frequency variant involved in these genes are provided in Additional file 1: Table S3. A full list of identified gene–amino acid pairs regardless of successful replication is provided in Additional file 1: Table S4. Annotated functional variants in the six genes of the seven gene–amino acid pairs accounted for 0.6–3.6% of the variance in the amino acid levels, with the average being 1.8%. The six genes all encode enzymes, four of which directly catalyze reactions involving the identified amino acids as substrates or end products. The relationships between AGA and asparagine (P = 1.3 × 10−10, P = 2.7 × 10−5), dimethylglycine and DMGDH (P = 3.2 × 10−31, P = 8.1 × 10−12), N-acetylalanine, N-acetylthreonine and ACY1 (P = 4.1 × 10−41 and 1.1 × 10−10, P = 3.9 × 10−15 and 4.7 × 10−5), proline and PRODH (P = 1.4 × 10−29, P = 1.5 × 10−11) are consistent with known autosomal recessive metabolic disorders. The gene exon results for the meta-analysis of the discovery and replication samples with p < 4.0 × 10−6 are provided in Additional file 1: Table S5.

Table 1

Gene exon-based results demonstrating a significant association among both discovery (p < 4.6 × 10−8) and replication (p < 0.003) stages for the T5 burden test

Metabolite	Gene	Discovery (AA)				Replication (EA)
Metabolite	Gene	P	Beta	cMAC	VarExp	P	Beta	cMAC	VarExp
Dimethylglycine	DMGDH	3.2 × 10⁻³¹	0.64	96	3.6%	8.1 × 10⁻¹²	0.39	73	1.7%
N-acetylthreonine	ACY1	1.1 × 10⁻¹⁰	0.12	239	0.6%	4.7 × 10⁻⁵	0.26	24	0.4%
N-acetylalanine	ACY1	4.1 × 10⁻⁴¹	0.16	239	1.5%	3.9 × 10⁻¹⁵	0.25	24	0.6%
Asparagine	AGA	1.1 × 10⁻¹⁰	0.34	157	1.4%	2.7 × 10⁻⁵	0.38	58	0.9%
Indolelactate	CCBL1	2.7 × 10⁻²¹	0.39	87	1.6%	1.1 × 10⁻⁷	0.26	33	0.5%
Alpha-hydroxyisovalerate	HAO2	1.6 × 10⁻⁸	0.64	21	0.8%	8.2 × 10⁻⁶	0.41	18	0.5%
Proline	PRODH	1.4 × 10⁻²⁹	0.14	324	1.4%	1.5 × 10⁻¹¹	0.09	295	0.7%

cMAC cumulative minor allele count, VarExp variance explained by the loci

Gene exon-based results demonstrating a significant association among both discovery (p < 4.6 × 10−8) and replication (p < 0.003) stages for the T5 burden test cMAC cumulative minor allele count, VarExp variance explained by the loci

Regulatory motif approach

Defining regulatory motifs away from protein-encoding genes is a major activity of modern genome sciences. Projects such as ENCODE [9] and GTEx [10] are defining noncoding regions of the genome that have important biologic function, including regulation of gene expression. We analyzed a total of 21,040 annotated regulatory motifs with cMAC ≥7 across the genome, and statistical significance was defined as P < 3.4 × 10−8. Although two regulatory motifs exceeded our a priori significance threshold for discovery in the AA samples, they did not replicate in the EA samples (Additional file 1: Table S6). To help up-weight predicted functional variants, the regulatory motif analysis was repeated and weighted by the combined annotation dependent depletion (CADD) scores [11], but the results did not change substantially from those of the unweighted analyses (Additional file 2: Figure S2). The regulatory motif results for the meta-analysis of the discovery and replication samples with p <4.0 × 10−6 are provided in Additional file 1: Table S7.

Sliding window approach

We next applied a sliding window approach to analyze rare and low-frequency variation (MAF ≤5%) aggregated by 4-kb windows with a 2-kb skip length using burden tests to scan the entire genome. A total of 1,337,499 windows (668,748 non-overlapping windows) with cMAC ≥7 were analyzed. We identified and replicated two genomic regions influencing two amino acid levels (P < 1.1 × 10−9 and P < 0.01; Table 2). One is a 130-kb region at 2p13.2, where two windows in the region were associated with N-acetyl-1-methylhistidine levels (lowest window P = 1.6 × 10−15, P = 3.9 × 10−4). ALMS1 and NAT8, two neighboring genes residing in this 130-kb region, have been previously reported to be related to N-acetyl amino acids levels [4, 6]. The other region is located at 6q23.2 where a single window 46 kb downstream of VNN1 was associated with acisoga. Detailed results for each rare and low-frequency variant involved in the identified windows are provided in Additional file 1: Table S3. A full list of identified significant sliding window–amino acid pairs regardless of successful replication is provided in Additional file 1: Table S8. The sliding window results for the meta-analysis of the discovery and replication samples with p < 4.0 × 10−6 are provided in Additional file 1: Table S9.

Table 2

Sliding windows demonstrating a significant association among both discovery (p < 1.1 × 10−9) and replication (p < 0.01) stages for the T5 burden test

Metabolite	Discovery (AA)					Replication (EA)
Metabolite	Window (gene)	P	Beta	cMAC	VarExp	Window (gene)	P	Beta	cMAC	VarExp
N-acetyl-1-methylhistidine	Chr2: 73744005–73748004 (NAT8)	1.6 × 10⁻¹⁵	0.12	933	1.0%	Chr2: 73744005–73748004 (NAT8)	0.0004	0.17	156	0.5%
N-acetyl-1-methylhistidine	Chr2: 73614005–73618004 (NAT8)	6.2 × 10⁻¹¹	−0.11	728	0.7%	Chr2: 73614005–73618004 (NAT8)	0.005	−0.07	336	0.2%
Acisoga	Chr6: 132952009–132956008 (VNN1)	9.4 × 10⁻¹⁰	0.06	1504	0.4%	Chr6: 132952009–132956008 (VNN1)	0.009	−0.04	764	0.1%

cMAC cumulative minor allele count, VarExp variance explained by the loci

Sliding windows demonstrating a significant association among both discovery (p < 1.1 × 10−9) and replication (p < 0.01) stages for the T5 burden test cMAC cumulative minor allele count, VarExp variance explained by the loci

Single variant approach

In addition to rare and low-frequency variants, we conducted a survey of the genome investigating common SNVs with MAF >5%. Eleven single variant–amino acid associations reached the significance threshold at both the discovery and replication stages (P < 7.1 × 10−10 and P < 0.003; Table 3). These 11 common variants accounted for 0.7–9.7% of the variance of amino acids levels, with an average of 2.3%. The 11 SNVs all resided in protein-encoding gene regions, six of which encode enzymes that catalyze the reaction of the corresponding metabolite as a substrate or product. Among the significant findings, two gene–amino acid associations are novel (3-methoxytyrosine and DDC, and acisoga and VNN1) and there are two loci, DDC and CPS1, in which mutations are known to cause autosomal recessive metabolic disorders. A full list of identified significant single variant–amino acid pairs regardless of successful replication is provided in Additional file 1: Table S10. The single variant results for the meta-analysis of the discovery and replication samples with p < 5.0 × 10−8 are provided in Additional file 1: Table S11.

Table 3

Single variant results demonstrating a significant association among both discovery (p < 7.1 × 10−10) and replication (p < 0.003) stages

Metabolite	Variant information					Discovery (AA)				Replication (EA)
Metabolite	Gene	SNP	Function	Chr:position	REF/ALT	MAF	Beta	P	Var Exp	MAF	Beta	P	Var Exp
Glycine	CPS1	rs1047891	Missense	2:211540507	C/A	0.37	0.09	4.5 × 10⁻¹⁹	1.3%	0.31	0.16	4.9 × 10⁻⁴⁵	3.6%
Dimethylglycine	DMGDH	rs933683	Intronic	5:78324003	G/T	0.44	−0.15	2.3 × 10⁻¹⁴	1.9%	0.29	−0.09	9.5 × 10⁻⁶	0.7%
Asparagine	AGA	rs11131799	Intronic	4:178363378	G/A	0.49	−0.14	2.4 × 10⁻¹⁰	2.5%	0.36	−0.26	3.9 × 10⁻²³	4.5%
N-acetyl-1-methylhistidine	NAT8	rs13538	Missense	2:73868328	A/G	0.48	0.34	3.3 × 10⁻⁷⁵	9.7%	0.23	0.51	3.4 × 10⁻⁸⁵	14.2%
Glutarylcarnitine	SYCE2	rs8012	Missense	19:13010520	A/G	0.19	−0.12	9.5 × 10⁻¹⁴	1.2%	0.46	−0.11	2.5 × 10⁻¹⁷	1.5%
N-acetyl phenylalanine	ALMS1P	rs13431529	Intronic	2:73876041	G/C	0.49	0.09	4.3 × 10⁻¹⁰	1.0%	0.23	0.06	1.2 × 10⁻⁵	0.4%
3-Methoxytyrosine	DDC	rs11575302	Silent	7:50607694	G/A	0.15	0.15	2.5 × 10⁻¹⁷	1.5%	0.02	0.19	1.4 × 10⁻⁷	0.5%
Indolepropionate	ACSM5	rs8044331	Intronic	16:20450302	T/C	0.42	−0.17	5.3 × 10⁻¹⁰	1.8%	0.22	−0.11	0.001	0.5%
Alpha-hydroxyisovalerate	HAO2	rs17023507	UTR5	1:119923247	C/T	0.10	−0.25	1.6 × 10⁻¹³	1.9%	0.002	−0.64	0.001	0.4%
Proline	PRODH	rs1814288	Intronic	22:18923383	C/T	0.30	−0.06	7.8 × 10⁻¹²	0.7%	0.21	−0.03	0.003	0.1%
Acisoga	VNN1	rs2272996	Missense	6: 133015271	T/C	0.19	0.18	8.1 × 10⁻¹⁶	0.2%	0.27	0.26	4.8 × 10⁻³⁴	5.1%

REF/ALT reference allele and alternative allele, MAF minor allele frequency, VarExp variance explained by the loci

Single variant results demonstrating a significant association among both discovery (p < 7.1 × 10−10) and replication (p < 0.003) stages REF/ALT reference allele and alternative allele, MAF minor allele frequency, VarExp variance explained by the loci

Conditional analyses

Across all analytic approaches, six of the region–amino acid associations have been reported in previous GWAS: AGA–asparagine, DMGDH–dimethylglycine, HAO2–alpha-hydroxyisovalerate, PRODH–proline, CCBL1–idnolelactate, and two sliding windows close to NAT8 with N-acetyl-1-methylhistidine. We performed conditional analyses in order to examine whether sequencing data were able to identify independent region-based effects at loci highlighted by previous GWAS. Results of the region-based conditional analyses are shown in Table 4. Low-frequency variants in AGA, DMGDH, HAO2, PRODH, and CCBL1 were associated with amino acid levels independent of the known GWAS lead variants. The association of low-frequency variants in the two sliding windows near NAT8, however, was strongly attenuated after adjusting for rs13538, the lead variant identified by previous GWAS. Among these six associations, we examined whether any GWAS findings can be explained by rare and low-frequency variants. In one case, rs248386, the significance of the lead variant identified by previous GWAS of dimethylglycine levels was largely diminished after conditioning on the burden of rare and low-frequency variants in DMGDH (Additional file 1: Table S12). We next performed conditional analyses to determine whether the lead single common variants for nine locus–amino acid associations were independent from the lead variants identified by GWAS. In three of these cases (rs13538–NAT8, rs1047891–CPS1, and rs8012–SYCE2), we identified the same lead variant as previous GWAS. The remaining lead variants we discovered in AA samples (rs11131799–AGA, rs933683–DMGDH, rs1814288–PRODH, rs13431529–ALMS1P, rs8044331–ACSM5, and rs17023507–HAO2) were generally independent of those identified by previous GWAS (Additional file 1: Table S13).

Table 4

Conditional analysis of selected regions adjusting for the lead common variant identified by previous genome-wide association studies

Metabolite	Region	Type	GWAS Lead SNV	Discovery (AA)		Replication (EA)
Metabolite	Region	Type	GWAS Lead SNV	P _unadjusted	P _adjusted	P _unadjusted	P _adjusted
Indolelactate*	CCBL1	Gene	rs15676	1.3 × 10⁻²⁰	1.1 × 10⁻²⁰	2.1 × 10⁻⁶	4.1 × 10⁻⁶
N-acetyl-1-methylhistidine	Chr2: 73744005–73748004 (NAT8)	Window	rs13538	1.6 × 10⁻¹⁵	0.005	4.0 × 10⁻⁴	0.2
N-acetyl-1-methylhistidine	Chr2: 73614005–73618004 (NAT8)	Window	rs13538	6.2 × 10⁻¹¹	0.9	0.005	0.8
Asparagine*	AGA	Gene	rs4690522	6.8 × 10⁻¹⁰	9.1 × 10⁻¹⁰	1.5 × 10⁻⁵	6.0 × 10⁻⁸
Dimethlyglycine*	DMGDH	Gene	rs248386	1.1 × 10⁻²⁶	4.3 × 10⁻²⁷	4.4 × 10⁻¹¹	4.5 × 10⁻¹⁰
Alpha-hydroxyisovalerate*	HAO2	Gene	rs12141041	1.5 × 10⁻⁵	3.0 × 10⁻⁵	9.3 × 10⁻⁵	2.0 × 10⁻⁴
Proline*	PRODH	Gene	rs2540641	1.4 × 10⁻²⁶	1.7 × 10⁻²⁶	1.3 × 10⁻¹²	1.2 × 10⁻¹³

*Unadjusted results may differ from main analysis because only individuals with both exome sequencing and whole genome sequencing were included in the conditional analysis. SNV single nucleotide variant

Conditional analysis of selected regions adjusting for the lead common variant identified by previous genome-wide association studies *Unadjusted results may differ from main analysis because only individuals with both exome sequencing and whole genome sequencing were included in the conditional analysis. SNV single nucleotide variant

Discussion

We identified and replicated 14 associations between genetic loci and serum amino acid levels, all in or neighboring genes encoding enzymes. Four of the associated gene–amino acid pairs were novel (DDC–3-methoxytyrosine, VNN1–acisoga, ACY1–N-acetylalanine, and ACY1–N-acetylthreonine). Six of the loci–amino acid associations were identified by more than one analytical approach. In most cases, rare and low-frequency variants in the regions identified in this study were associated with amino acids independent of common variants previously identified by GWAS. Six of the gene–amino acid pairs identified here are known to underlie Mendelian disorders. Notably, among the four analytical approaches proposed in this study, analyses focusing on regulatory motifs was the only setting where there was no significant and replicated amino acid associations. Amino acids are the building blocks of proteins. Humans can synthesize 11 of the 20 standard amino acids and the remaining nine essential amino acids must be obtained from dietary sources. The genetic loci identified in this study are all associated with non-essential amino acids or amino acid derivatives, although previous GWAS have reported multiple common variants that are associated with levels of nine essential amino acids [6, 12–14]. Given the nature of amino acid biosynthesis and the properties of the enzyme-encoding genes, it is of note that six of the identified enzymes directly catalyze reactions involving the amino acid as a substrate or end product. Understanding the genetic bases of inherited metabolic disease has been a focus of human genetics for a long time. In this study, we identified six genes (DMGDH, AGA, ACY1, PRODH, DDC, CPS1) that have been previously implicated in recessive metabolic disorders, four of which show direct relationships to the amino acids identified here: mutations in AGA are known to cause aspartylglucosaminuria (MIM 208400); mutations in DMGDH cause dimethylglycine dehydrogenase deficiency (MIM 605850); mutations in ACY1 cause aminoacylase-1 deficiency (MIM 609924); and mutations in PRODH are known to cause hyperprolinemia type I (MIM 239500). Although the other two loci did not directly affect the identified amino acid levels, there is evidence suggesting that the two genes play a role in their regulation. DDC participates in tyrosine metabolism (DBGET: R02080) and mutations in it are known to causearomatic L-amino acid decarboxylase deficiency (AADC; MIM 608643). The identified amino acid 3-methoxytyrosine is one of the main biochemical markers of AADC [15]. CPS1 (carbamoyl phosphate synthetase I) encodes an ammonia ligase (DBGET: R00149) and deficiency of the CPS1 protein (MIM 608307) leads to hyperammonemia. Glycine is a precursor of ammonia (DBGET: R01221) and, as such, accumulates in the liver and kidneys under the condition of excess ammonia [16]. DMGDH–dimethylglycine, AGA–asparagine, PRODH–proline, and CPS1–glycine associations were reported by several previous studies (Additional file 1: Table S2), while the ACY1–N-acetylthreonine/N-acetylalanine and DDC–3-methoxytyrosine associations are novel. Our findings support that genetic variation impacts inter-individual differences in amino acid levels in the general population in addition to causing recessive inborn errors of metabolism. The data reported here provide new insight into the genes influencing blood amino acid levels. For example, CCBL1, which encodes kynurenine aminotransferase 1, was associated with three lactate derivatives, including indolelactate, phenyllactate (PLA), and 3-(4-hydroxyphenyl)lactate. Kynurenine aminotransferase 1 is known to be involved in tryptophan metabolism (DBGET: T01001, hsa00380), where it converts kynurenine, an intermediate of the tryptophan degradation pathway, into kynurenic acid [17], a neurotoxic compound associated with schizophrenia [18]. One of the three amino acids, indolelactate, is also part of tryptophan metabolism (DBGET: hsa00380). A common variant in CCBL1 has been reported to be related to indolelactate in populations of European ancestry [13], and we observed that rare and low-frequency variants in CCBL1 were associated with indolelactate in both AA and EA samples independent of the reported common variant. Because of the neurotoxic effect of kynurenic acid, inhibition of the kynurenine pathway is a therapeutic strategy for neurodegenerative disease [19, 20]. Current available drugs are indoleamine-pyrrole 2,3-dioxygenase (IDO) inhibitors, which inhibit the conversion of tryptophan to kynurenine. We identified rare and low-frequency variants in IDO1, encoding IDO, associated with low levels of kynurenine, suggesting that participants carrying functional mutations in IDO1 may show neuroprotection. Phenylalanine, tyrosine, and tryptophan have common steps in their biosynthesis pathway (DBGET:map00400). Interestingly, besides tryptophan metabolism, the other two identified lactate derivatives, PLA and 3-(4-hydroxyphenyl)lactate, are involved in phenylalanine and tyrosine metabolism. Both PLA and 3-(4-hydroxyphenyl)lactate are elevated in phenylketonuria and hyperphenylalaninemia [21], which if untreated may result in mental impairment and other neurologic disorders (MIM 261600 and 261640). Our results indicate that rare and low-frequency variants in CCBL1 are associated with increased levels for all three lactate derivatives. Future studies are warranted to dissect the mechanism of the observed associations and the possibility of CCBL1 as a novel drug target for neurologic disorders. The results reported here generate new hypotheses that future studies can investigate. One example is the association between a common missense variant in VNN1 and acisoga. Acisoga is a newly described amino acid involved in polyamine metabolism. Although polyamines are ubiquitous small molecules, acisoga is the only polyamine measured in our metabolomics panel. VNN1 encodes vanin 1, which shares extensive sequence similarity with biotinidase. The function for VNN1 is not well studied; however, it possesses pantetheinase activity, which may play a role in oxidative-stress response [22]. There is convincing evidence that altered polyamine metabolism is involved in many diseases, and drugs altering polyamine levels therefore may have a variety of important disease targets [23]. The results presented here provide preliminary directions for further research on polyamine metabolism and the VNN1 gene. The analysis strategy and results presented here establish a paradigm for whole genome sequence analysis of quantitative risk factor phenotypes. There is compelling evidence based on GWAS that common variants confer relatively small increments in risk and explain only a small proportion of the heritability [24]. Assessment of rare and low-frequency variants, specifically non-coding rare and low-frequency variants, in relation to human health is largely incomplete. Whole genome sequencing data offer an opportunity to characterize rare and low-frequency variations and variations outside of the usual protein-encoding regions. The UK10K and GoT2D projects [25, 26] have demonstrated success identifying novel findings utilizing whole genome sequencing, but this success has been limited compared to GWAS, in part due to the limited statistical power. Compared to studies of complex diseases, the study of quantitative phenotypes, such as amino acid levels which are proximal to gene function, can dramatically maximize statistical power. Our study successfully identified and replicated four novel findings, demonstrating the feasibility of analyzing whole genome sequences in the context of intermediate quantitative phenotypes to promote novel biologically relevant findings. Although the majority of the findings in our study reside in coding regions, we were able to identify non-coding loci that contribute to amino acid levels. For example, a common intronic variant, rs11131799, was shown to be associated with asparagine levels, independent of coding variants in AGA (AGA, P = 1.1 × 10−10, P = 2.4 × 10−9). Conditioning on AGA coding variants did not markedly alter the non-coding locus association. AGA encodes the enzyme aspartylglucosaminidase, which breaks down glycoproteins by hydrolyzing N-acetylglucosamine–asparagine linkages, thereby releasing asparagine. Rs11131799, annotated as a predicted promoter variant, is highly associated with AGA expression levels (http://genenetwork.nl/biosqtlbrowser/). Some of the variants involved in the 4-kb window are annotated as predicted deleterious by CADD [11] and FATHMM-MKL [27]. A previous study identified an association between asparagine and the ASPG locus, encoding asparaginase [13], which catalyzes the hydrolysis of asparagine to aspartic acid. Interestingly, our lead variant for the AGA–asparagine association (rs11131799) occurred in both AA and EA participants, while the previously reported lead variant (rs4690522) was only observed in EA participants. The two variants were in strong linkage disequilibrium in EA participants, but not in linkage disequilibrium in AA participants, suggesting that rs4690522 may have simply been a proxy for rs11131799 in previous studies. The data reported here suggest that blood asparagine levels may be influenced not only by the coding regions but also by some regulatory elements. Further annotation information is warranted to dissect the two non-coding regions in relation to asparagine levels. Among the four analytical approaches proposed in this study, the analysis of regulatory motifs was the only approach that did not yield novel findings. If we consider effect sizes seen in the other analysis approaches, these results reemphasize that improvements in annotation, particularly non-coding regulatory elements, are necessary. It is likely that the high density of non-functional variants in the hypothesized regulatory motifs overwhelms the sparser functional variants included in a burden test. Alternatively, single rare and low-frequency variants with large effects may be scarce in annotated regulatory elements of the human genome. Strengths of this study include the use of direct sequencing, as opposed to genotyping and imputation. By using sequencing data, we were able to interrogate low-frequency, rare, and private variants that are not covered by genotyping and imputation. Even for variants accessible by both approaches, sequencing avoids the measurement error generated by imputation, which can be large for rare variants. The advantages of sequencing are particularly important for fine-mapping, since differences in imputation quality among variants can obstruct the search for the most likely causal variant. An additional strength of this study is the joint calling of variants in a larger pooled sample of studies conducted in the same laboratory, including ARIC. By increasing the sample size during the calling of variants, the ability to correctly call rare variants is enhanced [28]. The discovery sample for this study was AA, a population with a high level of genetic diversity, to promote novel findings. Also, AA are relatively under-represented in large-scale genomics research. To our knowledge, there is no AA sample for which both whole genome sequencing and multi-amino acid measurements are available to perform replication. Therefore, EA were used as the replication sample. Our focus here is the similar associations detected in both AA and EA. For the associations that were not replicated in EA, population-specific genetic variation and effects are possible reasons in addition to the original observation being a type I error. The variants included in aggregate tests differed between our discovery (AA) and replication (EA) samples due to ancestry-specific variants as well as allele frequency differences among shared variants. The variance explained by a genetic locus provides an estimate about the proportion of phenotypic variation that is attributed to inter-individual differences in DNA sequence. In this study, the variance explaining amino acid levels ranges from 0.4 to 9.7% among AA. Our previous GWAS reported 5 to 20% variance explaining differing levels of five amino acids [6], and the range of variance explaining differences in amino acid levels varied among Caucasians, such as 1–10% [29] or 1–25% [13]. To our knowledge, there is no trans-ethnic genetic association study of amino acid levels. Nevertheless, our exploratory trans-ethnic meta-analysis provided insights for future studies. Further investigation is warranted to evaluate these and additional findings in multiple ethnic groups.

Conclusions

By integrating -omic technologies into deeply phenotyped populations, we show that sequencing variants affect the levels of multiple human amino acids among two ethnicities. These data and results identify new avenues of gene function, novel molecular mechanisms, and potentially diagnostic targets for multiple diseases.

Methods

Study population and metabolome measurements

The Atherosclerosis Risk in Communities (ARIC) study is a prospective epidemiological study designed to investigate the etiology and predictors of cardiovascular disease. It enrolled 15,792 individuals aged 45–64 years from four US communities (Forsyth County, NC; Jackson, MS; suburbs of Minneapolis, MN; and Washington County, MD) in 1987–89 (baseline) and followed them for four completed visits in 1990–92, 1993–95, 1996–98, and 2011–13. A detailed description of the ARIC study design and methods is published elsewhere [30]. Amino acid levels were measured using fasting serum samples collected at the baseline examination in 1987–1989 among ARIC selected AA and EA. A total of 89 amino acids were detected and semi-quantified by Metabolon Inc. (Durham, USA) using an untargeted, gas chromatography–mass spectrometry and liquid chromatography–mass spectrometry (GC-MS and LC-MS)-based metabolomic quantification protocol (Additional file 2: Supplemental methods) [31, 32]. Amino acids were excluded if: 1) more than 25% of the samples had values below the detection limit; or 2) the Pearson correlation coefficients between 2010 and 2014 measurements were <0.3 (Additional file 2: Supplemental methods). After this assessment, 70 metabolites were included in the present study.

Exome sequencing

Isolated DNA from AA and EA for exon sequencing were further processed using the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) VCRome 2.1 reagent (42 Mb, NimbleGen) [33], and all samples were paired-end sequenced using Illumina GAII or HiSeq instruments. Details about sequencing, variant calling, and variant quality control are provided in Additional file 2: Supplemental methods. Variants were annotated using ANNOVAR [34] and dbNSFP v2.0 [35] according to the reference genome GRCh37 and National Center for Biotechnology Information RefSeq.

Whole genome sequencing

Whole genome sequencing data for AA and EA were generated at BCM-HGSC using Nano or PCR-free DNA libraries and the Hiseq 2000 instrument (Illumina, Inc., San Diego, CA, USA). Methods for the whole genome sequencing of the ARIC study samples were described elsewhere [36]. Briefly, individuals were sequenced at sevenfold average depth on Illumina HiSeq instruments and variant calling was completed using goSNAP (https://sourceforge.net/p/gosnap/git/ci/master/tree/). Details about sequencing, variant calling, and variant quality control are provided in Additional file 2: Supplemental methods. Whole genome sequencing variants were annotated across regions and functional domains using the Whole Genome Sequencing Annotation (WGSA) pipeline [37]. The 3′ and 5′ UTRs of a gene were determined using ANNOVAR [34] annotations based on the RefSeq gene model [38]. The promoter of a gene was defined based on the overlap between the permissive set of CAGE peaks reported by the FANTOM5 project [39] and the 5-kb upstream region determined by the ANNOVAR annotation based on the RefSeq gene model. The enhancers and the target genes of the enhancers were defined based on the permissive set of enhancers and enhancer–promoter pairs reported by the FANTOM5 project. In the case of an undesignated enhancer–gene pair, we assigned an enhancer to the nearest gene.

Statistical analyses

Metabolomic data points lying outside the 1st–99th percentile of each amino acid level were winsorized among each measurement respectively. Levels below the detectable limit of the assay were imputed with the lowest detected value for that amino acid in all samples. Amino acid levels were then natural log-transformed prior to the analyses. Because our primary focus was on rare and low-frequency variants, we aggregated rare and low-frequency variants (MAF ≤5%) in groups based on gene exons, regulatory motifs, or sliding windows. Gene-based aggregation tests are designed for rare and low-frequency coding variants. The analytical unit is an annotated gene. All annotated coding variants, such as splicing, stop-gain, stop-loss, nonsynonymous, and indels within the gene were aggregated for the analysis. The regulatory motifs included annotated enhancers, the 3′ and 5′ UTRs, and promoter of a gene. The sliding window approach is designed to aggregate rare and low-frequency variants according to their physical position regardless of annotated function. Based on our previous experience [36], sliding windows were defined as 4 kb in length and began at position 0 bp for each chromosome, with a skip length of 2 kb. Within each annotated unit, a burden test (T5) [40] was used, adjusting for age, sex, and the first three principal components (PCs). We further adjusted for estimated glomerular filtration rate (eGFR) [41], an indicator of kidney function, since multiple amino acid levels were associated with eGFR [42]. The T5 burden test collapses variants with MAF ≤5% into a single genetic score to evaluate the joint effects of rare and low-frequency alleles. We also conducted single variant analysis for all individual variants with MAF >5% using an additive genetic model with the same adjustments. For each approach, the variance explained (VarExp) was calculated using the effect allele frequency (p) and beta (β) from the analyses and the variance of the quantitative trait (σ ) using the formula VarExp = β /σ × 2 × p × (1 − p) [43]. In addition, we also applied the CADD scores [11] as variant weights to the regulatory motifs. The weights were defined as the difference between raw CADD scores and the minimum CADD score scaled by the range of the raw CADD scores and were introduced into the T5 burden test using its quartic form. The analytical models were the same as described above. All analyses were carried out using the R seqMeta package [44]. The significance threshold for the gene-based analysis is defined as P < 4.6 × 10−8 for the discovery stage adjusting for 15,589 genes and 70 amino acids and P < 0.003 for the replication stage adjusting for 15 significant gene–amino acid pairs identified in the discovery stage. The significance threshold for the regulatory motifs analysis is defined as P < 3.4 × 10−8 for the discovery stage adjusting for 21,040 genes and 70 amino acids. The significance threshold for the sliding window approach is defined as P < 1.1 × 10−9 for the discovery stage adjusting for 668,748 non-overlapping windows and 70 amino acids and P < 0.01 for the replication stage adjusting for five significant window–amino acid pairs identified in the discovery stage. The significance threshold for the single variant analysis is defined as P < 7.1 × 10−10 for the discovery stage adjusting for one million independent common variants [45] and 70 amino acids and P < 0.003 for the replication stage adjusting for 16 significant single variant–amino acid pairs identified in the discovery stage. We consider an association novel if it has not been reported in previous GWAS or candidate gene study. We also performed trans-ethnic meta-analysis among the discovery and replication samples to provide additional insight into the genetic loci discovery. Regions associated with amino acid levels using the gene-based or sliding window approaches that have already been identified by previous GWAS were selected for inclusion in the conditional analyses. We reexamined each of the selected associations, additionally adjusting the region-based association for the lead common variant identified by the GWAS, and vice versa. To adjust the GWAS variants for the identified regions, we computed the T5 burden and used it as a covariate. We also performed a conditional analysis for our single variant findings when these overlapped with regions identified by GWAS, adjusting our lead single variant for the lead variant identified by GWAS and vice versa.

44 in total

Review 1. Renal metabolism of amino acids: its role in interorgan amino acid exchange.

Authors: Marcel C G van de Poll; Peter B Soeters; Nicolaas E P Deutz; Kenneth C H Fearon; Cornelis H C Dejong
Journal: Am J Clin Nutr Date: 2004-02 Impact factor: 7.045

2. Rapid evaluation of phenotypes, SNPs and results through the dbGaP CHARGE Summary Results site.

Authors: Stephen S Rich; Zeng Y Wang; Anne Sturcke; Lora Ziyabari; Mike Feolo; Christopher J O'Donnell; Ken Rice; Joshua C Bis; Bruce M Psaty
Journal: Nat Genet Date: 2016-06-28 Impact factor: 38.330

Review 3. Mammalian polyamine metabolism and function.

Authors: Anthony E Pegg
Journal: IUBMB Life Date: 2009-09 Impact factor: 3.885

Review 4. Kynurenine pathway inhibition as a therapeutic strategy for neuroprotection.

Authors: Trevor W Stone; Caroline M Forrest; L Gail Darlington
Journal: FEBS J Date: 2012-03-27 Impact factor: 5.542

5. Kynurenine pathway metabolites in humans: disease and healthy States.

Authors: Yiquan Chen; Gilles J Guillemin
Journal: Int J Tryptophan Res Date: 2009-01-08

6. Association of Rare Loss-Of-Function Alleles in HAL, Serum Histidine: Levels and Incident Coronary Heart Disease.

Authors: Bing Yu; Alexander H Li; Donna Muzny; Narayanan Veeraraghavan; Paul S de Vries; Joshua C Bis; Solomon K Musani; Danny Alexander; Alanna C Morrison; Oscar H Franco; André Uitterlinden; Albert Hofman; Abbas Dehghan; James G Wilson; Bruce M Psaty; Richard Gibbs; Peng Wei; Eric Boerwinkle
Journal: Circ Cardiovasc Genet Date: 2015-01-08

7. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations.

Authors: Xiaoming Liu; Xueqiu Jian; Eric Boerwinkle
Journal: Hum Mutat Date: 2013-07-10 Impact factor: 4.878

8. A promoter-level mammalian expression atlas.

Authors: Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

9. Genome-wide meta-analysis of homocysteine and methionine metabolism identifies five one carbon metabolism loci and a novel association of ALDH1L1 with ischemic stroke.

Authors: Stephen R Williams; Qiong Yang; Fang Chen; Xuan Liu; Keith L Keene; Paul Jacques; Wei-Min Chen; Galit Weinstein; Fang-Chi Hsu; Alexa Beiser; Liewei Wang; Ebony Bookman; Kimberly F Doheny; Philip A Wolf; Michelle Zilka; Jacob Selhub; Sarah Nelson; Stephanie M Gogarten; Bradford B Worrall; Sudha Seshadri; Michèle M Sale
Journal: PLoS Genet Date: 2014-03-20 Impact factor: 5.917

10. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962

11 in total

1. Challenges and progress in interpretation of non-coding genetic variants associated with human disease.

Authors: Yizhou Zhu; Cagdas Tazearslan; Yousin Suh
Journal: Exp Biol Med (Maywood) Date: 2017-06-05

2. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study.

Authors: Paul S de Vries; Bing Yu; Elena V Feofanova; Ginger A Metcalf; Michael R Brown; Atefeh L Zeighami; Xiaoming Liu; Donna M Muzny; Richard A Gibbs; Eric Boerwinkle; Alanna C Morrison
Journal: Hum Mol Genet Date: 2017-09-01 Impact factor: 6.150

3. A Genome-wide Association Study Discovers 46 Loci of the Human Metabolome in the Hispanic Community Health Study/Study of Latinos.

Authors: Elena V Feofanova; Han Chen; Yulin Dai; Peilin Jia; Megan L Grove; Alanna C Morrison; Qibin Qi; Martha Daviglus; Jianwen Cai; Kari E North; Cathy C Laurie; Robert C Kaplan; Eric Boerwinkle; Bing Yu
Journal: Am J Hum Genet Date: 2020-10-07 Impact factor: 11.025

4. Genetic variants in microRNA genes and targets associated with cardiovascular disease risk factors in the African-American population.

Authors: Chang Li; Megan L Grove; Bing Yu; Barbara C Jones; Alanna Morrison; Eric Boerwinkle; Xiaoming Liu
Journal: Hum Genet Date: 2017-12-20 Impact factor: 4.132

5. Sequence-Based Analysis of Lipid-Related Metabolites in a Multiethnic Study.

Authors: Elena V Feofanova; Bing Yu; Ginger A Metcalf; Xiaoming Liu; Donna Muzny; Jennifer E Below; Lynne E Wagenknecht; Richard A Gibbs; Alanna C Morrison; Eric Boerwinkle
Journal: Genetics Date: 2018-04-02 Impact factor: 4.562

6. Serum Metabolomics and Incidence of Atrial Fibrillation (from the Atherosclerosis Risk in Communities Study).

Authors: Alvaro Alonso; Bing Yu; Yan V Sun; Lin Y Chen; Laura R Loehr; Wesley T O'Neal; Elsayed Z Soliman; Eric Boerwinkle
Journal: Am J Cardiol Date: 2019-03-18 Impact factor: 2.778

7. The Consortium of Metabolomics Studies (COMETS): Metabolomics in 47 Prospective Cohort Studies.

Authors: Bing Yu; Krista A Zanetti; Marinella Temprosa; Demetrius Albanes; Nathan Appel; Clara Barrios Barrera; Yoav Ben-Shlomo; Eric Boerwinkle; Juan P Casas; Clary Clish; Caroline Dale; Abbas Dehghan; Andriy Derkach; A Heather Eliassen; Paul Elliott; Eoin Fahy; Christian Gieger; Marc J Gunter; Sei Harada; Tamara Harris; Deron R Herr; David Herrington; Joel N Hirschhorn; Elise Hoover; Ann W Hsing; Mattias Johansson; Rachel S Kelly; Chin Meng Khoo; Mika Kivimäki; Bruce S Kristal; Claudia Langenberg; Jessica Lasky-Su; Deborah A Lawlor; Luca A Lotta; Massimo Mangino; Loïc Le Marchand; Ewy Mathé; Charles E Matthews; Cristina Menni; Lorelei A Mucci; Rachel Murphy; Matej Oresic; Eric Orwoll; Jennifer Ose; Alexandre C Pereira; Mary C Playdon; Lucilla Poston; Jackie Price; Qibin Qi; Kathryn Rexrode; Adam Risch; Joshua Sampson; Wei Jie Seow; Howard D Sesso; Svati H Shah; Xiao-Ou Shu; Gordon C S Smith; Ulla Sovio; Victoria L Stevens; Rachael Stolzenberg-Solomon; Toru Takebayashi; Therese Tillin; Ruth Travis; Ioanna Tzoulaki; Cornelia M Ulrich; Ramachandran S Vasan; Mukesh Verma; Ying Wang; Nick J Wareham; Andrew Wong; Naji Younes; Hua Zhao; Wei Zheng; Steven C Moore
Journal: Am J Epidemiol Date: 2019-06-01 Impact factor: 4.897

8. Genome-wide association study of serum metabolites in the African American Study of Kidney Disease and Hypertension.

Authors: Shengyuan Luo; Elena V Feofanova; Adrienne Tin; Sarah Tung; Eugene P Rhee; Josef Coresh; Dan E Arking; Aditya Surapaneni; Pascal Schlosser; Yong Li; Anna Köttgen; Bing Yu; Morgan E Grams
Journal: Kidney Int Date: 2021-04-08 Impact factor: 18.998

9. Human disease genomics: from variants to biology.

Authors: Mark I McCarthy; Daniel G MacArthur
Journal: Genome Biol Date: 2017-01-30 Impact factor: 13.583

10. Circulating amino acids and the risk of macrovascular, microvascular and mortality outcomes in individuals with type 2 diabetes: results from the ADVANCE trial.

Authors: Paul Welsh; Naomi Rankin; Qiang Li; Patrick B Mark; Peter Würtz; Mika Ala-Korpela; Michel Marre; Neil Poulter; Pavel Hamet; John Chalmers; Mark Woodward; Naveed Sattar
Journal: Diabetologia Date: 2018-05-04 Impact factor: 10.122