| Literature DB >> 21878913 |
George Nicholson1, Mattias Rantalainen, Anthony D Maher, Jia V Li, Daniel Malmodin, Kourosh R Ahmadi, Johan H Faber, Ingileif B Hallgrímsdóttir, Amy Barrett, Henrik Toft, Maria Krestyaninova, Juris Viksna, Sudeshna Guha Neogi, Marc-Emmanuel Dumas, Ugis Sarkans, Bernard W Silverman, Peter Donnelly, Jeremy K Nicholson, Maxine Allen, Krina T Zondervan, John C Lindon, Tim D Spector, Mark I McCarthy, Elaine Holmes, Dorrit Baunsgaard, Chris C Holmes.
Abstract
¹H Nuclear Magnetic Resonance spectroscopy (¹H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired ¹H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in ¹H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect ¹H NMR-based biomarkers quantifying predisposition to disease.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21878913 PMCID: PMC3202796 DOI: 10.1038/msb.2011.57
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Percentage decomposition of biological population variation—summary of results
| Plasma standard 1D (87 peaks) | Plasma spin-echo (87 peaks) | Plasma diffusion-edited (24 peaks) | Plasma all (198 peaks) | Urine standard 1D (328 peaks) | |
|---|---|---|---|---|---|
| aMean of estimates, across peaks. | |||||
| bInterquartile range of estimates, across peaks. | |||||
| (A) Familiality | 38a (28–48)b | 43 (33–56) | 49 (45–56) | 42 (32–52) | 30 (17–39) |
| (B) Individual environment | 17 (9–22) | 20 (10–26) | 22 (14–25) | 19 (10–25) | 18 (9–25) |
| (C) Individual visit | 35 (24–47) | 27 (14–39) | 20 (12–28) | 30 (17–39) | 45 (34–55) |
| (D) Common visit | 10 (4–15) | 10 (4–13) | 9 (5–13) | 10 (4–14) | 8 (4–10) |
| (A+B) Stable total | 55 (42–69) | 63 (54–73) | 71 (63–79) | 60 (51–72) | 47 (35–60) |
| (C+D) Unstable total | 45 (31–58) | 37 (27–46) | 29 (21–37) | 40 (28–49) | 53 (40–65) |
Figure 1Decomposition of biological variance for each annotated metabolite. The plot displays estimates (and measures of precision) for the proportion of biological variance explained by each of four components (familial, individual environmental, individual visit, and common visit). The central tick within each box marks the posterior mean, the box extends to the posterior quartiles, and the whiskers extend to the 2.5 and 97.5 posterior percentiles. Metabolites are ordered by estimated familiality.
Figure 2Sample size calculations for 1H NMR-based MWASs. A hypothetical study was designed to detect an association between a metabolic phenotype, x, and a disease phenotype, y, when the biological processes linking x and y explain a proportion p of population variation in x, and a proportion q of population variation in y (so . Calculations were based on the study attaining 80% power to reject H0:ρ=0 at a 10−4 level of significance (a Bonferroni-corrected significance level of 0.05, assuming that 500 metabolite peaks were tested for disease association). (A) Sample size as a function of p and q. Darker grey represents a larger required sample size (the colour scale is indicated by labelled contour lines on the plot). (B) Sample sizes for the discovery of 1H NMR-based urine biomarkers. Bottom panel (annotated ‘'): Probability distributions on the magnitude of the underlying correlation (on logarithmic scale) between urinary metabolite concentration, x, and the disease phenotype, y. The probability distribution on p (not shown) was constructed using (for upper bounds) the current paper's estimates of the stable proportion of variation for peaks in the urine data (details are in Results). The proportion of disease risk explained, q, was fixed at four different values (annotated on plot). Main panel (annotated with four different values for q): Relationship between the underlying (x, y) correlation, and the sample size required for effect detection (both on logarithmic scale). Left panel (annotated ‘Sample Size for 80% Power'): Probability distribution on sample size (on logarithmic scale) required for effect detection, mapped from the correlation distributions in the bottom panel.
Variance parameters—textual description and mathematical notation
| Familial variance | σ |
| Individual-environment variance | σ |
| Common-visit variance | σ |
| Individual-visit variance | σ |
| Non-biological (residual) variance | σɛ2 |
| Total phenotypic variance | σ |
| Total biological variance | σ |
| Non-biological proportion of total phenotypic variance | |
| Familiality (familial proportion of biological variance) | |
| Stable proportion of biological variance | |
| Unstable proportion of biological variance |