| Literature DB >> 35216288 |
Nikolina Pleić1, Mirjana Babić Leko1, Ivana Gunjača1, Thibaud Boutin2, Vesela Torlak3, Antonela Matana1, Ante Punda3, Ozren Polašek4, Caroline Hayward2, Tatijana Zemunik1.
Abstract
Thyroglobulin (Tg) is an iodoglycoprotein produced by thyroid follicular cells which acts as an essential substrate for thyroid hormone synthesis. To date, only one genome-wide association study (GWAS) of plasma Tg levels has been performed by our research group. Utilizing recent advancements in computation and modeling, we apply a Bayesian approach to the probabilistic inference of the genetic architecture of Tg. We fitted a Bayesian sparse linear mixed model (BSLMM) and a frequentist linear mixed model (LMM) of 7,289,083 variants in 1096 healthy European-ancestry participants of the Croatian Biobank. Meta-analysis with two independent cohorts (total n = 2109) identified 83 genome-wide significant single nucleotide polymorphisms (SNPs) within the ST6GAL1 gene (p<5×10-8). BSLMM revealed additional association signals on chromosomes 1, 8, 10, and 14. For ST6GAL1 and the newly uncovered genes, we provide physiological and pathophysiological explanations of how their expression could be associated with variations in plasma Tg levels. We found that the SNP-heritability of Tg is 17% and that 52% of this variation is due to a small number of 16 variants that have a major effect on Tg levels. Our results suggest that the genetic architecture of plasma Tg is not polygenic, but influenced by a few genes with major effects.Entities:
Keywords: BSLMM; LMM; ST6GAL1; genome-wide association study; thyroglobulin; thyroid
Mesh:
Substances:
Year: 2022 PMID: 35216288 PMCID: PMC8876738 DOI: 10.3390/ijms23042173
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
SNPs passing genome-wide significance threshold () in the single-SNP LMM analysis and their corresponding PIPs from the multi-SNP BSLMM analysis of cohorts Korcula2 and Korcula3.
| SNP | Chr | Position | Gene | Ref. Allele | Effect Allele | EAF | Single-SNP LMM Analysis in Cohorts Korcula2 and Korcula3 | Multi-SNP BSLMM Analysis in Cohorts Korcula2 and Korcula3 |
|---|---|---|---|---|---|---|---|---|
| rs10937280 | 3 | 186738033 |
| G | A | 0.35 | −0.31 ( | −0.29 (0.21) |
| rs5001409 | 3 | 186735690 |
| A | C | 0.35 | −0.31 ( | −0.295 (0.07) |
| rs9863411 | 3 | 186737820 |
| C | T | 0.35 | −0.31 ( | −0.283 (0.2) |
| rs7634389 | 3 | 186738421 |
| T | C | 0.35 | −0.31 ( | −0.292 (0.08) |
| rs967367 | 3 | 186734466 |
| G | A | 0.35 | −0.31 ( | −0.29 (0.12) |
| rs3821819 | 3 | 186732725 |
| G | A | 0.35 | −0.31 ( | −0.292 (0.06) |
| rs4686838 | 3 | 186743053 |
| A | G | 0.45 | −0.3( | −0.27 (0.08) |
| rs10212190 | 3 | 186731157 |
| A | T | 0.34 | −0.29 ( | −0.28 (0.003) |
| rs4012172 | 3 | 186741511 |
| C | T | 0.36 | −0.29 ( | −0.27 (0.0003) |
| rs3872724 | 3 | 186741221 |
| C | T | 0.37 | −0.28 ( | −0.27 (0.001) |
| rs3872723 | 3 | 186741131 |
| C | T | 0.36 | −0.28 ( | 0 (0) |
| rs28674898 | 3 | 186744563 |
| G | A | 0.39 | 0.28 ( | −0.28 (0.003) |
| rs4686844 | 3 | 186765135 |
| G | A | 0.56 | −0.25 ( | −0.15 (0.0007) |
| rs78946539 | 1 | 13921500 |
| A | G | 0.04 | −0.63 ( | −0.51 (0.03) |
| rs143154928 | 1 | 13921447 |
| G | A | 0.04 | −0.63 ( | −0.5 (0.03) |
| rs12566684 | 1 | 13922117 |
| A | G | 0.04 | −0.64 ( | −0.5 (0.02) |
| rs257104 | 3 | 186775807 |
| G | A | 0.4 | 0.24 ( | 0.17 (0.002) |
Statistical analyses were performed with GEMMA LMM and BSLMM. p-values < are genome-wide significant. SNPs are sorted by ascending LMM p-value. BSLMM, Bayesian sparse linear mixed model; Chr, chromosome; EAF, effect allele frequency; LMM, linear mixed model; PIP; posterior inclusion probability; SNP, single nucleotide polymorphism.
Figure 1Manhattan plots of single-SNP and multi-SNP association mapping in cohorts Korcula2 and Korcula3. (A) Manhattan plot of single-SNP LMM analysis. The x axis represents the chromosomal position of SNPs and the y axis represents their (p-values) obtained by the LMM analysis. Each dot on the Manhattan plot signifies an SNP. Because the strongest associations have the smallest p-values (e.g., ), their negative logarithms will be the greatest (e.g., 12). The red horizontal line indicates the genome-wide significance threshold (), while the blue horizontal line indicates the suggestive threshold of significance (). (B) Manhattan plot of multi-SNP BSLMM analysis. The x axis represents the chromosomal position of SNPs, and the y axis represents their posterior inclusion probabilities (PIPs) obtained by the BSLMM analysis.
Figure 2Manhattan plot and quantile–quantile (Q-Q) plot of the meta-analysis results for thyroglobulin (Tg) levels. (A) Manhattan plot of single nucleotide polymorphisms (SNP) for Tg levels. The x axis represents the chromosomal position of SNPs and the y axis represents their (p-values) obtained by combined analysis. Each dot on the Manhattan plot signifies an SNP. The red horizontal line indicates the genome-wide significance threshold (p = ), while the blue horizontal line indicates the suggestive threshold of significance (p = ). (B) In the Q-Q plot, we see a strong deviation from the null distribution (the distribution of p-values under the null hypothesis of no true association is indicated by the red line).
Figure 3Regional association plot of the ST6GAL1 region. The most significant SNP (rs5001409) is shown in purple. The colors of the circles denote their correlations (LD ) with the top SNP (lead SNP in purple, high LD SNPs with in red, orange for , green for , light blue for and dark blue for ). The figure was generated using the LocusZoom tool [20].
Figure 4Colocalization analysis of thyroglobulin GWAS signals with eQTL signals of ST6GAL1 gene in thyroid tissue. Filled circles represent thyroglobulin GWAS (p-values) (left y axis). The rs5001409 SNP was defined as the lead SNP and is presented in purple. The LD information is similar to LocusZoom. The LD information was computed from the European 1000 Genomes subset (phase 1, version 3) [21] in reference to the lead SNP. The gray line represents the eQTL signals and traces the lowest p-value (right y axis, showing (p-values)). Gene track information is from GENCODE v19 (hg19 coordinates). The figure was generated using the LocusFocus tool [22].
Characteristics of the study population.
| Cohort | Split | Korcula 1 | Korcula 2 | Korcula 3 |
|---|---|---|---|---|
|
| 605 | 489 | 593 | 505 |
| Women | 321 (53%) | 297 (61%) | 328 (55.3%) | 294 (58.2%) |
| Age | 51 (39, 61) | 56 (46, 67) | 54 (40, 65) | 54 (39, 65) |
| Tg | 9.20 (4.80, 14.50) | 10.20 (6.40, 15.70) | 10.1 (5.6, 16.4) | 10.6 (7.5, 16.1) |
Values in the table represent median (interquartile range) or n (%). n, number of participants; Tg, thyroglobulin.