| Literature DB >> 26366554 |
Carlo Sidore1,2,3, Fabio Busonero1,2,4, Andrea Maschio1,2,4, Eleonora Porcu1,2,3, Silvia Naitza1, Magdalena Zoledziewska1, Antonella Mulas1,3, Giorgio Pistis1,2,3, Maristella Steri1, Fabrice Danjou1, Alan Kwong2, Vicente Diego Ortega Del Vecchyo5, Charleston W K Chiang6, Jennifer Bragg-Gresham2, Maristella Pitzalis1, Ramaiah Nagaraja7, Brendan Tarrier4, Christine Brennan4, Sergio Uzzau8, Christian Fuchsberger2, Rossano Atzeni9, Frederic Reinier9, Riccardo Berutti3,9, Jie Huang10, Nicholas J Timpson11, Daniela Toniolo12, Paolo Gasparini13,14, Giovanni Malerba15, George Dedoussis16, Eleftheria Zeggini10, Nicole Soranzo10,17, Chris Jones9, Robert Lyons4, Andrea Angius1,9, Hyun M Kang2, John Novembre18, Serena Sanna1, David Schlessinger7, Francesco Cucca1,3, Gonçalo R Abecasis2.
Abstract
We report ∼17.6 million genetic variants from whole-genome sequencing of 2,120 Sardinians; 22% are absent from previous sequencing-based compilations and are enriched for predicted functional consequences. Furthermore, ∼76,000 variants common in our sample (frequency >5%) are rare elsewhere (<0.5% in the 1000 Genomes Project). We assessed the impact of these variants on circulating lipid levels and five inflammatory biomarkers. We observe 14 signals, including 2 major new loci, for lipid levels and 19 signals, including 2 new loci, for inflammatory markers. The new associations would have been missed in analyses based on 1000 Genomes Project data, underlining the advantages of large-scale sequencing in this founder population.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26366554 PMCID: PMC4627508 DOI: 10.1038/ng.3368
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Summary of Discovered Variants
The table provides an overview of the sequencing data, including summary statistics on data generated, a breakdown by frequency and biological function of all variants discovered and their novelty rate based on public databases. Finally, we show the distribution of variants discovered per each sequenced individual.
| Data Generation | |||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Total Mapped Bases | *** 22,684 Gb *** | ||||||
| Average Depth | *** 4.16× *** | ||||||
| Coding Variation | |||||||
| Genome | Regulatory | Silent | Splice | Essential Splice | Missense | Nonsense | |
|
| |||||||
|
| |||||||
| No. of Variants | 17.6M | 1,596,737 | 63,062 | 21,097 | 2,504 | 84,312 | 2,013 |
| Novelty rate vs dbSNP 135 | 31.6% | 31.7% | 24.0% | 31.8% | 36.2% | 34.8% | 48.7% |
| Novelty rate vs dbSNP 142 | 21.7% | 21.6% | 15.2% | 19.1% | 26.5% | 22.6% | 34.8% |
| Novelty rate vs dbSNP142 and Exome Aggregation | 21.6% | 21.5% | 7.0% | 14.2% | 21.8% | 11.8% | 21.6% |
|
| |||||||
|
| |||||||
| Common (MAF > 5%) | 31.8% | 31.2% | 29.1% | 28.7% | 26.8% | 20.7% | 14.5% |
| Low Frequency (MAF 0.5-5%) | 19.8% | 21.2% | 21.5% | 20.7% | 20.1% | 19.8% | 15.8% |
| Rare (MAF < 0.5%) | 47.7% | 47.5% | 49.4% | 50.6% | 53.2% | 59.5% | 69.7% |
| Singletons | 9.0% | 8.8% | 9.2% | 9.6% | 9.8% | 12.3% | 17.9% |
|
| |||||||
|
| |||||||
| 5th Percentile | 3,332,299 | 293,928 | 10,619 | 3,331 | 361 | 10,738 | 158 |
| Average | 3,359,655 | 293,928 | 10,778 | 3,396 | 380 | 10,920 | 172 |
| 95th Percentile | 3,383,736 | 298,766 | 10,934 | 3,465 | 400 | 11,100 | 186 |
Figure 1Geographical differentiation based on common and rare sites
The figure show allele sharing among the Sardinian and the 1000 Genomes European populations. In panel a) differentiation is represented for three different frequency intervals over the geographic map of Europe. The thickness and the color of the lines connecting the dots are proportional to the allele sharing statistic as indicated in the color map. In panel b) we instead represent the relationship between the frequency (X axis) and the sharing ratio (on the Y axis) for different 1000 Genomes Project populations (continuous lines). Results are plotted separately for the Lanusei valley sample (left panel) and the case control samples (right panel). The dotted line are used as comparison to show the sharing ratio between the TSI and other 1000 Genomes Project populations.
Figure 2Length of shared haplotypes surrounding f2 variants within Sardinians and populations in 1000 Genomes
Length of shared haplotypes surrounding f2 variants shared between one of our sequenced individuals and one of 100 randomly selected individuals sampled from our study or from a particular 1000 Genomes Project population. Panel a) shows the length of these shared haplotypes, in kilobases, in comparisons between Sardinia and several 1000 Genomes Project populations. Panel b) shows the number of f2 haplotypes in each comparison. Panel c) shows the number of f2 haplotypes in comparisons within Sardinia (note the wider Y-axis range).
Summary of Lipid Association Results
The table lists association signals that reach p <5×10−8 for association with lipid levels in our study. At each novel locus, we indicated the genes likely to be modulated by the lead SNP, the location of the lead variant (human genome build GRCh37), the variant identifier rs#, the nearest gene, the effect and other allele, the frequency of the effect allele, the effect size in standard deviation units and the standard error, the pvalue, the proportion of variance explained by the allele (R2%), the imputation accuracy (RSQR) , the functional consequence of the variant and the r2 with hits previously identified in ([12]). When reporting a second signal within a locus, we first controlled for association with the local peak variant, as indicated by an asterisk (*, **) in the corresponding rows. Novel signals are shown in bold.
| Candidate Gene | Chr:position | rs name | Effect Allele / Other | Freq | Effect (StdErr) | pvalue | R2(%) | RSQR | Variant Consequence | r2 with previous hit |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
|
| ||||||||||
|
| 1:55505647 | rs11591147 | T/G | 0.038 | −0.406(0.053) | 1.73 × 10−14 | 1.0 | Genotyped | Missense, R46L | Same SNP |
|
| 1:109821307 | rs583104 | G/T | 0.180 | 0.156(0.027) | 1.87 × 10−08 | 0.5 | Genotyped | Downstream | 0.821 |
|
|
|
|
|
|
|
|
|
|
| - |
|
| 19: 19456917 | rs58489806 | T/C | 0.074 | −0.232(0.042) | 2.58 × 10−08 | 0.5 | Genotyped | Intronic | 0.858 |
|
| 19:45412079 | rs7412 | T/C | 0.036 | −0.645(0.053) | 2.47 × 10−33 | 2.4 | Genotyped | Missense, R176C | Same SNP |
|
| 19:45411941 | rs429358 [ | C/T | 0.074 | 0.264(0.039) | 1.21 × 10−11 | 0.8 | 0.999 | Missense, C130R | Same SNP |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| 1:55505647 | rs11591147 | T/G | 0.038 | −0.390(0.053) | 1.69 × 10−13 | 1.0 | Genotyped | Missense, R46L | Same SNP |
|
| 4:41980435 | - | G/A | 0.013 | −0.520(0.091) | 6.94 × 10−9 | 0.6 | 0.91 | Intergenic | - |
|
|
|
|
|
|
|
|
|
|
| - |
|
| 19:19456917 | rs58489806 | T/C | 0.074 | −0.260(0.041) | 2.15 × 10−10 | 0.7 | Genotyped | Intronic | 0.858 |
|
| 19:45412079 | rs7412 | T/C | 0.036 | −0.544(0.053) | 2.06 × 10−24 | 1.7 | Genotyped | Missense, R176C | Same SNP |
|
| 19:45411941 | rs429358 [ | C/T | 0.074 | −0.210(0.038) | 2.18 × 10−08 | 0.5 | 0.999 | Missense, C130R | Same SNP |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 8:19815256 | rs286 | T/A | 0.125 | 0.257(0.046) | 2.70 × 10−08 | 1.2 | Genotyped | Intronic | 0.315 | |
|
| 15:58687603 | rs174418 | T/C | 0.467 | 0.136(0.021) | 7.96 × 10−11 | 0.7 | 0.999 | Intergenic | 0.485 |
|
| 16:56989590 | rs247616 | T/C | 0.268 | 0.190(0.023) | 2.37 × 10−16 | 1.1 | Genotyped | Intergenic | 0.994 |
| 18:3412386 | rs8092903 | T/C | 0.026 | −0.448(0.082) | 4.49 × 10−08 | 0.8 | 0.954 | Intronic | - | |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| 8:19845376 | rs7841189 | T/C | 0.209 | −0.160(0.026) | 8.36 × 10−10 | 0.6 | Genotyped | Intergenic | Same SNP |
|
|
| - |
|
|
|
|
| Genotyped |
| - |
|
| 11:116664040 | rs10750097 [ | G/A | 0.172 | 0.160(0.027) | 4.64 × 10−09 | 0.6 | Genotyped | Upstream | Same SNP |
|
| 19:19456917 | rs58489806 | T/C | 0.074 | −0.260(0.039) | 2.14 × 10−11 | 0.8 | Genotyped | Intronic | 0.858 |
Association parameters reported for this marker refer to a model that includes rs7412 as additional covariate
Association parameters reported for this marker refer to a model that includes 11:116661101 as additional covariate
Results refer to the sex specific analyses. See Supplementary Table 7 for more details.
Figure 3Regional association plots for novel lipids loci
Regional association plots at the HBB locus for LDL-c, and at APOA5 for triglycerides for imputation performed using the Sardinian (panels a and c) and 1000 Genomes (panels b and d) reference panels, respectively. At each locus, we plotted the association strength (Y axis shows the –log 10 pvalue) versus the genomic positions (on the hg19/GRCh37 genomic build) around the most significant SNP, which is indicated with a purple dot. Other SNPs in the region are color-coded to reflect their LD with the top SNP as in the inset (taken from pairwise r2 values calculated on Sardinian and 1000 Genomes haplotypes for left and right panels, respectively). Symbols reflect genomic functional annotation, as indicated in the inner box of panel A. Genes and the position of exons, as well as the direction of transcription, are noted in lower boxes. This plot was drawn using the standalone version of the LocusZoom package ([50]).
Summary of Inflammatory Marker Association Results
The table shows the association results at that reach p < 5×10−8 for ADPN, hsCRP, ESR, MCP-1 and IL-6. At each locus, we indicated the genes likely to be modulated by the lead SNP. For each lead SNP, we also showed the rs ID when available, the effect allele and its frequency, the regression coefficients, , the proportion of variance explained by the allele (R2%), the imputation accuracy (RSQR) for those that were imputed, the biological type of the corresponding nucleotide change, and the r2 with the hits previously reported in ([13]). Novel signals are shown in bold; independent signals are shown in italics.
| Candidate Gene | Chr:position | rs name | Effect Allele / Other | Freq | Effect (StdErr) | pvalue | R2(%) | RSQR | Variant Consequence | r2 with previous hit |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
|
| ||||||||||
|
| 3:186559460 | rs17300539 | A/G | 0.156 | 0.247 (0.025) | 1.35×10−22 | 1.6 | Genotyped | Intergenic | -- |
|
| 13:108884835 | N/A | A/G | 0.001 | −1.519 (0.275) | 3.35×10−08 | 0.5 | 0.921 | 3′UTR | -- |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| 1:159684665 | rs3091244 | A/G | 0.428 | 0.207 (0.019) | 5.28×10−27 | 2.0 | Genotyped | Intergenic | 0.249 |
|
|
|
|
|
|
|
|
|
|
| -- |
|
|
|
|
|
|
|
| 0.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| -- |
|
| 19:45411941 | rs429358 | C/T | 0.073 | −0.237 (0.036) | 3.78×10−11 | 0.7 | 1 | Missense, C130R | 0.565 |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1:207684359 | rs11117956 | T/G | 0.400 | −0.153 (0.018) | 9.43×10−18 | 1.2 | Genotyped | Intronic | 0.989 |
|
| 11:5248004 | rs11549407 | A/G | 0.048 | −0.437 (0.042) | 1.02×10−25 | 1.8 | 0.918 | Stop gained, Q40X | 0.330 |
|
|
|
|
|
|
|
|
|
|
| -- |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| 1:159175354 | rs12075 | G/A | 0.446 | −0.405 (0.019) | 1.08×10−96 | 7.2 | Genotyped | Missense, G44D | Same SNP |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Genotyped |
| -- |
|
| 3:46383906 | rs113403743 | T/G | 0.099 | 0.273 (0.034) | 1.47×10−15 | 1.1 | 0.997 | Intergenic | 0.988 |
|
|
|
|
|
|
|
|
| Genotyped |
| -- |
|
| 16:49072490 | rs76135610 | T/C | 0.005 | 0.969 (0.172) | 1.76×10−08 | 0.9 | 0.915 | Intergenic | -- |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| 1:154428283 | rs12133641 | G/A | 0.255 | 0.118 (0.020) | 6.87×10−09 | 0.6 | 1 | Intronic | 0.998 |
|
| 9:136142355 | rs643434 | A/G | 0.263 | −0.221 (0.020) | 5.80×10−27 | 2.0 | Genotyped | Intronic | 0.980 |
Notes:
Results refer to the conditional analyses after conditioning on rs183233091
Results refer to the conditional analyses after conditioning on rs11117956
Results refer to the conditional analyses after conditioning on rs12075
Results refer to the conditional analyses after conditioning on rs12075 and rs2852718
Results refer to the conditional analyses after conditioning on rs113403743
Results refer to the female-specific analysis (see Supplementary Table 8 for more details); these genes do not fulfil our specific criteria for being candidates, but they are the nearest to lead SNP in the region (N4BP1, 428.3 Kb; CBLN1, 239.3 Kb)
Figure 4Regional association plot at chromosome 12 for hSCRP and ESR
Regional association plots at the chromosome 12 locus for hsCRP and for ESR, using the Sardinian (panels a and c) and 1000 Genomes (panels b and d) reference panels for imputation, respectively. For the plot style, see Figure 3 legend.
Rare variant tests
The table shows results for the rare variant association tests at genes passing the significant threshold for at least on the two statistical tests (CMC and VT). Of note, no significant results were observed for LDL-c, hsCRP and IL-6. For each gene, we indicated the genomic location assessed for analyses (in hg19 genomic build), the number of available SNPs considered, the number of SNPs passing the tests-specific criteria for inclusion, and the number and the fraction of individuals carrying a rare allele. For the CMC test, the effect size and its standard error, along with the pvalue and the phenotypic variance explained are reported. For the VT the impact on the phenotype (+ increase, − decrease) of rare variants, the pvalue and the phenotypic variance explained are reported. We also reported the pvalue observed after adjusting for the lead variant at the same or the nearby gene. Specifically, STAB1 was adjusted for rs7639267; CCR2 was adjusted for rs113403743 and rs200491743; IFI16 was adjusted for rs12075, rs2852718 and rs34599082; HBB and OR52H1 were adjusted for rs76728603, and PTPRH was adjusted for the best lead in the region (rs7253814). Genes that remain significant after adjustment are marked in bold.
| Gene | Chr:Start-end | #SNPs | #Pass | Burden Fraction with Count rare | CMC test | VT test | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Effect(StdErr) | pvalue | R2 | Adjusted pvalue | Direction | Pvalue | R2 | Adjusted pvalue | ||||||
|
| |||||||||||||
|
| |||||||||||||
|
| 3:52535766-52558237 | 25 | 23 | 752 | 0.12886 | 0.245 (0.039) | 4.71×10−10 | 0.007 |
| + | 1.00×10−07 | 0.007 |
|
|
| |||||||||||||
|
| |||||||||||||
|
| |||||||||||||
|
| 3:46399158-46401290 | 4 | 3 | 105 | 0.01797 | 0.541 (0.104) | 1.84×10−07 | 0.005 | 0.7092 | + | 1.00×10−06 | 0.005 | 0.92 |
|
| 1:158979950-159024668 | 10 | 8 | 567 | 0.09702 | 0.218 (0.046) | 2.50×10−06 | 0.004 | 0.1564 | + | 1.40×−05 | 0.003788 | 0.115 |
|
| |||||||||||||
|
| |||||||||||||
|
| |||||||||||||
|
| 11:5247914-5248004 | 2 | 2 | 613 | 0.10318 | −0.345 (0.039) | 9.77×10−19 | 0.013 | 0.015 | − | 1.00×10−07 | 0.013 | 0.025 |
|
| 11:5565906-5566751 | 5 | 3 | 529 | 0.08904 | −0.205 (0.042) | 1.23×10−06 | 0.004 | 0.345 | − | 3.40×10−06 | 0.004 | 0.69 |
|
| 19:55693244-55716713 | 22 | 15 | 1152 | 0.19391 | −0.146 (0.029) | 8.31×10−07 | 0.004 | −. | 1.18×10−05 | 0.0041 | 1.90×10−05 | |