Literature DB >> 26366553

Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels.

Fabrice Danjou¹, Magdalena Zoledziewska¹, Carlo Sidore^1,2,3, Maristella Steri¹, Fabio Busonero^1,2,4, Andrea Maschio^1,2,4, Antonella Mulas^1,3, Lucia Perseu¹, Susanna Barella⁵, Eleonora Porcu^1,2,3, Giorgio Pistis^1,2,3, Maristella Pitzalis¹, Mauro Pala¹, Stephan Menzel⁶, Sarah Metrustry⁷, Timothy D Spector⁷, Lidia Leoni⁸, Andrea Angius^1,8, Manuela Uda¹, Paolo Moi^5,9, Swee Lay Thein^6,10, Renzo Galanello^5,9, Gonçalo R Abecasis², David Schlessinger¹¹, Serena Sanna¹, Francesco Cucca^1,3.

Abstract

We report genome-wide association study results for the levels of A1, A2 and fetal hemoglobins, analyzed for the first time concurrently. Integrating high-density array genotyping and whole-genome sequencing in a large general population cohort from Sardinia, we detected 23 associations at 10 loci. Five signals are due to variants at previously undetected loci: MPHOSPH9, PLTP-PCIF1, ZFPM1 (FOG1), NFIX and CCND3. Among the signals at known loci, ten are new lead variants and four are new independent signals. Half of all variants also showed pleiotropic associations with different hemoglobins, which further corroborated some of the detected associations and identified features of coordinated hemoglobin species production.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2015 PMID： 26366553 PMCID： PMC4627580 DOI： 10.1038/ng.3307

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTION

The provision of oxygen to tissues depends on hemoglobin, requiring the coordinated expression of several globin chains that form functional tetramers. An index of the importance of hemoglobin function is the evolutionary duplication and divergence of regulation of globin gene copies to adapt to stages of development and buffer the effects of mutational loss. In particular, at birth, a switch occurs from fetal hemoglobin (HbF) toward hemoglobin A2 (HbA2) and hemoglobin A1 (HbA1), so that during adult life the hemoglobin forms comprise ~1 % HbF, ~3 % HbA2 and ~96 % HbA1. The different hemoglobins all contain α-globin chains, encoded by two eponymous genes on chromosome 16. Those aggregate with non-α-globin chains encoded, respectively, by the γ (for HbF), δ (for HbA2) and β-globin (for HbA1) genes in the “β-globin gene cluster” on chromosome 11 (Figure 1). The molecular switch between fetal and adult hemoglobin occurs via the binding of transcription factors to regulatory DNA sequences controlling the expression of globin genes. In particular, the various genes in the β-globin cluster are sequentially activated during ontogeny, so that time-specific expression patterns follow their genomic order[1].

Figure 1

Association at the globin clusters

Schematic representation of association results in the genomic context of the β-globin (panel a) and α-globin (panel b) gene clusters. For each hemoglobin, the markers associated are positioned with + or – corresponding to an increase or decrease in the corresponding trait by the effective allele (as in Table 1). Symbol is larger if the marker is associated at genome-wide level or smaller if it results from the analysis of pleiotropic effects. The β039 mutation and –α 3.7 type I deletion as well as relevant genes and the locus control region hypersensitivity sites (HS) are indicated. Finally, at the bottom of each panel is represented the linkage disequilibrium (r2) profile for the region in Sardinia, with colors ranging from high (red), to intermediate (green), and low (blue).

Inherited disorders of hemoglobin, such as β-thalassemia caused by mutations at the hemoglobin β (HBB) locus, represent the most common monogenic disorders worldwide[2]. Prevalence is highest in areas where malaria was or remains endemic[3]. The severity of inherited hemoglobin disorders is also variable, from severe life-long transfusion-dependent anemia to mild anemia that does not require transfusion, depending on the molecular defect and genotype status as well as ameliorating variants in modifier genes. Therefore, studying the genetic regulation of hemoglobin levels might reveal new factors and mechanisms to optimize strategies for the therapy of the disorders. The large heritable contribution to phenotypic variance of HbA2 and HbF in the general population (0.728 and 0.633 respectively; see Online Methods and previous report[4]) indicates that genetic analyses could lead to new insights. In genome-wide association studies (GWAS), two genomic regions, the β-globin gene cluster locus and the HBS1L-MYB locus, have been associated at a genome-wide significant level with variations in the amount of HbA2[5], and only those loci and BCL11A have been associated with HbF levels[6,7]. Variants at all four loci are powerful modifiers of the severity of β-thalassemia and sickle-cell disease[7-10]. Notably, none of the variants associated with HbA2 or HbF have been found associated with total hemoglobin, even in the largest meta-analysis of over 135,000 individuals[11]. This indicates that in analyses of total hemoglobin levels, association signals for subtypes are diluted and possibly obscured by opposite directions of effects. Currently, most of the HbF and HbA2 heritability also remains to be explained, and HbA1 variation has never been specifically assessed by GWAS at all. A promising source to extend analyses is the founder Sardinian population, in which previous associations have been detected in a large cohort through the analysis of genotyping arrays bearing common/ubiquitous variants[7]. Here, we extend these analyses to rarer and Sardinian-specific variants inferred from whole-genome population sequencing in the same cohort (see Supplementary Note and Supplementary Figure 1). Furthermore, analyzing variants modulating HbA1, HbA2 and HbF levels concurrently in a single cohort provides a route to assess associations that overlap for different hemoglobin forms without the need to account for differences in study size, ethnic background or measurements.

RESULTS

To test for genetic associations with the levels of HbA1, HbA2 and HbF, we interrogated ~10.9 million single nucleotide polymorphisms (SNPs), genotyped or imputed in 6,602 general population volunteers of the SardiNIA longitudinal study[4] (see Online Methods and Supplementary Table 1). Initial analyses showed a predominant role for the HBB:c.118C>T stop-codon mutation -- Q40X, better known as β039 mutation -- a variant common in Sardinia (rs11549407, allele frequency 4.8 %). It results in complete absence of β-globin chain synthesis (β0) and consequent β-thalassemia in homozygous individuals, and in a decrease of HbA1 and increase of HbA2 and HbF in heterozygous individuals (with p-values < 1.0×10−200). Because its effect has been established previously[7,12], we considered this mutation and other rarer β0-thalassemia mutations known in Sardinia as covariates (see Online Methods and Supplementary Table 2). The assessed individuals in the cohort include 664 healthy heterozygous carriers but no β0-thalassemia patients. The genome-wide scan revealed 23 unique variants at 10 loci at the classical 5×10−08 threshold. Of note, 21 are significant even considering a more stringent threshold of p = 1.4×10−8, calculated based on an empirical estimate of the number of independent tests in the Sardinian genome (see companion paper[13]). Five variants are at previously undetected loci, 4 are new independent signals at known loci, and 10 refine previously described associations to new lead polymorphisms that may have functional effects (Table 1). Six, 14 and 8 independent genome-wide significant signals were seen for HbA1, HbA2 and HbF respectively (Supplementary Figure 2). Hence, some of the associated variants significantly affected more than one hemoglobin, resulting in 28 variant-trait associations (see Table 1, Figure 2 and Supplementary Table 3). Variants resulting from imputation and not supported by linked genotyped markers were experimentally validated (Supplementary Table 4)

Table 1

Most significant independent association results from single variant tests for hemoglobin A1, A2 and fetal

The table shows the most significant association results (all results are corrected for β0 mutations observed in the HBB gene, and results on the α-globin gene cluster are adjusted for the −α 3.7 deletion type I, see Online Methods). Novel signals are shown in bold while variants refining previously reported signals are in italic. At each locus, we indicated the chromosome and genomic position (hg19 build), the rs ID when available, the effect allele tested for association (EA) and the other allele at the SNP (OA), the imputation accuracy (RSQR), the SNP effect allele frequency (EAF) and the regression coefficients. We then indicated whether the SNP is also linked the other hemoglobin forms (p < 0.01), and indicated the direction of the effect allele (+ for increasing the levels of Hb, - for decreasing). The candidate genes likely to be modulated by the lead SNP are also reported along with their inclusion criteria, as described in Online Methods (p = position, c = coding, e = eQTL, o = OMIM, b = biological). Where “α-globin gene cluster” is mentioned we refer to NPRL3, HBZ, HBQ1, HBA1, HBA2 and HBM genes; while for “β-globin gene cluster” we refer to HBB, HBD, HBBP1, HBG1, HBG2 and HBE1 genes. Association coefficients for males and females are reported in Supplementary Table 11.

									Shared effects
Traits (units) and loci #	Candidate genes	chr:position	rsID from dbsnp142	Alleles (EA/OA)	RSQR	EAF	Effect (StdErr)	p-value	HbA1	HbA2	HbF
HbA1 (g/dl)
locus1 1	α-globin gene cluster(p,o,b); MPG(p)	16:149539 1,4	rs570013781	A/G	0.98	0.136	−0.1995 (0.023)	5.86×10⁻¹⁸	−	−	−
locus1 1	α-globin gene cluster (p,o,b); AXIN1(p)	16:391593 1,3,5 (cond.)	−	T/C	0.94	0.012	−0.4028 (0.058)	3.28×10⁻¹²	−	−

locus2	FAM3A(p); G6PD(p,c,o,b); IKBKG(p)	X:153762634 4	rs5030868	A/G	Genotyped	0.085	−0.1256 (0.019)	2.78×10⁻¹¹	−

locus3 2	MPHOSPH9(p)	12:123681790 2	−	A/C	0.96	0.010	−0.3606 (0.064)	1.68×10⁻⁰⁸	−	−

HbA2

locus1 4 (%)	β-globin gene cluster(p,o,b); HBD(c)	11:5255582 4	rs35152987	A/C	Genotyped	0.004	−2.182 (0.109)	4.35×10⁻⁸⁶		−
	β-globin gene cluster (p,o,b); HBD(c)	11:5251849 4 (cond.)	rs7944544	T/G	0.98	0.005	−1.26 (0.097)	3.90×10⁻³⁸		−	+
	β-globin gene cluster (p,o,b); HBB(c); HBG1/HBG2(e); OR51V1(p)	11:5231565 4 (cond.)	rs12793110	T/C	1.00	0.181	−0.2408 (0.019)	5.75×10⁻³⁶		−	−
	β-globin gene cluster (p,o,b); OR51V1(p)	11:5242698 4 (cond.)	rs11036338	C/G	0.99	0.381	0.1282 (0.017)	2.03×10⁻¹⁴		+
	β-globin gene cluster (p,o,b); HBG1/HBG2(e)	11:5250168 4 (cond.)	rs7936823	G/A	0.96	0.466	0.1117 (0.015)	5.00×10⁻¹³	+	+	+

locus2 1,3,5 (g/dl)	α-globin gene cluster (p,o,b); HBM (c); LUC7L(p)	16:216593 1,3	rs141494605	C/T	0.97	0.149	−0.3080 (0.025)	3.94×10⁻³⁵	−	−	−
	α-globin gene cluster (p,o,b); AXIN1(p)	16:391593 1,3,5 (cond.)	−	T/C	0.94	0.012	−0.5112 (0.063)	6.48×10⁻¹⁶	−	−
	α-globin gene cluster (p,o,b); ARHGDIG(p); AXIN1(p); ITFG3(p); PDIA2(p); RGS11(p)	16:342218 1,3,5 (cond.)	rs148706947	T/C	0.93	0.021	0.2892 (0.051)	1.04×10⁻⁰⁸		+

locus3 2 (%)	CCND3(p,b)	6:41952511 2	rs113267280	G/T	0.99	0.101	0.2923 (0.026)	1.11×10⁻²⁹		+	+

locus4 (%)	MYB(b)	6:135418916	rs7776054	G/A	Genotyped	0.210	0.1762 (0.020)	3.71×10⁻¹⁹		+	+

locus5 2 (%)	CTSA(p); PCIF1(p,c); PLTP(p,e); MMP9(e); TNNC2(e)	20:44547672 2	rs59329875	C/T	1.00	0.134	−0.1399 (0.024)	3.64×10⁻⁰⁹		−

locus6 2 (%)	FOG1(p,b,c); C16orf85(p)	16:88601281 2	rs141006889	G/A	Genotyped	0.007	−0.5074 (0.087)	5.33×10⁻⁰⁹		−

HbF (g/dl)
locus1	BCL11A(p,o,b)	2:60720951	rs4671393	A/G	1.00	0.136	0.578 (0.023)	2.60×10⁻¹³⁰			+
locus1	BCL11A(p,o,b)	2:60710571 4 (cond.)	rs13019832	A/G	1.00	0.484	−0.2024 (0.017)	9.12×10⁻³³			−

locus2	MYB(b)	6:135419018	rs9399137	C/T	Genotyped	0.205	0.4202 (0.020)	1.09×10⁻⁹³		+	+
locus2	HBS1L(p,c,e); ALDH8A1(e)	6:135356216 3 (cond.)	rs11754265	C/G	1.00	0.367	−0.1421 (0.021)	5.04×10⁻¹²			−

locus3 4	β-globin gene cluster (p,o,b); HBG1/HBG2(e)	11:5290370 4	rs67385638	G/C	1.00	0.236	0.2038 (0.019)	1.09×10⁻²⁵			+
locus3 4	β-globin gene cluster (p,o,b); HBG1/HBG2(e)	11:5277236 4 (cond.)	rs2855122	C/T	1.00	0.395	−0.1458 (0.022)	2.57×10⁻¹¹	+	+	−

locus4 2,5	NFIX(p)	19:13121899 2,5	rs183437571	T/C	0.97	0.010	0.4607 (0.081)	1.61×10⁻⁰⁸			+

= association results locally corrected for the −α 3.7 deletion type I (NG_000006.1:g.34164_37967del3804) (see Supplementary Note).

= first time associated to the trait and in a novel locus.

= first time associated to the trait in a previously reported locus.

= signal refining a previously reported signal.

= result not found using the 1000 Genomes reference panel.

cond. = obtained by conditional analysis on variants reported on the upper rows for the considered locus.

Figure 2

Diagram of genome-wide associated loci

Representation of genome-wide significant findings on hemoglobin levels in relation to their contribution to the phenotypic variation (variance explained, panel a) or to their individual impact (effect size, panel b). At each step, the length of the black bar represents the magnitude of variance explained (panel a) or effect size (panel b) for each trait, locus, gene and variant. The bars are connected by colored bands to their sub-components (loci for each trait, genes for each locus, variants for each gene). Three colors (yellow, green and blue) represent the 3 hemoglobin forms (HbA1, HbA2 and HbF respectively), and for loci or genes affecting more than one hemoglobin: gray combines HbA1 and HbA2, cyan combines HbA2 and HbF, and light gray represents effects common to all 3 hemoglobin forms. Each panel is drawn to show loci in order of their importance, i.e. from the largest to smallest amount of explained phenotypic variance (panel a) or effect size (panel b). The variance explained by each locus was calculated fitting a regression model including all variants at that locus, while the effect size for a locus is the sum of effect sizes of all variants in that locus (Supplementary Table 3 reports effect sizes for such joint models). For variants associated with more than one trait the maximum value is used. Markers are reported as chromosome:position when an rs ID was not available; and when an intergenic region is involved instead of a single gene, we show nearby genes within brackets.

Novel associations at new loci

Novel associations were detected for all 3 hemoglobin forms. For HbA1, we observed a signal led by chr12:123681790 (in an intron of MPHOSPH9), encompassing several SNPs in complete linkage disequilibrium (LD) in a region encoding several genes (see Supplementary Figure 3). Which gene is truly associated, and how it affects hemoglobin production, remains unclear, although among the top associated SNPs, a variant in an intron of ARL6IP4 (chr12:123465483) falls in a highly conserved region rich in putative transcription factor binding sites and has the highest score for insilico prediction of deleterious impact on function (CADD score)[14] as detailed in Supplementary Table 2. Although this association is just below the more stringent empirical threshold of significance, it is further strengthened by independent association with another hemoglobin form (HbA2, p = 5.9×10−5), as detailed in Table 1. For HbA2, we identified 3 novel signals. One, rs141006889, is a missense variant located in ZFPM1, a gene also known as FOG1 that encodes a cofactor of the hematopoietic transcription factors GATA1 and GATA2[15] (Supplementary Figure 4). The complexes formed by FOG1 and GATA proteins are essential for normal erythroid differentiation[15], as demonstrated by pathogenetic mutations that abrogate the FOG-GATA interaction to cause familial dyserythropoietic anemia and thrombocytopenia[16]. Another signal is defined by a pair of statistically indistinguishable variants, rs113267280 and rs112233623 (p-values: 1.11×10−29 and 1.29×10−29), located in CCND3 gene, whose product, cyclin D3, is thought to be critical for erythropoiesis[17]. Knockdown of cyclin D3 correlates with reduction in the number of cell divisions during terminal erythropoiesis, thereby producing fewer and larger red blood cells[18]. These variants are also in partial LD with rs9349205 (r2 = 0.40), a SNP previously associated with mean red blood cell volume and number (see Supplementary Table 6), which falls 160bp away from rs112233623 in the same erythroid specific enhancer functionally associated with CCND3[18-20]. The latter is also the associated variant with highest CADD score (see Supplementary Table 5). An additional variant related to HbA2, rs59329875, was observed for the first time in this study. It is situated between PLTP, which has been associated with several plasma lipoprotein and triglyceride levels[21-24], and PCIF1, which is thought to negatively regulate gene expression by RNA polymerase II[25]. As for HbF, we identified one new variant associated with its level: rs183437571, located on chromosome 19 in an intron of NFIX, which encodes a CCAAT-binding transcription factor. This variant is just below the empirical significance threshold of p = 1.4×10−8 but is supported by considerable biological evidence implicating the gene and the surrounding region in hemoglobin regulation. Specifically, rs183437571 falls in a CpG region that is differentially methylated in fetal and adult red blood cell progenitors[26]. In mice, Nfix was recently identified as one of the regulatory factors with relatively restricted expression in hematopoietic stem cells,[27] and required for the survival of hematopoietic stem and progenitor cells during stress hematopoiesis[28]. Intriguingly, NFIX is situated in a region of ~300 Kb that encompasses a number of genes involved in erythropoiesis (DNASE2 and KLF1)[29-33] or otherwise associated with red blood cell traits, including mean corpuscular hemoglobin (SYCE2, FARSA and CALR)[11] (Supplementary Figure 5 and Supplementary Table 6). KLF1 is a particularly interesting candidate gene[33,34], but mutations observed in previous studies[35] were not found and the gene itself is situated in an LD block distinct from our association signal. However, long distance regulatory interactions remain a possibility. Of the 5 novel signals, the discovery of chr12:123681790 for HbA1, rs141006889 for HbA2, and rs183437571 for HbF were strongly influenced by the assessment of variants from Sardinian whole-genome sequencing. Specifically, chr12:123681790 was missing in 1000 Genomes phase III[36], and using this public reference panel the signal was misplaced to another variant ~1Mb away; rs141006889 was included in the design of one genotyping array (ExomeChip) after it was identified through our sequencing effort, but is currently not detected in sequenced 1000 Genomes samples; and rs183437571 was poorly imputed with 1000 Genomes phase III, with a resulting signal that was not genome-wide significant (see Table 1 and Supplementary Table 7). Overall, the amount of variance explained by markers associated at the genome-wide level (Table 1) account for a fraction of the estimated genetic component of each trait (from 46 % for HbA1 to 68 % for HbA2, see Online Methods), supporting inheritance models that include small effect size and/or rare variants. For instance, 21 additional genes with suggestive significance signals (p<1.×10−04, minor allele frequency [MAF] > 0.5 %) were related to genome-wide significant loci listed here, either in the scientific literature (Pubmed before 2006) or by expression levels (Human Expression Atlas[37]) or Gene Ontology[38] categories, using GRAIL software[39] (see Supplementary Note and Supplementary Table 8). Four of the suggestive signals most strongly linked to genome-wide association findings were located in NFE2, which encodes Erythroid Nuclear Factor 2[40]; ADGB, which encodes a recently discovered globin of unknown physiological function[41]; and SPTB and ANK1, both of which encode proteins affecting the stability of erythrocyte membranes[42]. To test for replication of the associations at new loci detected in Sardinia, we used the largest independent sample reported to date, which measured HbA2 and HbF as well as F-cells (see Online Methods) in 4,131 individuals from the TwinsUK cohort enrolled from the United Kingdom (UK) general population[43]. For two loci, both associated with HbA2, we successfully replicated the association seen in Sardinia. In particular, we observed a p-value of 6.98×10−06 for rs59329875 in the PLTP-PCIF1 intergenic region (MAF of 0.18) and a p-value of 1.73×10−04 for rs113267280 in CCND3 (MAF of 0.01). The rarity of other variants precluded replication. The MPHOSPH9 and FOG1 variants associated with HbA1 and HbA2, respectively, are missing in publicly available imputation panels (as detailed above), and rs183437571 in NFIX associated with HbF was imputed as monomorphic in the TwinsUK cohort (see Table 2 and Online Methods).

Table 2

Replication of novel loci

The table describes association in the TwinsUK cohort (N = 4,131 individuals). For each SNP, we indicated the associated hemoglobin tested, the number of samples analysed, the imputation accuracy according to the IMPUTE-INFO metric, the effect allele tested for association (EA) and the other allele at the SNP (OA), the SNP effect allele frequency (EAF) and the regression coefficients. The last column explains the reason for the SNPs not being tested.

Traits (units) and loci # from Table 1	SNP	Candidate genes	INFO score	Alleles (EA/OA)	EAF	Effect (StdErr)	p-value	Notes
HbA1 (g/dl)
locus3	chr12:123681790	MPHOSP9	-	-	-	-	-	Not imputable because absent in 1000 Genomes; at the moment, Sardinian specific.

HbA2 (%)
locus3	rs113267280	CCND3	0.843	G/T	0.011	0.442 (0.118)	1.73×10⁻⁰⁴	.
locus5	rs59329875	PLPT-PCIF1	0.994	C/T	0.185	0.132 (0.029)	6.98×10⁻⁰⁶
locus6	rs141006889	FOG1	-	-	-	-	-	Not imputable because absent in 1000 Genomes; detected in the NHLBI GO Exome Sequencing Project (ESP).

HbF (%)
locus4	rs183437571	NFIX	0.294	T/C	0.000	-	-	Imputed as monomorphic in TwinsUK cohort.

Fine mapping at known loci

The integration of whole-genome sequence variants in the scan was also instrumental to refine signals at previously known loci, either identifying a better lead variant or indicating novel independent signals. Specifically, as detailed below, we refined the association within the α and β-globin gene clusters with all 3 hemoglobins; the association of the HBS1L-MYB intergenic region with HbA2 and HbF; and the association of the BCL11A gene with HbF. Associations within the β-globin gene cluster were intricate. As reported above, the strongest modifier in this region is the HBB β039 variant, acting on all 3 hemoglobin types (see Figure 1, Online Methods and Supplementary Table 2). Multiple additional independent signals were observed in conditional analyses for HbA2 and HbF, but they were distinct for each hemoglobin type, highlighting different regulatory patterns within the β-globin gene cluster. Specifically, for HbA2, we confirmed 2 known independent associations at missense mutations in the HBD gene (rs35152987 and rs35406175, the latter perfectly tagged by our lead signal, see Supplementary Table 2). In addition, we identified 3 novel independent signals (rs12793110, rs11036338 and rs7936823) within a block of LD around the HBB gene, confirming a controlling role of this region in HbA2 production[5] (see Figure 1 and Supplementary Figure 4). For HbF levels, 2 new independent signals were detected in a separate LD-block of the β-globin gene cluster (see Figure 1 and Supplementary Figure 5). The first, situated in an intron of the HBE1 gene (rs67385638), remained associated even when taking into account 43 other variants in the β-globin gene cluster associated with hemoglobin variation (see Supplementary Note). The second was located in a cyclic AMP response element upstream from HBG2 (rs2855122) already implicated in drug-mediated HbF induction by butyrate[44] : different features of this marker make it a strong candidate for fetal to adult hemoglobin switching modulation (see Supplementary Note). At the α-globin gene cluster, 2 variants were associated with HbA1 and 3 with HbA2, of which one affected both traits (Table 1 and Figure 1). All results at this locus were corrected for any effect of the most frequent α-globin gene deletion present in Sardinia (NG_000006.1:g.34164_37967del3804, known as –α 3.7 deletion type I), directly genotyped in a subset of the volunteers and imputed for the rest of the cohort (see Online Methods). This deletion was associated at the genome-wide level with both HbA1 and HbA2 and only nominally with HbF (see Table 1 and Supplementary Table 2). The most strongly associated signals (rs570013781 and rs141494605) were situated within the NPRL3 and HBM genes, affecting HbA1 and HbA2 respectively. NPRL3 contains several hypersensitive sites involved in the regulation of α-globin gene. HBM encodes a globin member of the avian α-D family[45] and its expression is highly regulated in human erythroid cells, although the protein has not been detected in human erythroid tissues. These observations suggest a possible regulatory function for which high-level protein expression is not required[45]. An independent variant associated with HbA1 and HbA2 (chr16:391593) was observed within the AXIN1 gene, in which a further independent SNP (rs148706947) was found associated with HbA2 alone (Supplementary Figure 3 and Supplementary Figure 4). We also examined variants in the HBS1L-MYB intergenic region known to be associated with HbF and HbA2 levels[5]. We confirmed the role of the known variant (rs66650371, a TAC deletion) on the expression of both forms of hemoglobin[46,47] (see Supplementary Note). A further novel independent signal for HbF was found at rs11754265 in an intron of HBS1L, which has been shown to be a much stronger eQTL than rs66650371 for HBS1L and the neighboring ALDH8A1 in monocytes[48]. In line with previous studies[6-8,49,50] the second intron of BCL11A gave multiple signals associated with HbF levels. They are explicable by the joint action of variants in each of two independent groups of statistically indistinguishable SNPs: one group formed by rs4671393, rs766432 and rs1427407, with p-values between 2.6×10−130 and 5.6×10−129, and the other by rs13019832 and rs7606173, with p-values of 6.1×10−33 and 9.1×10−33 in our cohort. The most likely causal candidate in the first group is rs1427407, a variant already associated with HbF in other population cohorts and functionally associated with BCL11A regulation[51]. In the second group we can instead point to rs13019832, which shows the highest functional CADD score (Supplementary Table 5). This variant has also been correlated, in adipose tissue, with the methylation of a CpG site (cg23678058) in a region that is functionally associated with BCL11A expression[52] and shows evidence of an effect on GATA-1 binding in peripheral blood-derived erythroblasts[53,54].

Pleiotropic effects

Among our 23 lead variants, 6 were associated (at least with p<0.01) with a second hemoglobin type, and another 6 were associated with all 3 (including β039 and –α 3.7 deletion type I) (Figure 1 and Table 1). Overall, all but 3 pleiotropic variants modulate different hemoglobins in the same manner, i.e., with the same allele increasing the levels of all associated hemoglobins. The 3 exceptions include the β039 variant, which decreases HbA1 while increasing HbA2 and HbF, and 2 SNPs mapping in the β-globin gene cluster, both affecting HbA2 and HbF but in opposite directions (Figure 1 and Table 1). In addition, many of the additional suggestive signals are associated with more than one hemoglobin type, increasing the likelihood that they are true signals (see Online Methods). In fact, 14 of these variants – all sharing effects on HbA1 and HbA2, but none with HbF – showed between-trait combined p-values that were genome-wide significant (Supplementary Table 9) and hint at additional pathways of potential interest in hemoglobin dynamics. In general, the extended number of genetic variants showing joint association with HbA1 and HbA2 rather than HbF is consistent with high correlations of levels of adult hemoglobins HbA1 and HbA2 but only partial correlations of these hemoglobin forms with levels of HbF (see Online Methods). Given the central role of hemoglobin in providing oxygen to the body tissues and the substantial fraction of total body cells accounted for by circulating red cells, factors impacting hemoglobin production and red cell count unsurprisingly have pleiotropic effects on other non-hematological traits. This is exemplified by the strong impact of the major β039 mutation on cholesterol and LDL-cholesterol (see companion paper[13]). Here we extended the analysis for this mutation to 69 non-hematological quantitative traits selected from among those assessed in the SardiNIA cohort[4] (see Supplementary Note). We found the variant also significantly associated with increased total white blood cell counts (p = 3×10−7) -- with the major contribution coming from neutrophil counts (p = 1×10−6) -- and platelet counts (p = 9×10−5) (see Supplementary Table 10)

DISCUSSION

We provide evidence for 23 associated variants at 10 loci influencing the levels of one or more of the 3 hemoglobin species measurable in post-natal life. Our results are based on a cohort from the Sardinian founder population that is much larger than previously described GWAS for HbF and HbA2 and interrogates a high resolution genetic map, based on population sequencing that expands the assessed spectrum of allelic variants 10-fold compared to previous studies. The finding that 2 of the 5 newly reported loci were not detectable without using the SardiNIA reference panel, and the others were misplaced (Table 1 and Supplementary Table 7), further highlights how large-scale sequencing efforts in this founder population can reveal functionally relevant variants that may be very rare and hence missed in other populations. For the same reasons, however, replication of results for such variants or translation of findings directly to other populations is difficult. For example, the other currently reported sample of comparable size, from the United Kingdom, could provide replication only for the two variants present there. Similar limitations will likely be found in other GWAS designed to detect effects of rare and founder variants. However, additional corroboration of our findings for such variants comes from their independent associations with other hemoglobin species and hematological traits in Sardinians, and also from the biological function of the genes involved. For instance, variant chr12:123681790 within MPHOSP9, associated with HbA1, also shows suggestive evidence of association with HbA2. The variant in FOG1, very rare in Europeans (MAF 0.4 %), is a missense variant in a gene implicated in erythropoiesis; and the variant in NFIX, absent in other European populations, falls within a cluster of genes involved in erythropoiesis and in a CpG region differentially methylated in fetal and adult red blood cell progenitors[26]. By carrying out GWAS for HbA1, HbA2 and HbF assessed for the first time in the same individuals, we see a wide range of pleiotropic effects of variants across the 3 hemoglobin types (Table 1). Strikingly, HbA2 harbors more than half of the loci discovered here (see Figure 2), with many pleiotropic effects on HbA1 and some on HbF. Thus, although it has a minor role in the transport of oxygen to tissues[55], variations in HbA2 participate in pathways that regulate the levels of the other hemoglobins active in postnatal life. The direction of pleiotropic effects among the different hemoglobin types provides some additional clues to mechanism. Within the α-globin gene cluster, in agreement with the presence of α-globin chains in HbA1, HbA2 and HbF, all variants affecting more than one hemoglobin showed the same direction of effect for all. The regulation of globin chains from the β-globin gene cluster, however, is more complicated. It involves variants with the same direction of effect for all hemoglobins (rs7936823) and other variants most likely involved in switching mechanisms that affect fetal and adult hemoglobins in opposite directions (rs2855122). Still other variants change the kinetics of competition among non-α globin chains; for example, the β039 mutation decreases β-globin levels and thereby increases the availability of α-globin chains to combine with δ and γ-globins, leading to higher levels of HbA2 and HbF. Variants influencing only 2 forms of hemoglobin acted mainly in the same direction and never jointly affected HbA1 and HbF. As for variants shared only between HbA2 and HbF, they can be attributed to specific cis-regulatory mechanisms in the β-globin gene cluster (rs12793110 and rs7944544) or to loci with a role in erythroid differentiation (CCND3 and MYB). By contrast, variants shared between HbA2 and HbA1 were either trans-acting (in MPHOSPH9) or localized in the α-globin gene cluster but with effect sizes probably too small to impact HbF production. Consistent with the latter possibility, the –α 3.7 deletion type I, which has strong genome-wide significant effects on HbA1 and HbA2, had much smaller, only suggestive, effects on HbF (see Supplementary Table 2). Our analyses also detected broader pleiotropic impacts, most strikingly for the β039 variant. In addition to effects on LDL-c described in the companion paper[13], we report for the first time that β039 is also significantly associated with increased total counts of white blood cells (and some subsets) as well as platelet counts. This suggests that in heterozygous carriers this variant drives a broader increase in bone marrow-derived blood cells. Speculatively, some of these, such as augmented leukocyte and neutrophil counts, may have provided protection against pathogens other than malaria, thus increasing selection for the balanced polymorphism. The detected variants provide candidate modifiers influencing the clinical status of patients with monogenic hemoglobin disorders. For example, we carried out a preliminary analysis of a small sample of 306 β-thalassemia patients homozygous for the β039 stop codon mutation but showing very great heterogeneity in disease presentation and course. In addition to those described previously[7-10], some variants detected in this study showed possible effects as modifiers of disease severity (see Supplementary Note). However, the potential of these variants to help predict disease severity remains tentative without studies of larger sample sets. Nevertheless, the variants already add to the candidate targets for therapeutic intervention in the widely prevalent inherited β-thalassemia and other hemoglobinopathies[2].

ONLINE METHODS

Sample description

The population studied here includes 6,921 individuals, representing > 60 % of the adult population of 4 villages in the Lanusei Valley in Sardinia, Italy. They are part of the SardiNIA project, a longitudinal study including genetic and phenotypic data of 1,257 multigenerational families with more than 37,000 relative pairs. Details of phenotype assessments for these samples have been published previously[4]. All participants gave informed consent to study protocols, which were approved by the institutional review board of the University of Cagliari, the National Institute on Aging, and the University of Michigan. For whole-genome sequencing, we selected 1,122 individuals from the SardiNIA study and 998 individuals enrolled in case–control studies of Multiple sclerosis and Type I Diabetes in Sardinia. Genomes were sequenced to an average coverage of 4.16-fold. Details on sequencing protocol, data process and variant calling can be found elsewhere[56] and in the companion paper[13]. The 2,120 sequenced samples consist of 695 complete and incomplete trios; to avoid over-representation of rare haplotypes during imputation process we considered only parents for each trio – totaling 1,488 samples – to build our reference panel[56] (see companion paper[13] for details). Part of the sequencing data used in this study are available through dbGap, under “SardiNIA Medical Sequencing Discovery Project”, Study Accession: phs000313.v3.p2.

Genotyping and Imputation

The 4 micro-arrays used for genotyping the entire SardiNIA cohort were the Illumina® Infinium HumanExome BeadChip, ImmunoChip, Cardio-MetaboChip and HumanOmniExpress BeadChip. Genotyping was carried out according to manufacturer protocols at the SardiNIA Project Laboratory (Lanusei, Italy), at the Technological Center - Porto Conte Ricerche (Alghero, Italy) and at the National Institute on Aging Intramural Research Program Laboratory of Genetics (Baltimore, MD). Genotypes were called using GenomeStudio (version 1.9.4) and refined using Zcall (version 3)[57]. We applied standard per sample quality control filters to remove samples with low call rates or for which reported relationships and/or gender disagreed with genetic data. Details on quality controls were described elsewhere[56]. Altogether, 890,542 autosomal markers and 16,325 X-linked markers were genotyped across SardiNIA study samples. We selected for phasing and imputation only the 6,602 samples for which all 4 arrays were successfully genotyped. Genotypes were phased using MACH software[58], using 30 iterations of the haplotyping Markov chain and 400 states per iteration. We performed imputation using Minimac software[59] and a reference panel including haplotypes of 1,488 Sardinian whole-genomes[56] (see companion paper[13]). Variants with estimated imputation quality (RSQR) <= 0.3 or <0.8 were discarded if the estimated MAF was >= 1 % or between 0.5 % and 1 % respectively; variants with MAF < 0.5 % were kept only if genotyped. RSQR thresholds for rare and low frequency variants were more stringent than those proposed for other traits[56] as they led to better genomic control parameters (1.001, 0.993 and 0.985 for HbA1, A2 and fetal, respectively). We also performed imputation using the 1000 Genomes Project Phase III (version 5)[60] haplotype set, and used the same thresholds to discard variants. Genomic control parameters for 1000 Genomes imputation were 1.050, 0.997 and 0.984 for HbA1, A2 and fetal, respectively.

Association analysis

We performed association analyses of all 3 hemoglobins in grams per deciliter (g/dl) as well as percentage (%) for HbA2 and HbF. HbA2 (%) and HbF (%) were directly measured from high-performance liquid chromatography, while HbA1 (g/dl), HbA2 (g/dl) and HbF (g/dl) were derived from total hemoglobin measured by Coulter counter. As expected, measurements in % and g/dl were highly correlated for HbF (Spearman’s Rho = 0.99) and for HbA2 (Rho = 0.85). HbA1 (%) was not considered for genetic association because it was too highly correlated with both HbA2 (%) and HbF (%) as a consequence of their derivation formula (Rho = −0.803 and −0.757, respectively, p < 1×10−20). Considering only non-carriers of β0-mutations, HbA1 (g/dl) was highly correlated with HbA2 (g/dl) (Rho = 0.662, p < 1×10−20) and poorly with HbF (g/dl) (Rho = −0.055, p = 3.44×10−5). Likewise, HbA2 and HbF were weakly positively correlated as percentage measures (Rho = 0.108, p = 4.08×10−16) and even less as g/dl (Rho = 0.066, p = 5.81×10−5), consistent with previous findings[5]. Measurements were available for a subset of 6,305 individuals; descriptive statistics are reported in Supplementary Table 1. Association results were considered genome-wide significant when p-value was less than 5×10−08, however we also noted in the text variants that would not meet a threshold of 1.4×10−8 we introduce for sequencing based GWAS carried out in Sardinians for variants with MAF > 0.5 % (see companion paper[13]). Before association analyses, traits were normalized using inverse normal transformation; for HbF we also removed outliers with values above 5 %. Analyses were adjusted for age, age2, and gender as well as for the presence of at least one of the 3 β0 mutations (β039 (rs11549407), HBB:c.20delA (rs63749819) and HBB:c.315+1G>A (rs33945777)), all directly genotyped or sequenced (see Characterization of β0 mutations paragraph). Regression coefficients for β039 – the most common in Sardinia with 10.3 % of carriers – are reported in the Supplementary Table 2. Association was performed using the q.emmax test in EPACTS[61], which implements a linear mixed model procedure to correct for cryptic relatedness and population stratification by incorporating a genomic-based kinship matrix. Associations reported in the table refer to the best p-value obtained with either percentage or original units for HbA2 and HbF. Notably, HbF signals always resulted in lower p-values considering g/dl, whereas for HbA2 analysis, this was only the case for rs141494605. All loci passed the genome-wide significance threshold of p<5×10−08 for both % and g/dl except for rs59329875, which was genome-wide significant only for the HbA2 measure reported in Table 1. To identify independent signals we performed regional conditional analysis, using forward selection procedure adding, at each step, the most associated variant as covariate in the model. In this sequential analysis, we tested only SNPs lying in a region of 2Mb centered on the lead variant. The same genome-wide significance threshold used for primary signals was also considered for independent signals. For loci where different independent signals were found, we also report model parameters of jointly associated variants in Supplementary Table 3. Finally, the lead variants and their surrogates (r2 > 0.90) were annotated using Combined Annotation Dependent Depletion (CADD) score[14] and reported in Supplementary Table 5.

Heritability and variance explained

We estimated heritability for the 3 hemoglobins using Merlin-regress[62] on the same sample used for the GWAS study. Estimates for normalized levels of hemoglobins were respectively 0.520 for HbA1 (g/dl), 0.728 for HbA2 (%) (0.700 for g/dl) and 0.633 for HbF (%) (0.624 for g/dl). We then calculated for each hemoglobin form the proportion of phenotypic variance explained by the associated lead variants. We measured that as the difference of R2-adjusted observed between the full and the basic model, where the basic model includes only phenotypic covariates (age, age2 and gender) and the full model also includes all the independent SNPs associated with the specific trait. R2-adjusted values were calculated using a linear mixed model procedure from lmekin() function in the “Kinship” R package[63]. Estimates were 0.240 for HbA1 (g/dl), 0.492 for HbA2 (%) and 0.383 for HbF (%).

Characterization of β0 mutations

For the present study we designed a Taqman custom assay for the HBB:c.118C>T nonsense mutation (rs11549407, also known as β039), and genotyped 6,602 samples. Comparison of Taqman genotypes and imputation results (rs11549407, RSQR = 0.92) produced an overall concordance of 98.8 %. Also, we further sequenced all samples discordant between red blood cell index-based diagnosis (using MCV, MCH, HbF % and HbA2 %) and Taqman genotypes, using Sanger sequencing to determine any additional β-globin mutations different from β039, thus identifying 3 carriers for the HBB:c.20delA (rs63749819) and one for the HBB:c.315+1G>A (rs33945777) mutations.

Characterization of the deletion at the α-globin gene cluster

In Sardinia 3 variants are known to be mainly responsible for α-thalassemia: SNPs rs111033603 and rs41474145, and the deletion NG_000006.1:g.34164_37967del3804; the latter, known as the –α 3.7 deletion type I, is by far the most common[64]. We did not observe the rarer rs111033603 or rs41474145 in our sequencing effort. To establish genotypes at the deletion site in the full cohort, we used an inference strategy combined with experimental data. Specifically, we first characterized the structural variant by PCR in 260 unrelated sequenced individuals randomly selected in the SardiNIA cohort. We calculated the relative coverage of the deleted region in the whole-genome sequenced samples by considering the ratio of read count in the potentially deleted region (223,450 to 226,953 bp – excluding 150 bp boundaries) with read count in the nearby region not subject to deletion (227,254 to 230,757 bp). We then identified coverage ratio thresholds that best predicted PCR genotypes at the deletion and used these thresholds to infer genotypes for the 2,120 sequenced individuals. We then inserted genotypes in the Sardinian reference panel and imputed the deletion on the total SardiNIA cohort. To assess accuracy of imputation we considered the best guess genotypes and searched for Mendelian errors in families. The observed rate was 0.58 % over 1,193 parent-offspring pairs, consistent with high imputation precision. Association results reported in the manuscript at this locus are corrected for the inferred –α 3.7 deletion type I dosages.

Variants validation

We validated all variants that showed genome-wide significant p-values in the primary or conditional analysis that were not directly genotyped or had no surrogates (r2 > 0.90) that were directly genotyped. We did not validate variant rs13019832 at BCL11A for HbF, which was highly linked with findings of previous reports (rs7606173)[49,51]. Validation was performed using Sanger sequencing or Taqman, depending on variant frequency, for 5 variants. We selected for each variant all individuals carrying the minor allele (heterozygous and homozygous) plus a random subset of subjects homozygous for the other allele (in all, 3,084 subjects were genotyped), except for rs141494605 and chr16:391593, for which we specifically selected worse imputation dosages (borderline RSQR). In addition, for rs17525396, we used independent genotypes available for a subset of the cohort[65], derived from Affymetrix 6.0 (see Supplementary Table 4).

Replication of variant effects

Replication was performed in the TwinsUK cohort[43]. Genotyping was performed using a combination of Illumina arrays (HumanHap300, HumanHap610Q, 1M-Duo and 1.2MDuo 1M), and imputation performed using the IMPUTE software package (v2) and 1,000 Genomes haplotypes released on 16 Jun 2014-- Phase I integrated variant set release[36,66]. Details on quality controls are provided as Supplementary Note. HbA2 levels and HbF percentage were obtained by HPLC, and F-cells were enumerated after intracellular HbF staining and subsequent flow cytometry[67]. Measurements were available in 4,131 samples. Association analyses were performed with merlinoffline package in Merlin, to account for relatedness[62]. To be consistent with analyses performed in the SardiNIA study, age, age squared and gender were used as covariates and the traits transformed using quantile normalization.

Selection of candidate genes

At each locus, we defined a list of genes to be considered as plausible candidates if they satisfied one of the following: 1) genes that were +/− 25Kb of the lead SNP, indicated (p) in Table 1; 2) genes with exonic variants (frame-shift, stop-codon, non-synonymous and synonymous) along with splice-site and 5′/3′ UTR variants in LD (r2≥0.8) with the lead SNP (c); 3) genes whose expression was modulated by the SNP itself or by an eQTL in LD (r2≥0.8) with the top SNP (e); 4) genes with clear biological function connected to the traits (b); or 5) genes harboring variants responsible for which Mendelian diseases, as reported in OMIM (o). Candidate genes from eQTL data were searched using an automatized pipeline querying 16 eQTL public repositories[48,68-82], including the Pritchard eQTL browser; only top SNP eQTLs or any SNP with FDR < 0.05 were considered.

Pleiotropy and gene connections analysis

To characterize genome-wide significant results and to identify suggestively significant ones, we searched for effects shared between the different hemoglobin forms as well as evidence of connections between both. Specifically, for genome-wide significant markers, we simply reported the effect direction for all traits with p < 0.01 when a marker is associated at genome-wide level for one trait (see Table 1). To identify candidates with suggestive p-values between 1.00×10−04 and 5.00×10−08, we selected among these: - markers with MAF > 0.5 % and showing 2-trait combined p-values < 5×10−08; p-values were combined using inverse variance weighted meta-analysis, as implemented in Metal software[83]; - markers falling in or nearby genes that demonstrated evidence of connections with genome-wide significant loci, either in Pubmed (using the 2006 data set to avoid confounding by subsequent GWAS discoveries), or in Human Expression Atlas[37] and Gene Ontology[38] databases using GRAIL[39] and considering genes reported with multiple hypothesis corrected p-values < 0.05. Using these criteria, we identified 21 further genes with biological connections to genome-wide significant loci reported in Supplementary Table 8 and 14 variants with combined p-values between 2.08×10−08 and 1.18×10−11, reported in Supplementary Table 9.

81 in total

1. Androglobin: a chimeric globin in metazoans that is preferentially expressed in Mammalian testes.

Authors: David Hoogewijs; Bettina Ebner; Francesca Germani; Federico G Hoffmann; Andrej Fabrizius; Luc Moens; Thorsten Burmester; Sylvia Dewilde; Jay F Storz; Serge N Vinogradov; Thomas Hankeln
Journal: Mol Biol Evol Date: 2011-11-24 Impact factor: 16.240

Review 2. Computational tools for discovery and interpretation of expression quantitative trait loci.

Authors: Fred A Wright; Andrey A Shabalin; Ivan Rusyn
Journal: Pharmacogenomics Date: 2012-02 Impact factor: 2.533

Review 3. The multifunctional role of EKLF/KLF1 during erythropoiesis.

Authors: Miroslawa Siatecka; James J Bieker
Journal: Blood Date: 2011-05-25 Impact factor: 22.113

4. Genetics and beyond--the transcriptome of human monocytes and disease susceptibility.

Authors: Tanja Zeller; Philipp Wild; Silke Szymczak; Maxime Rotival; Arne Schillert; Raphaele Castagne; Seraya Maouche; Marine Germain; Karl Lackner; Heidi Rossmann; Medea Eleftheriadis; Christoph R Sinning; Renate B Schnabel; Edith Lubos; Detlev Mennerich; Werner Rust; Claire Perret; Carole Proust; Viviane Nicaud; Joseph Loscalzo; Norbert Hübner; David Tregouet; Thomas Münzel; Andreas Ziegler; Laurence Tiret; Stefan Blankenberg; François Cambien
Journal: PLoS One Date: 2010-05-18 Impact factor: 3.240

5. Mouse development and cell proliferation in the absence of D-cyclins.

Authors: Katarzyna Kozar; Maria A Ciemerych; Vivienne I Rebel; Hirokazu Shigematsu; Agnieszka Zagozdzon; Ewa Sicinska; Yan Geng; Qunyan Yu; Shoumo Bhattacharya; Roderick T Bronson; Koichi Akashi; Piotr Sicinski
Journal: Cell Date: 2004-08-20 Impact factor: 41.582

Review 6. Update on fetal hemoglobin gene regulation in hemoglobinopathies.

Authors: Daniel E Bauer; Stuart H Orkin
Journal: Curr Opin Pediatr Date: 2011-02 Impact factor: 2.856

7. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers.

Authors: Ralph Stadhouders; Suleyman Aktuna; Supat Thongjuea; Ali Aghajanirefah; Farzin Pourfarzad; Wilfred van Ijcken; Boris Lenhard; Helen Rooks; Steve Best; Stephan Menzel; Frank Grosveld; Swee Lay Thein; Eric Soler
Journal: J Clin Invest Date: 2014-03-10 Impact factor: 14.808

8. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Authors: Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller
Journal: Genome Res Date: 2013-10-03 Impact factor: 9.043

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. The UCSC Genome Browser database: 2014 update.

Authors: Donna Karolchik; Galt P Barber; Jonathan Casper; Hiram Clawson; Melissa S Cline; Mark Diekhans; Timothy R Dreszer; Pauline A Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A Harte; Steve Heitner; Angie S Hinrichs; Katrina Learned; Brian T Lee; Chin H Li; Brian J Raney; Brooke Rhead; Kate R Rosenbloom; Cricket A Sloan; Matthew L Speir; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent
Journal: Nucleic Acids Res Date: 2013-11-21 Impact factor: 16.971

30 in total

1. Small island, big genetic discoveries.

Authors: Guillaume Lettre; Joel N Hirschhorn
Journal: Nat Genet Date: 2015-11 Impact factor: 38.330

2. Confounding effects of microbiome on the susceptibility of TNFSF15 to Crohn's disease in the Ryukyu Islands.

Authors: Shigeki Nakagome; Hiroshi Chinen; Atsushi Iraha; Akira Hokama; Yasuaki Takeyama; Shotaro Sakisaka; Toshiyuki Matsui; Judith R Kidd; Kenneth K Kidd; Heba S Said; Wataru Suda; Hidetoshi Morita; Masahira Hattori; Tsunehiko Hanihara; Ryosuke Kimura; Hajime Ishida; Jiro Fujita; Fukunori Kinjo; Shuhei Mano; Hiroki Oota
Journal: Hum Genet Date: 2017-02-14 Impact factor: 4.132

Review 3. Genetic-Driven Druggable Target Identification and Validation.

Authors: Matteo Floris; Stefania Olla; David Schlessinger; Francesco Cucca
Journal: Trends Genet Date: 2018-05-23 Impact factor: 11.639

4. Overexpression of the Cytokine BAFF and Autoimmunity Risk.

Authors: Maristella Steri; Valeria Orrù; M Laura Idda; Maristella Pitzalis; Mauro Pala; Ilenia Zara; Carlo Sidore; Valeria Faà; Matteo Floris; Manila Deiana; Isadora Asunis; Eleonora Porcu; Antonella Mulas; Maria G Piras; Monia Lobina; Sandra Lai; Mara Marongiu; Valentina Serra; Michele Marongiu; Gabriella Sole; Fabio Busonero; Andrea Maschio; Roberto Cusano; Gianmauro Cuccuru; Francesca Deidda; Fausto Poddie; Gabriele Farina; Mariano Dei; Francesca Virdis; Stefania Olla; Maria A Satta; Mario Pani; Alessandro Delitala; Eleonora Cocco; Jessica Frau; Giancarlo Coghe; Lorena Lorefice; Giuseppe Fenu; Paola Ferrigno; Maria Ban; Nadia Barizzone; Maurizio Leone; Franca R Guerini; Matteo Piga; Davide Firinu; Ingrid Kockum; Izaura Lima Bomfim; Tomas Olsson; Lars Alfredsson; Ana Suarez; Patricia E Carreira; Maria J Castillo-Palma; Joseph H Marcus; Mauro Congia; Andrea Angius; Maurizio Melis; Antonio Gonzalez; Marta E Alarcón Riquelme; Berta M da Silva; Maurizio Marchini; Maria G Danieli; Stefano Del Giacco; Alessandro Mathieu; Antonello Pani; Stephen B Montgomery; Giulio Rosati; Jan Hillert; Stephen Sawcer; Sandra D'Alfonso; John A Todd; John Novembre; Gonçalo R Abecasis; Michael B Whalen; Maria G Marrosu; Alessandra Meloni; Serena Sanna; Myriam Gorospe; David Schlessinger; Edoardo Fiorillo; Magdalena Zoledziewska; Francesco Cucca
Journal: N Engl J Med Date: 2017-04-27 Impact factor: 91.245

5. Whole-genome sequencing in French Canadians from Quebec.

Authors: Cécile Low-Kam; David Rhainds; Ken Sin Lo; Sylvie Provost; Ian Mongrain; Anick Dubois; Sylvie Perreault; John F Robinson; Robert A Hegele; Marie-Pierre Dubé; Jean-Claude Tardif; Guillaume Lettre
Journal: Hum Genet Date: 2016-07-04 Impact factor: 4.132

6. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study.

Authors: Paul S de Vries; Bing Yu; Elena V Feofanova; Ginger A Metcalf; Michael R Brown; Atefeh L Zeighami; Xiaoming Liu; Donna M Muzny; Richard A Gibbs; Eric Boerwinkle; Alanna C Morrison
Journal: Hum Mol Genet Date: 2017-09-01 Impact factor: 6.150

7. 14q32 and let-7 microRNAs regulate transcriptional networks in fetal and adult human erythroblasts.

Authors: Samuel Lessard; Mélissa Beaudoin; Stuart H Orkin; Daniel E Bauer; Guillaume Lettre
Journal: Hum Mol Genet Date: 2018-04-15 Impact factor: 6.150

8. Fast permutation tests and related methods, for association between rare variants and binary outcomes.

Authors: Arjun Sondhi; Kenneth Martin Rice
Journal: Ann Hum Genet Date: 2017-12-18 Impact factor: 1.670

9. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps.

Authors: Alexander P Reiner; Paul L Auer; Nicole Soranzo; Valentina Iotchkova; Jie Huang; John A Morris; Deepti Jain; Caterina Barbieri; Klaudia Walter; Josine L Min; Lu Chen; William Astle; Massimilian Cocca; Patrick Deelen; Heather Elding; Aliki-Eleni Farmaki; Christopher S Franklin; Mattias Franberg; Tom R Gaunt; Albert Hofman; Tao Jiang; Marcus E Kleber; Genevieve Lachance; Jian'an Luan; Giovanni Malerba; Angela Matchan; Daniel Mead; Yasin Memari; Ioanna Ntalla; Kalliope Panoutsopoulou; Raha Pazoki; John R B Perry; Fernando Rivadeneira; Maria Sabater-Lleal; Bengt Sennblad; So-Youn Shin; Lorraine Southam; Michela Traglia; Freerk van Dijk; Elisabeth M van Leeuwen; Gianluigi Zaza; Weihua Zhang; Najaf Amin; Adam Butterworth; John C Chambers; George Dedoussis; Abbas Dehghan; Oscar H Franco; Lude Franke; Mattia Frontini; Giovanni Gambaro; Paolo Gasparini; Anders Hamsten; Aaron Issacs; Jaspal S Kooner; Charles Kooperberg; Claudia Langenberg; Winfried Marz; Robert A Scott; Morris A Swertz; Daniela Toniolo; Andre G Uitterlinden; Cornelia M van Duijn; Hugh Watkins; Eleftheria Zeggini; Mathew T Maurano; Nicholas J Timpson
Journal: Nat Genet Date: 2016-09-26 Impact factor: 38.330

10. A polygenic score for acute vaso-occlusive pain in pediatric sickle cell disease.

Authors: Evadnie Rampersaud; Guolian Kang; Lance E Palmer; Sara R Rashkin; Shuoguo Wang; Wenjian Bi; Nicole M Alberts; Doralina Anghelescu; Martha Barton; Kirby Birch; Nidal Boulos; Amanda M Brandow; Russell John Brooke; Ti-Cheng Chang; Wenan Chen; Yong Cheng; Juan Ding; John Easton; Jason R Hodges; Celeste K Kanne; Shawn Levy; Heather Mulder; Ashwin P Patel; Latika Puri; Celeste Rosencrance; Michael Rusch; Yadav Sapkota; Edgar Sioson; Akshay Sharma; Xing Tang; Andrew Thrasher; Winfred Wang; Yu Yao; Yutaka Yasui; Donald Yergeau; Jane S Hankins; Vivien A Sheehan; James R Downing; Jeremie H Estepp; Jinghui Zhang; Michael DeBaun; Gang Wu; Mitchell J Weiss
Journal: Blood Adv Date: 2021-07-27