Literature DB >> 23423242

A proximity-based method to identify genomic regions correlated with a continuously varying environmental variable.

Cornelia Di Gaetano¹, Giuseppe Matullo, Alberto Piazza, Moreno Ursino, Mauro Gasparini.

Abstract

Knowledge of markers in the human genome which show spatial patterns and display extreme correlation with different environmental determinants play an important role in understanding the factors which affect the biological evolution of our species. We used the genotype data of more than half a million single nucleotide polymorphisms (SNPs) from the data set Human Genome Diversity Panel (HGDP-CEPH -CEPH) and we calculated Spearman's correlation between absolute latitude and one of the two allele frequencies of each SNP. We selected SNPs with a correlation coefficient within the upper 1% tail of the distribution. We then used a criterion of proximity between significant variants to focus on DNA regions showing a continuous signal over a portion of the genome. Based on external information and genome annotations, we demonstrated that most regions with the strongest signals also have biological relevance. We believe this proximity requirement adds an edge to our novel method compared to the existing literature, highlighting several genes (for example DTNB, DOT1L, TPCN2, RELN, MSRA, NRG3) related to body size or shape, human height, hair color, and schizophrenia. Our approach can be applied generally to any measure of association between polymorphic frequencies and continuously varying environmental variables.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: adaptations; latitude; outlier approach; point processes; spatial patterns

Year: 2013 PMID： 23423242 PMCID： PMC3565544 DOI： 10.4137/EBO.S10211

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

From an evolutionary point of view, human biological variation can result from natural selection, genetic drift and demographic processes. In human population genetics, several ways have been found to highlight genes that may be subject to selective pressures, and in recent years whole genome scanning techniques have made it possible to find signatures of selection.1–4 The Human Genome Diversity Project (HGDP-CEPH) database5 has been repeatedly investigated in order to identify markers in the human genome which show geographical patterns and to explain how different selective forces can shape human genetic variations across continents. One strategy for the detection of spatial selection signatures is the outlier approach.2,6,7 Using genome-wide data sets genotyped in different human populations, genetic variables—such as single nucleotide polymorphisms (SNPs)—that exhibit extreme correlations with latitude or with other environmental determinants are identified as candidate targets for selective pressure. By “extreme correlation” we mean that the value of a certain statistic, measuring the strength of the relationship between allele frequencies and latitude or other environmental variables, falls in the tails of the distribution of the same statistic over the whole genome. Many choices are possible for the relevant statistic, ranging from a simple (either Pearson or Spearman) correlation coefficient between the latitude and the frequencies of either one or two alleles of a SNP to a Bayes factor comparing two models that do and do not, respectively, take into account the effect of a dichotomous environmental variable on the distribution of a genetic variant. From a technical point of view, the outlier approach is just a reformulation of the concept of statistical significance, ie, variation with respect to a reference distribution. The outlier approach has been used to study sodium homeostasis balance as an example of adaptation. In hot and dry climates, genes influencing salt and water retention are favored by selection, explaining in this way large inter-ethnic differences in the prevalence of salt-sensitive hypertension.8,9 Other important research has been conducted to assess the correlation between four variables that summarize climate and the frequencies of 873 tag SNPs in 82 genes related to energy metabolic pathways.6 The outlier approach has also been used to demonstrate that allele frequencies of a subset of genes coding for blood group antigens vary with levels of pathogen richness, supporting the idea that these loci affect susceptibility to infectious diseases.10 This finding, which is compatible with previous evidences on the correlation between HLA class I diversity and pathogen richness,11 is important for stressing the role of diseases and pathogens, like virus protozoa fungi, in shaping human variations.12 Finally, a very comprehensive article on the HGDP-CEPH database (enriched with the Hap Map and other human populations databases) has recently been published, in which the outlier approach is used to highlight polymorphisms and pathways correlated with ecoregion membership and diet.13 Our idea is to reinforce the outlier approach by considering a criterion of proximity between significant variants. In the search for targets of selective pressure, we believe it is important to focus on those DNA regions which repeatedly contain values which are labeled as significant by the outlier approach. In other words, we look for evidence of a continuous signal over a portion of the genome which can strengthen the significance of a cluster of markers labeled as significant by the outlier approach alone and we built statistical tools. In this paper we therefore adopt a search-and- confirm approach which integrates the outlier approach by identifying regions of the genome where not just one, but a significant number of SNPs are located in the tails of the distribution of the relevant statistic, when compared to the number of SNPs originally genotyped in the same region. This is done in the following three steps, which are further illustrated in the complete workflow process diagram in Figure 1:

Figure 1

Graphical workflow process for the study.

The outlier method: We identify 1% significant SNPs as having an absolute value of the Spearman correlation coefficient with latitude above its 99th percentile; The proximity-based algorithm: Using the methods described in detail in the Materials and Methods section, we select candidate regions in the genome which exhibit the strongest signals, ie, the regions where the significant SNPs identified above are present at a significantly higher rate when compared to the number of originally genotyped SNPs; Biological relevance: We investigate the biological relevance of the strongest signals by comparing our data with results from Genome Wide Association studies (GWAs),14 by studying the canonical pathway processes through gene-annotation enrichment analysis15 and by comparing our analysis with previously published genomic scans for selective sweep.3,16

Materials and Methods

We describe here our methods with reference to the three-step process described in the Introduction.

Step 1: Our data and the outlier method

We used a data set of 660,832 SNPs genotyped in 51 human populations distributed worldwide from the HGDP-CEPH panel.1 As underlined by a previous article,17 within the HGDP-CEPH panel there are some closely-related individuals; in order to overcome this possible source of bias we excluded one member of each relative pair and we used 938 HGDP-CEPH individuals. Information about sample sizes and latitudes of the populations can be found on the CEPH homepage http://www.cephb.fr/en/hgdp/table.php.5 Only 22 autosomes are included in our analysis; we also removed SNPs with more than 10% of missing genotypes and the ones that failed the Hardy- Weinberg equilibrium test in at least one population. After filtering, we use 545,209 SNPs. Statistical analysis is performed using R.18 We calculated Spearman’s correlation (the correlation coefficient between the ranks of two variables) between absolute latitude and one of the two alleles of each SNP and, using the outlier approach, we identified those SNPs which have an absolute Spearman’s correlation coefficient falling in the upper 1% tail of the distribution (Fig. 2).

Figure 2

(Panel A) Histogram of the values of Spearman’s correlation coefficient over all the SNPs and theoretical approximate density of the Spearman’s correlation coefficient under the hypothesis of population null correlation. (Panel B) Histogram of the absolute values of Spearman’s correlation coefficient over all the SNPs. Using the outlier approach, we identify significant SNPs in the 1% upper tail of this distribution.

Step 2: The proximity-based algorithm

For each chromosome, we now have two sequences of serial positions: one for all genotyped SNPs and one for the significant SNPs, the latter of which are included in the former. Each chromosome is indexed by the sequence of base pairs: as an approximation, we can view a chromosome as a linear segment and the position of a SNP as a point of that linear segment. Based on the two sequences of points, we can define two cumulative counts depending on a generic point l, known in statistics as counting processes: with l varying from 1 (the first bp in the chromosome) to the position of the last bp of the chromosome. As an example, the two counting processes are plotted for chromosome 1 in Figure 3. Cumulative counts are a convenient way to compare the incidences of the different kinds of SNPs over different genomic regions (a simple dot plot would not do it, due to the sheer number of SNPs involved). If, over a certain segment of the chromosome, there is a greater-than-usual incidence of significant SNPs, then the relative increment of S.01(l) over that segment will be greater than the relative increment of S(l) over the same segment. In other words, the graph of the S.01(l) counting process will be steeper than S(l), up to a proportionality factor. Our proposal is to identify those genome regions which exhibit extreme concentrations of outlying SNPs.

Figure 3

Counting process representation of the location of the candidate regions of chromosome 1.

Notes: The thicker step function represents cumulative counts of all originally genotyped SNPS and refers to the main ordinate scale, on the left. The thinner step function represents cumulative counts of significant SNPs and refers to the ordinate scale on the right. Sixteen regions identified by our method are shown as small vertical segments on the abscissa axis. The zooming box on the upper left part of the graph shows two of them (gray bands) located around position 202500 kb, as guided by the arrows.

We could formalize this search as a change-point problem for counting processes: in certain intervals to be estimated, the intensity of the S(l) point process—a function modelling the instantaneous rate of incidence of the process—would be higher than in other regions. Due to the size of the problem and to the approximate nature of our search- and -confirm approach, we prefer a simpler proximity-based algorithm as follows. For each pair of significant SNPs located at points l1 and l2, with l1 < l2 on the chromosome, we define ie, the observed incidence rate of significant SNPs per original SNP. This statistic over the sliding window (l1, l2) plays a central role in our proximity-based algorithm. As a technical note, it would probably be a good idea to penalize large windows, for example by dividing the U(l1, l2) statistic above by a penalty term (l2–l1)g with g equal to some number between 0 and 1. The final results would not change a lot (results not shown) and it would be difficult to commit to a specific g; therefore we decide to use the U(l1, l2) statistic without a penalty term. For each chromosome and for each significant SNP in position l1, we computer U(l1, l2) for each of the other significant SNPs in position l2 within a distance of 1000 Kb from the original one. This is done to reduce the problem to a manageable size, under the assumption that relevant proximities are smaller than 1000 Kb. We built the new reference distribution of all U(l1, l2) values over all chromosomes, excluding from the analysis all U(l1, l2) values relative to intervals (l1, l2) which included fewer SNPs than a threshold s, which has been chosen to be equal to 3 in this work. This is done to avoid very high automatic values of U(l1, l2) when two significant SNPs happen to be adjacent. We selected the first 1000 SNPs contained in regions corresponding to the highest U(l1, l2) values. A fixed number, rather than a fix tail area, was chosen to facilitate the discussion of the robustness of our method to varying parameters (see end of section Results).

Step 3: Biological relevance of the strongest signals

To accomplish step 3 as outlined in the Introduction, we proceeded to the biological cross-validation of our findings, which insofar had been based mainly on statistical grounds. We focused on the genes tagged by the SNPs we found, since our goal was to detect continuous signals coming from proximal groups of SNPs belonging to the same gene. To link our findings to the results of genome wide data, we first compared our gene list with the June 2012 update of the Catalog of Publish ed GWAs.14 Next, we scanned our gene list using a bioinformatic enrichment tool named Genecodis 2.015 to obtain a summary of the most enriched biological processes or pathways. Finally, we compared our analysis with previously published genomic scans for selective sweep in order to find possible overlaps in signals.

Results

We calculated Spearman’s correlation between absolute latitude and one of the two alleles of the SNPs found in the HGDP-CEPH panel and, following the outlier approach, we identified those SNPs which have an absolute Spearman’s correlation coefficient falling in the upper 1% tail of the distribution. The histogram of Spearman’s correlations ρ’s is plotted in Figure 2A. Its null distribution for 51 pairs of numbers has been overlaid on the same graph (Fig. 2A). It is a normal distribution with variance 1/50 due to a well-known result.19 The discrepancy between the two distributions is due to SNPs which are correlated with latitude for reasons other than chance alone, for example due to environmental selection factors. Following the outlier approach, the upper 1% of the distribution of the absolute value of ρ, corresponding to |ρ| > 0.606, is identified in the histogram of the absolute value of ρ (Fig. 2B). It corresponds to 5452 outlying SNPs in the tails of the ρ distribution. The candidate regions and the annotations emerging from the application of Step 2 described in the Introduction are contained in Additional 1 in the online supporting information. As an example, candidate regions which were identified in chromosome 1 are shown in Figure 3. The 1000 top SNPs emerging from the proximity-based algorithm enabled us to identify 467 intergenic and 533 genic SNPs, harboring 146 genes. We found 23 coding non synonymous (NS) changes and 6 coding synonymous changes. 372 were intronic and 107 were on the mRNA 3′UTR. Finally, we gathered the biological knowledge of the strongest signals by comparing them to the Catalog of Published Genome-Wide Association Studies updated to June 2012. The genes which appear on this Catalog and additionally appear in candidate regions according to our proximity-based algorithm, are shown in Additional file 2. A short list of the most interesting signals are shown in Table 1. Several genes shown in that table are associated with metabolism-related phenotypes (like celiac disease for IL21 interleukin 21, Gene id 59067)20 and adiposity (MSRA Gene id4482) or variants associated with hair color in Europeans, like TPCN2 gene (two pore segment channel 2, gene ID 219931)21 and several with schizophrenia. At the same time, we compared our gene list with genes reported in OMIM. Several of our genes which show a correlation with latitude also implied some traits. For example, DOT1L gene (DOT1-like, histone H3 methyltransferase Saccharomyces cerevisiae) gene ID 84444 is associated with height22 or DTNB gene dystrobrevin, beta ID 1838 which is affecting adult human height.23 A complete table with the genes reported also in OMIM Disease database is in Additional file 3.

Table 1

List of several genes reported in previously published GWAs and showing continuous correlation signals with our proximity based method.

Reported gene(s)	Trait	Region	NCBI ID	Gene description	Reference
C9orf3	Erectile dysfunction and prostate cancer treatment	9q22.32	84909	Chromosome 9 open reading frame 3	41
ABL1	Response to amphetamine	9q34.12	25	v-abl Abelson murine leukemia viral oncogene homolog 1	42
DTNB	Adult human height	2p23.3	1838	Dystrobrevin, beta	23
DTNB	Coronary heart disease	2p23.3	1838	Dystrobrevin, beta	43
TPCN2	Hair pigmentation in Europeans	11q13.3	219931	Two pore segment channel 2	21
DOT1L	Associated with height	19p13.3	84444	DOT1-like, histone H3 methyltransferase (S. cerevisiae)	22,35
RELN	Susceptibility and clinical phenotype in multiple sclerosis	7q22.1	5649	Reelin	44
RELN	Increases the risk of schizophrenia only in women	7q22.1	5649	Reelin	34
IL21	Celiac disease	4q27	59067	Interleukin 21	38,45
DOCK2	Protein quantitative trait loci	5q35.1	1794	Dedicator of cytokinesis 2	46
FRMD4B	Celiac disease	3p14.1	23150	FERM domain containing 4B	38
MAGI2	Hippocampal atrophy	7q21.11	9863	Membrane associated guanylate kinase, WW and PDZ domain containing 2	47
NCALD	Cognitive performance	8q22.3	83988	Neurocalcin delta	48
NRG3	Response to iloperidone treatment (QT prolongation)	10q23.1	10718	Neuregulin 3	49
RUNX3	Celiac disease	1p36.11	864	Runt-related transcription factor 3	38
SDK1	Quantitative traits	7p22.2	221935	Sidekick homolog 1 (chicken)	50
MSRA	Adiposity	8p23.1	4482	Methionine sulfoxide reductase A	27
MSRA	Hypertension	8p23.1	4482	Methionine sulfoxide reductase A	28
MSRA	Schizophrenia	8p23.1	4482	Methionine sulfoxide reductase A	51
MSRA	Bipolar disorder and schizophrenia	8p23.1	4482	Methionine sulfoxide reductase A	26

We analyzed Kyoto Encyclopedia of Genes and Genomes pathways (KEGG) using as reference set all genes in the Entrez-gene database and, as a statistical test, the hypergeometric one with a Benjamini-Hochberg correction for multiple testing at significance level equal to 0.05. Several KEGG pathways reached significance. The first was the extracellular matrix (ECM) receptor interaction (KEGG number: hsa04512) for the following genes: RELN reelin gene ID 5649; ITGB6 integrin beta 6 gene ID 3694; COL6A3 collagen, type VI, alpha 3 gene ID 1293. This pathway reaches a raw P-value of the hypergeometric test equal to 0.0011 and a P-value adjusted for multiplicity around 0.01. In order to look for overlaps with scans of the human genome for signals of positive natural selection, we compared our results with SNPs with significant composite of multiple signals (CMS) but only one intersection was found between the two gene lists concerning rs2256670 and rs2711853 both on RELN reelin, gene ID 5649.16 A variety of choices were made in the actual implementation of the proximity-based algorithm described in Step 2 in the previous section. The two most important parameters set to reasonable values are (a) the maximum distance over which we search, which is set to 1000 Kb in Step 2, and (b) the minimum number of consecutive SNPs required, which is set to 3 in Step 2. In order to study the robustness of our method with respect to different values of these parameters, we varied the maximum distance and noticed (not shown) that the results where unchanged for distances down to 100 Kb. The algorithm is instead sensitive to the minimum number of consecutive SNPs required: if we increase it from 3 to 5, for example (it would not make sense to consider a minimum much higher than 5), different SNPs and regions turn out to be significant, as shown in Table 2. For example, the number of selected SNPs shared when applying a minimum of 5 and when applying a minimum of 3 is 64%. This made us consider what would happen for varying this threshold. The changes are not dramatic (Table 2) but some interesting genes, like AGT, ADCY9 and WWOX would come out from the analysis with a threshold equal to 5.

Table 2

Percentage of common SNPs when varying the minimum number of consecutive SNPs required.

% concordance	3 SNPs	4 SNPs	5 SNPs
3 SNPs	100.00%	74.70%	64.00%
4 SNPs		100.00%	80.80%
5 SNPs			100.00%

Discussion

In this paper we examined the HGDP-CEPH data again by integrating the outlier approach with a novel proximity-based algorithm. Only latitude was used for ecological conditions, rather than using a multiplicity of variables as in Hancock et al6 for example. We made this choice for the sake of simplicity, since latitude is correlated with different variables like short wave radiation flux, mean winter and summer temperatures, rainfall and pathogen richness. It should therefore provide a good proxy for the selective pressures that shaped variation in our genome. Even though we use a simple correlation measure such as Spearman’s ρ with latitude only, we emphasize that the resulting signal should be a continuous and persistent proportion of background information, represented by all originally genotyped SNPs. We believe this proximity requirement adds an edge to our novel method when compared to existing literature. Our approach is applicable to any measure of association between polymorphic frequencies and environmental variables. It could be applied, for example, to complex statistics such as the minimum rank statistic, based on Bayes Factors and on rank transformations, of Hancock et al.13 With our method we identifed different genes, some of them already reported in the literature, dealing with different traits or diseases. GWAs include the scanning of all or most of the genes of different individuals aimed at finding susceptibility loci for traits or diseases. GWAs, so far, have allowed the identification of more than 7688 associated SNPs in humans. We compared our list of genes with GWAs results. Some interesting signals can be pointed out, for instance the correlation between skin pigmentation and latitude. It is well known that two coding variants in TPCN2 are associated with hair color in Europeans.21 At the same time MSRA (methionine sulfoxide reductase A gene) is related to the melanin formation in the hair follicle melanocyte.24 Remarkably, MSRA gene is also related to schizophrenia25,26 but also with adiposity27 and hypertension.28 Several other genes in our list (see Additional file 1) can be associated with vitamin D related genes, known to show a latitude driven cline.7 An example is SMARCA2, (SWI/SNF related, matrix associated, act in dependent regulator of chromatin, subfamily a, member 2), described as a component of a human multiprotein complex that that interacts directly with the vitamin D receptor. Schizophrenia genes are correlated with latitude and in our list several schizophrenia genes appear, like GRID1,29,30MAGI2,31NRG3,32NRXN3,33RARB and RELN.34 Region CYP19A1 in our list is known from GWAs to exhib it association with adult height35,36 whose distribution is related to latitude. Two more genes in our list, DOT1-like, histone H3 methyltransferase (S. cerevisiae)22,35 and dystrobrevin, beta23 are reported in OMIM to be related with height. Several others genes are related to Celiac Disease (CD) which strongly correlates with latitude. Infectious agents are implicated in the pathogenesis of many autoimmune diseases like CD. This observation may imply that there is a relationship between one or more infectious agents, latitude related environmental exposure to gluten and others genetic susceptibility loci, and the development of this disease. For a complete review see Plot and Amital, 2009.37 The RUNX3 gene and IL21, in our list, are implicated with CD.38 In the same paper, another gene FRMD4B previously known as GRSP1, appearing in our Table 1 is also associated with CD.38RUNX3 gene is also required for CD8 T cell development during thymopoiesis.39 One of the most interesting genes highlighted by our work is ANK2 (ankyrin 2, neuronal) which is implicated in cardiac arrhythmias due to abnormal variations in QT interval.40 Finally, the enrichment of genes in the KEGG pathway called extracellular matrix (ECM) receptor interaction (KEGG number: hsa04512) is note worth because these molecules are exploited by a number of pathogenic micro-organisms as receptors for cell entry. This can be interpreted as a signal of different forces played by pathogens on living cells in different environments.

Conclusions

Our study complements the growing body of knowledge surrounding scans for natural selection in humans using a method that uses the proximity criterion in addition to the outlier approach. Our findings support the hypothesis that latitudinal genetic diversity gradients are present in humans and reflect genetic adaptations to different environmental pressures that have shaped the human genome. Additional file 1: SNPs and regions from the proximity- based algorithm. Additional file 1 in the online supporting information contains all regions selected by the proximitybased method, duly annotated. Additional file 2: The complete list of genes reported in previously published GWAs and showing continuous correlation signals with our proximity based method. Additional file 3: The complete list of genes reported in OMIM and showing continuous correlation signals with our proximity based method. Additional file 4: R scripts. Additional file 4 in the online supporting information contains R scripts to perform the necessary calculations.

47 in total

1. Many sequence variants affecting diversity of adult human height.

Authors: Daniel F Gudbjartsson; G Bragi Walters; Gudmar Thorleifsson; Hreinn Stefansson; Bjarni V Halldorsson; Pasha Zusmanovich; Patrick Sulem; Steinunn Thorlacius; Arnaldur Gylfason; Stacy Steinberg; Anna Helgadottir; Andres Ingason; Valgerdur Steinthorsdottir; Elinborg J Olafsdottir; Gudridur H Olafsdottir; Thorvaldur Jonsson; Knut Borch-Johnsen; Torben Hansen; Gitte Andersen; Torben Jorgensen; Oluf Pedersen; Katja K Aben; J Alfred Witjes; Dorine W Swinkels; Martin den Heijer; Barbara Franke; Andre L M Verbeek; Diane M Becker; Lisa R Yanek; Lewis C Becker; Laufey Tryggvadottir; Thorunn Rafnar; Jeffrey Gulcher; Lambertus A Kiemeney; Augustine Kong; Unnur Thorsteinsdottir; Kari Stefansson
Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330

2. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives.

Authors: Noah A Rosenberg
Journal: Ann Hum Genet Date: 2006-11 Impact factor: 1.670

3. Worldwide human relationships inferred from genome-wide patterns of variation.

Authors: Jun Z Li; Devin M Absher; Hua Tang; Audrey M Southwick; Amanda M Casto; Sohini Ramachandran; Howard M Cann; Gregory S Barsh; Marcus Feldman; Luigi L Cavalli-Sforza; Richard M Myers
Journal: Science Date: 2008-02-22 Impact factor: 47.728

4. Newly identified genetic risk variants for celiac disease related to the immune response.

Authors: Karen A Hunt; Alexandra Zhernakova; Graham Turner; Graham A R Heap; Lude Franke; Marcel Bruinenberg; Jihane Romanos; Lotte C Dinesen; Anthony W Ryan; Davinder Panesar; Rhian Gwilliam; Fumihiko Takeuchi; William M McLaren; Geoffrey K T Holmes; Peter D Howdle; Julian R F Walters; David S Sanders; Raymond J Playford; Gosia Trynka; Chris J J Mulder; M Luisa Mearin; Wieke H M Verbeek; Valerie Trimble; Fiona M Stevens; Colm O'Morain; Nicholas P Kennedy; Dermot Kelleher; Daniel J Pennington; David P Strachan; Wendy L McArdle; Charles A Mein; Martin C Wapenaar; Panos Deloukas; Ralph McGinnis; Ross McManus; Cisca Wijmenga; David A van Heel
Journal: Nat Genet Date: 2008-03-02 Impact factor: 38.330

5. Identification of ten loci associated with height highlights new biological pathways in human growth.

Authors: Guillaume Lettre; Anne U Jackson; Christian Gieger; Fredrick R Schumacher; Sonja I Berndt; Serena Sanna; Susana Eyheramendy; Benjamin F Voight; Johannah L Butler; Candace Guiducci; Thomas Illig; Rachel Hackett; Iris M Heid; Kevin B Jacobs; Valeriya Lyssenko; Manuela Uda; Michael Boehnke; Stephen J Chanock; Leif C Groop; Frank B Hu; Bo Isomaa; Peter Kraft; Leena Peltonen; Veikko Salomaa; David Schlessinger; David J Hunter; Richard B Hayes; Gonçalo R Abecasis; H-Erich Wichmann; Karen L Mohlke; Joel N Hirschhorn
Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330

6. GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists.

Authors: Pedro Carmona-Saez; Monica Chagoyen; Francisco Tirado; Jose M Carazo; Alberto Pascual-Montano
Journal: Genome Biol Date: 2007 Impact factor: 13.583

7. Adaptations to climate in candidate genes for common metabolic disorders.

Authors: Angela M Hancock; David B Witonsky; Adam S Gordon; Gidon Eshel; Jonathan K Pritchard; Graham Coop; Anna Di Rienzo
Journal: PLoS Genet Date: 2008-02 Impact factor: 5.917

8. Framingham Heart Study 100K Project: genome-wide associations for blood pressure and arterial stiffness.

Authors: Daniel Levy; Martin G Larson; Emelia J Benjamin; Christopher Newton-Cheh; Thomas J Wang; Shih-Jen Hwang; Ramachandran S Vasan; Gary F Mitchell
Journal: BMC Med Genet Date: 2007-09-19 Impact factor: 2.103

9. Genome-wide association identifies a common variant in the reelin gene that increases the risk of schizophrenia only in women.

Authors: Sagiv Shifman; Martina Johannesson; Michal Bronstein; Sam X Chen; David A Collier; Nicholas J Craddock; Kenneth S Kendler; Tao Li; Michael O'Donovan; F Anthony O'Neill; Michael J Owen; Dermot Walsh; Daniel R Weinberger; Cuie Sun; Jonathan Flint; Ariel Darvasi
Journal: PLoS Genet Date: 2008-02 Impact factor: 5.917

10. A genome-wide association study identifies protein quantitative trait loci (pQTLs).

Authors: David Melzer; John R B Perry; Dena Hernandez; Anna-Maria Corsi; Kara Stevens; Ian Rafferty; Fulvio Lauretani; Anna Murray; J Raphael Gibbs; Giuseppe Paolisso; Sajjad Rafiq; Javier Simon-Sanchez; Hana Lango; Sonja Scholz; Michael N Weedon; Sampath Arepalli; Neil Rice; Nicole Washecka; Alison Hurst; Angela Britton; William Henley; Joyce van de Leemput; Rongling Li; Anne B Newman; Greg Tranah; Tamara Harris; Vijay Panicker; Colin Dayan; Amanda Bennett; Mark I McCarthy; Aimo Ruokonen; Marjo-Riitta Jarvelin; Jack Guralnik; Stefania Bandinelli; Timothy M Frayling; Andrew Singleton; Luigi Ferrucci
Journal: PLoS Genet Date: 2008-05-09 Impact factor: 5.917

2 in total

Review 1. Neuregulin 3 and its roles in schizophrenia risk and presentation.

Authors: Dimitrios Avramopoulos
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2017-05-29 Impact factor: 3.568

2. Involvement of ANXA5 and ILKAP in susceptibility to malignant melanoma.

Authors: Yoana Arroyo-Berdugo; Santos Alonso; Gloría Ribas; Maider Ibarrola-Villava; María Peña-Chilet; Conrado Martínez-Cadenas; Jesús Gardeazabal; Juan Antonio Ratón-Nieto; Ana Sánchez-Díez; Jesús María Careaga; Gorka Pérez-Yarza; Gregorio Carretero; Manuel Martín-González; Cristina Gómez-Fernández; Eduardo Nagore; Aintzane Asumendi; María Dolores Boyano
Journal: PLoS One Date: 2014-04-17 Impact factor: 3.240

2 in total