Literature DB >> 26394715

SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations.

Abstract

Next-generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype-phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence-based associations. The R package, its source code and documentations are available from http://cran.r-project.org/web/packages/seqminer and http://seqminer.genomic.codes/.

Entities: Chemical Disease Gene Species

Keywords: genome annotation; R; information retrieval; next-generation sequencing; statistical genetics

Mesh：

Year: 2015 PMID： 26394715 PMCID： PMC4794281 DOI： 10.1002/gepi.21918

Source DB: PubMed Journal: Genet Epidemiol ISSN： 0741-0395 Impact factor: 2.135

Introduction

Next‐generation sequencing has made it possible to assess the impact of a full spectrum of functional variants on complex human diseases. Many large‐scale genetic studies are being performed to gain insights from genotype‐phenotype association data, to learn the underlying biology, and to enhance genomics‐guided clinical translations. One area of particular interest in statistical genetics is in developing novel methods to enhance the analysis and interpretation of summary association statistics. Currently, it is a standard practice for consortia studies to release summary association statistics publically. These datasets have become an invaluable resource. Numerous methods have been developed to leverage this information to conduct fine mapping [Kichaev et al., 2014], perform gene‐level association tests [Lee et al., 2013; Liu et al., 2014], and infer causal relationships between biomarkers and diseases [Burgess et al., 2014; Do et al., 2013]. There is an ever‐increasing need for new methods and tools to more effectively use these datasets. The statistical package R is a popular platform for data analysis and methodology development. In order to facilitate the development of new methods in R for the functional interpretation of genotype‐phenotype associations, we developed the R package SEQMINER with a variety of useful features: first, SEQMINER supports variant annotation and data integration for whole‐genome datasets of summary association statistics. Second, SEQMINER allows efficient random access and queries for sequence datasets, files of summary association statistics, and files for the correlation coefficients between summary association statistics. Retrieved information can be automatically parsed and made ready for downstream analysis. Finally, SEQMINER is self‐contained and optimized for statistical genetics analyses of summary association statistics. Many commonly used features in statistical genetics can be performed using a minimum of steps, which considerably expedites the analyses and method development. We evaluated the performance of SEQMINER using datasets from the 1000 Genomes Project [Consortium et al., 2012]. We show that SEQMINER is highly efficient for annotating and querying summary association statistics. It can be a valuable tool to facilitate the analyses and interpretation of sequence‐based association analyses.

Methods

Method Overview

Major uses of summary association statistics can include the following: (1) fine‐mapping GWAS loci and identifying causal variants; (2) performing gene‐level (or pathway‐level) association tests; and (3) performing Mendelian randomizations to study the causal impact of risk factors on diseases. These applications require knowledge of the functional annotations of DNA sequence variants and linkage disequilibrium information between sequence variants, and often require joint analyses of summary association statistics between multiple studies and multiple traits. Given the ever‐increasing scale of sequence datasets and the complexity of bioinformatics databases, it is necessary to develop software packages in R that can effectively and efficiently integrate these datasets and retrieve information of interest. SEQMINER implements features to: annotate sequence variants and summary association statistics; efficiently retrieve summary association statistics and their variance‐covariance information specific to genetic regions of interest; and collate summary association statistics and their covariance information from multiple studies. Detailed descriptions for these functionalities are given below.

Annotate Sequence Datasets and Summary Association Statistics

Annotation information is necessary for the analysis and interpretation of sequence‐based association. For example, annotation information is needed to determine the analysis unit in gene‐level association tests (i.e., nonsynonymous variants within a gene). Bioinformatics databases such as functional prediction scores were shown to be helpful in prioritizing causal variants and improving power for detecting genotype‐phenotype associations [Price et al., 2010]. SEQMINER implements comprehensive features for annotating sequence variants and summary association statistics. The package is designed to take generic tab‐delimited files as input, which saves ad hoc efforts from users for preparing intermediate input files. Supported file formats include sequence variant genotypes in VCF/BCF format [Danecek et al., 2011] or files of summary association statistics in METAL [Willer et al., 2010] or RAREMETAL format [Feng et al., 2014; Liu et al., 2014]. To integrate information, SEQMINER preprocesses and augments the input sequence dataset with annotation information. The integrated dataset is stored in the same format as the input dataset, which remains compatible with external tools and can be indexed by tabix [Li, 2011]. Subsequent queries can be performed directly on the integrated dataset. Two types of variant annotations are supported, gene‐based annotation and region‐based annotation. For gene‐based annotation, sequence variants are annotated by their induced changes on the amino acid. A variety of gene/transcript definitions are supported, including UCSC KnownGenes, RefSeq, and GENCODE. In region‐based annotation, genomic regions of interest (e.g., transcription factor binding sites, known GWAS signals, etc.) are listed by chromosomal position in the BED files. Sequence variants are annotated by whether they overlap with these genomic regions. Using the functionality of region‐based annotation, SEQMINER can integrate numerous bioinformatics databases, e.g., PolyPhen2, SIFT, GERP scores, etc. Finally, SEQMINER allows for the efficient generation of integrated datasets, combining sequence variants with annotation information. This feature is critical for many statistical genetics analyses where annotation information needs to be re‐used repeatedly.

Efficient Retrieval of Summary Association Statistics and Sequence Variants

SEQMINER allows efficient queries for tabix‐indexed sequence datasets (either preprocessed or generic). Built‐in functions in SEQMINER implement a variety of frequently used queries, including extracting summary association statistics by genomic position, gene names, or annotation types. For example, it takes only one command to extract genotype (GT) information for synonymous variants (NS) in a given gene, as well as the allele frequency (AF), allele count (AC), and the positions (CHROM, POS) of the variants readVCFToListByGene (fileName, geneFile, geneName="CFH", annoType="Synonymous", vcfColumn=c("CHROM", "POS"), vcfInfo = c("AF", "AC"), vcfIndv=c("GT")). In this function, vcfColumn allows users to specify descriptive columns in VCF files to be extracted. These columns include CHROM, POS, and ID. The option vcfInfo allows users to choose fields in the INFO field from VCF files. These fields may include (but are not limited to) allele frequencies (AF), annotation information (ANNO), etc. Finally, vcfIndv allows users to specify fields defined in the FORMAT column to be extracted. Examples include genotypes (GT), genotype likelihoods (GL), allelic depth (AD), etc. It also requires only one command to retrieve summary association statistics and covariance information between variants from files in RAREMETAL format. rvmeta.readDataByRange(scoreTestFiles, covFiles, tabixRanges) Extracted information will be automatically parsed and stored in standard R objects (e.g., list, matrix) for downstream statistical analysis. By leveraging the programming environment in R, the queries can be flexibly refined.

Collate Summary Association Statistics from Multiple Studies

There is considerable interest in the field in interrogating genetic variants with pleiotropic effects [Giambartolomei et al., 2014; Hu et al., 2013; Lee et al., 2013; Tang and Lin, 2014, 2013], examining if genetic effects vary across cohorts/ethnic groups [Wen and Stephens, 2014], performing meta‐analyses that combine results from multiple studies [Liu et al., 2014] or implementing Mendelian randomization experiments by joint analyses of genetic associations with risk factors and disease outcomes [Do et al., 2013; Voight et al., 2012]. These research questions all require joint analysis of multiple sets of summary association statistics and their covariance information. Multiple studies may not have the same set of genetic variants genotyped (particularly for sequencing studies, where different variant sites are called in each study). It can be a nontrivial task to randomly access a large number of files of summary association statistics and covariance matrices, efficiently retrieve information specific to a genetic region of interest, and collate variant sites between studies. A great amount of ad hoc scripting may be needed. To address this research need, SEQMINER is designed to read and process multiple files of summary association statistics. Loaded data will be automatically parsed, stored in standard R objects and made ready by downstream analyses. This functionality has been extensively used to implement methods for meta‐analyses of gene‐level association tests. Since its release, SEQMINER has been used in several large‐scale meta‐analyses of complex traits, including lipid levels, anthropometric traits, smoking and drinking addictions, etc.

Algorithmic Optimization

We implemented a series of algorithmic optimizations to improve the performance of SEQMINER: first, SEQMINER supports directly reading/writing compressed and tabix‐indexed files. To support efficient random information retrieval from large data files, we incorporated and extended the tabix library into SEQMINER. Tabix proceeds by indexing blocks of compressed data files (bgzip) format. Using the binning index and linear index, the tabix library allows the quick location of the sequence data from disks that overlap the query interval. This design allows the retrieval of sequence information at a time complexity of O(log(N)). The original tabix library only allows storing all retrieved information as strings, which cannot be directly analyzed in R. In SEQMINER, we extended tabix and implemented features to randomly access files in METAL/RAREMETAL format. Retrieved information is automatically parsed, converted to the appropriate data types (as strings, floating numbers, etc.) and made available for analysis as standard R objects, e.g., list or data frames. Second, SEQMINER implements novel data structures for storing bioinformatics databases to speed up annotations and query. To support efficient region‐based variant annotations, we used a segment tree data structure, enabling queries of O(log(N))time complexity and construction of O(N log(N))time and space complexity where N is the number of regions. Specifically, we constructed a red‐black tree and stored every region (start position, end position, and region names) in a tree leaf. These regions are ordered internally by their start positions and then end positions. As the tree is balanced, querying genomic variants requires at most log2(N) comparisons. This data structure has high performance in practice, and it enables the online annotation of range‐based databases such as transcription factor binding sites. Lastly, SEQMINER provides a user‐friendly R interface that is easy for new users and utilizes C++ for all computationally intensive or I/O intensive queries. Algorithms described above are carefully implemented in C++ and will be compiled and optimized during package installation. These combine the high flexibility of R with the high performance of C++.

Results

Performance for Annotating Summary Association Statistics

We evaluated the performance of SEQMINER for annotating and querying summary association statistics using datasets from the 1000 Genomes phase 1 project. The call set consisted of 1,092 individuals genotyped at ∼39 million variants. We simulated phenotypes under the null hypothesis of no genotype‐phenotype associations. Summary association statistics were generated using RVTESTS [https://github.com/zhanxw/rvtests]. The output from RVTESTS is automatically bgzip‐compressed and tabix‐indexed. The file for single variant association statistics after compression is 6.2 GB, and the file for the covariance matrix between single variant score statistics after compression is 266 GB. We evaluated the performance for annotating summary association statistics. The whole dataset was annotated as a plain text file using the function seqminer::annotatePlain. The annotation for the whole genome dataset took ∼2 CPU hours with ∼63 MB RAM. The software is thus highly scalable for very large datasets.

Performance of Retrieving Summary Association Statistics

SEQMINER supports random access and retrieval of summary association statistics that are stored in tab‐delimited files and organized by chromosomal positions. In particular, it supports both METAL and RAREMETAL formats. First, we used SEQMINER to query tabix‐indexed files of summary association statistics (as described in section Performance for Annotating Summary Association Statistics). For the file containing single variant association statistics for 39 million variants, SEQMINER took 0.3 sec to retrieve summary association statistics using the function seqminer::rvmeta.readDatabyRange. Not surprisingly, reading the entire file into R (using either data.table::fread or read.table) and performing in‐memory query took considerably longer and required a much larger memory footprint (Table 1). While SEQMINER's main advantage lies in its capability to randomly access files of summary association statistics, and it is often not necessary to read the entire file into the memory, we also compared the speed of SEQMINER in reading the entire file as a benchmark. Using SEQMINER's tabix.read.table function to read the entire file into memory in R took 7.5 min, which is ∼4% faster than R's read.table command.

Table 1

Comparison of querying files of summary association statistics

Function	Task	Time complexity	Memory complexity
seqminer::rvmeta.readDataByRange	Retrieve summary association statistics from 100 randomly chosen regions	0.32 sec	550 KB
seqminer::rvmeta.readDataByRange	Retrieve summary association statistics from 100 randomly chosen regions. Also retrieve covariance matrix between these summary association statistics	1.12 sec	1.3 MB
data.table::fread	Read entire file of summary association statistics into memory	7.8 min	103 MB
data.table::fread	Read entire file of summary association statistics and their covariance matrix into memory	34.6 hr	263 GB

We compared the performance of SEQMINER for querying files of summary association statistics and files of correlations coefficients between summary association statistics.

Comparison of querying files of summary association statistics We compared the performance of SEQMINER for querying files of summary association statistics and files of correlations coefficients between summary association statistics. Second, the package is also very efficient when applied to retrieve correlation information between pairs of sequence variants. Retrieving correlation information between pairs of sequence variants for 100 randomly selected genes required only 1.13 sec. Reading all the covariance files into memory using data.table::fread and parsing them took over 35 hr, an unrealistic amount of time given the size of the file (266 GB).

Performance for Annotating and Querying VCF Files

As a companion feature, SEQMINER also supports annotating and querying VCF files of sequence variant genotype calls. To our knowledge, there is one R package VariantAnnotation that supports annotating and retrieving sequence variants from VCF/BCF files. VariantAnnotation relies on a variety of the Bioconductor packages to query and annotate VCF files. The package, however, was not designed to annotate and query summary association statistics. Our package performs competitively in its shared features of annotating and querying VCF files. We compared the efficiency of VariantAnnotation and SEQMINER for annotating large‐scale datasets and extracting genetic regions of interest (Tables 2 and 3): first, using 1,092 samples from the 1000 Genomes Project, we compared time and memory efficiency for annotating whole chromosome variants. In all scenarios examined, SEQMINER was >20× faster and required ∼100‐fold less memory than VariantAnnotation. We then benchmarked the extraction of nonsynonymous variants from 100 randomly selected genes. VariantAnnotation used 23.0 min and 1,095 MB memory, while SEQMINER used 1.3 min and 37 MB memory. In all scenarios considered, SEQMINER exhibited advantages in time and memory efficiency.

Table 2

Comparison of time and memory complexity for annotating sequence variants

Tools	Chunk size	Time (second)	Memory (kilobytes)
SEQMINER	Entire chromosome	8,371	63,072
VariantAnnotation	5,000	144,403	8,364,784
	10,000	125,748	16,236,324
	20,000	116,078	30,078,896

We benchmarked the performance of SEQMINER and VariantAnnotation for annotating sequence variants in chromosome 1 from the 1000 Genomes Project phase 1 datasets. Annotation by SEQMINER was done using function annotateVCF. VariantAnnotation cannot analyze chromosome 1 in one batch due to memory constraints. We compared the performance of VariantAnnotation by dividing the chromosome 1 dataset into chunks and annotating each chunk separately. For measuring memory consumption, we recorded peak memory usage. Cumulative time is recorded for annotating the entire chromosome.

Table 3

Comparison of time and memory complexity for querying selected genes/ranges

Tool	Task	Time (seconds)	Memory (kilobytes)
SEQMINER	Extract 100 randomly	76	37,948
VariantAnnotation	Selected ranges	1,313	1,122,204
SEQMINER	Extract 100 randomly	462	59,736
VariantAnnotation	Selected genes	1,718	1,461,404

We compared the performance of SEQMINER and VariantAnnotation in extracting nonsynonymous variants from 100 randomly selected genes or ranges. Whole genome datasets from the 1000 Genomes Project phase 1 were used. To extract randomly selected genes, we used readVCFToListByGene function in SEQMINER. For VariantAnnotation, we first determined the genomic ranges for each gene and extract variants within these genomic ranges. We then predicted the function of retrieved variants and select the subset of variants that were nonsynonymous.

Comparison of time and memory complexity for annotating sequence variants We benchmarked the performance of SEQMINER and VariantAnnotation for annotating sequence variants in chromosome 1 from the 1000 Genomes Project phase 1 datasets. Annotation by SEQMINER was done using function annotateVCF. VariantAnnotation cannot analyze chromosome 1 in one batch due to memory constraints. We compared the performance of VariantAnnotation by dividing the chromosome 1 dataset into chunks and annotating each chunk separately. For measuring memory consumption, we recorded peak memory usage. Cumulative time is recorded for annotating the entire chromosome. Comparison of time and memory complexity for querying selected genes/ranges We compared the performance of SEQMINER and VariantAnnotation in extracting nonsynonymous variants from 100 randomly selected genes or ranges. Whole genome datasets from the 1000 Genomes Project phase 1 were used. To extract randomly selected genes, we used readVCFToListByGene function in SEQMINER. For VariantAnnotation, we first determined the genomic ranges for each gene and extract variants within these genomic ranges. We then predicted the function of retrieved variants and select the subset of variants that were nonsynonymous.

Conclusions

In summary, we implemented an efficient software package, SEQMINER, to facilitate the analysis and interpretation of genotype‐phenotype association summary statistics. We showed that the tools can scale well to large datasets with millions of variants. SEQMINER provides a useful platform where new methods can be developed and distributed. Since its release, the software package has contributed to several large‐scale meta‐analyses of complex traits, including lipid levels, height, and body mass index, as well as smoking and alcohol addictions. We envision that the software package will continue to be extremely valuable for interpreting genotype‐phenotype association results in the sequencing era. SUPPLEMENTAL MATERIAL Click here for additional data file.

17 in total

1. Pooled association tests for rare variants in exon-resequencing studies.

Authors: Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev
Journal: Am J Hum Genet Date: 2010-05-13 Impact factor: 11.025

2. General framework for meta-analysis of rare variants in sequencing association studies.

Authors: Seunggeun Lee; Tanya M Teslovich; Michael Boehnke; Xihong Lin
Journal: Am J Hum Genet Date: 2013-06-13 Impact factor: 11.025

3. METAL: fast and efficient meta-analysis of genomewide association scans.

Authors: Cristen J Willer; Yun Li; Gonçalo R Abecasis
Journal: Bioinformatics Date: 2010-07-08 Impact factor: 6.937

4. The variant call format and VCFtools.

Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937

5. RAREMETAL: fast and powerful meta-analysis for rare variants.

Authors: Shuang Feng; Dajiang Liu; Xiaowei Zhan; Mary Kate Wing; Gonçalo R Abecasis
Journal: Bioinformatics Date: 2014-06-03 Impact factor: 6.937

6. Integrating functional data to prioritize causal variants in statistical fine-mapping studies.

Authors: Gleb Kichaev; Wen-Yun Yang; Sara Lindstrom; Farhad Hormozdiari; Eleazar Eskin; Alkes L Price; Peter Kraft; Bogdan Pasaniuc
Journal: PLoS Genet Date: 2014-10-30 Impact factor: 5.917

7. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

Authors: Yi-Juan Hu; Sonja I Berndt; Stefan Gustafsson; Andrea Ganna; Joel Hirschhorn; Kari E North; Erik Ingelsson; Dan-Yu Lin
Journal: Am J Hum Genet Date: 2013-07-25 Impact factor: 11.025

8. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

9. Common variants associated with plasma triglycerides and risk for coronary artery disease.

Authors: Ron Do; Cristen J Willer; Ellen M Schmidt; Sebanti Sengupta; Chi Gao; Gina M Peloso; Stefan Gustafsson; Stavroula Kanoni; Andrea Ganna; Jin Chen; Martin L Buchkovich; Samia Mora; Jacques S Beckmann; Jennifer L Bragg-Gresham; Hsing-Yi Chang; Ayşe Demirkan; Heleen M Den Hertog; Louise A Donnelly; Georg B Ehret; Tõnu Esko; Mary F Feitosa; Teresa Ferreira; Krista Fischer; Pierre Fontanillas; Ross M Fraser; Daniel F Freitag; Deepti Gurdasani; Kauko Heikkilä; Elina Hyppönen; Aaron Isaacs; Anne U Jackson; Asa Johansson; Toby Johnson; Marika Kaakinen; Johannes Kettunen; Marcus E Kleber; Xiaohui Li; Jian'an Luan; Leo-Pekka Lyytikäinen; Patrik K E Magnusson; Massimo Mangino; Evelin Mihailov; May E Montasser; Martina Müller-Nurasyid; Ilja M Nolte; Jeffrey R O'Connell; Cameron D Palmer; Markus Perola; Ann-Kristin Petersen; Serena Sanna; Richa Saxena; Susan K Service; Sonia Shah; Dmitry Shungin; Carlo Sidore; Ci Song; Rona J Strawbridge; Ida Surakka; Toshiko Tanaka; Tanya M Teslovich; Gudmar Thorleifsson; Evita G Van den Herik; Benjamin F Voight; Kelly A Volcik; Lindsay L Waite; Andrew Wong; Ying Wu; Weihua Zhang; Devin Absher; Gershim Asiki; Inês Barroso; Latonya F Been; Jennifer L Bolton; Lori L Bonnycastle; Paolo Brambilla; Mary S Burnett; Giancarlo Cesana; Maria Dimitriou; Alex S F Doney; Angela Döring; Paul Elliott; Stephen E Epstein; Gudmundur Ingi Eyjolfsson; Bruna Gigante; Mark O Goodarzi; Harald Grallert; Martha L Gravito; Christopher J Groves; Göran Hallmans; Anna-Liisa Hartikainen; Caroline Hayward; Dena Hernandez; Andrew A Hicks; Hilma Holm; Yi-Jen Hung; Thomas Illig; Michelle R Jones; Pontiano Kaleebu; John J P Kastelein; Kay-Tee Khaw; Eric Kim; Norman Klopp; Pirjo Komulainen; Meena Kumari; Claudia Langenberg; Terho Lehtimäki; Shih-Yi Lin; Jaana Lindström; Ruth J F Loos; François Mach; Wendy L McArdle; Christa Meisinger; Braxton D Mitchell; Gabrielle Müller; Ramaiah Nagaraja; Narisu Narisu; Tuomo V M Nieminen; Rebecca N Nsubuga; Isleifur Olafsson; Ken K Ong; Aarno Palotie; Theodore Papamarkou; Cristina Pomilla; Anneli Pouta; Daniel J Rader; Muredach P Reilly; Paul M Ridker; Fernando Rivadeneira; Igor Rudan; Aimo Ruokonen; Nilesh Samani; Hubert Scharnagl; Janet Seeley; Kaisa Silander; Alena Stančáková; Kathleen Stirrups; Amy J Swift; Laurence Tiret; Andre G Uitterlinden; L Joost van Pelt; Sailaja Vedantam; Nicholas Wainwright; Cisca Wijmenga; Sarah H Wild; Gonneke Willemsen; Tom Wilsgaard; James F Wilson; Elizabeth H Young; Jing Hua Zhao; Linda S Adair; Dominique Arveiler; Themistocles L Assimes; Stefania Bandinelli; Franklyn Bennett; Murielle Bochud; Bernhard O Boehm; Dorret I Boomsma; Ingrid B Borecki; Stefan R Bornstein; Pascal Bovet; Michel Burnier; Harry Campbell; Aravinda Chakravarti; John C Chambers; Yii-Der Ida Chen; Francis S Collins; Richard S Cooper; John Danesh; George Dedoussis; Ulf de Faire; Alan B Feranil; Jean Ferrières; Luigi Ferrucci; Nelson B Freimer; Christian Gieger; Leif C Groop; Vilmundur Gudnason; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Aroon Hingorani; Joel N Hirschhorn; Albert Hofman; G Kees Hovingh; Chao Agnes Hsiung; Steve E Humphries; Steven C Hunt; Kristian Hveem; Carlos Iribarren; Marjo-Riitta Järvelin; Antti Jula; Mika Kähönen; Jaakko Kaprio; Antero Kesäniemi; Mika Kivimaki; Jaspal S Kooner; Peter J Koudstaal; Ronald M Krauss; Diana Kuh; Johanna Kuusisto; Kirsten O Kyvik; Markku Laakso; Timo A Lakka; Lars Lind; Cecilia M Lindgren; Nicholas G Martin; Winfried März; Mark I McCarthy; Colin A McKenzie; Pierre Meneton; Andres Metspalu; Leena Moilanen; Andrew D Morris; Patricia B Munroe; Inger Njølstad; Nancy L Pedersen; Chris Power; Peter P Pramstaller; Jackie F Price; Bruce M Psaty; Thomas Quertermous; Rainer Rauramaa; Danish Saleheen; Veikko Salomaa; Dharambir K Sanghera; Jouko Saramies; Peter E H Schwarz; Wayne H-H Sheu; Alan R Shuldiner; Agneta Siegbahn; Tim D Spector; Kari Stefansson; David P Strachan; Bamidele O Tayo; Elena Tremoli; Jaakko Tuomilehto; Matti Uusitupa; Cornelia M van Duijn; Peter Vollenweider; Lars Wallentin; Nicholas J Wareham; John B Whitfield; Bruce H R Wolffenbuttel; David Altshuler; Jose M Ordovas; Eric Boerwinkle; Colin N A Palmer; Unnur Thorsteinsdottir; Daniel I Chasman; Jerome I Rotter; Paul W Franks; Samuli Ripatti; L Adrienne Cupples; Manjinder S Sandhu; Stephen S Rich; Michael Boehnke; Panos Deloukas; Karen L Mohlke; Erik Ingelsson; Goncalo R Abecasis; Mark J Daly; Benjamin M Neale; Sekar Kathiresan
Journal: Nat Genet Date: 2013-10-06 Impact factor: 38.330

10. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

Authors: Claudia Giambartolomei; Damjan Vukcevic; Eric E Schadt; Lude Franke; Aroon D Hingorani; Chris Wallace; Vincent Plagnol
Journal: PLoS Genet Date: 2014-05-15 Impact factor: 5.917

19 in total

1. The Tumor Suppressor ARID1A Controls Global Transcription via Pausing of RNA Polymerase II.

Authors: Marco Trizzino; Elisa Barbieri; Ana Petracovici; Shuai Wu; Sarah A Welsh; Tori A Owens; Silvia Licciulli; Rugang Zhang; Alessandro Gardini
Journal: Cell Rep Date: 2018-06-26 Impact factor: 9.423

2. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data.

Authors: Xiaowei Zhan; Youna Hu; Bingshan Li; Goncalo R Abecasis; Dajiang J Liu
Journal: Bioinformatics Date: 2016-02-15 Impact factor: 6.937

3. Exome-wide association study of plasma lipids in >300,000 individuals.

Authors: Dajiang J Liu; Gina M Peloso; Haojie Yu; Adam S Butterworth; Xiao Wang; Anubha Mahajan; Danish Saleheen; Connor Emdin; Dewan Alam; Alexessander Couto Alves; Philippe Amouyel; Emanuele Di Angelantonio; Dominique Arveiler; Themistocles L Assimes; Paul L Auer; Usman Baber; Christie M Ballantyne; Lia E Bang; Marianne Benn; Joshua C Bis; Michael Boehnke; Eric Boerwinkle; Jette Bork-Jensen; Erwin P Bottinger; Ivan Brandslund; Morris Brown; Fabio Busonero; Mark J Caulfield; John C Chambers; Daniel I Chasman; Y Eugene Chen; Yii-Der Ida Chen; Rajiv Chowdhury; Cramer Christensen; Audrey Y Chu; John M Connell; Francesco Cucca; L Adrienne Cupples; Scott M Damrauer; Gail Davies; Ian J Deary; George Dedoussis; Joshua C Denny; Anna Dominiczak; Marie-Pierre Dubé; Tapani Ebeling; Gudny Eiriksdottir; Tõnu Esko; Aliki-Eleni Farmaki; Mary F Feitosa; Marco Ferrario; Jean Ferrieres; Ian Ford; Myriam Fornage; Paul W Franks; Timothy M Frayling; Ruth Frikke-Schmidt; Lars G Fritsche; Philippe Frossard; Valentin Fuster; Santhi K Ganesh; Wei Gao; Melissa E Garcia; Christian Gieger; Franco Giulianini; Mark O Goodarzi; Harald Grallert; Niels Grarup; Leif Groop; Megan L Grove; Vilmundur Gudnason; Torben Hansen; Tamara B Harris; Caroline Hayward; Joel N Hirschhorn; Oddgeir L Holmen; Jennifer Huffman; Yong Huo; Kristian Hveem; Sehrish Jabeen; Anne U Jackson; Johanna Jakobsdottir; Marjo-Riitta Jarvelin; Gorm B Jensen; Marit E Jørgensen; J Wouter Jukema; Johanne M Justesen; Pia R Kamstrup; Stavroula Kanoni; Fredrik Karpe; Frank Kee; Amit V Khera; Derek Klarin; Heikki A Koistinen; Jaspal S Kooner; Charles Kooperberg; Kari Kuulasmaa; Johanna Kuusisto; Markku Laakso; Timo Lakka; Claudia Langenberg; Anne Langsted; Lenore J Launer; Torsten Lauritzen; David C M Liewald; Li An Lin; Allan Linneberg; Ruth J F Loos; Yingchang Lu; Xiangfeng Lu; Reedik Mägi; Anders Malarstig; Ani Manichaikul; Alisa K Manning; Pekka Mäntyselkä; Eirini Marouli; Nicholas G D Masca; Andrea Maschio; James B Meigs; Olle Melander; Andres Metspalu; Andrew P Morris; Alanna C Morrison; Antonella Mulas; Martina Müller-Nurasyid; Patricia B Munroe; Matt J Neville; Jonas B Nielsen; Sune F Nielsen; Børge G Nordestgaard; Jose M Ordovas; Roxana Mehran; Christoper J O'Donnell; Marju Orho-Melander; Cliona M Molony; Pieter Muntendam; Sandosh Padmanabhan; Colin N A Palmer; Dorota Pasko; Aniruddh P Patel; Oluf Pedersen; Markus Perola; Annette Peters; Charlotta Pisinger; Giorgio Pistis; Ozren Polasek; Neil Poulter; Bruce M Psaty; Daniel J Rader; Asif Rasheed; Rainer Rauramaa; Dermot F Reilly; Alex P Reiner; Frida Renström; Stephen S Rich; Paul M Ridker; John D Rioux; Neil R Robertson; Dan M Roden; Jerome I Rotter; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Serena Sanna; Naveed Sattar; Ellen M Schmidt; Robert A Scott; Peter Sever; Raquel S Sevilla; Christian M Shaffer; Xueling Sim; Suthesh Sivapalaratnam; Kerrin S Small; Albert V Smith; Blair H Smith; Sangeetha Somayajula; Lorraine Southam; Timothy D Spector; Elizabeth K Speliotes; John M Starr; Kathleen E Stirrups; Nathan Stitziel; Konstantin Strauch; Heather M Stringham; Praveen Surendran; Hayato Tada; Alan R Tall; Hua Tang; Jean-Claude Tardif; Kent D Taylor; Stella Trompet; Philip S Tsao; Jaakko Tuomilehto; Anne Tybjaerg-Hansen; Natalie R van Zuydam; Anette Varbo; Tibor V Varga; Jarmo Virtamo; Melanie Waldenberger; Nan Wang; Nick J Wareham; Helen R Warren; Peter E Weeke; Joshua Weinstock; Jennifer Wessel; James G Wilson; Peter W F Wilson; Ming Xu; Hanieh Yaghootkar; Robin Young; Eleftheria Zeggini; He Zhang; Neil S Zheng; Weihua Zhang; Yan Zhang; Wei Zhou; Yanhua Zhou; Magdalena Zoledziewska; Joanna M M Howson; John Danesh; Mark I McCarthy; Chad A Cowan; Goncalo Abecasis; Panos Deloukas; Kiran Musunuru; Cristen J Willer; Sekar Kathiresan
Journal: Nat Genet Date: 2017-10-30 Impact factor: 38.330

4. Hidden genomic MHC disparity between HLA-matched sibling pairs in hematopoietic stem cell transplantation.

Authors: Satu Koskela; Jarmo Ritari; Kati Hyvärinen; Tony Kwan; Riitta Niittyvuopio; Maija Itälä-Remes; Tomi Pastinen; Jukka Partanen
Journal: Sci Rep Date: 2018-03-29 Impact factor: 4.379

5. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association.

Authors: Kelsey E Grinde; Jaron Arbet; Alden Green; Michael O'Connell; Alessandra Valcarcel; Jason Westra; Nathan Tintle
Journal: Front Genet Date: 2017-09-14 Impact factor: 4.599

Review 6. Making Sense of the Epigenome Using Data Integration Approaches.

Authors: Emma Cazaly; Joseph Saad; Wenyu Wang; Caroline Heckman; Miina Ollikainen; Jing Tang
Journal: Front Pharmacol Date: 2019-02-19 Impact factor: 5.810

7. Exome Chip Meta-analysis Fine Maps Causal Variants and Elucidates the Genetic Architecture of Rare Coding Variants in Smoking and Alcohol Use.

Authors: David M Brazel; Yu Jiang; Jordan M Hughey; Valérie Turcot; Xiaowei Zhan; Jian Gong; Chiara Batini; J Dylan Weissenkampen; MengZhen Liu; Daniel R Barnes; Sarah Bertelsen; Yi-Ling Chou; A Mesut Erzurumluoglu; Jessica D Faul; Jeff Haessler; Anke R Hammerschlag; Chris Hsu; Manav Kapoor; Dongbing Lai; Nhung Le; Christiaan A de Leeuw; Anu Loukola; Massimo Mangino; Carl A Melbourne; Giorgio Pistis; Beenish Qaiser; Rebecca Rohde; Yaming Shao; Heather Stringham; Leah Wetherill; Wei Zhao; Arpana Agrawal; Laura Bierut; Chu Chen; Charles B Eaton; Alison Goate; Christopher Haiman; Andrew Heath; William G Iacono; Nicholas G Martin; Tinca J Polderman; Alex Reiner; John Rice; David Schlessinger; H Steven Scholte; Jennifer A Smith; Jean-Claude Tardif; Hilary A Tindle; Andries R van der Leij; Michael Boehnke; Jenny Chang-Claude; Francesco Cucca; Sean P David; Tatiana Foroud; Joanna M M Howson; Sharon L R Kardia; Charles Kooperberg; Markku Laakso; Guillaume Lettre; Pamela Madden; Matt McGue; Kari North; Danielle Posthuma; Timothy Spector; Daniel Stram; Martin D Tobin; David R Weir; Jaakko Kaprio; Gonçalo R Abecasis; Dajiang J Liu; Scott Vrieze
Journal: Biol Psychiatry Date: 2018-12-06 Impact factor: 13.382

8. An adaptive test for meta-analysis of rare variant association studies.

Authors: Tianzhong Yang; Junghi Kim; Chong Wu; Yiding Ma; Peng Wei; Wei Pan
Journal: Genet Epidemiol Date: 2019-12-12 Impact factor: 2.135

9. The Mega2R package: R tools for accessing and processing genetic data in common formats.

Authors: Robert V Baron; Justin R Stickel; Daniel E Weeks
Journal: F1000Res Date: 2018-08-29

10. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use.

Authors: Mengzhen Liu; Yu Jiang; Robbee Wedow; Yue Li; David M Brazel; Fang Chen; Gargi Datta; Jose Davila-Velderrain; Daniel McGuire; Chao Tian; Xiaowei Zhan; Hélène Choquet; Anna R Docherty; Jessica D Faul; Johanna R Foerster; Lars G Fritsche; Maiken Elvestad Gabrielsen; Scott D Gordon; Jeffrey Haessler; Jouke-Jan Hottenga; Hongyan Huang; Seon-Kyeong Jang; Philip R Jansen; Yueh Ling; Reedik Mägi; Nana Matoba; George McMahon; Antonella Mulas; Valeria Orrù; Teemu Palviainen; Anita Pandit; Gunnar W Reginsson; Anne Heidi Skogholt; Jennifer A Smith; Amy E Taylor; Constance Turman; Gonneke Willemsen; Hannah Young; Kendra A Young; Gregory J M Zajac; Wei Zhao; Wei Zhou; Gyda Bjornsdottir; Jason D Boardman; Michael Boehnke; Dorret I Boomsma; Chu Chen; Francesco Cucca; Gareth E Davies; Charles B Eaton; Marissa A Ehringer; Tõnu Esko; Edoardo Fiorillo; Nathan A Gillespie; Daniel F Gudbjartsson; Toomas Haller; Kathleen Mullan Harris; Andrew C Heath; John K Hewitt; Ian B Hickie; John E Hokanson; Christian J Hopfer; David J Hunter; William G Iacono; Eric O Johnson; Yoichiro Kamatani; Sharon L R Kardia; Matthew C Keller; Manolis Kellis; Charles Kooperberg; Peter Kraft; Kenneth S Krauter; Markku Laakso; Penelope A Lind; Anu Loukola; Sharon M Lutz; Pamela A F Madden; Nicholas G Martin; Matt McGue; Matthew B McQueen; Sarah E Medland; Andres Metspalu; Karen L Mohlke; Jonas B Nielsen; Yukinori Okada; Ulrike Peters; Tinca J C Polderman; Danielle Posthuma; Alexander P Reiner; John P Rice; Eric Rimm; Richard J Rose; Valgerdur Runarsdottir; Michael C Stallings; Alena Stančáková; Hreinn Stefansson; Khanh K Thai; Hilary A Tindle; Thorarinn Tyrfingsson; Tamara L Wall; David R Weir; Constance Weisner; John B Whitfield; Bendik Slagsvold Winsvold; Jie Yin; Luisa Zuccolo; Laura J Bierut; Kristian Hveem; James J Lee; Marcus R Munafò; Nancy L Saccone; Cristen J Willer; Marilyn C Cornelis; Sean P David; David A Hinds; Eric Jorgenson; Jaakko Kaprio; Jerry A Stitzel; Kari Stefansson; Thorgeir E Thorgeirsson; Gonçalo Abecasis; Dajiang J Liu; Scott Vrieze
Journal: Nat Genet Date: 2019-01-14 Impact factor: 38.330