Literature DB >> 23138309

Interpreting noncoding genetic variation in complex traits and human disease.

Abstract

Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. This picture has changed with advances in the systematic annotation of functional noncoding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs and molecular quantitative trait loci all provide complementary information about the function of noncoding sequences. These functional maps can help with prioritizing variants on risk haplotypes, filtering mutations encountered in the clinic and performing systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable data-set integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis and treatment.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：
RNA, Untranslated

Year: 2012 PMID： 23138309 PMCID： PMC3703467 DOI： 10.1038/nbt.2422

Source DB: PubMed Journal: Nat Biotechnol ISSN： 1087-0156 Impact factor: 54.908

Understanding the genetic basis of disease can revolutionize medicine by elucidating relevant biochemical pathways for drug targets and by enabling personalized risk assessments[1,2]. As technologies evolved over the past century, geneticists are no longer limited to studying Mendelian disorders and can tackle complex phenotypes. The resulting discovered associations have broadened from individual variants primarily in coding regions to much richer disease architectures, including non-coding variants, wider allelic spectra, numerous loci, and weak effect sizes (Table 1). In the last few years, a new wave of technological advances has intensified the shift towards tackling more complex genetic architectures and uncovering the molecular mechanisms underlying them.

Table 1

The diversity of genetic architectures underlying human phenotypes.

Architecture	Notes	Role of computational and regulatory genomics
Classic monogenic traits	The earliest human genes characterized were those leading to inborn errors in metabolism, which were shown by Garrod in the early 1900s to follow Mendelian inheritance[140,141]. The modern study of human disease genes began with the cloning of loci responsible for high-penetrance monogenic disorders with Mendelian inheritance patterns, such as phenylketonuria and cystic fibrosis[140,142,143], that were most amenable to classical mapping approaches. Variants associated with monogenic traits were also the first to be identified through positional cloning in the 1980s, a classic success being the CFTR mutations responsible for most cases of cystic fibrosis[3,142,143].	As the underlying mutations tend to alter protein structure, the computational challenge in predicting their effect lies in molecular modeling and structural studies.
Monogenic traits with multiple disease alleles	Even monogenetic diseases differ greatly in the extent to which a single risk allele predominates among affected individuals (allelic heterogeneity). On one end of the spectrum, the F508del allele of CFTR is found in about 70% of patients with cystic fibrosis[144], even though thousands of alleles are known. In contrast, phenylketonuria is extremely heterogeneous, with different PAH alleles predominating among affected individuals in different populations[145]. A majority of mutations in this class are missense or nonsense coding mutations[3].	As noted above, for protein-coding mutations, the relevant problem is predicting the biochemical effect of the amino acid substitution. In cases of allele heterogeneity, the observed substitutions may be too numerous to characterize experimentally, necessitating computational models (Fig. 3c).
Multiple loci with independent contributions (“oligogenetic”)	Many variants increase or decrease the risk of a disease, with the final phenotype relying on the genotype at many loci (locus heterogeneity). One example well-studied through linkage analysis is Hirschprung disease, a complex disorder with low sex-dependent penetrance for which at least ten genes are involved, including the tyrosine kinase receptor RET and the gene GDNF which encodes its ligand[146]. Interestingly, the most common variant in the main susceptibility gene RET is non-coding, a single-nucleotide polymorphism (SNP) in an enhancer. Both coding and non-coding variants are involved typically in one or a small number of well-defined pathways.	Oligogenetic traits, in which a handful of well-characterized loci contribute to the phenotype, may present the best opportunity to observe and quantify epistatic interactions. In cases where non-coding regions are implicated, these haplotypes can be functionally mapped to isolate the most likely causal variants (Fig. 2).
Large numbers of variants jointly contributing weakly to a complex trait	GWAS on complex traits are also discovering many weakly-contributing loci. For example a recent meta-analysis of several height studies found 180 loci reaching genome-wide significance[15,103,139], enriched near genes already known to underlie skeletal growth defects. In the height study and in a study of psychiatric disorders, it has been shown that polygenic association extends to thousands of common variants, extending far beyond genome-wide significant loci[135,139]	In contrast to the variants underlying monogenic traits, the variants involved in complex traits are overwhelmingly not associated with missense or nonsense coding mutations, suggesting that their mechanisms are primarily regulatory[11]. Large sets of regulatory variants can be combined with reference annotations to elucidate relevant pathways and tissues (Fig. 3b, Table 5).
Variants regulating a “molecular trait” with unknown effect on organismal phenotype or fitness	Variants are rapidly being discovered that directly affect molecular quantitative traits, such as gene expression or chromatin state, many of which may have no effect on organismal phenotype or fitness[38].	QTL and allele-specific analyses are needed to characterize these variants (Fig. 1b,c). As the studies performed to date sample only a small fraction of the cell types in which a variant may have an effect, and variant-expression associations are highly tissue-specific[147], it is possible that many such regulatory variants remain to be discovered.
Variants causing no known molecular phenotype and no effect on organismal phenotype or fitness	The idea that the majority of mutations are neutral from an adaptive perspective was controversial when first proposed, and now is widely accepted[148–150].	Although it is straightforward to calculate from the genetic code what fraction of protein-coding mutations will cause an amino acid change, an analogous estimate for other molecular phenotypes is far more challenging and requires comprehensive regulatory models at the nucleotide level.
Private and somatic variants	Somatic mutations within an organism are frequent driver mutations selected in cancer formation[151].	The interpretation of private and somatic variations (Fig 3d) will also benefit tremendously from a systematic regulatory annotation, as they likely exploit existing regulatory pathways, even though they are subject to cellular, rather than organismal selective pressures.

In the early twentieth century, several metabolic disorders were shown to be genetic and Mendelian, and later positional cloning allowed the identification of many such loci, such as those curated by the Online Mendelian Inheritance in Man database (OMIM)[3,4]. Starting in the 1980s, linkage analysis was used to correlate the inheritance of traits in families with the inheritance of mapped polymorphic markers which could be assayed through restriction fragment length polymorphism (RFLP) analysis[5,6]. However, the regions mapped by linkage analysis were necessarily large, and cloning candidate genes for follow-up association studies, resequencing, and functional assays required the application of painstaking molecular techniques before the completion of the Human Genome Project[7]. In addition, complex phenotypes were not amenable to linkage because of the large sample sizes needed to detect loci with modest effects above the genomic background[8]. The long haplotype structure of the human genome, and its systematic mapping by the HapMap Project[9], has allowed single nucleotide polymorphisms (SNPs) to be used as markers for common haplotypes, which could be genotyped using chip technology. The stage was set for a flood of unbiased, genome-wide association studies (GWAS) to search across unrelated individuals[10] for common variants associated with complex disease and diverse molecular phenotypes (Fig. 1, Table 2).

Figure 1

Four types of next-generation association tests

(a) Genetic association with organismal traits is performed in genome-wide association studies (GWAS); at the locus shown, the G allele is associated with disease. The effect of GWAS-discovered variants is mediated through many layers of molecular processes, some of which can also be interrogated at a genomewide scale. (b) Rather than organismal traits, molecular traits can be used, leading to the discovery of local regulatory variants such as expression quantitative trait loci (eQTLs). In this example a local molecular signal, such as a region of open chromatin, varies across the individuals, and is shown to co-vary with presence of the T allele; this allele may influence a cis-regulatory motif of chromatin. (c) Heterozygous sites in individual cells can be used to interrogate allele-specific effects; unlike molecular QTLs discovered across individuals, these studies control for variation in trans genetic background. In this example, the G allele is not only associated with the presence of a TF binding peak at that locus, but in heterozygous individuals is over-represented in ChIP-seq reads originating from that locus, suggesting that the TF binds specifically to the G allele. (d) Functional genomics data can be directly compared between cases and controls to discover biomarkers for disease, without necessarily attributing genetic causes to these molecular changes. Indeed, these biomarkers may be caused by trans genetic factors, environmental factors, or by the disease itself.

Table 2

Computational tools for association analyses.

Class of analysis	Tool	Notes
Genome-wide association between genotype and phenotype (GWAS)	SNPTEST[155]	Incorporates imputation
	Bim-Bam[156]	Bayesian regression approach combining imputation and association probabilities
	EIGENSTRAT[157]	Models ancestry differences between cases and controls using principal components analysis
	PLINK[158]	Large package including tools to impute, control for population stratification, and hybrid methods such as family-based association and population-based linkage
Local association between genotype and molecular trait (e.g., eQTL)	eQTNMiner[159]	Tests a Bayesian hierarchical model incorporating priors based on TSS distance
	Matrix eQTL[160]	Fast association testing of continuous or categorical genotype values with expression
Allele-specific expression and binding	ChIP-SNP[82]	For ChIP-chip data
Allele-specific expression and binding	AlleleSeq[161]	For ChIP-seq and RNA-seq data
Genome-wide association between molecular trait and phenotype (e.g., differential expression, EWAS)	limma[162]	For expression microarray data
	edgeR[163]	For RNA-seq data

Note: analyses using genotype information require tools to call variants, such as BirdSeed[152] on array data or GATK[153] on sequencing data, and tools to impute genotypes, such as MaCH[154].

Relative to linkage analysis and sequencing, GWAS have less power in cases where different rare mutations act in different families or individuals at the same locus (allelic heterogeneity). However, they are far more sensitive than family studies to complex polygenic associations where a phenotype is associated with the joint effect of many weakly-contributing variants across different loci (locus heterogeneity). In this sense GWAS have been a resounding success, identifying thousands of disease-associated loci for further study[11] and revealing previously-unknown mechanisms for diseases such as Crohn’s disease, macular degeneration, and type 2 diabetes[2]. However, the pursuit of GWAS has also received criticism (Box 1) because of the structure of the knowledge it has been producing relative to the determinism of highly-penetrant Mendelian genetic discoveries[2,12,13]. The current tension mirrors the intellectual rift in the early 1900s between Mendelians, who modeled inheritance of discrete traits as being carried by single genes, and the biometrician adherents of Galton, who studied the inheritance of continuous traits; the fields were reconciled by R.A. Fisher, who proposed that quantitative traits’ heritability was owed to the contribution of many genes with small effect [14,15]. Although several predominant criticisms of GWAS have been voiced, responses to each can guide future studies. Cumulative predictive power. Generally, the discovered loci reaching genome-wide significance have weak additive predictive power for specific phenotypes, which limits their clinical relevance for some traits at present[130-132]. However, risk prediction using the loci discovered for complex disease using GWAS often performs similarly to using classical clinical tests, and has unique properties, such as stability over the lifespan[133]. Predictors that jointly use hundreds or thousands of weakly-contributing loci have also been shown to explain a larger proportion of variance than was initially appreciated[134,135]. Integrating these discoveries into clinical protocols is in its infancy, and should be expected to mature. Non-coding variants with unknown effect. Most of the loci are non-coding and many are far from discovered genes, and, because of linkage disequilibrium (LD), encompass many variants; therefore, they are not immediately informative or biochemically tractable for experimental work. Assigning a prior probability to the deleteriousness of a non-coding mutation is challenging[136]. To address this challenge, non-coding sequence is being annotated at a rapid pace through systematic efforts such as the ENCODE Project[21] and the Roadmap Epigenomics Mapping Consortium[22], and through studies of the impact of common variants on genomewide molecular phenotypes, discussed below. Detection of rare variants. Significant loci tend to additively explain only a small proportion of the narrow-sense heritability of phenotypes[12], suggesting that rare rather than common variants may underlie their genetics, which will only be discovered through whole-exome and whole-genome sequencing or family-based studies[13]. Many explanations for “hidden heritability” among the discovered common-variant associations have been proposed[12]. The relative importance of rare and common variants is a topic of intense debate[137], ranging from arguments that associations with common variants are in fact driven by synthetic associations with large-effect rare variants in long-range LD[138], that common associations of weak effect contribute to heritability well beyond the threshold of statistical significance[139], and that narrow-sense heritability may be overestimated in many twin studies due to epistasis disguised as additivity[98]. Reproducibility. GWAS sometimes do not replicate across studies or populations[140], leading to the report of false positives and suspicion of the validity of novel associations, especially when they are non-coding. This could be partly due to the difficulties both in imputing genotypes, which will benefit from an increased understanding of common human variation, and to the poor definition of organismal phenotypes[140], which can benefit from molecular disease biomarkers discussed below. Moreover, while the specific loci involved may differ across populations, they may reflect the same underlying molecular pathways, and thus regulatory annotations may be more reproducible across populations. Focusing on molecular phenotypes may improve reproducibility by isolating potential socio-economic or other environmental factors that occur downstream of molecular phenotypes and can strongly affect organismal phenotypes. In this review, we discuss both the computational challenges and the opportunities presented by the large number of non-coding disease-associated variants being discovered through GWAS and medical resequencing. We first survey the types of regulatory annotations available, including those from functional and comparative genomics as well as quantitative trait loci (QTLs) and allele-specific events, and the ways in which these can be used to dissect disease-associated haplotypes to identify the most promising causal variants at a locus. We then discuss the utility of these regulatory annotations to perform systems-level analysis of GWAS and allelic spectra, revealing relevant cell types and regulatory mechanisms. Finally, we present a variety of bioinformatics hurdles and computational challenges that lie ahead for the field, such as discovering epistatic interactions, connections between molecular and organismal phenotype, and patterns that must be mined from potentially sensitive medical data.

Systematic annotation of the non-coding genome

Interpretation of the molecular mechanisms of disease-associated loci can be a great challenge. Even though protein biochemistry has been used to characterize missense and nonsense coding mutations that most often underlie monogenic traits, the frequency with which loss-of-function mutations and rare coding variants are being discovered in healthy individuals[16,17] suggests our understanding is far from complete. The challenge of interpretation is even greater for non-coding variants, given the diversity of non-coding functions, the incomplete annotation of regulatory elements, and potentially still unknown mechanisms of regulatory control. Several pioneering studies have provided a model for the types of systematic regulatory annotations needed, by revealing the diverse mechanisms of action underlying human disease, including at the transcriptional, splicing, and translational level (Table 3).

Table 3

Mechanisms through which non-coding variants influence human disease.

Non-coding element disrupted	Molecular function and effect of mutations.	Disease association
Splice-junction and splicing-enhancer	Splicing is constitutive for some transcripts and highly tissue-specific for others, relying on both canonical sequences at the exon-intron junction as well as weakly-specified sequence motifs distributed throughout the transcript. Mutations affecting constitutive splice sites can have an effect similar to nonsense or missense mutations, resulting in aberrantly included introns or skipped exons, sometimes resulting in nonsense-mediated decay (NMD).	Splicing regulatory variants are implicated in several diseases[164,165].
		A recent analysis suggests that the majority of disease-causing point mutations in OMIM may exert their effects through splicing[166].
		Alternative splice site variants in the WT1 gene are involved in Frasier Syndrome (FS)[167]
		Skipping of exon 7 of the SMN gene is involved in spinal muscular atrophy (SMA)[168]
Sequences regulating translation, stability, and localization	Sequences in the 5′-untranslated regions (UTRs) of mRNAs can influence translation regulation, such as upstream ORFs, premature AUG or AUC codons, and palindromic sequences that form inhibitory stem loops[169]. Sequence motifs in the 3′-UTR are recognized by microRNAs and RNA-binding proteins (RBPs).	Loss-of-function mutations in the 5′-UTR of CDKN2A predispose individuals to melanoma[170].
		A rare mutation that creates a binding site for the miRNA hs-miR-189 in the transcript of the gene SLITRK1 is associated with Tourette’s syndrome[171].
Genes encoding trans-regulatory RNA	Non-coding RNAs participate in a panoply of regulatory functions, ranging from the well-understood transfer and ribosomal RNA to the recently-discovered long non-coding RNAs[172,173].	Both rare and common mutations in the gene RMRP encoding an RNA component of the mitochondrial RNA processing ribonuclease have been associated with cartilage-hair hypoplasia[174]
Genes encoding trans-regulatory RNA		Non-coding RNA mutations can cause many other diseases[175].
Promoter	Promoter regions are an essential component of transcription initiation and the assembly of RNA polymerase and associated regulators. Mutations can affect binding of activators or repressors, chromatin state, nucleosome positioning, and also looping contacts of promoters with distal regulatory elements. Genes with coding disease mutations can also harbor independently-associated regulatory variants that correlate with expression, are bound by proteins in an allele-specific manner, and disrupt or create regulatory motifs[176].	Mutations in the promoter of the HIV1-progression associated gene CCR5, are correlated with expression of the receptor it encodes and bind differentially to at least three transcription factors[177,178]
		APOE promoter mutations are associated with Alzheimer’s disease[179,180]
		Heme oxygenase-1 (HO-1) promoter mutations lead to expression changes and are associated with many diseases[181]
Enhancer	Enhancers are distal regulatory elements that often lie 10,000 to 100,000 nucleotides from the start of their target gene. Mutations within them can disrupt sequence motifs for sequence-specific transcription factors, chromatin regulators, and nucleosome positioning signals. Structural variants including inversions and translocations can disrupt their regulatory activity by moving them away from their targets, disrupting local chromatin conformation, or creating interactions with insulators or repressors that can hinder their action. While it is thought that looping interactions with promoter regions play a role, the rules of enhancer-gene targeting are still poorly understood.	The role of distal enhancers in disease was suggested even before GWAS by many Mendelian disorders for which some patients had translocations or other structural variants far from the promoter[182–184].
		In one early study, point mutations were mapped in an unlinked locus in the intron of a neighboring gene, a million nucleotides away from the developmental gene Shh [185]; this distal locus acted as an enhancer of Shh and recapitulated the polydactyly phenotype in mouse.
		A number of GWAS hits have been validated as functional enhancers[186]; for example, common variants associated with cancer susceptibility map to a gene desert on chromosome 8, with one SNP demonstrated to disrupt a TCF7L2 binding site and to inhibit long-range activation of the oncogene MYC[187–189].
Synonymous mutations within protein-coding sequences	All of the aforementioned regulatory elements can also be encoded within the protein-coding exons themselves. Thus, synonymous mutations within protein-coding regions may be associated with non-coding functions, acting pre-transcriptionally at the DNA level, or post-transcriptionally at the RNA level.	A synonymous variant in the dopamine receptor gene DRD2 associated with schizophrenia and alcoholism has been shown to modulate receptor production through differences in mRNA folding and stability[190].

In each of these cases, extensive experimental follow-up was needed to uncover the molecular mechanisms responsible for the disease association signal, and many more disease-associated variants remain uncharacterized, emphasizing the need for systematic methods for annotating regulatory regions, their functional nucleotides, and their interconnections. Recognizing the need for systematic interpretation of non-coding disease-associated variants, several large-scale projects are currently underway to enhance the annotation of the non-coding genome (Fig. 2). These rely on reference annotation maps using both functional genomics and comparative genomics, and can dramatically increase the annotation of regulatory elements, which can have a strong impact for interpreting both existing GWAS and individual personal genomes.

Figure 2

Dissecting haplotypes discovered through association tests

These three examples are ways to annotate loci containing several linked SNPs (in this case, three) to discover those most likely to be causal. (a) Functional genomics techniques are being developed to discover putative regulatory elements and link these elements to their target genes. Here, the middle SNP lies in an enhancer in Tissue 1 and Tissue 3, and regulates a gene to its left. (b) Regulatory genomics information leads to prediction of sequence motifs active in classes of enhancers, and this can be combined with the motif creation/disruption caused by variants. In this case, the middle SNP deletes a match to motif B, which is predicted to be active in enhancers found in both Tissue 1 and Tissue 3. (c) Comparative genomics identifies regions of evolutionary constraint in non-coding sequence. Here, sequence surrounding only the middle SNP is constrained across mammals.

Reference functional genomics and chromatin state maps

Massively parallel short-read sequencing technologies have obviated the need for the extremely expensive tiling microarrays previously used to map biochemically active regions of the human genome. This has enabled chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) applied to map transcription factor binding, chromatin regulators, or histone modification marks[18], mapping of DNA methylation using bisulfite sequencing (BS-Seq)[19] and mapping of accessible chromatin regions by DNase hypersensitivity analysis (DNase-Seq)[20]. Computational integration of these datasets through supervised or unsupervised machine learning enables mapping of functional non-coding elements such as distal enhancers, transcription factor binding sites, and regulatory RNA genes on a genome-wide scale. For example, the Encyclopedia of DNA Elements (ENCODE) project is releasing comprehensive maps of chromatin states, TF binding, and transcription for a selection of cell lines and DNase maps for many primary cells[21], and the NIH Epigenomics Roadmap Project[22] and BluePrint project[23] both aim to construct reference epigenome maps of hundreds of primary cells and cultured cells. Regulatory maps can then guide the way towards the most likely causal regulators on a haplotype (Fig. 2a).

Nucleotide-resolution regulatory annotations

While maps of regulatory regions can be highly informative, increasing their resolution from hundreds of nucleotides to single nucleotides requires additional computational or experimental developments. This can leverage systematic efforts that seek to elucidate the binding specificities of transcription factors[24,25] and splicing regulators[26,27], and to also discover regulatory motifs genome-wide based on their enrichment and conservation properties[28,29]. Similarly, new technologies have been applied to enhance existing techniques, such as digital genomic footprinting using DNAse-seq[30], dynamic application of micrococcal nuclease (MNase)[31], or the use of lambda exonuclease (ChIP-exo)[32], dramatically increasing the mapping resolution of regulatory elements even without knowledge of the specific motifs involved.

Predictive models of variant effects

Even when the functional elements and motifs are known, we need models to distinguish how mutations in different positions of a regulatory motif or element will affects its function. These models can be used to distinguish silent from deleterious mutations, as is possible within protein-coding regions. This requires integrative models of sequence motifs, chromatin state, and expression patterns[24,33-36], which can be trained on experimentally tractable tissues or through in vitro experiments and applied to predict the effect of newly-observed rare and private mutations. The massive scale of regulatory predictions, encompassing hundreds of regulators and millions of regulatory motif instances, demands correspondingly massively parallel methods to validate them. Such methods exploit emerging large-scale synthesis and sequencing technologies are being developed both in model organisms and cultured human cells[37-39], and enable testing mechanistic hypotheses about causal variants at unprecedented scales (Fig. 2b).

Comparative genomics between related species

Even when a regulatory element is rarely used and its activity unobserved in the cell types and tissues sampled, its effect on fitness can still be recognized based on its preferential conservation across multiple related species. Genome-wide comparative analysis of many mammals has revealed a high-resolution map of constrained elements spanning 4.5% of the human genome[40,41], revealing millions of likely new elements, including individual transcription factor binding sites, whose nucleotides have been preserved across evolutionary time. Beyond the overall level of evolutionary constraint, the specific evolutionary signatures encoded in the patterns of substitutions, insertions and deletions across related species can provide information for the type of molecular function likely encoded by the constrained elements[41-44]. Together, constraint and evolutionary signatures can pinpoint functional transcription factor binding motifs and individual binding sites (Fig. 2c), non-coding RNA genes and structures, microRNAs and their targets, and yet uncharacterized sequence elements that confer a selective advantage.

Evolutionarily conserved biochemical activity

Even in absence of conserved sequence, the conservation of biochemical activity can be indicative of conserved functional elements, even when the corresponding sequence features are not detectable by traditional alignment and constraint measures due to turnover[45,46]. Because some fraction of protein binding and RNA transcription may be nonfunctional “noise,” cross-species analysis of transcription factor binding[47] or gene expression[48] can help reveal the subset of elements that are most likely to be functional. However, lineage-specific elements may nevertheless be important and not captured through this method.

Interpreting variants using functional genomic annotations

For protein-coding mutations, knowledge of protein structure and function, and the unambiguous nature of the genetic code, has allowed the development of a class of predictive algorithms that can score the severity of missense and nonsense variants[49-52]. Reference annotations are needed to bring functional datasets to bear on understanding the molecular roles of disease-associated common variants in individual regions, especially for non-coding variants (Fig. 2). In addition, new methods are needed to define the relationship between global genetic architectures and genome-wide functional landscapes.

Tools for prioritizing variants

An immediate concern for practitioners of GWAS is the interpretation and prioritization of non-coding variants[53]. A number of resources, including HaploReg[54] (L.D.W. and M.K.), RegulomeDB[55], and ENSEMBL’s Variant Effect Predictor[56] aim to annotate non-coding common variants from association studies using conservation, functional genomics, and regulatory motif data. Databases such as ANNOVAR[57] and VAAST[58] are specialized for annotating whole-genome/exome sequencing data, and leverage population-level negative selection to identify extremely rare coding alleles that are most likely to be functional. None of these tools presently brings together all of the available annotation resources listed in the previous section, however, and they will need to be continuously updated to reflect the exponential growth of regulatory knowledge (Table 4).

Table 4

Comparison of recent tools to systematically annotate variants

Many such tools have been released as databases or software in the past decade; listed below are a sampling of the most recent.

Tool	Type	Input method	Protein annotation	Regulatory annotation	Other
SeattleSeq[191]	server	variants	deleteriousness scores	conservation scores	dbSNP clinical association data
ANNOVAR[57]	software	variants, regions	User-defined: user downloads desired variation, conservation, coding and non-coding functional annotations
ENSEMBL VEP[56]	server	variants, regions	deleteriousness scores	regulatory motif alteration scores	OMIM, GWAS data
VAAST[58]	software	variants	deleteriousness scores	conservation scores	Aggregation to discover rare variants in case-control
HaploReg[54]	server	variants, studies	dbSNP consequence data	chromatin state, protein binding, DNase, conservation, regulatory motif alteration scores	GWAS data, eQTL, LD calculation, enrichment analysis per study
RegulomeDB[55]	server	variants, regions		Histone modification, protein binding, DNase, conservation, reguatory motif alteration scores	eQTL, reporter assays, combined score analysis per variant

Gene set enrichment analysis

Prior knowledge of gene interrelationships has been leveraged in studies of gene expression to discover differentially-regulated pathways even where single genes in those pathways change expression too little to rise to statistical significance[59]. These methods for gene set enrichment analysis (GSEA) are being applied to GWAS, where similarly, genetic risk is expected to be concentrated along biological pathways and multiple testing diminishes the statistical significance of associations considered individually. Dozens of methods have been developed to use prior knowledge from gene functional annotation databases to perform pathway analysis on GWAS[60,61] (Fig. 3a).

Figure 3

Systems-level analyses beyond isolated common haplotypes. (a) Gene-based enrichment analysis of genetic architecture

A typical analysis of GWAS results will compare the set of genes near associated loci with prior knowledge about those genes, leading to hypotheses about the pathways involved (in this example, process A but not process B). (b) Non-coding enrichment analysis of genetic architecture using regulatory annotations. High-resolution maps of diverse regulatory annotations can also be intersected with GWAS results. Examples are shown where tissue-associated enhancers, eQTLs, DNAse peaks, or allele-specific polymerase binding are enriched among the results of a GWAS. In addition, regulatory annotations can be combined with gene-based annotations and linking information, in this case discovering an enrichment for enhancers linked to the genes involved in process A. (c) Interpreting linked loci exhibiting high allelic heterogeneity. In some cases only rare mutations at a locus contribute to its genetic mechanism, and these regions will only be discovered through classical linkage analysis. These regions can now be interrogated through WES/WGS, and an imbalanced burden of putatively deleterious alleles can be observed in cases (as in the left example). With regulatory annotations, these burden tests can now be extended to non-coding regions (as in the right example.) (d) Interpreting causal variants in whole genomes. Personal genomes pose the challenge of exposing potentially causal variants that were too rare or low-penetrance to have been associated with a phenotype through association or linkage studies. For coding alleles, prior knowledge is currently used in several ways when analyzing personal genomes: knowledge of the genetic code (to filter on nonsynonymous variants), inference of negative selection from population panels (to filter out common variants), and models developed from biophysical principles (to focus on those amino acid substitutions most likely to alter protein structure and function.) Similar pipelines will need to be developed for regulatory regions. We propose using both population-level and cross-species signals of selection (to filter out not only common variants, but those that are not constrained across mammals), and all of the regulatory models previously mentioned (predicted regulatory elements and the motifs active within them, molecular trait associations such as eQTLs, etc.) Such a pipeline will be crucial to interpreting the flood of sequencing data that will be collected in both clinical and research settings.

Regulatory element enrichment analysis

A recent study used chromatin state maps to discover an enrichment of cell type-specific enhancers among the top associations in several GWAS[62] (L.D.W., M.K., and colleagues), demonstrating the utility of high-resolution functional genomics maps to serve as a type of pathway annotation. Similar results have been seen using DNase hypersensitivity maps across a large number of cell types[63], and by examining concordance between expression quantitative trait loci (eQTLs) and GWAS[64,65]. These approaches have demonstrated the power of reference epigenomes to identify relevant tissues for further study (Fig. 3b). Another way to use prior knowledge about variant function is to incorporate the information into the association study itself through Bayesian methods[61,66-69] or using boosting to prioritize disease networks[70]. However, it is difficult to evaluate the utility of these weighting schemes, which essentially discard loci about which there is the least functional data.

Burden tests; dealing with heterogeneity

For potentially causal rare variants discovered through whole-genome sequencing, a class of techniques has been developed that deal successfully with allelic heterogeneity and low allele frequencies by pooling mutations across individuals by genes, pathways, or other functional annotations and filters[71]; the additional use of functional genomic maps has recently been proposed[72]. Improved annotation of non-coding regions will obviously empower this type of analysis (Fig. 3c). Table 5 lists examples of new insights from computational methods integrating regulatory elements with GWAS.

Table 5

Examples of regulatory enrichment analyses of genetic associations.

Class of test	Finding	Computational tools used
Gene set enrichment near associated loci	Regulatory network of five proteins implicated in Kawasaki disease[192]	Ingenuity Pathway Analysis (closed-source)
	Genes differentially expressed in adipose overlap with genetic associations with obesity[193]	Microarray analysis of differential expression
	TGF-β pathway, Hedgehog signaling pathway are enriched among height GWAS loci[103]	GSEA using MAGENTA[194], network from text-mining using GRAIL[195], known disease genes from OMIM[4], eQTL enrichment
Concordance with eQTL results	eQTL prioritization during replication facilitated validation of two Crohn’s disease susceptibility loci[196]	eQTL enrichment
Concordance with eQTL results	GWAS involving immune system show enrichment for lymphoblastoid eQTL[64]	eQTL enrichment (RTC[64])
Chromatin state enrichment	Many GWAS show enrichment for enhancers in biologically-relevant cell types[62]	ChromHMM to define discrete chromatin states[197] (M.K. and colleagues); enrichment analysis
TF binding site and DNase hypersensitivity enrichment	Many GWAS show enrichment for ENCODE-annotated DNAse and ChIP sites[198]	Enrichment analysis
	Many GWAS show enrichment for DNAse in biologically-relevant cell types[63]	Hotspot algorithm to define discrete hypersensitive sites[199]; enrichment analysis
	FOXA1 and estrogen receptor binding sites are enriched among breast cancer GWAS loci[200]	Variant Set Enrichment (VSE[200])

Interpreting variants using population variation in molecular phenotypes

While until this point we have discussed regulatory annotations from reference cell lines, biochemical activity is itself genotype-dependent, and thus a single reference annotation fails to capture the complexity of the regulatory genome. Moreover, we treated LD as a property of the human genome, while it is in fact population specific, and patterns of LD and selection have varied across both geography and time. This increased complexity can in fact be leveraged to gain additional insights into genome regulation, and provide additional power for the aforementioned analyses.

Genotype-associated molecular activity

Two powerful tools have emerged to identify non-coding loci that affect molecular phenotypes: association studies and allele-specificity studies. Association studies (Fig. 1b) have been used to discover non-coding cis regulators of methylation (meQTLs)[73], DNase I sensitivity (dsQTLs)[74], transcription factor binding[75], gene expression (eQTLs)[76], and alternative splicing[77]. In the same manner as GWAS on organism-level quantitative traits, these studies consider a phenotype associated with a particular genomic locus (such as steady-state mRNA level corresponding to a gene) in the same cell type isolated across unrelated individuals, and search for genetic regulators of those molecular processes. A recent related study used eQTL data to reveal selective signatures of epistasis between deleterious coding variants and the regulatory variants that modulate their penetrance[78], a method which should be broadly applicable to testing hypotheses about cis regulatory interactions from genomics models.

Allele-specificity activity

In contrast, allele specificity tests look at heterozygous sites in individuals and look for a skew in the molecular signal towards one of the alleles (Fig. 1c). Allele-specific methylation[79], histone modification[80], DNAse I sensitivity[81], protein binding[82], and expression[83] have been surveyed genomewide. While association studies have the advantage of identifying regulatory variants that may be acting at some genetic distance from the regulated locus, and can include homozygous individuals in the sample, allele-specific studies can be performed on single individuals, and inherently control for possible trans-regulatory differences caused by individuals’ genetic background.

Importance of population-specific effects

Causal variants within associated haplotypes should be identified not only for further research, but also for genetic counseling; because of variations in LD patterns, a SNP that marks a risk haplotype efficiently in one population may not in another[84]. Computational methods that explicitly model ethnic background in admixed populations can increase their power by exploiting their shared ancestry[85].

Population differentiation and positive selection

Haplotype structure and allele frequencies from the HapMap project[9] and 1000 Genomes project[86] provide evidence of both positive and negative selection currently acting on the human lineage. Although the relative importance of population structure and selective sweeps in recent human history is debated[87-89], many non-coding loci show multiple lines of evidence for local adaptation[90].

Utilizing population structure and relatedness

Ultimately, linkage analysis and GWAS are sensitive to complementary genetic architectures, but a wide spectrum of diseases likely exhibit both locus and allele heterogeneity. Because the genomically-distributed signals of association with complex disease are weak, the potential confounding effects of population stratification and cryptic relatedness become especially important to control. Family-based methods such as linkage analysis and the transmission disequilibrium test (TDT) are free of these complications, and have been combined with association tests in a new class of methods[91]. In addition, new methods in phylogenomics and ancestral recombination graph reconstruction provide an opportunity to enhance association studies by explicitly taking population structure and region-specific relatedness into account[92,93].

Aggregate measures of purifying selection

Modeling of allele frequency data[94,95] and sequence divergence data[46] suggests that a large amount of negative selection is occurring outside of mammalian conserved elements, evidence for widespread non-coding function. These same forces can maintain disease-associated alleles at lower frequency in the population dependent on their penetrance and expressivity.

Identifying higher-order relationships between variants

Even when considering genome-wide enrichments of functional annotations in disease-associated regions, the aforementioned methods have so far considered each locus as acting independently and considered their effects as additive. Functional genomics should enable us to consider higher-order interactions between these individual loci, by leveraging functional and variation information to build interaction and regulatory networks. These networks can then guide the search for epistatic effects.

Detecting epistasis de novo

Substantial disagreement exists over the relative importance of epistasis in the genetic basis of complex disease[96-98]. While genetic interactions have been systematically mapped in yeast[99] and cases have been identified in human[66], testing for all possible interactions remains impossible; understandably, detecting epistasis in association studies is an area of intense theoretical interest[66,100,101]. One method[102] successfully discovered epistasis between two taste receptor genes affecting nicotine dependence by using a multifactor dimensionality reduction (MDR) method integrated with linkage information from a pedigree disequilibrium test, similar to the hybrid linkage-association studies described previously[91].

Guiding search for epistasis

Some methods propose to limit the search space for interactions by only searching among the most significant independently-associated loci; this method failed to discover any interactions among the 180 loci reported to be associated with height[103]. Another proposed limit on the search space is with prior knowledge from gene annotations and protein-protein interactions[104-106]. Again, epigenomic maps and improved regulatory annotation holds promise for zeroing in on relevant combinations of SNPs that might be expected to interact.

Linking enhancers to their target genes using physical interaction data

Unlike promoters, enhancers pose the dual challenge of both pinpointing their location in vast nonfunctional sequences, and linking them to their target genes. These distal regulatory elements often interact physically with promoters, and technologies to detect these interactions, such as chromatin conformation capture (3C, Hi-C)[107,108] and chromatin interaction paired-end tagging (ChIA-PET)[109] are advancing rapidly.

Linking enhancers to their target genes using cell-to-cell variability

Another way of detecting enhancer-gene relationships is to measure the correlation of these elements’ activity with expression across multiple cell types and conditions. This technique is being used to infer gene regulatory networks in human[35] and model organisms[99,110]. While protein-protein interaction and metabolic networks are the most common types of prior knowledge integrated into existing algorithms, these regulatory networks may provide a more useful starting point in the search for epistasis.

Inferring networks from individual-to-individual variability

Molecular QTL data discovered from inter-individual variation can also being used to help infer regulatory networks[111], which unlike evidence learned solely from expression patterns provide unambiguous directionality for causality.

Inferring networks from systematic perturbations

Chemical perturbations of cultured cells have been used for network inference. These experiments are useful not only for their relevance to understanding pharmacological mechanisms, but also for revealing the difference in network topology between normal and cancerous cells[112], including gene-gene and gene-drug interactions relevant to interpreting genetic architecture of cancer.

Artificial selection and drug response experiments in model organisms

While human genetic history and selective pressures are closely intertwined, model organisms offer an opportunity to measure the global effects of selection and the resulting genetic interactions in a controlled setting[113,114]. Model organisms have also proven useful for testing gene-gene[99] and gene-drug[115] interactions on a scale that is impossible in humans.

Functional genomics in a medical setting

While genotyping and sequencing is already becoming commonplace for discovery of disease loci and increasingly for diagnostics in a clinical setting, in the future the democratization of genome-wide molecular profiling technologies will further enable cohort-level molecular association studies and personal functional genomics in a medical setting. These can complement existing genetic and chemical biomarkers with molecular-level diagnostics of disease state.

Functional genomics of disease cohorts

One of the major clinical applications of DNA microarrays was to identify disease-involved genes and to classify disease subtypes by genome-wide expression signatures[116], and disease-associated gene sets from microarrays and now RNA-seq can be used to define biological pathways, such as those in the Molecular Signatures Database (MSigDB)[117]. Similarly, chromatin maps can be compared across lineages or between disease and normal tissue to define sets of regulating loci (Fig. 1d). These sets can be used for enrichment and pathway analysis of GWAS, as described previously.

Epigenome-phenotype association

Microarray-based assays for methylation are now allowing for the first time “epigenome-wide association studies” (EWAS)[118], which identify differentially-methylated sites associated with disease without taking into account genotype (Fig. 1d). Such studies may bypass some of the environmental variability that lowers the penetrance of genetic factors[119]. Integrating family members into EWAS studies may be especially useful in order to test for imprinting and other parent-of-origin effects.

Genetic association with molecular phenotypes for determining causality

One important future use of molecular QTLs may be to empower Mendelian randomization studies[120,121]. Molecular traits - expression, epigenetic state, or biomarkers - can be important stepping stones between genetic variation and complex phenotypes, but the direction of causality can be unclear between the molecular trait and the organismal trait. A recent study used this method to challenge the idea that raising HDL cholesterol levels reduces risk of myocardial infarction, showing that alleles for higher HDL did not convey the genetic protection from heart disease that would be expected if cholesterol were causal[122].

Predicting molecular consequences of rare and private mutations

Once these regulatory mechanisms are predicted from functional genomics and molecular variation, the next challenge is applying this knowledge to rare variants discovered by whole-genome sequencing (Figure 2d). A goal for regulatory genomics should be to develop models that predict the effect of novel regulatory variants with the same accuracy as existing methods for novel protein-coding variants.

Functional genomics of individuals

Some expression signatures of disease subtypes or progression are already being used clinically, and their use promises to grow. However, analogous to the problem of rare variants discovered through sequencing, clinical functional genomics samples will also exhibit patterns too rare in the population to have been correlated with disease. As a recent pilot study on an individual demonstrates[123], there is both great power but also many challenges associated with interpreting such personal -omics profiling, and new computational models are needed that can generalize from the effects of common genetic and functional variation to personal genetics and functional genomics.

Hurdles in biomedical informatics and interoperability

In addition to these conceptual challenges of statistical and computational integration of disparate datasets, each of these topics has relied on extensive data sharing between genomics and medical genetics researchers. However, sharing is still limited due to privacy concerns and informatics challenges of database interoperability. These challenges are even greater for non-genomic datasets such as medical records and drug response, resulting in treasure troves of information remaining unused. To complete the integration of genomics into the drug discovery and target validation pipelines, several additional hurdles need to be overcome:

GWAS P-value sharing

In order to facilitate integrative analysis, GWAS investigators should report the association of all variants, not just those that are most significant. The editorial board of Nature Genetics recently articulated a policy to this effect[124], but concerns remain about sufficiently de-identifying association results in order to protect subject privacy[125]. Procedures in place at central archives such as the NCBI’s database of Genotypes and Phenotypes (dbGaP) and the European Genome-Phenome Archive (EGA) are crucial to balancing the rights of human subjects with the principles of scientific openness.

Database integration

The interoperability of databases remains paramount to integrative analysis. Continuing efforts by the UCSC Genome Browser and the ENSEMBL Genome Browser have facilitated integration of epigenomic and variation data, but better connections to domain-specific knowledge bases such as the GTex eQTL Browser, dbGaP analyses, and the NHGRI GWAS Catalog[11] would broaden the scope of connections available to geneticists.

Medical record standardization

Medical records have been successfully mined to discover epidemiological patterns[126], adverse drug reactions[127], and disease risk factors and heterogeneity[128]. As electronic medical records become populated with genetic data, cooperation with clinicians will be needed in order to mine patient data for genetic associations with biomarkers and disease, and discover novel patterns of disease heterogeneity[129].

Integration of medical and pharmacogenomics datasets

Ultimately, informatics challenges will need to be resolved in order to connect the resulting molecular predictions to patient records, environmental variables, drug screening and response databases, towards enabling genomics as commonplace for clinical practice.

CONCLUSIONS

Data from GWAS and whole-genome sequencing continue to expand the catalog of non-coding variants implicated in human disease, and data from epigenome mapping consortia complemented with regulatory modeling are needed to prioritize candidate causal variants and candidate affected tissues. Thoughtful integration of systematic and manual annotations of gene sets along with higher-resolution functional maps may hold the key to implicating pathways and cell types, both through joint consideration of the many weak additive associations discovered in GWAS as well as in the search for epistatic interactions between variants. Clinically relevant regulatory interactions may then be tested experimentally in the tissues or in vitro experimental conditions that are predicted to recapitulate the phenotype. In addition, an explosion of functional genomics data has been facilitated by high-throughput sequencing technology, allowing “intermediate” molecular phenotypes to be correlated with both organismal phenotype and with genotype. This new type of data can be combined with genetic associations to decipher the mechanisms underlying complex disease.

197 in total

1. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

Authors: Alexander Stark; Michael F Lin; Pouya Kheradpour; Jakob S Pedersen; Leopold Parts; Joseph W Carlson; Madeline A Crosby; Matthew D Rasmussen; Sushmita Roy; Ameya N Deoras; J Graham Ruby; Julius Brennecke; Emily Hodges; Angie S Hinrichs; Anat Caspi; Benedict Paten; Seung-Won Park; Mira V Han; Morgan L Maeder; Benjamin J Polansky; Bryanne E Robson; Stein Aerts; Jacques van Helden; Bassem Hassan; Donald G Gilbert; Deborah A Eastman; Michael Rice; Michael Weir; Matthew W Hahn; Yongkyu Park; Colin N Dewey; Lior Pachter; W James Kent; David Haussler; Eric C Lai; David P Bartel; Gregory J Hannon; Thomas C Kaufman; Michael B Eisen; Andrew G Clark; Douglas Smith; Susan E Celniker; William M Gelbart; Manolis Kellis
Journal: Nature Date: 2007-11-08 Impact factor: 49.962

2. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

3. A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies.

Authors: Xiang-Yang Lou; Guo-Bo Chen; Lei Yan; Jennie Z Ma; Jamie E Mangold; Jun Zhu; Robert C Elston; Ming D Li
Journal: Am J Hum Genet Date: 2008-10-02 Impact factor: 11.025

4. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling.

Authors: Sari Tuupanen; Mikko Turunen; Rainer Lehtonen; Outi Hallikas; Sakari Vanharanta; Teemu Kivioja; Mikael Björklund; Gonghong Wei; Jian Yan; Iina Niittymäki; Jukka-Pekka Mecklin; Heikki Järvinen; Ari Ristimäki; Mariachiara Di-Bernardo; Phil East; Luis Carvajal-Carmona; Richard S Houlston; Ian Tomlinson; Kimmo Palin; Esko Ukkonen; Auli Karhu; Jussi Taipale; Lauri A Aaltonen
Journal: Nat Genet Date: 2009-06-28 Impact factor: 38.330

5. So much "junk" DNA in our genome.

Authors: S Ohno
Journal: Brookhaven Symp Biol Date: 1972

Review 6. Positive natural selection in the human lineage.

Authors: P C Sabeti; S F Schaffner; B Fry; J Lohmueller; P Varilly; O Shamovsky; A Palma; T S Mikkelsen; D Altshuler; E S Lander
Journal: Science Date: 2006-06-16 Impact factor: 47.728

7. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution.

Authors: Ho Sung Rhee; B Franklin Pugh
Journal: Cell Date: 2011-12-09 Impact factor: 41.582

8. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

Review 9. Construction of a genetic linkage map in man using restriction fragment length polymorphisms.

Authors: D Botstein; R L White; M Skolnick; R W Davis
Journal: Am J Hum Genet Date: 1980-05 Impact factor: 11.025

10. Architecture of the human regulatory network derived from ENCODE data.

Authors: Mark B Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G Landt; Koon-Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P Boyle; Philip Cayting; Alexandra Charos; David Z Chen; Yong Cheng; Declan Clarke; Catharine Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski; Phil Lacroute; Jing Jane Leng; Jin Lian; Hannah Monahan; Henriette O'Geen; Zhengqing Ouyang; E Christopher Partridge; Dorrelyn Patacsil; Florencia Pauli; Debasish Raha; Lucia Ramirez; Timothy E Reddy; Brian Reed; Minyi Shi; Teri Slifer; Jing Wang; Linfeng Wu; Xinqiong Yang; Kevin Y Yip; Gili Zilberman-Schapira; Serafim Batzoglou; Arend Sidow; Peggy J Farnham; Richard M Myers; Sherman M Weissman; Michael Snyder
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

234 in total

1. GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.

Authors: Haoyang Zeng; Tatsunori Hashimoto; Daniel D Kang; David K Gifford
Journal: Bioinformatics Date: 2015-10-17 Impact factor: 6.937

2. A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data.

Authors: Zheng Xu; Guosheng Zhang; Fulai Jin; Mengjie Chen; Terrence S Furey; Patrick F Sullivan; Zhaohui Qin; Ming Hu; Yun Li
Journal: Bioinformatics Date: 2015-11-04 Impact factor: 6.937

3. Towards a map of cis-regulatory sequences in the human genome.

Authors: Meng Niu; Ehsan Tabari; Pengyu Ni; Zhengchang Su
Journal: Nucleic Acids Res Date: 2018-06-20 Impact factor: 16.971

4. Computational Prediction of Position Effects of Human Chromosome Rearrangements.

Authors: Cinthya J Zepeda-Mendoza; Shreya Menon; Cynthia C Morton
Journal: Curr Protoc Hum Genet Date: 2018-04-26

5. VEGF-A and VEGFR1 SNPs associate with preeclampsia in a Philippine population.

Authors: Melissa D Amosco; Van Anthony M Villar; Justin Michael A Naniong; Lara Marie G David-Bustamante; Pedro A Jose; Cynthia P Palmes-Saloma
Journal: Clin Exp Hypertens Date: 2016-09-26 Impact factor: 1.749

6. A Comprehensive cis-eQTL Analysis Revealed Target Genes in Breast Cancer Susceptibility Loci Identified in Genome-wide Association Studies.

Authors: Xingyi Guo; Weiqiang Lin; Jiandong Bao; Qiuyin Cai; Xiao Pan; Mengqiu Bai; Yuan Yuan; Jiajun Shi; Yaqiong Sun; Mi-Ryung Han; Jing Wang; Qi Liu; Wanqing Wen; Bingshan Li; Jirong Long; Jianghua Chen; Wei Zheng
Journal: Am J Hum Genet Date: 2018-05-03 Impact factor: 11.025

Review 7. Using the ENCODE Resource for Functional Annotation of Genetic Variants.

Authors: Michael J Pazin
Journal: Cold Spring Harb Protoc Date: 2015-03-11

8. Integrative annotation of variants from 1092 humans: application to cancer genomics.

Authors: Ekta Khurana; Yao Fu; Vincenza Colonna; Xinmeng Jasmine Mu; Hyun Min Kang; Tuuli Lappalainen; Andrea Sboner; Lucas Lochovsky; Jieming Chen; Arif Harmanci; Jishnu Das; Alexej Abyzov; Suganthi Balasubramanian; Kathryn Beal; Dimple Chakravarty; Daniel Challis; Yuan Chen; Declan Clarke; Laura Clarke; Fiona Cunningham; Uday S Evani; Paul Flicek; Robert Fragoza; Erik Garrison; Richard Gibbs; Zeynep H Gümüş; Javier Herrero; Naoki Kitabayashi; Yong Kong; Kasper Lage; Vaja Liluashvili; Steven M Lipkin; Daniel G MacArthur; Gabor Marth; Donna Muzny; Tune H Pers; Graham R S Ritchie; Jeffrey A Rosenfeld; Cristina Sisu; Xiaomu Wei; Michael Wilson; Yali Xue; Fuli Yu; Emmanouil T Dermitzakis; Haiyuan Yu; Mark A Rubin; Chris Tyler-Smith; Mark Gerstein
Journal: Science Date: 2013-10-04 Impact factor: 47.728

9. A genome-wide survey of CD4(+) lymphocyte regulatory genetic variants identifies novel asthma genes.

Authors: Sunita Sharma; Xiaobo Zhou; Derek M Thibault; Blanca E Himes; Andy Liu; Stanley J Szefler; Robert Strunk; Mario Castro; Nadia N Hansel; Gregory B Diette; Becky M Vonakis; N Franklin Adkinson; Lydiana Avila; Manuel Soto-Quiros; Albino Barraza-Villareal; Robert F Lemanske; Julian Solway; Jerry Krishnan; Steven R White; Chris Cheadle; Alan E Berger; Jinshui Fan; Meher Preethi Boorgula; Dan Nicolae; Frank Gilliland; Kathleen Barnes; Stephanie J London; Fernando Martinez; Carole Ober; Juan C Celedón; Vincent J Carey; Scott T Weiss; Benjamin A Raby
Journal: J Allergy Clin Immunol Date: 2014-06-13 Impact factor: 10.793

Review 10. Macrophage immunomodulation in chronic osteolytic diseases-the case of periodontitis.

Authors: Corneliu Sima; Ana Viniegra; Michael Glogauer
Journal: J Leukoc Biol Date: 2018-11-19 Impact factor: 4.962