Literature DB >> 34570217

VannoPortal: multiscale functional annotation of human genetic variants for interrogating molecular mechanism of traits and diseases.

Dandan Huang^1,2, Yao Zhou^1,3, Xianfu Yi⁴, Xutong Fan^1,3, Jianhua Wang^1,3, Hongcheng Yao⁵, Pak Chung Sham⁵, Jihui Hao⁶, Kexin Chen⁷, Mulin Jun Li^1,3,7.

Abstract

Interpreting the molecular mechanism of genomic variations and their causal relationship with diseases/traits are important and challenging problems in the human genetic study. To provide comprehensive and context-specific variant annotations for biologists and clinicians, here, by systematically integrating over 4TB genomic/epigenomic profiles and frequently-used annotation databases from various biological domains, we develop a variant annotation database, called VannoPortal. In general, the database has following major features: (i) systematically integrates 40 genome-wide variant annotations and prediction scores regarding allele frequency, linkage disequilibrium, evolutionary signature, disease/trait association, tissue/cell type-specific epigenome, base-wise functional prediction, allelic imbalance and pathogenicity; (ii) equips with our recent novel index system and parallel random-sweep searching algorithms for efficient management of backend databases and information extraction; (iii) greatly expands context-dependent variant annotation to incorporate large-scale epigenomic maps and regulatory profiles (such as EpiMap) across over 33 tissue/cell types; (iv) compiles many genome-scale base-wise prediction scores for regulatory/pathogenic variant classification beyond protein-coding region; (v) enables fast retrieval and direct comparison of functional evidence among linked variants using highly interactive web panel in addition to plain table; (vi) introduces many visualization functions for more efficient identification and interpretation of functional variants in single web page. VannoPortal is freely available at http://mulinlab.org/vportal.

Entities: Chemical

Mesh：

Year: 2022 PMID： 34570217 PMCID： PMC8728305 DOI： 10.1093/nar/gkab853

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Genome-wide association studies (GWASs) and large-scale genome sequencing studies have uncovered many genetic variants and somatic mutations associated with different human diseases/traits, yet interpreting the molecular mechanisms of these genomic variations and their causal relationships with disease/trait development is challenging (1,2). As the growing volume of functional genomic/epigenomic profiling across a large number of human tissue/cell types, such as the Encyclopedia of DNA Elements (ENCODE) Project (3), Roadmap Epigenomics Project (4) and the International Human Epigenome Consortium (IHEC) Project (5), context-dependent fine-mapping of causal variants and identifying fine-grained molecular phenotypes that mediate the effect between an investigated variant and a particular disease/trait become practical. In addition, a number of computational tools have been developed to predict the regulatory potential or pathogenicity of variant genome-widely (6,7), such as the pioneer algorithm GWAVA (8) and the disease-specific model DIVAN (9), which significantly facilitates the characterization of genomic variants at single base level for interpretation of disease/trait development. Despite the great effort of international projects in generating, processing, and distributing large amounts of genome/epigenome sequencing data and functional annotations, biologists and clinicians nowadays face great difficulties to curate, collect and compare variant information from different resources, and sometimes even need to download huge annotation files or manually calculate prediction scores. Several variant annotation databases, such as UCSC Variant Annotation Integrator (10), Ensembl Variation Database (11), VarSome (12) and Bystro (13), provide convenient avenues to inspect the genomic and phenotypic features of given variants, but they barely provide genomic effects of variants in linkage disequilibrium (LD) with the single variant being queried and offer limited annotations for non-coding variants. Besides, the overwhelming growth of tissue/cell type-specific and disease/trait-specific variant annotations enables evidence-driven prioritization of candidate causal/pathogenic variants in particular conditions. Unfortunately, existing databases like RegulomeDB (14) and HaploReg (15) often fail to incorporate the latest context-dependent annotations and genome-scale functional predictions which are crucial for drawing biologically meaningful conclusions from investigated variants. In this work, by systematically integrating large-scale tissue/cell type-specific genomic/epigenomic profiles, base-wise functional prediction scores, and frequently-used annotation databases from various biological domains, we develop a novel variant annotation database VannoPortal for biologists and clinicians to efficiently retrieve comprehensive and context-specific features, including variant basic information, evolutionary annotation, disease/trait association, variant regulatory potential, and variant pathogenicity. VannoPortal leverages multiscale orthogonal evidences to support the functionality or pathogenicity of queried variants. It significantly enlarges the annotation scope to almost all possible substitutions of a small variant in the human reference genome, and make efforts to improve the interpretability of variant annotations by using many intuitive visualizations and interactive web components. VannoPortal is free and open to all users without login and registration at http://mulinlab.org/vportal.

MATERIALS AND METHODS

Variant basic information and allele frequency

Allele information of known single nucleotide variations (SNVs) and insertions/deletions (indels) were collected from gnomAD r2.0.2 (16), 1000 Genomes Project phase 3 (17), and dbSNP b151 (18). For SNV, alleles are enumerated if only genomic coordinate is provided based on human reference genome. For customized alleles which are conflict with human reference genome or are absent in known variant databases, only region-level annotation is supported. We applied a Java library Jannovar v0.30 (19) to annotate gene and transcript information. Commonly-used allele frequency information for worldwide populations were downloaded from 1000 Genomes Project phase 3 and gnomAD r2.0.2. We also incorporated allele frequencies from other genome sequencing or genotyping projects, including GenomeAsia (20), jMorp (21), ABraOM (22), UK10K project (23), UK Biobank (24), etc. CrossMap (25) was used to convert genome coordinates between GRCh37 and GRCh38 when the annotation is not provided for a certain genome assembly version.

Evolutionary information

Most base-wise conservation scores were extracted from CADD v1.4 (26), including PhyloP (27), phastCons (28), GERP (29), fitCons (30), and bStatistic (31) except for SiPhy (32). Similar to CADD score, we calculated the ‘PHRED-scaled’ score for each of these scores by taking the rank in order of magnitude, which makes them comparable to each other. For each score, a likely conserved signal was defined once the ‘PHRED-scaled’ score was >10. Based on genotypes from 1000 Genomes Project phase 3, variant-level positive selection scores were calculated and classified according to the description of our dbPSHP (33) and 1000 Genomes Selection Browser (34).

Disease/trait association

LD information for five super populations (AFR, AMR, EAS, EUR, SAS) were calculated using genotypes from 1000 Genomes Project phase 3. Disease/trait-associated variants were collected from The NHGRI-EBI GWAS Catalog v1.0.2 (35). The likely disease/trait-causal variants were downloaded from our CAUSALdb v1.1 (36). Expression quantitative trait loci (eQTL) and splicing quantitative trait loci (sQTL) of 54 human tissue/cell types were downloaded from GTEx v8 (37), and information for other types of molecular trait quantitative trait loci (xQTL) were collected from our QTLbase v1.2 (38). VarNote random-sweep searching algorithm (39) was used to extract annotations and filter linked variants in LD.

Regulatory potential

Context-dependent regulatory variant prediction scores were integrated from cepip (40), GenoSkyline-Plus (41), FUN-LDA (42), GenoNet (43), and FitCons2 (44) for 127 tissue/cell types. The combined score of tissue/cell type-specific regulatory potential was calculated by rank product. Based on consolidated and imputed epigenomes of 127 human tissue/cell types from Roadmap Epigenomics (4) and 869 samples from EpiMap (45), we intersected each query variant with narrow peaks of histone marks, transcription factor (TF) (measured by chromatin immunoprecipitation sequencing (ChIP-Seq)) and open chromatin (measured by DNase I hypersensitive sites sequencing (DNase-Seq) or transposase accessible chromatin sequencing (ATAC-seq)) using VarNote random access function. Significant 5 kb Hi-C interactions of 60 tissue/cell types were borrowed from our GWAS4D (46), and a virtual 4C diagram anchored at query variant locus was plotted using CHiCP (47). Motif information for 136 TFs was collected from CIS-BP (48), JASPAR (49), and ENCODE-motifs (50). Binding affinity effect changes between different alleles of query variant were estimated according to our previous method (51). TF binding ChIP-seq significant peaks in different tissue/cell types were systematically integrated from CistromeDB (52), DeepBlueR (53), GTRD (54) and EpiMap (45). We also incorporated allelic imbalance evidence of chromatin accessibility and TF binding from multiple studies (55,56).

Pathogenicity

Genome-scale base-wise prediction scores of pathogenic and cancer driver regulatory variants were downloaded from RegBase-PAT and RegBase-CAN (7). According to the Youden's J statistics derived from trained model for each tool, query variants can be classified as likely pathogenic or neutral properties. Nonsynonymous SNV pathogenicity scores were downloaded from dbNSFP V4.1a (57). Prediction scores for splicing-altering potential were retrieved from dbscSNV (58), S-CAP (59), and SpliceAI (60). ClinVar was used to annotate genomic variation and its relationship to human health (61). COSMIC (62) and ICGC (63) aggregated mutation datasets were adopted to annotate somatic recurrence in cancers. Finally, CIViC was used to annotate mutation-dependent effects on cancer drug treatment (64).

Database design and annotation retrieve strategy

VannoPortal is built on a Java-based web framework. Several interactive web pages are implemented by D3.js, jQuery and related JavaScript modules. To ensure fast retrieval of relevant information from huge annotation databases, each annotation file was concerted to BED, VCF or 1-based tabular text file, then compressed and indexed by VarNote. The parallel random-sweep searching or independent random-access strategies of VarNote were used to ensure a highly efficient query.

RESULTS

Data summary of VannoPortal

By systematically integrating genomic/epigenomic profiles and variant annotations from various biological domains, the initial version of VannoPortal contains 40 independent variant-level and region-level information archived in over 4TB indexed annotation files (Supplementary Table S1). To simplify biological interpretation, VannoPortal classified these annotations into five major categories including variant basic information, evolutionary annotation, disease/trait association, variant regulatory potential, and variant pathogenicity. Specifically, (i) in ‘variant basic information’ annotation, VannoPortal reports the genomic attributes, affected genes and transcripts and worldwide allele frequencies. In addition to the 1000 Genomes project (17) and gnomAD (16), VannoPortal also incorporates allele frequency information from other genome sequencing projects. (ii) In ‘evolutionary annotation’, VannoPortal provides comprehensive aggregation of 11 base-wise conservation scores and 13 variant-level score regarding positive/negative selection in recent human evolution, which could benefit the identification of functional variants from an evolutionary perspective. (iii) In ‘disease/trait association’, VannoPortal collects disease/trait-associated signals and credible variants identified by GWAS, and molecular trait QTLs across most of human tissue/cell types. By leveraging population-specific LD information, this disease/trait association evidence can be easily compared among correlated variants in VannoPortal. (iv) Since interpreting the non-coding regulatory variants is challenging, VannoPortal comprehensively integrates large-scale tissue/cell type-specific epigenomes and functional predictions in the ‘regulatory potential’ section. For example, context-dependent prioritization of regulatory potential among high LD variants enables the identification of potentially causal regulatory variants in phenotypically relevant tissue/cell types; Mapping variant locus to critical histone marks, chromatin states and TF binding sites across hundreds of tissue/cell type-specific samples, from Roadmap Epigenomics (4) or EpiMap (45) projects, will greatly facilitate the grasp of regulatory code underlying the investigated variant; Linking variant to its target genes or affected regulators can further pinpoint the molecular mechanism and direct functional follow-up. (v5) Finally, in ‘variant pathogenicity’ annotation, VannoPortal not only includes deleterious scores for missense and splicing-altering variants, it also summarizes multiple genome-scale predictions and evidence to interpret pathogenic variants for disease progression and targeted therapy (Figure 1).

Figure 1.

Database architecture, function structure and representative features.

Advanced features of VannoPortal over existing databases

The key design principle of VannoPortal is to avoid simple aggregation of exiting annotation databases, and to advocate evidence-driven interpretation and prioritization. To this end, VannoPortal has the following distinctive features and improvements compared with existing databases. First, VannoPortal is equipped with our recent novel index system and parallel random-sweep searching algorithms for efficient management of backend databases and information extraction (39). It only takes seconds to randomly access or screen terabyte-level annotation datasets for each independent query. Particularly, VannoPortal allows fast retrieval and direct comparison of functional annotations among variants in LD by providing several interactive panels, while existing databases, such as Ensembl Variation Database (11) and VarSome (12), only annotate single variant with suboptimal efficiency. Second, VannoPortal incorporates many base-wise and genome-scale features to annotate SNVs and indels, which enlarges the annotation scope to almost all possible substitutions of small variants in the human reference genome. Whereas limited information for variants outside protein-coding regions was provided by most of existing databases. Third, VannoPortal provides genome-wide, multiscale and orthogonal evidences regarding whether a variant is functional. For example, to evaluate whether a given variant has regulatory, pathogenic or cancer-driver potential, multiple prediction scores and phenotypic evidence were reported. Fourth, compared to commonly-used HaploReg (15) and RegulomeDB (14) for regulatory variant annotation, VannoPortal greatly expands context-dependent variant annotation to incorporate large-scale epigenomic maps and regulatory profiles across over 33 tissue/cell types and thousands of biosamples. Finally, VannoPortal focuses more on the interpretability of variant annotations rather than information enumeration. For instance, all genome-scale prediction scores were transformed to comparable values and then were classified into meaningful variant consequences.

Database usage

VannoPortal accepts many query formats, including dbSNP ID, VCF-like, HGVS and even only genomic coordinates.Both GRCh37/hg19 and GRCh38/hg38 of human genome assembly are supported. For known SNVs and indels, VannoPortal will automatically extract all allele information from the backend database and provide allele-specific annotation switching if multiple alternative alleles are reported. For rare, somatic or unobserved SNVs and indels, VannoPortal allows customized alleles in several region-level annotation sections. The query result page of VannoPortal incorporates five major annotation sections (including variant basic information, evolution, phenotype, regulatory potential, pathogenicity) as well as several sub-categories in each section. The navigation bar displays the annotation hit status for a query variant on each of sub-categories. By clicking the name of the sub-category, the page will scroll to the detailed panel of the corresponding item (Figure 2).

Figure 2.

Result page and distinctive web components of VannoPortal. (A) Conservation scores and positive selection scores in the ‘Evolution’ panel. (B) A composite viewer showing LD structure, disease/trait association tracks and evidence table in the ‘Phenotype’ panel. (C) Tissue/cell type-specific regulatory variant prioritization function in the ‘Regulatory potential’ panel. (D) Two rich tables displaying critical histone marks, chromatin states, and TF binding sites across hundreds of tissue/cell type-specific samples at variant locus, from Roadmap Epigenomics or EpiMap projects. (E) A circular plot showing significant 5 kb Hi-C chromatin interactions between variant locus and its target region. (F) Real-time motif scanning table for the predicted allele-specific effect of TF binding. (G) Genome-scale pathogenic scores in the ‘Pathogenicity’ panel. In the left panel of the result page: (i) ‘Variant basic information’ panel shows genomic position, allele information, dbSNP ID, transcript annotation and allele frequencies from different populations. The page can be redirected to the original database page for details once clicking on different arrowhead links (Figure 2). (ii) ‘Evolution’ panel reports base-wise conservation scores and variant-level scores regarding positive/negative selection in recent human evolution. Note that the scores beyond empirical cutoffs were labeled as ‘likely conserved’ or ‘likely influenced by selection or population history’ or other noteworthy signatures (Figure 2A). (iii) ‘Phenotype’ panel incorporates an interactive LD viewer along with some disease/trait association tracks, including disease/trait-associated evidence and eQTL/sQTL/xQTL hits. Users can click each variant in the plot or vertical bar in the evidence tracks to check the summary information of supporting evidence. By selecting the dropdown list or dragging the slider bar, users can adjust the population, LD r2 cutoff and LD window size to filter out variants. As the LD threshold changes, the bottom table lists the LD information and the number of supporting evidences for all correlated variants (Figure 2B). More detailed information for disease/trait associations is displayed in separate table viewers. (iv) ‘Regulatory potential’ panel systematically demonstrates tissue/cell type-specific functional predictions, epigenomic signals and TF binding evidence in different aspects. By assigning a desired tissue/cell type and adjusting LD parameters, the query variant can be prioritized together with all linked variants, and a combined ranking score based on the five state-of-the-art prediction scores can be calculated for each of the variants within the LD region (Figure 2C). Importantly, in two rich table viewers, users can comprehensively grasp the chromatin states and epigenomic features at variant locus across 127 Roadmap Epigenomics tissue/cell types and 869 EpiMap samples. Clicking on each tissue name can unfold the view to cell type level in the EpiMap viewers (Figure 2D). In addition, according to user-selected tissue/cell type, an interactive circular plot displays the topmost significant 5 kb chromatin interactions anchored at the variant-contained locus (Figure 2E). When users click on each interaction arc, chromatin marks within the interacted 5 kb bins can be displayed. Last, users can easily check the predicted changes in TF binding affinity through real-time motif scanning, TF binding evidence of public ChIP-seq peaks, and the allele-specific footprint events in several rich table viewers (Figure 2F). (v) ‘Pathogenicity’ panel enumerates many genome-scale pathogenic prediction scores, deleterious scores for missense and splicing-altering variants, as well as cancer driver prediction scores for somatic mutations (Figure 2G). According to the classification of each prediction score, users can easily determine whether the query variant is likely pathogenic in a certain context. Known health-associated evidence and therapeutic implications are also listed in separate tables. Finally, users can download all of the functional predictions and annotation information for each query variant by simply clicking the download button at the top right of the result page or by RESTful API.

Case studies

To investigate the reliability and practicality of VannoPortal for identifying potentially causal variants in different scenarios of genetic study, we exemplified several classical or novel loci according to published results. (i) For common regulatory variants revealed by GWAS, we used an experimentally validated causal variant rs12740374, which alters plasma low-density lipoprotein cholesterol (LDL-C) by modulating hepatic very low-density lipoprotein secretion (65), to test whether VannoPortal could precisely annotate the variant effect. Consistent with the reported findings, VannoPortal reveals many lines of evidence for the causality of cholesterol traits and molecular trait QTLs (Supplementary Figure S1A). In the context of cholesterol trait-relevant cell type HepG2, VannoPortal successfully prioritizes rs12740374 as a top regulatory variant with the highest combined score among LD variants (Supplementary Figure S1B). Epigenomic annotations also demonstrate that rs12740374 is located in the active chromatin and harbors EP300 and cohesion binding signals across many tissue/cell types (Supplementary Figure S1C). Notably, in agreement with published results (65), VannoPortal motif scanning result shows that rs12740374 may create a CEBPA transcription factor binding site (Supplementary Figure S1D). (ii) We also examined a low-frequency variant rs74956615 associated with coronavirus disease 19 (COVID-19) (66–68). This variant has been documented to confer risk for critical illness of COVID-19 near the gene that encodes tyrosine kinase 2 (TYK2). Based on the LD of the EUR population, VannoPortal can link this variant to a TYK2 missense variant rs34536443 (r2 = 0.8332) which significantly associate with the susceptibility of many autoimmune diseases (Supplementary Figure S2A). Searching on rs34536443 reveals that it can affect different isoforms of TYK2, and its minor C allele is totally absent in the East Asian population (Supplementary Figure S2B). Both conservation scores and pathogenicity scores from VannoPortal also support the likely damaging role of this variant (Supplementary Figure S2C–E). (iii) For rare pathogenic variants, we queried rs12565 which was previously found to cause cardiovascular diseases by altering the recruitment of REST to target gene NPPA (69). Interestingly, this non-coding variant exhibits very high conservation scores (Supplementary Figure S3A) and obtains active chromatin states in only heart tissues, including open chromatin marked by DNase-seq peak and histone modifications of H3K4me3, H3K4me1, and H3K27ac (Supplementary Figure S3B). Both public TF ChIP-seq data and motif scanning results indicate that rs12565 may modulate the binding affinity of REST (Supplementary Figure S3C, D). In addition, genome-scale pathogenicity scores from VannoPortal consistently show that this non-coding variant is likely pathogenic (Supplementary Figure S3E). (iv) For somatic cancer-driver mutation, we inspected a well-known pan-cancer mutation chr5:g.1295228:G > A (GRCh37, rs1242535815) in –124bp upstream of TERT promoter which reactivates TERT expression by recruitment of the TF GABP (70). The oncogenicity and regulatory mechanism underlying this mutation are well supported by VannoPortal, such as overactive chromatin states in cancers (Figure 3A), increased HDAC1 and GABPA bindings (Figure 3B), as well as many lines of cancer-driven evidence and therapeutic implications (Figure 3C–E).

Figure 3.

Supporting evidence from VannoPortal for the regulatory potential and cancer-driven roles of chr5:g.1295228:G > A (GRCh37, rs1242535815). (A) rs1242535815 overlaps active chromatin states (e.g. DNase-seq and ATAC-seq), histone marks (e.g. H3K27ac, H3K4me2, H3K4me3 and H3K9ac) and TF binding sites (e.g. POLR2A, RAD21 and SMC3) across many human tissues, particularly in cancers. (B) rs1242535815 A allele potentially increases the binding affinity of HDAC1 and GABPA. (C) rs1242535815 is a likely pathogenic mutation supported by many genome-scale base-wise pathogenicity prediction methods. (D) rs1242535815 is a highly recurrent mutation in cancer patients. (E) rs1242535815 is a likely cancer driver mutation supported by regBase-CAN and other tools, and it could be a prognostic marker in cancer therapy.

CONCLUSIONS

VannoPortal systematically incorporates lots of new genome-scale and context-dependent variant annotation resources from various biological domains, particularly for variants outside of protein-coding regions. It focuses more on the interpretability of variant annotations instead of simple aggregation of known information using many intuitive visualizations and interactive web components, and enables direct comparison of some functional evidence (e.g. disease/trait association, tissue/cell type-specific regulatory potential) between query variant and its linked ones without multi-round queries. Along with the rapid evolution of advanced biotechnologies and new genetic findings (71,72), VannoPortal will continue to update the existing annotation databases and introduce more advanced features, such as prioritization of target genes for non-coding regulatory variants, integration of more prediction scores for variant affecting post-transcriptional and translational processes, support of large variant annotation, and incorporation of genetic-based translational medicine data (73,74). Given the suboptimal assumption of independence between the base positions of the sequence motif, we will combine large-scale tissue/cell type-specific open chromatin profiles (e.g. DNase-Seq and ATAC-seq) and powerful statistical methods (e.g. gkm-SVM (75) and KMAC (76)) to annotate the most plausible TFs associated with regulatory variants. We believe that this novel platform will benefit researchers to interrogate the biological functions of genome variations and create significant impacts in the era of human genetics and genomics. Click here for additional data file.

76 in total

1. The support of human genetic evidence for approved drug indications.

Authors: Matthew R Nelson; Hannah Tipney; Jeffery L Painter; Judong Shen; Paola Nicoletti; Yufeng Shen; Aris Floratos; Pak Chung Sham; Mulin Jun Li; Junwen Wang; Lon R Cardon; John C Whittaker; Philippe Sanseau
Journal: Nat Genet Date: 2015-06-29 Impact factor: 38.330

2. Exomic variants of an elderly cohort of Brazilians in the ABraOM database.

Authors: Michel Satya Naslavsky; Guilherme Lopes Yamamoto; Tatiana Ferreira de Almeida; Suzana A M Ezquina; Daniele Yumi Sunaga; Nam Pho; Daniel Bozoklian; Tatiana Orli Milkewitz Sandberg; Luciano Abreu Brito; Monize Lazar; Danilo Vicensotto Bernardo; Edson Amaro; Yeda A O Duarte; Maria Lúcia Lebrão; Maria Rita Passos-Bueno; Mayana Zatz
Journal: Hum Mutat Date: 2017-05-03 Impact factor: 4.878

3. ClinVar: improvements to accessing data.

Authors: Melissa J Landrum; Shanmuga Chitipiralla; Garth R Brown; Chao Chen; Baoshan Gu; Jennifer Hart; Douglas Hoffman; Wonhee Jang; Kuljeet Kaur; Chunlei Liu; Vitaly Lyoshin; Zenith Maddipatla; Rama Maiti; Joseph Mitchell; Nuala O'Leary; George R Riley; Wenyao Shi; George Zhou; Valerie Schneider; Donna Maglott; J Bradley Holmes; Brandi L Kattman
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

4. The support of genetic evidence for cardiovascular risk induced by antineoplastic drugs.

Authors: Hui Cui; Shengkai Zuo; Zipeng Liu; Huanhuan Liu; Jianhua Wang; Tianyi You; Zhanye Zheng; Yao Zhou; Xinyi Qian; Hongcheng Yao; Lu Xie; Tong Liu; Pak Chung Sham; Ying Yu; Mulin Jun Li
Journal: Sci Adv Date: 2020-10-14 Impact factor: 14.136

5. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer.

Authors: Malachi Griffith; Nicholas C Spies; Kilannin Krysiak; Joshua F McMichael; Adam C Coffman; Arpad M Danos; Benjamin J Ainscough; Cody A Ramirez; Damian T Rieke; Lynzey Kujan; Erica K Barnell; Alex H Wagner; Zachary L Skidmore; Amber Wollam; Connor J Liu; Martin R Jones; Rachel L Bilski; Robert Lesurf; Yan-Yang Feng; Nakul M Shah; Melika Bonakdar; Lee Trani; Matthew Matlock; Avinash Ramu; Katie M Campbell; Gregory C Spies; Aaron P Graubert; Karthik Gangavarapu; James M Eldred; David E Larson; Jason R Walker; Benjamin M Good; Chunlei Wu; Andrew I Su; Rodrigo Dienstmann; Adam A Margolin; David Tamborero; Nuria Lopez-Bigas; Steven J M Jones; Ron Bose; David H Spencer; Lukas D Wartman; Richard K Wilson; Elaine R Mardis; Obi L Griffith
Journal: Nat Genet Date: 2017-01-31 Impact factor: 38.330

6. Regulatory genomic circuitry of human disease loci by integrative epigenomics.

Authors: Carles A Boix; Benjamin T James; Yongjin P Park; Wouter Meuleman; Manolis Kellis
Journal: Nature Date: 2021-02-03 Impact factor: 49.962

7. Identifying novel constrained elements by exploiting biased substitution patterns.

Authors: Manuel Garber; Mitchell Guttman; Michele Clamp; Michael C Zody; Nir Friedman; Xiaohui Xie
Journal: Bioinformatics Date: 2009-06-15 Impact factor: 6.937

8. Determination and inference of eukaryotic transcription factor sequence specificity.

Authors: Matthew T Weirauch; Ally Yang; Mihai Albu; Atina G Cote; Alejandro Montenegro-Montero; Philipp Drewe; Hamed S Najafabadi; Samuel A Lambert; Ishminder Mann; Kate Cook; Hong Zheng; Alejandra Goity; Harm van Bakel; Jean-Claude Lozano; Mary Galli; Mathew G Lewsey; Eryong Huang; Tuhin Mukherjee; Xiaoting Chen; John S Reece-Hoyes; Sridhar Govindarajan; Gad Shaulsky; Albertha J M Walhout; François-Yves Bouget; Gunnar Ratsch; Luis F Larrondo; Joseph R Ecker; Timothy R Hughes
Journal: Cell Date: 2014-09-11 Impact factor: 41.582

9. Enhanced regulatory sequence prediction using gapped k-mer features.

Authors: Mahmoud Ghandi; Dongwon Lee; Morteza Mohammad-Noori; Michael A Beer
Journal: PLoS Comput Biol Date: 2014-07-17 Impact factor: 4.475

10. 15 years of genome-wide association studies and no signs of slowing down.

Authors: Ruth J F Loos
Journal: Nat Commun Date: 2020-11-19 Impact factor: 14.919

5 in total

1. ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs.

Authors: Alexandr Boytsov; Sergey Abramov; Ariuna Z Aiusheeva; Alexandra M Kasianova; Eugene Baulin; Ivan A Kuznetsov; Yurii S Aulchenko; Semyon Kolmykov; Ivan Yevshin; Fedor Kolpakov; Ilya E Vorontsov; Vsevolod J Makeev; Ivan V Kulakovskiy
Journal: Nucleic Acids Res Date: 2022-04-21 Impact factor: 19.160

2. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors: Daniel J Rigden; Xosé M Fernández
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

3. Integrative genetic and immune cell analysis of plasma proteins in healthy donors identifies novel associations involving primary immune deficiency genes.

Authors: Lluis Quintana-Murci; Darragh Duffy; Antonio Rausell; Barthelemy Caron; Etienne Patin; Maxime Rotival; Bruno Charbit; Matthew L Albert
Journal: Genome Med Date: 2022-03-09 Impact factor: 11.117

4. Association between Polymorphisms in CFH, ARMS2, CFI, and C3 Genes and Response to Anti-VEGF Treatment in Neovascular Age-Related Macular Degeneration.

Authors: Oyuna S Kozhevnikova; Anzhella Zh Fursova; Anna S Derbeneva; Ida F Nikulich; Mikhail S Tarasov; Vasiliy A Devyatkin; Yulia V Rumyantseva; Darya V Telegina; Nataliya G Kolosova
Journal: Biomedicines Date: 2022-07-10

5. Performance evaluation of differential splicing analysis methods and splicing analytics platform construction.

Authors: Kuokuo Li; Tengfei Luo; Yan Zhu; Yuanfeng Huang; An Wang; Di Zhang; Lijie Dong; Yujian Wang; Rui Wang; Dongdong Tang; Zhen Yu; Qunshan Shen; Mingrong Lv; Zhengbao Ling; Zhenghuan Fang; Jing Yuan; Bin Li; Kun Xia; Xiaojin He; Jinchen Li; Guihu Zhao
Journal: Nucleic Acids Res Date: 2022-08-22 Impact factor: 19.160

5 in total