Literature DB >> 28158838

Accurately annotate compound effects of genetic variants using a context-sensitive framework.

Si-Jin Cheng¹, Fang-Yuan Shi¹, Huan Liu¹, Yang Ding¹, Shuai Jiang¹, Nan Liang¹, Ge Gao¹.

Abstract

In genomics, effectively identifying the biological effects of genetic variants is crucial. Current methods handle each variant independently, assuming that each variant acts in a context-free manner. However, variants within the same gene may interfere with each other, producing combinational (compound) rather than individual effects. In this work, we introduce COPE, a gene-centric variant annotation tool that integrates the entire sequential context in evaluating the functional effects of intra-genic variants. Applying COPE to the 1000 Genomes dataset, we identified numerous cases of multiple-variant compound effects that frequently led to false-positive and false-negative loss-of-function calls by conventional variant-centric tools. Specifically, 64 disease-causing mutations were identified to be rescued in a specific genomic context, thus potentially contributing to the buffering effects for highly penetrant deleterious mutations. COPE is freely available for academic use at http://cope.cbi.pku.edu.cn.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2017 PMID： 28158838 PMCID： PMC5449550 DOI： 10.1093/nar/gkx041

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Tremendous advances in high-throughput sequencing technologies have enabled several large-scale human genome sequencing projects, such as the 1000 Genomes Project (1) and the Personal Genome Project (2), to identify millions of genetic variants in thousands of individual genomes. Consequently, there is a great demand for effectively interpreting these variants (3–5). The majority of current variant annotation tools adopt a variant-centric approach that assesses the functional consequence of each variant independently by assuming that each variant acts in a context-free manner (6–8). Several reports have shown that such an assumption results in both false-positive and false-negative calls (9–11) when multiple variants affect the same gene. A few recently released tools have applied additional filters to identify (and try to correct) the common false-positive calls caused by multiple Single Nucleotide Variants (SNVs) in the same codon (12), multiple indels in the same gene (11) and in-frame alternative acceptor sites (13). However, none of these tools integrate the entire sequential context in addition to their own particular configurations. For an accurate annotation algorithm, the entire sequential context within a gene should be considered together, because multiple variants in the same gene may interfere with each other, thus producing combinational rather than individual effects (e.g., the complementary rescue effect (9)). Here, we present a fully gene-centric variant annotation tool, COPE (Context-Oriented Predictor for variant Effect), for evaluating the effects of variants in a context-sensitive approach. Using each transcript as the basic annotation unit, COPE infers the ‘mutant peptide’ from the entire variant set input and reports the final amino acid alteration through comparison against the reference sequence. Incorporating the whole sequence context enables COPE to accurately annotate complex compound effects of multiple genetic variants like alternative isoforms caused by gain/loss of splicing sites, which are cannot be handled by previous tools (11–13) (Figure 1B, also see Supplementary Tables S1 and S5). The web server and source code of COPE are freely available for academic use at http://cope.cbi.pku.edu.cn/.

Figure 1.

(A) Overview of COPE. COPE uses each transcript as a basic annotation unit. The variant mapping step identifies variants within transcripts. The coding region inference step removes introns from each transcript; all possible splicing patterns are taken into consideration for splice-altering transcripts (in this case, the red dot indicates a splice acceptor site SNP, and intron retention and exon skipping are considered). The sequence comparison step compares a ‘mutant peptide’ against a reference protein sequence to obtain the final amino acid alteration. (B) Schematic diagram of typical types of annotation corrections implemented in COPE. A rescued stop-gained SNV indicates that another SNV (‘A’ to ‘C’) in the same codon rescues a variant-centric stop-gained SNV (‘A’ to ‘T’). Stop-gained MNV indicates that two or more SNVs result in a stop codon (‘A’ to ‘T’ and ‘C’ to ‘G’). A rescued frameshift indel indicates that another indel in the same haplotype recovers the original open reading frame. A splicing-rescued stop-gained/frameshift variant indicates that a stop-gained or frameshift variant is rescued by a novel splicing isoform. A rescued splice-disrupting variant indicates that a splice-disrupting variant is rescued by a nearby cryptic site (as shown in the figure) or a novel splice site. The asterisk in the figure indicates a stop codon.

MATERIALS AND METHODS

Overview of COPE

COPE is a framework for predicting the effects of variants through a context-sensitive, gene-centric approach (Figure 1A). Firstly, genetic variants are mapped to protein-coding genes derived from a user-supplied reference gene model such as RefSeq (14). Using the phase information, COPE handles two haplotypes separately. Then, COPE tries to reconstruct the ‘mutant peptide’ from the entire inputted variant set. Briefly, COPE attempts to identify splicing-changing variants (i.e. variants that disrupt existing splice sites or create novel splice sites), and, if a splicing-changing variant is found, new isoforms are inferred accordingly (Supplemental Figure S1). Finally, COPE translates all coding sequences into amino acid sequences and compares them against the reference sequence to obtain the final amino acid alterations.

Transcript inference

The accurate inference of a ‘mutant’ transcript is the most important step in the COPE pipeline. We used MaxEntScan (15), a commonly used splice sequence scoring tool, to identify splice site gain/loss events. Inspired by the results of Jian et al. (16), we used relative score variation to measure the scale of change caused by the variant and adopted the cut-off recommended in their paper (16). We evaluated the performance of isoform inference in COPE by following the protocol from Jian et al. (16). Briefly, for splice site gain events, the positive set was derived from a publication by Stein et al. (17); the negative dataset was constructed from the 1000 Genomes Project Phase 3 genotype data through the following steps (16): (i) we kept only intronic variants within protein-coding genes of the Ensembl gene model that could be downloaded from ftp://ftp.ensembl.org/pub/release-83/gtf/homo_sapiens/Homo_sapiens.GRCh38.83.gtf.gz; (ii) we discarded variants within multi-transcript genes; (iii) we discarded variants within single-exon genes and (iv) we discarded variants with an allele frequency less than or equal to 0.1. For splice loss events, the positive set was derived from the publication by Jung et al. (18), and the negative dataset was derived from Jian et al. (16). The results showed that COPE is accurate for isoform inference, with a high sensitivity (85.4% for splice gain events and 94.7% for splice loss events) and a high specificity (89.8% for splice gain events and 92.9% for splice loss events) (Supplementary Figure S8).

Application of COPE

To provide a proof of concept, we applied COPE to the genotype data of a male Caucasian (NA12144), downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. We further applied COPE to a list of 1147 curated high-confidence LoF variants reported by MacArthur et al. (9). Because there was no phase information in the dataset reported in that paper, we used the phased dataset downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. Ninety-three curated LoF variants were no longer existent in the current phased dataset (released in 2 May 2013) and were excluded from our analysis. Additionally, to exclude the effects resulting from dataset updates, we further assessed the context of 53 rescued (i.e. the putative damaging effect is neutralized by the specific genomic context) loss-of-function variant candidates in the original MacArthur et al. dataset (released in July 2010, downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/release/2010_07/) and discarded three variants with different sequential contexts (Supplementary Figure S11). We then extended the analysis to the full 1000 Genomes Project Phase 3 SNVs and indels variant set (downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/). Inconsistent variant calls (SNVs/indels overlapping with indels) were removed. Then, the SNVs and indels were annotated with VEP by using the RefSeq gene model. Following previous studies (9,13), we removed putative LoF variants that result in less than 10% protein sequence alteration. Finally, 5559 splice-disrupting, 2092 frameshift and 9728 stop-gained variants were fed into COPE for reanalysis.

Validation of the rescued false-positive splice-disrupting variants

To validate the results on rescued false-positive splice-disrupting variants, we used RNA-seq data to confirm the novel splice junctions predicted by COPE. RNA-seq data on 445 individuals in the 1000 Genomes Project were obtained from the Geuvadis RNA Sequencing Project (19) (downloaded from EBI ArrayExpress, accession E-GEUV-1). For each rescued false-positive splice-disrupting variant, we extracted the inferred genomic sequence (with intron retained) of each transcript from each individual and then mapped the RNA-Seq reads to the inferred genomic sequence by HISAT2 (20). We used junction reads to validate the rescued false-positive splice-disrupting variants. Briefly, a given junction is called as ‘expressed’ if and only if ≥1 RNA-Seq reads are found to be spanning the junction. A transcript is considered as ‘expressed’ in a particular sample when all its reference junctions (i.e. junctions annotated in the reference gene model) are expressed in the given sample. And a putative novel junction identified by COPE in an expressed transcript will be classified as ‘True Positive’ if and only if the junction itself is called as ‘expressed’ too (Supplementary Figure S9).

Search for rescued pathogenic LoF variants in healthy individuals

We compiled a list of pathogenic LoF variants by merging 60,556 LoF variants tagged as ‘DM’ from the HGMD (21) database and 11,777 annotated with the label ‘(likely) pathogenic mutations’ from the ClinVar (22) database. In addition, we downloaded a total of 9,362 disease-associated genes from DisGeNET (23). Then, we searched for pathogenic variants that were rescued in at least one healthy individual from the 1000 Genomes Project and identified 64 pathogenic LoF variants, including 21 from the HGMD database and 43 from DisGeNET (Supplementary Table S4). The variant rs549508773 is a stop-gained SNV within the gene CHD7, a driver gene of CHARGE syndrome. To demonstrate the disease-causing effect of SNP rs549508773, 175 disease-causing (CHARGE syndrome) stop-gained SNVs were collected from the HGMD database. In addition, we also collected CADD scores (24) of 43 pathogenic missense variants, 156 pathogenic stop-gained variants and 26 benign missense variants from the CHD7 database (25) to demonstrate the benign effect of the single amino acid substitution resulting from SNP rs549508773 together with SNP rs567756521.

RESULTS

COPE handles complex compound effects of multiple variants correctly

As a gene-centric annotation tool, COPE is able to handle complex compound effects of multiple variants correctly (Figure 1B, also see Supplementary Table S1). For proof-of-concept analysis, we applied COPE to the male Caucasian sample NA12144 from the 1000 Genomes Project. We compared the COPE results with the official variant annotation generated by Variant Effect Predictor (VEP) (6), a commonly used variant-centric annotation tool. COPE corrected two false-positive stop-gained calls, five false-positive frameshift calls, eight false-positive splice-disrupting calls and one false-negative stop-gained call (Supplementary Table S2). For example, the VEP called two indels (rs67712719 and rs67322929) in ZFPM1 ‘frameshift variants’ and suggested that both of them lead to a loss-of-function (LoF) event. In contrast, COPE correctly identified the combinational effect of these two variants as one amino acid deletion (Supplementary Figure S2). Similarly, COPE also accurately identified a cryptic splicing site 3 bp downstream of the VEP-reported splice acceptor variant rs1152522 within the C14orf105 gene, which can rescue the splicing at the cost of a single amino acid (glutamine) deletion; this finding was validated by both corresponding RNA-Seq data (Supplementary Figure S3) and an independent report (26). To further assess the performance of COPE, we reanalyzed the manually curated high-confidence LoF variants listed in 1000 Genomes (9). After exclusion of 93 nonexistent variants in the current phased release, COPE identified 4.74% curated LoF variants (consisting of one stop-gained and 49 splice-disrupting variants) as potential false-positive calls in at least one sample from 1000 Genomes (Supplementary Figure S11 and Table S3). Further inspection showed that the stop-gained variant was rescued by another SNV in the same codon (Supplementary Figure S4) that was previously incorrectly handled, and all 49 variants were able to be correctly annotated only when the entire sequential context was considered, thus demonstrating the necessity of COPE even after manual variant reannotation.

Application to genotype data from the 1000 Genomes Project

We then extended the analysis to the full 1000 Genomes Project Phase 3 SNVs set. All LoF variants reported by VEP, including splice-disrupting variants (in either donor or acceptor site), frameshift indels, and stop-gained SNVs, were extracted and reanalyzed by COPE. Unexpectedly, we found that a total of 1290 (23.21%) reported splice-disrupting variants were rescued in their particular sequential context and 1251 (97.0%) were rescued by in-frame cryptic splice sites within 100 bp. An average of 39.6% VEP-annotated splice-disrupting variants in each individual were rescued (Figure 2A). On the basis of the Geuvadis RNA-Seq data (19), we validated 78 (79.6%) out of 98 rescued false-positive splice-disrupting variants supported by 129 expressed transcripts. Additionally, COPE also identified 6.45% (135 out of 2092, Figure 2A) reported frameshift indels and 2.10% (204 out of 9728, Figure 2A) reported stop-gained SNVs as false-positive calls. Statistical analysis showed that 1398 genes containing these false-positive LoF calls were likely to be involved in several particular biological processes, including adhesion and signal transduction, thus suggesting a systematic bias in the variant-centric function calling tool (Figure 2B, even such bias did not significantly change the global functional spectrum of LoF genes (Supplementary Figure S12)). Notably, by incorporating the entire sequential context, COPE was also able to identify false-negative LoF variants, such as stop-gained MNV (i.e. multiple co-occurring SNVs in the same codon that jointly introduce a new STOP codon), which are usually neglected by variant-centric tools. We identified 38 stop-gained MNVs in 38 genes, including TNFRSF10D, encoding a member of the TNF-receptor superfamily, and NEUROG3, encoding a transcription factor involved in neurogenesis (Figure 2C). Unexpectedly, we found that 78% (1960 of 2504) of the individuals sequenced by the 1000 Genomes Project had at least one of the 38 identified stop-gained MNV-harboring genes (Supplementary Figure S5). In particular, the stop-gained MNV-harboring zinc finger protein ZNF705E (Supplementary Figure S6) was found in 64.5% (1616) of individuals.

Figure 2.

LoF variants in the 1000 Genomes Project rescued in a specific genomic context. (A) The number of rescued LoF variants. The pie charts show the proportion of rescued LoF variants, and the histograms show the proportion of rescued LoF variants in each individual. The ‘mean’ labels in the histograms indicate the average number. (B) Enrichment analysis of the rescued LoF transcripts. The numbers represent the corrected P-values. (C) The 38 genes affected by stop-gained MNVs. The red bar represents the number of transcripts affected by each stop-gained MNV, and the gray bar represents the total number of transcripts of the gene.

Pathogenic variants rescued by specific genomic context

We also found 64 rescued disease-causing mutations in the list (Supplementary Table S4). One highly intriguing case was the rescued stop-gained SNV rs549508773 within the CHD7, a confirmed disease-causing gene for CHARGE syndrome (OMIM 214800), a severe childhood autosomal dominant disease with a recognizable appearance of birth defects (27–31). The variant rs549508773 itself results in a nonsense mutation and, along with 32 downstream HGMD-reported stop-gained SNVs (Figure 3A), was identified as deleterious by variant-centric tools. However, in the individual HG02861 harboring this SNV, a co-occurring SNV rs567756521 was found in the same codon (Figure 3B), thus leading to a mild single amino acid substitution together with rs549508773 that was predicted to be neutral by both PolyPhen-2 (32) and CADD (24) (Figure 3C), thus suggesting a plausible mechanism for the buffering effects of highly penetrant, deleterious mutations (31) (also see another case in Supplementary Figure S7).

Figure 3.

A pathogenic SNV is rescued in its specific genomic context. (A) Compared with SNV rs549508773 (red), 32 disease-causing (DM) stop-gained SNVs (black) recorded in the HGMD database are located downstream of the gene CHD7. (B) The figure shows an IGV image of the whole exome sequencing data of HG02861, a healthy participant in the 1000 Genomes Project. As shown in the figure, SNV rs567756521 rescues the stop-gained mutation, and the combined effect is a single amino acid substitution (Glu>Leu). (C) Polyphen-2 predicted the single amino acid (Glu>Leu) as a benign mutation (Left). Boxplot of CADD scores for three different kinds of mutations (benign missense, pathogenic missense and pathogenic nonsense) collected from the CHD7 database (Right). The score for rs549508773 (G>T) is located within the range of pathogenic nonsense (red dotted line). The score for the MNV (GA>TT) is located within the range of benign missense (black dotted line).

Online web server and standalone package

A web server is available at http://cope.cbi.pku.edu.cn/whole_PCG_Analysis.html for users to try COPE online (Figure 4). The input is a space-delimited file with five columns per line: chromosome, position, reference allele, alternative allele and haplotype information. The output on the website includes seven columns: transcript, symbol, splicing code, protein length, amino acid (including all amino acid alterations), and variant (including all variants in the transcript).

Figure 4.

Screenshot of the COPE web server. (A) An example of input. (B) Annotation by COPE.

Screenshot of the COPE web server. (A) An example of input. (B) Annotation by COPE. For large-scale analysis, we also provide a standalone package, which can be downloaded freely for academic use. A detailed guideline for installation and setup is also available at http://cope.cbi.pku.edu.cn/PCG_manual.html.

DISCUSSION

During recent years, annotating each variant independently has been taken for granted and variant-centric annotation algorithms have widely been used in the downstream analysis of genome sequencing. The challenges of a variant-centric algorithm have been discussed previously (9). COPE aims to avoid the annotation errors caused by variant-centric methods by considering the genomic sequential context. COPE was designed as an isoform-oriented annotator, and all isoforms of a gene are analyzed simultaneously. By analyzing the genotype data from the 1000 Genomes Project, we demonstrated that COPE is able to correct numerous annotation errors, including both false-positive and false-negative LoF calls. To the best of our knowledge, COPE is the first fully gene-centric tool for annotating the effects of variants in a context-sensitive approach. Detailed comparison based on both typical sample and whole 1000 Genome dataset shows COPE's gene-centric strategy significantly improves the accuracy of variant annotation (Supplementary Tables S6 and S7). Phase information is important for accurate annotation. Several algorithms have been proposed for inferring the haplotype from un-phased sequencing data (33,34). COPE makes full use of phasing information for accurately annotating the variant effect. We have made a script available for inferring the phase directly from short-read sequencing data (http://cope.cbi.pku.edu.cn/phase.html) when such information is not available. Our evaluation suggested that our pipeline achieves a rather high (>90%) haplotype recovery rate (i.e, the proportion of completely phased transcripts over transcripts with multiple variants) for un-phased data with reasonable coverage (>30x), and the rate kept increasing with higher coverage (Supplementary Figure S10). We also note that rapid development of the sequencing technology is effectively enabling haplotype resolved experimentally (35–37). COPE accesses functional effects of genetic variants by comparing the inferred ‘mutant’ transcript with the ‘wild-type’ one based on user-specified reference gene models. The quality and completeness of reference gene models is critical to the accuracy of COPE annotation. Thus, while COPE is designed to be species-neutral, its performance may suffer when being applied to less-annotated genomes (e.g. genomes of non-model organism). Epistasis is a phenomenon in which the functional influence of a variant at a genetic locus is affected by another variant at another locus (38). It leverages the SNP–SNP interaction between different genes to explain the lack of heritability of numerous types of complex human disease (39,40). COPE is a variant effect annotator that considers the compound effects of multiple variants within the same gene, in contrast to epistasis, to annotate their effects accurately. Protein-coding genes are not the only player in the complex biological network; the current framework could also be readily extended to and adapted for other functional molecules other than protein-coding genes, such as miRNAs and long noncoding RNAs, when more functional and mechanistic information becomes available. Click here for additional data file.

40 in total

1. A linear complexity phasing method for thousands of genomes.

Authors: Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal: Nat Methods Date: 2011-12-04 Impact factor: 28.547

Review 2. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment.

Authors: Khader Shameer; Lokesh P Tripathi; Krishna R Kalari; Joel T Dudley; Ramanathan Sowdhamini
Journal: Brief Bioinform Date: 2015-10-22 Impact factor: 11.622

3. Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases.

Authors: Rong Chen; Lisong Shi; Jörg Hakenberg; Brian Naughton; Pamela Sklar; Jianguo Zhang; Hanlin Zhou; Lifeng Tian; Om Prakash; Mathieu Lemire; Patrick Sleiman; Wei-Yi Cheng; Wanting Chen; Hardik Shah; Yulan Shen; Menachem Fromer; Larsson Omberg; Matthew A Deardorff; Elaine Zackai; Jason R Bobe; Elissa Levin; Thomas J Hudson; Leif Groop; Jun Wang; Hakon Hakonarson; Anne Wojcicki; George A Diaz; Lisa Edelmann; Eric E Schadt; Stephen H Friend
Journal: Nat Biotechnol Date: 2016-04-11 Impact factor: 54.908

4. The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms.

Authors: David N Cooper; Peter D Stenson; Nadia A Chuzhanova
Journal: Curr Protoc Bioinformatics Date: 2006-01

5. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals.

Authors: Gene Yeo; Christopher B Burge
Journal: J Comput Biol Date: 2004 Impact factor: 1.479

6. A systematic survey of loss-of-function variants in human protein-coding genes.

Authors: Daniel G MacArthur; Suganthi Balasubramanian; Adam Frankish; Ni Huang; James Morris; Klaudia Walter; Luke Jostins; Lukas Habegger; Joseph K Pickrell; Stephen B Montgomery; Cornelis A Albers; Zhengdong D Zhang; Donald F Conrad; Gerton Lunter; Hancheng Zheng; Qasim Ayub; Mark A DePristo; Eric Banks; Min Hu; Robert E Handsaker; Jeffrey A Rosenfeld; Menachem Fromer; Mike Jin; Xinmeng Jasmine Mu; Ekta Khurana; Kai Ye; Mike Kay; Gary Ian Saunders; Marie-Marthe Suner; Toby Hunt; If H A Barnes; Clara Amid; Denise R Carvalho-Silva; Alexandra H Bignell; Catherine Snow; Bryndis Yngvadottir; Suzannah Bumpstead; David N Cooper; Yali Xue; Irene Gallego Romero; Jun Wang; Yingrui Li; Richard A Gibbs; Steven A McCarroll; Emmanouil T Dermitzakis; Jonathan K Pritchard; Jeffrey C Barrett; Jennifer Harrow; Matthew E Hurles; Mark B Gerstein; Chris Tyler-Smith
Journal: Science Date: 2012-02-17 Impact factor: 47.728

7. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

Authors: Janet Piñero; Núria Queralt-Rosinach; Àlex Bravo; Jordi Deu-Pons; Anna Bauer-Mehren; Martin Baron; Ferran Sanz; Laura I Furlong
Journal: Database (Oxford) Date: 2015-04-15 Impact factor: 3.451

8. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

9. MAC: identifying and correcting annotation for multi-nucleotide variations.

Authors: Lei Wei; Lu T Liu; Jacob R Conroy; Qiang Hu; Jeffrey M Conroy; Carl D Morrison; Candace S Johnson; Jianmin Wang; Song Liu
Journal: BMC Genomics Date: 2015-08-01 Impact factor: 3.969

10. Transcriptome and genome sequencing uncovers functional variation in humans.

Authors: Tuuli Lappalainen; Michael Sammeth; Marc R Friedländer; Peter A C 't Hoen; Jean Monlong; Manuel A Rivas; Mar Gonzàlez-Porta; Natalja Kurbatova; Thasso Griebel; Pedro G Ferreira; Matthias Barann; Thomas Wieland; Liliana Greger; Maarten van Iterson; Jonas Almlöf; Paolo Ribeca; Irina Pulyakhina; Daniela Esser; Thomas Giger; Andrew Tikhonov; Marc Sultan; Gabrielle Bertier; Daniel G MacArthur; Monkol Lek; Esther Lizano; Henk P J Buermans; Ismael Padioleau; Thomas Schwarzmayr; Olof Karlberg; Halit Ongen; Helena Kilpinen; Sergi Beltran; Marta Gut; Katja Kahlem; Vyacheslav Amstislavskiy; Oliver Stegle; Matti Pirinen; Stephen B Montgomery; Peter Donnelly; Mark I McCarthy; Paul Flicek; Tim M Strom; Hans Lehrach; Stefan Schreiber; Ralf Sudbrak; Angel Carracedo; Stylianos E Antonarakis; Robert Häsler; Ann-Christine Syvänen; Gert-Jan van Ommen; Alvis Brazma; Thomas Meitinger; Philip Rosenstiel; Roderic Guigó; Ivo G Gut; Xavier Estivill; Emmanouil T Dermitzakis
Journal: Nature Date: 2013-09-15 Impact factor: 49.962

5 in total

1. Leveraging splice-affecting variant predictors and a minigene validation system to identify Mendelian disease-causing variants among exon-captured variants of uncertain significance.

Authors: Zachry T Soens; Justin Branch; Shijing Wu; Zhisheng Yuan; Yumei Li; Hui Li; Keqing Wang; Mingchu Xu; Lavan Rajan; Fabiana L Motta; Renata T Simões; Irma Lopez-Solache; Radwan Ajlan; David G Birch; Peiquan Zhao; Fernanda B Porto; Juliana Sallum; Robert K Koenekoop; Ruifang Sui; Rui Chen
Journal: Hum Mutat Date: 2017-08-18 Impact factor: 4.878

2. Linkage of A-to-I RNA Editing in Metazoans and the Impact on Genome Evolution.

Authors: Yuange Duan; Shengqian Dou; Hong Zhang; Changcheng Wu; Mingming Wu; Jian Lu
Journal: Mol Biol Evol Date: 2018-01-01 Impact factor: 16.240

3. Whole-genome analysis of noncoding genetic variations identifies multiscale regulatory element perturbations associated with Hirschsprung disease.

Authors: Alexander Xi Fu; Kathy Nga-Chu Lui; Clara Sze-Man Tang; Ray Kit Ng; Frank Pui-Ling Lai; Sin-Ting Lau; Zhixin Li; Maria-Mercè Garcia-Barcelo; Pak-Chung Sham; Paul Kwong-Hang Tam; Elly Sau-Wai Ngan; Kevin Y Yip
Journal: Genome Res Date: 2020-09-18 Impact factor: 9.043

4. Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes.

Authors: Qingbo Wang; Emma Pierce-Hoffman; Beryl B Cummings; Jessica Alföldi; Laurent C Francioli; Laura D Gauthier; Andrew J Hill; Anne H O'Donnell-Luria; Konrad J Karczewski; Daniel G MacArthur
Journal: Nat Commun Date: 2020-05-27 Impact factor: 14.919

5. Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained.

Authors: Fabien Degalez; Frédéric Jehl; Kévin Muret; Maria Bernard; Frédéric Lecerf; Laetitia Lagoutte; Colette Désert; Frédérique Pitel; Christophe Klopp; Sandrine Lagarrigue
Journal: Front Genet Date: 2021-07-07 Impact factor: 4.599

5 in total