| Literature DB >> 28158838 |
Si-Jin Cheng1, Fang-Yuan Shi1, Huan Liu1, Yang Ding1, Shuai Jiang1, Nan Liang1, Ge Gao1.
Abstract
In genomics, effectively identifying the biological effects of genetic variants is crucial. Current methods handle each variant independently, assuming that each variant acts in a context-free manner. However, variants within the same gene may interfere with each other, producing combinational (compound) rather than individual effects. In this work, we introduce COPE, a gene-centric variant annotation tool that integrates the entire sequential context in evaluating the functional effects of intra-genic variants. Applying COPE to the 1000 Genomes dataset, we identified numerous cases of multiple-variant compound effects that frequently led to false-positive and false-negative loss-of-function calls by conventional variant-centric tools. Specifically, 64 disease-causing mutations were identified to be rescued in a specific genomic context, thus potentially contributing to the buffering effects for highly penetrant deleterious mutations. COPE is freely available for academic use at http://cope.cbi.pku.edu.cn.Entities:
Mesh:
Year: 2017 PMID: 28158838 PMCID: PMC5449550 DOI: 10.1093/nar/gkx041
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Overview of COPE. COPE uses each transcript as a basic annotation unit. The variant mapping step identifies variants within transcripts. The coding region inference step removes introns from each transcript; all possible splicing patterns are taken into consideration for splice-altering transcripts (in this case, the red dot indicates a splice acceptor site SNP, and intron retention and exon skipping are considered). The sequence comparison step compares a ‘mutant peptide’ against a reference protein sequence to obtain the final amino acid alteration. (B) Schematic diagram of typical types of annotation corrections implemented in COPE. A rescued stop-gained SNV indicates that another SNV (‘A’ to ‘C’) in the same codon rescues a variant-centric stop-gained SNV (‘A’ to ‘T’). Stop-gained MNV indicates that two or more SNVs result in a stop codon (‘A’ to ‘T’ and ‘C’ to ‘G’). A rescued frameshift indel indicates that another indel in the same haplotype recovers the original open reading frame. A splicing-rescued stop-gained/frameshift variant indicates that a stop-gained or frameshift variant is rescued by a novel splicing isoform. A rescued splice-disrupting variant indicates that a splice-disrupting variant is rescued by a nearby cryptic site (as shown in the figure) or a novel splice site. The asterisk in the figure indicates a stop codon.
Figure 2.LoF variants in the 1000 Genomes Project rescued in a specific genomic context. (A) The number of rescued LoF variants. The pie charts show the proportion of rescued LoF variants, and the histograms show the proportion of rescued LoF variants in each individual. The ‘mean’ labels in the histograms indicate the average number. (B) Enrichment analysis of the rescued LoF transcripts. The numbers represent the corrected P-values. (C) The 38 genes affected by stop-gained MNVs. The red bar represents the number of transcripts affected by each stop-gained MNV, and the gray bar represents the total number of transcripts of the gene.
Figure 3.A pathogenic SNV is rescued in its specific genomic context. (A) Compared with SNV rs549508773 (red), 32 disease-causing (DM) stop-gained SNVs (black) recorded in the HGMD database are located downstream of the gene CHD7. (B) The figure shows an IGV image of the whole exome sequencing data of HG02861, a healthy participant in the 1000 Genomes Project. As shown in the figure, SNV rs567756521 rescues the stop-gained mutation, and the combined effect is a single amino acid substitution (Glu>Leu). (C) Polyphen-2 predicted the single amino acid (Glu>Leu) as a benign mutation (Left). Boxplot of CADD scores for three different kinds of mutations (benign missense, pathogenic missense and pathogenic nonsense) collected from the CHD7 database (Right). The score for rs549508773 (G>T) is located within the range of pathogenic nonsense (red dotted line). The score for the MNV (GA>TT) is located within the range of benign missense (black dotted line).
Figure 4.Screenshot of the COPE web server. (A) An example of input. (B) Annotation by COPE.