Literature DB >> 28482034

PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants.

Abstract

One of the major challenges in human genetics is to identify functional effects of coding and non-coding single nucleotide variants (SNVs). In the past, several methods have been developed to identify disease-related single amino acid changes but only few tools are able to score the impact of non-coding variants. Among the most popular algorithms, CADD and FATHMM predict the effect of SNVs in non-coding regions combining sequence conservation with several functional features derived from the ENCODE project data. Thus, to run CADD or FATHMM locally, the installation process requires to download a large set of pre-calculated information. To facilitate the process of variant annotation we develop PhD-SNPg, a new easy-to-install and lightweight machine learning method that depends only on sequence-based features. Despite this, PhD-SNPg performs similarly or better than more complex methods. This makes PhD-SNPg ideal for quick SNV interpretation, and as benchmark for tool development. AVAILABILITY: PhD-SNPg is accessible at http://snps.biofold.org/phd-snpg.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2017 PMID： 28482034 PMCID： PMC5570245 DOI： 10.1093/nar/gkx369

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The recent advances in sequencing technology, have leaded to an exponential growth of the observed genetic variants in human (1), whose effects are poorly understood. Most of the available data were generated by international consortiums, which aim to characterize the pattern of genetic variations across individuals (2,3), and to identify mutations associated to human diseases (4,5). Thus, predicting the functional effect of genetic variants is a key challenge for the interpretation of the human genome, and in turn, for the implementation of more accurate diagnostic and treatment strategies (6,7). In the last few years, several methods have been developed for predicting the impact of single nucleotide variants (SNVs) on human health, nevertheless only few of them are capable of assessing the effect of SNVs in non-coding regions (8). In this paper, we present PhD-SNPg, which is an extension of the PhD-SNP algorithm (9) for predicting the impact of human SNVs, both in coding and non-coding regions. PhD-SNPg is available both as web server, and standalone software to process large datasets of variants locally. PhD-SNPg, which is designed to be simple and lightweight, consists of a machine-learning core, trained only on comparative information in the form of the conservation score calculated from multiple sequence alignments. This information is extracted from the UCSC (University of California, Santa Cruz) repository (https://genome.ucsc.edu/). With respect to the state-of-the-art methods, such as CADD (10), FATHMM-MKL (11) and GVAWA (12), our tool requires a relatively small amount of input resources, and this makes PhD-SNPg easier to install and run on new sets of variations, even on laptop computers. As an example, to run the full version of PhD-SNPg <30 Gb data from UCSC are needed. This must be contrasted with the 400 Gb (or more) required by FATHMM-MKL and CADD. In addition, the lightest version of PhD-SNPg (∼100 Mb) can run in a ‘web mode’ by retrieving the UCSC data directly from their URLs, without downloading the whole genome files. Given its simple input (only nucleotide sequence and conservation are required), PhD-SNPg can also be regarded as baseline tool for benchmarking algorithms based on more complex input features. In particular, PhD-SNPg can be used for estimating the improvement of the performance obtained by adding new input features (such as open chromatin, histone modification, transcription factor binding sites etc.). For this reason, all the training and testing datasets created for implementing PhD-SNPg are available online. The availability of benchmark datasets is a particularly critical point for evaluating the discriminative power of new methods with different input features, and at the same time, avoiding an overestimation of the performances (13).

METHOD OUTLINE

Technically, PhD-SNPg is a binary classifier based on a Gradient Boosting algorithm, as implemented in scikit-learn package (14). PhD-SNPg was trained and tested using a set of ∼36,000 Pathogenic and Benign SNVs extracted from Clinvar dataset (15) (Supplementary Table S1). In Figure 1A, the location and the type of each mutation is depicted on the corresponding human chromosome cartoon (Pathogenic in red and Benign in blue).

Figure 1.

(A) Distribution of Pathogenic (red) and Benign (blue) single nucleotide variants (SNVs) along the chromosomes. *The size of the mitochondrial chromosome (M) in panel A is increased 2,500 times. (B) Schematic view of the PhD-SNPg algorithm and its input features. (C) Distribution of PhyloP100 scores in the loci where Pathogenic (red) Benign (blue) SNVs are detected. (D) Performance of PhD-SNPg (red), CADD (black) and FATHMM-MKL (blue) on the testing set (NewClinvar032016).

Dataset selection

The dataset of SNVs used for training and testing PhD-SNPg was extracted from Clinvar (15) (http://www.ncbi.nlm.nih.gov/clinvar/). The Clinvar dataset (version January 2016) was filtered by selecting the SNVs with either Pathogenic or Benign annotation. After this filtering, we ended up with a dataset (Clinvar012016) that consists of 24,267 Pathogenic and 11,535 Benign SNVs. In the Clinvar012016 dataset, 2,720 (11%) of the Pathogenic and 3,942 (34%) of the Benign SNVs are in non-coding regions. To evaluate the method on new incoming data, we derived a second test set based on a more recent version of Clinvar (March 2016), by selecting annotated SNVs not present in the training set (Clinvar012016). The new dataset, indicated as NewClinvar032016, is composed by 1,408 SNVs, 808 of which are annotated as Pathogenic and 600 as Benign. In the NewClinvar032016 dataset, 283 (35%) of the Pathogenic and 336 (56%) of the Benign SNVs are in non-coding regions. The files containing the Clinvar012016 and NewClinvar032016 datasets with the PhD-SNPg predictions are included as Supplementary Files. The genomic location in those files is based on the hg38 human genome assembly. To further evaluate the performance of PhD-SNPg we have collected a dataset (AllToolScores) composed only by nonsynonymous SNVs (nsSNVs). This dataset was obtained by merging the five datasets by Grimm and co-workers from VarIBench website (16), and removing the nsSNVs occurring in genomic locations included in the PhD-SNPg training set. The AllToolScores set consists of 69,529 nsSNVs, ∼41% of which have been annotated as Pathogenic. A final test for scoring PhD-SNPg was performed on a set of 30 non-coding SNVs (LiverVariants) whose change in transcriptional activity was experimentally determined (17). A summary of the composition of all datasets is reported in Supplementary Table S1.

Feature evaluation

The PhD-SNPg input consists of 35 values, 25 encoding for the sequence and mutation and 10 for the PhyloP conservation scores (18), as pre-computed at the UCSC repository (Figure 1B). In details they are: (i) 25 values representing the five-nucleotide window sequence centered on the mutated position (five times five possible nucleotides: A, C, G, T, N); (ii) 10 values mapping the conservation scores of the seven-species (PhyloP7) and 100-species alignments (PhyloP100) to the five window positions. Among the different input features, PhyloP100 shows the highest discriminative power (see Supplementary Table S2), as confirmed by plotting its distribution for Pathogenic and Benign SNVs (Figure 1C and Supplementary Figure S1). More details about the input features and the optimization procedure of PhD-SNPg are reported in Supplementary Materials (see also Supplementary Tables S2–S5).

Testing prediction performance

First, PhD-SNPg performances were assessed by performing a 10-fold cross-validation test on ∼36,000 SNVs. On this subset PhD-SNPg achieves an Area Under the Receiver Operating Characteristics (19) Curve (AUC) of 0.93 (1 and 0.5, are the scores of a perfect and random predictors, respectively). This result is shown in Table 1 and Supplementary Figure S2. In the first test PhD-SNPg performs as well as state-of-the-art methods (CADD and FATHMM-MKL) even though, for this test, their scores were not calculated in cross-validation. Furthermore, to evaluate the generalization capability of our predictor, and to better compare PhD-SNPg to the state-of-the-art methods, we extracted a set of ∼1400 newly annotated SNVs from a more recent version of Clinvar (March 2016). On the NewClinvar032016 testing set, the AUC of PhD-SNPg is 0.92, which is still high and comparable with that obtained in the cross-validation test. It is worth noticing, that PhD-SNPg score compares well with those obtained on the same set by the state-of-the-art methods, CADD and FATHMM-MKL (Table 2 and Figure 1D). The same trend is observed on the subsets of mutations located in coding and non-coding regions. These are surprising results, considering the limited information employed by PhD-SNPg in comparison with the other approaches.

Table 1.

Performance of PhD-SNPg, FATHMM-MKL and CADD on the Clinvar012016 dataset

Method	Dataset	Q₂	TNR	NPV	TPR	PPV	MCC	F1	AUC
PhD-SNP^g	All	0.88	0.81	0.82	0.91	0.91	0.72	0.91	0.93
	Coding	0.88	0.74	0.77	0.92	0.91	0.67	0.92	0.92
	Non-coding	0.90	0.92	0.91	0.86	0.88	0.78	0.87	0.94
FATHMM-MKL^a	All	0.84	0.67	0.79	0.91	0.85	0.61	0.88	0.88
	Coding	0.83	0.58	0.70	0.91	0.86	0.53	0.89	0.86
	Non-coding	0.88	0.84	0.95	0.92	0.79	0.75	0.85	0.95
CADD^a	All	0.90	0.90	0.81	0.90	0.95	0.78	0.93	0.95
	Coding	0.91	0.85	0.80	0.93	0.95	0.77	0.94	0.94
	Non-coding	0.88	0.99	0.83	0.71	0.99	0.76	0.82	0.94

aFATHMM-MKL and CADD returned predictions respectively on 99.3% and 99.9% of the total dataset.

Table 2.

Performances of PhD-SNPg, FATHMM-MKL and CADD on the NewClinvar032016 dataset.

Method	Dataset	Q₂	TNR	NPV	TPR	PPV	MCC	F1	AUC
PhD-SNP^g	All	0.86	0.77	0.88	0.93	0.85	0.72	0.88	0.92
	Coding	0.85	0.67	0.85	0.94	0.85	0.65	0.89	0.91
	Non-coding	0.88	0.86	0.91	0.90	0.84	0.75	0.87	0.93
FATHMM-MKL^a	All	0.78	0.58	0.85	0.93	0.75	0.55	0.83	0.85
	Coding	0.81	0.58	0.82	0.94	0.81	0.57	0.87	0.86
	Non-coding	0.73	0.57	0.89	0.91	0.64	0.51	0.75	0.86
CADD^a	All	0.86	0.82	0.85	0.89	0.87	0.72	0.88	0.92
	Coding	0.86	0.70	0.85	0.94	0.86	0.68	0.90	0.91
	Non-coding	0.87	0.92	0.85	0.81	0.90	0.74	0.85	0.92

aFATHMM-MKL and CADD returned predictions respectively on 99.6% and 99.8% of the total dataset.

Q2: overall accuracy, TNR: true negative rate, NPV: negative predictive value, TPR: true positive rate, PPV: positive predicted value, MCC: Matthews correlation coefficient, AUC: area under the receiver operating characteristic curve. PhD-SNPg: performance evaluation measures (defined in Supplementary Materials) are averaged over five cross-validation tests (10-fold). The standard errors for all the performance measures are reported in Supplementary Table S6. aFATHMM-MKL and CADD returned predictions respectively on 99.3% and 99.9% of the total dataset. Q2: overall accuracy, TNR: true negative rate, NPV: negative predictive value, TPR: true positive rate, PPV: positive predicted value, MCC: Matthews correlation coefficient, AUC: area under the receiver operating characteristic curve. PhD-SNPg: performance evaluation measures (defined in Supplementary Materials) are averaged over five tests with previous Clinvar012016 models. The standard error for all the performance measures for PhD-SNPg is below 1%. aFATHMM-MKL and CADD returned predictions respectively on 99.6% and 99.8% of the total dataset. It was pointed out that prediction tools can be hindered by two types of bias (13), such as: the same variants (type-1 circularity), or different variants from the same protein (type-2 circularity) occurring in both the training and validation sets. To exclude these sources of bias, we split our training and testing sets in a way that variants in the same chromosome (and in same gene) are kept in the same subset. To avoid that variants belonging to the same gene were assigned to different subsets, all the SNVs in the sex chromosomes (X and Y) were kept together. Nonetheless, we further checked the presence of hidden type-2 circularity by calculating the performance of our method for the subset of variants in genes with different ratio of pathogenic to benign SNVs. This test was recently introduced for checking the presence of type-2 circularly bias (13). In our analysis, we divided Clinvar012026 in subsets of variants from ‘mixed’ genes, which have both pathogenic and benign SNVs in different proportions, and ‘pure’ genes with only one class of variants (either pathogenic or benign). The result shows that PhD-SNPg is not affected by type-2 circularity bias because it achieves on average similar AUC or better MCC (Matthews correlation coefficient) on the subsets of variants from the ‘mixed’ genes with respect to the ‘pure’ subset (Supplementary Table S7 and Supplementary Figure S3). To provide a further comparison of the performance of PhD-SNPg, CADD and FATHMM-MKL in predicting the impact of coding variants, we scored the three algorithms on a dataset of nonsynonymous SNVs (AllToolScores) from VariBench (16). This test confirmed that PhD-SNPg performs similarly to CADD and better than FATHMM-MKL (Supplementary Table S8). Finally, we also evaluated the ability of PhD-SNPg to predict the effect of non-coding variants on transcriptional activity. We estimated the correlation coefficient (R2) between the output of PhD-SNPg (probability of pathogenicity) and the logarithm of the ratio between the transcription activities in the mutated versus the wild-type mouse liver cells. This test, based on the correlation coefficients for the whole set of 30 SNVs and its subsets (17), shows that PhD-SNPg achieved better R2 than CADD and FATHMM-MKL (Supplementary Table S9). More information about the procedure for comparing PhD-SNPg with the state-of-the-art methods as well as the definition of the performance evaluation measures are provided in Supplementary Materials.

Method usage

PhD-SNPg can predict the effect of single and multiple SNVs from an input file. Variant calling format (VCF) file is also accepted as input. Our scripts accept as input genomic coordinates from both assemblies of human genome: hg19 and hg38. The application of our method is limited by the availability of the conservation score. Indeed PhD-SNPg predictions can be performed only on genomic regions for which PhyloP100 score is available.

Prediction output

The main PhD-SNPg output is a probabilistic score between 0 and 1. When the score is >0.5 the SNVs is predicted as Pathogenic otherwise Benign. PhD-SNPg also returns three values that provide additional information in support of the prediction. They are: the false discovery rate (FDR), the PhyloP100 score in the mutated position and the average PhyloP100 score calculated on the five-nucleotide input window. The false discovery rate, defined in supplementary materials, can be used to filter out less reliable predictions. The empirical function for the calculation of the FDR is plot in Supplementary Figure S4.

SERVER DETAILS

Predicting the impact of single nucleotide variants

PhD-SNPg server predicts the impact of a single nucleotide variant provided as comma-separated value (CSV) text or variant calling format (VCF). For each SNV the CSV input is composed by four elements, which indicate the chromosome, the position, the reference and alternative alleles. For example, the variation of a Thymine to Cytosine in chromosome 17, position 41 251 803 is represented by 1741251803, T,C. Multiple SNVs can be provided by copy/pasting in the input box a list of variants in separated rows. For formatting reasons, the input in VCF format should be provided by uploading a file, which contains an header starting with a hashtag (#) followed by the identifiers of at least five columns (CHROM, POS, ID, REF, ALT) separated by a tab character. After the header line, each SNV is indicated in a separated row. If the variant's ID in the third column is missing or not available a dot sign (.) must be used. When the list of SNVs is provided, either in CSV or VCF formats, the server analyzes each variant and checks if the reference allele corresponds to the allele reported in the selected version of the human genome (hg19 or hg38). This task is performed using the twoBitToFa program (20), which quickly extracts a portion of the human genome from a sequence file in binary format. A window sequence of five nucleotides centered around the mutated position is used to generate the 25-element vector encoding for the sequence information. If the nucleotide in input matches the reference allele, the server extracts the corresponding conservation indexes (PhyloP7 and PhyloP100) for the positions around the mutation site. The pre-calculated conservation indexes, which are available on the UCSC repository, are collected using the bigWigToBedGraph program (20). The PhyloP7 and PhyloP100 scores are used to generate a 10-element vector, which represents the conservation features. After this step the 35-element vector encoding for the sequence and conservation features is given in input to the Gradient Boosting algorithm, which returns the prediction output described above. In the final step of the prediction task, the PhD-SNPg server annotates the input variants using TransVar tool (21). TransVar finds the possible effect on the amino acid sequence of the longest matching transcript corresponding to the mutated region.

Alternative input format for single amino acid variants

To facilitate the task of predicting the impact of single amino acid variants (SAVs), PhD-SNPg server also takes as input a list of SAVs. Each SAV is represented by the approved HGNC (HUGO Gene Nomenclature Committee) gene symbol (22) and the amino acid change separated by a comma. The amino acid change is indicated putting together the one-letter symbol of the wild-type residue, the position along the protein sequence and the one-letter symbol of the mutant residue. For example, the change of the Methionine (M) in position 237 to Isoleucine (I) in TP53 is represented by the tuple TP53,M237I. When the PhD-SNPg input is provided in this format (MUT) the server internally maps the protein change back to variant at the genomic level using TransVar algorithm. After this step, the impact of the SNVs is predicted using the procedure described above.

Input interface

The web interface of PhD-SNPg consists of a textarea box where the SNVs, in CSV and MUT format are provided. Below a ‘Browse’ button allows to upload CSV and VCF files either in standard text or gzipped format. When the list of SNVs is provided, three ‘Input Type’ buttons (CSV, VCF and MUT) allow to select the appropriate input format. A second group of buttons (Assembly) is used to indicate the human reference genome (hg19 or hg38) to which the SNVs are referred. Examples of inputs in CSV and MUT format can be provided using respectively the ‘chr,pos,ref,alt’ and ‘gene,mut’ links at the top of web interface. Although an example of VCF-like input is linked in the ‘Help’ web page, the usage of the textarea box for the VCF input format is discouraged. On the bottom of the PhD-SNPg web page, the e-mail box (optional) is available for receiving PhD-SNPg output by e-mail.

Server output

The PhD-SNPg output is an interactive web page where the prediction output is reported in tabular form. On the top of the page, the JobID of the prediction process and the link to the output in text format (output.txt) are provided. In the JavaScript d3 (https://d3js.org/) table, the predictions associated to each SNV are reported in rows composed by nine columns. The first four columns define the SNV and the remaining five provide information about the prediction. From left to right, the five prediction columns are: the result of the binary classifier (prediction), the probabilistic output (score) defined above, the associated false discovery rate (fdr), the value of the PhyloP100 score in the mutated site (phylop100) and the average value of the PhyloP100 scores for the five positions centered on the mutated site (avg-phylop100). A plus sign (+) at the beginning of each row allows to visualize the results of the annotation performed by TransVar algorithm. When a SNV maps on a coding region, four rows report the following information: i) RefSeq (23) code of the longest transcript (Transcript), ii) the HGNC gene symbol and the associated UniProt (24) identifiers (gene), iii) the sense of the translated strand (strand) and iv) information about the nucleotide change at DNA, RNA and protein levels (region). When available, the links to the RefSeq and UniProt databases are provided. The output file summarizes the prediction and annotation information in a VCF-like format. The same file includes in bottom part information about errors and warnings occurring during the prediction process. On the top of the page, a second web interface (http://snps.biofold.org/phd-snpg/find-job.html), accessible through the Job link, allows to retrieve the output stored on the PhD-SNPg server for about one day. The prediction output is accessible using the JobID provided at the beginning of the output page.

CONCLUSIONS

The PhD-SNPg web server is a user-friendly interface to predict the impact of SNVs in coding and non-coding regions. The standalone version of PhD-SNPg can be easily installed and executed on standard laptop machines. It can run on an Intel Xeon 2.40 GHz machine, with 8GB of RAM and can predict the effect of 1,000 SNVs in <2 min. This time increases, depending on the network speed, when the program runs in the web mode. Despite its simple input features, PhD-SNPg performs similarly to the state-of-the-art methods that require more information and resources. This makes PhD-SNPg a reliable and lightweight tool for evaluating the impact of new variants as well as a baseline benchmark tool for comparing predictors based on more complex input features.

AVAILABILITY AND REQUIREMENTS

PhD-SNPg server is freely available on the Internet at http://snps.biofold.org/phd-snpg. The web interface and the PhD-SNPg scripts are written in Python. PhD-SNPg standalone tool is stored on GitHub (https://github.com/biofold/PhD-SNPg), and can be installed by running a python script that automatically downloads the programs and data from the UCSC repository, with few library dependencies. Click here for additional data file.

23 in total

Review 1. Assessing the accuracy of prediction algorithms for classification: an overview.

Authors: P Baldi; S Brunak; Y Chauvin; C A Andersen; H Nielsen
Journal: Bioinformatics Date: 2000-05 Impact factor: 6.937

2. VariBench: a benchmark database for variations.

Authors: Preethy Sasidharan Nair; Mauno Vihinen
Journal: Hum Mutat Date: 2012-10-11 Impact factor: 4.878

3. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.

Authors: E Capriotti; R Calabrese; R Casadio
Journal: Bioinformatics Date: 2006-08-07 Impact factor: 6.937

Review 4. Variation Interpretation Predictors: Principles, Types, Performance, and Choice.

Authors: Abhishek Niroula; Mauno Vihinen
Journal: Hum Mutat Date: 2016-04-15 Impact factor: 4.878

Review 5. Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms.

Authors: Sierra S Nishizaki; Alan P Boyle
Journal: Trends Genet Date: 2016-12-06 Impact factor: 11.639

6. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

7. Bioinformatics challenges for personalized medicine.

Authors: Guy Haskin Fernald; Emidio Capriotti; Roxana Daneshjou; Konrad J Karczewski; Russ B Altman
Journal: Bioinformatics Date: 2011-05-19 Impact factor: 6.937

8. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

Authors: Dominik G Grimm; Chloé-Agathe Azencott; Fabian Aicheler; Udo Gieraths; Daniel G MacArthur; Kaitlin E Samocha; David N Cooper; Peter D Stenson; Mark J Daly; Jordan W Smoller; Laramie E Duncan; Karsten M Borgwardt
Journal: Hum Mutat Date: 2015-03-26 Impact factor: 4.878

9. Guidelines for investigating causality of sequence variants in human disease.

Authors: D G MacArthur; T A Manolio; D P Dimmock; H L Rehm; J Shendure; G R Abecasis; D R Adams; R B Altman; S E Antonarakis; E A Ashley; J C Barrett; L G Biesecker; D F Conrad; G M Cooper; N J Cox; M J Daly; M B Gerstein; D B Goldstein; J N Hirschhorn; S M Leal; L A Pennacchio; J A Stamatoyannopoulos; S R Sunyaev; D Valle; B F Voight; W Winckler; C Gunter
Journal: Nature Date: 2014-04-24 Impact factor: 49.962

10. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

29 in total

1. Genetic variability in COVID-19-related genes in the Brazilian population.

Authors: Rodrigo Secolin; Tânia K de Araujo; Marina C Gonsales; Cristiane S Rocha; Michel Naslavsky; Luiz De Marco; Maria A C Bicalho; Vinicius L Vazquez; Mayana Zatz; Wilson A Silva; Iscia Lopes-Cendes
Journal: Hum Genome Var Date: 2021-04-02

2. Transcription-translation error: In-silico investigation of the structural and functional impact of deleterious single nucleotide polymorphisms in GULP1 gene.

Authors: Opeyemi S Soremekun; Chisom Ezenwa; Mahmoud Soliman; Tinashe Chikowore; Oyekanmi Nashiru; Segun Fatumo
Journal: Inform Med Unlocked Date: 2020-12-24

3. In silico identification of the rare-coding pathogenic mutations and structural modeling of human NNAT gene associated with anorexia nervosa.

Authors: Muhammad Bilal Azmi; Unaiza Naeem; Arisha Saleem; Areesha Jawed; Haroon Usman; Shamim Akhtar Qureshi; M Kamran Azim
Journal: Eat Weight Disord Date: 2022-06-02 Impact factor: 3.008

4. Computational Insights into the Structural and Functional Impacts of nsSNPs of Bone Morphogenetic Proteins.

Authors: Hafiz Ishfaq Ahmad; Nabeel Ijaz; Gulnaz Afzal; Akhtar Rasool Asif; Aziz Ur Rehman; Abdur Rahman; Irfan Ahmed; Muhammad Yousaf; Abdelmotaleb Elokil; Sayyed Aun Muhammad; Sarah M Albogami; Saqer S Alotaibi
Journal: Biomed Res Int Date: 2022-07-04 Impact factor: 3.246

5. Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants.

Authors: Emidio Capriotti; Piero Fariselli
Journal: Hum Genet Date: 2022-01-31 Impact factor: 5.881

6. Loss of function mutations in GEMIN5 cause a neurodevelopmental disorder.

Authors: Sukhleen Kour; Deepa S Rajan; Tyler R Fortuna; Eric N Anderson; Caroline Ward; Youngha Lee; Sangmoon Lee; Yong Beom Shin; Jong-Hee Chae; Murim Choi; Karine Siquier; Vincent Cantagrel; Jeanne Amiel; Elliot S Stolerman; Sarah S Barnett; Margot A Cousin; Diana Castro; Kimberly McDonald; Brian Kirmse; Andrea H Nemeth; Dhivyaa Rajasundaram; A Micheil Innes; Danielle Lynch; Patrick Frosk; Abigail Collins; Melissa Gibbons; Michele Yang; Isabelle Desguerre; Nathalie Boddaert; Cyril Gitiaux; Siri Lynne Rydning; Kaja K Selmer; Roser Urreizti; Alberto Garcia-Oguiza; Andrés Nascimento Osorio; Edgard Verdura; Aurora Pujol; Hannah R McCurry; John E Landers; Sameer Agnihotri; E Corina Andriescu; Shade B Moody; Chanika Phornphutkul; Maria J Guillen Sacoto; Amber Begtrup; Henry Houlden; Janbernd Kirschner; David Schorling; Sabine Rudnik-Schöneborn; Tim M Strom; Steffen Leiz; Kali Juliette; Randal Richardson; Ying Yang; Yuehua Zhang; Minghui Wang; Jia Wang; Xiaodong Wang; Konrad Platzer; Sandra Donkervoort; Carsten G Bönnemann; Matias Wagner; Mahmoud Y Issa; Hasnaa M Elbendary; Valentina Stanley; Reza Maroofian; Joseph G Gleeson; Maha S Zaki; Jan Senderek; Udai Bhan Pandey
Journal: Nat Commun Date: 2021-05-07 Impact factor: 14.919

7. Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer.

Authors: Alin Voskanian; Panagiotis Katsonis; Olivier Lichtarge; Vikas Pejaver; Predrag Radivojac; Sean D Mooney; Emidio Capriotti; Yana Bromberg; Yanran Wang; Max Miller; Pier Luigi Martelli; Castrense Savojardo; Giulia Babbi; Rita Casadio; Yue Cao; Yuanfei Sun; Yang Shen; Aditi Garg; Debnath Pal; Yao Yu; Chad D Huff; Sean V Tavtigian; Erin Young; Susan L Neuhausen; Elad Ziv; Lipika R Pal; Gaia Andreoletti; Steven E Brenner; Maricel G Kann
Journal: Hum Mutat Date: 2019-08-17 Impact factor: 4.700

8. The association between Single Nucleotide Polymorphisms of Klotho Gene and Mortality in Elderly Men: The MrOS Sweden Study.

Authors: Ping-Hsun Wu; Per-Anton Westerberg; Andreas Kindmark; Åsa Tivesten; Magnus K Karlsson; Dan Mellström; Claes Ohlsson; Bengt Fellström; Torbjörn Linde; Östen Ljunggren
Journal: Sci Rep Date: 2020-06-24 Impact factor: 4.379

9. In silico analysis on the functional and structural impact of Rad50 mutations involved in DNA strand break repair.

Authors: Juwairiah Remali; Wan Mohd Aizat; Chyan Leong Ng; Yi Chieh Lim; Zeti-Azura Mohamed-Hussein; Shazrul Fazry
Journal: PeerJ Date: 2020-05-22 Impact factor: 2.984

10. TiSAn: estimating tissue-specific effects of coding and non-coding variants.

Authors: Kévin Vervier; Jacob J Michaelson
Journal: Bioinformatics Date: 2018-09-15 Impact factor: 6.937