| Literature DB >> 27044653 |
Likun Wang1, Cong Zhang1, Johnathan Watkins2,3, Yan Jin1, Michael McNutt1, Yuxin Yin4.
Abstract
BACKGROUND: Targeted next-generation sequencing is playing an increasingly important role in biological research and clinical diagnosis by allowing researchers to sequence high priority genes at much higher depths and at a fraction of the cost of whole genome or exome sequencing. However, in designing the panel of genes to be sequenced, investigators need to consider the tradeoff between the better sensitivity of a broad panel and the higher specificity of a potentially more relevant panel. Although tools to prioritize candidate disease genes have been developed, the great majority of these require prior knowledge and a set of seed genes as input, which is only possible for diseases with a known genetic etiology.Entities:
Keywords: Gene prioritization; Human disorder phenotype similarity; Targeted panel; Web service
Mesh:
Year: 2016 PMID: 27044653 PMCID: PMC4820874 DOI: 10.1186/s12859-016-0998-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flow chart for this website. Users first define a group of disorders using phenotype keyword searches, disorder phenotype similarity scoring, ICD-10 index specification, or manual editing. After determining a disorder group, known disease-associated genes are extracted from the OMIM database automatically. Users can next perform gene set enrichment analysis (GSEA) on the extracted list of disease-associated genes. Finally, an online supporting vector machine (SVM) is applied to refine the identification of genes that potentially underlie the disorder of interest
Fig. 2ROC curves of phenotype similarity matrices with different methods of weighting. ROC analysis with the two benchmark datasets (a: Phenotypic Series, b: Linked OMIM Record Pairs) showed that global weighting was superior to other forms of weighting. The range of false positive rates was restricted to (0, 0.1) in order to highlight the differences between each curve more clearly
AUCs using matrices with different weightings and two benchmark datasets
| Unweighted | Global | Local | Global–local | |
|---|---|---|---|---|
| Phenotypic Series | 0.983 | 0.996 | 0.976 | 0.995 |
| Linked OMIM Record Pairs | 0.945 | 0.985 | 0.923 | 0.982 |
Four sets of similarity score matrices with different weightings (unweighted, global weighting, local weighting, and global–local weighting) are shown. The method applying global weighting is used in SoftPanel
Fig. 3ROC curves of SoftPanel and MimMiner phenotype similarity matrices. ROC analysis with the two benchmark datasets (a: Phenotypic Series, b: Linked OMIM Record Pairs), demonstrated that our similarity matrix (SoftPanel) performed better compared with the results of an existing method (MimMiner) and a similarity matrix constructed as MimMiner using the newest OMIM database (MeSHTree). The range of false positive rates was restricted to (0, 0.1) in order to highlight the differences between each curve more clearly
AUCs using different similarity matrices and two benchmark datasets
| SoftPanel | MeSHTree | MimMiner | |
|---|---|---|---|
| Phenotypic Series | 0.995 | 0.988 | 0.972 |
| Linked OMIM Record Pairs | 0.981 | 0.961 | 0.944 |
SoftPanel: our similarity matrix. MeSHTree: a similarity matrix constructed as MimMiner using the newest OMIM database. MimMiner: data was downloaded from the MimMiner website. Prior to calibration, the matrices were restricted to those with identical dimensions in order to enable a consistent comparison. Hence, the results of SoftPanel in Tables 1 and 2 are different
Fig. 4ROC curves for validation in a case study using epilepsy as the disorder of interest. Two ROC curves were drawn according to the predicted rankings of the genes in the unfiltered and filtered validation datasets