| Literature DB >> 27387560 |
Renaud Vanhoutreve1, Arnaud Kress1, Baptiste Legrand1, Hélène Gass1, Olivier Poch1, Julie D Thompson2.
Abstract
BACKGROUND: A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences.Entities:
Keywords: Bayesian statistics; Homology-based methods; Multiple sequence alignment; Sequence homology
Mesh:
Substances:
Year: 2016 PMID: 27387560 PMCID: PMC4936259 DOI: 10.1186/s12859-016-1146-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic diagram of the steps involved in the LEON-BIS method
Accuracy of three methods for the detection of related and unrelated sequences
| LEON related | LEON non-related | OD-seq related | OD-seq non-related | LEON-BIS related | LEON-BIS non-related | |
|---|---|---|---|---|---|---|
| Related sequences | 15,227 | 2304 | 17,298 | 513 | 15,552 | 1999 |
| Non-related sequences | 221 | 930 | 540 | 331 | 187 | 944 |
| Total | 15,448 | 3234 | 17,838 | 844 | 15,739 | 2943 |
| Sensitivity | 0.87 | 0.97 | 0.89 | |||
| Specificity | 0.81 | 0.38 | 0.83 | |||
Fig. 2Part of an example alignment from the BAliBASE benchmark suite, aligned using MAFFT. The query sequence is from the bacteria Aquifex aeolicus (Uniprot:O67561) and the alignment includes both related and unrelated sequences. Conserved regions detected by LEON-BIS are outlined in red
Fig. 3a Number of known domains from the PFAM protein family database successfully retrieved by the different methods tested. b Number of regions predicted by the different methods that overlap with known PFAM domains
Precision and recall statistics for the identification of known PFAM domains by LEON and LEON-BIS
| LEON | LEON-BIS | |
|---|---|---|
| Recall | 0.91 | 0.92 |
| Precision | 0.93 | 0.92 |
| F-measure | 0.92 | 0.92 |
Fig. 4Retrieval of PFAM domains depending on a) domain length and b) percent sequence identity
Fig. 5Part of the alignment constructed by MAFFT of Cdk-activating kinase assembly factor MAT1/Tfb3 sequences, showing the conserved C-terminal region