| Literature DB >> 24316328 |
Geetha Govindan1, Achuthsankar S Nair2.
Abstract
Protein trafficking or protein sorting in eukaryotes is a complicated process and is carried out based on the information contained in the protein. Many methods reported prediction of the subcellular location of proteins from sequence information. However, most of these prediction methods use a flat structure or parallel architecture to perform prediction. In this work, we introduce ensemble classifiers with features that are extracted directly from full length protein sequences to predict locations in the protein-sorting pathway hierarchically. Sequence driven features, sequence mapped features and sequence autocorrelation features were tested with ensemble learners and their performances were compared. When evaluated by independent data testing, ensemble based-bagging algorithms with sequence feature composition, transition and distribution (CTD) successfully classified two datasets with accuracies greater than 90%. We compared our results with similar published methods, and our method equally performed with the others at two levels in the secreted pathway. This study shows that the feature CTD extracted from protein sequences is effective in capturing biological features among compartments in secreted pathways.Entities:
Keywords: Autocorrelation; Ensemble classifier; Pathways; Protein sorting; Sequence driven features; Sequence mapped features
Mesh:
Substances:
Year: 2013 PMID: 24316328 PMCID: PMC4357838 DOI: 10.1016/j.gpb.2013.07.005
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Hierarchical structures of compartments in protein trafficking Adopted from [15–19]. Level 0, root of hierarchy; Level 1, first division; Level 2, second division.
Figure 2Amino acid di-peptide (GC, TG, AT) count with skips in a sample sequence c0 indicates count of dipeptides with zero skip, c1 indicates count of dipeptides with one skip and c2 indicates count of dipeptides with two skips.