| Literature DB >> 18795109 |
Xiaomeng Li1, Jia Zeng, Hong Yan.
Abstract
We describe a promoter recognition method named PCA-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed codon (3-mer) and pentamer (5-mer) frequencies and created codon and pentamer frequency feature matrices to extract informative and discriminative features for effective classification. Principal component analysis (PCA) is applied to the feature matrices and a subset of principal components (PCs) are selected for classification. Our system uses three neural network classifiers to distinguish promoters versus exons, promoters versus introns, and promoters versus 3' un-translated region (3'UTR). We compared PCA-HPR with three well-known existing promoter prediction systems such as DragonGSF, Eponine and FirstEF. Validation shows that PCA-HPR achieves the best performance with three test sets for all the four predictive systems.Entities:
Keywords: CpG islands; principal component analysis; promoter recognition; sequence feature; transcription start sites
Year: 2008 PMID: 18795109 PMCID: PMC2533055 DOI: 10.6026/97320630002373
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Codon/pentamer percentage in top 100 discriminative features. Statistics is based on three datasets: Promoter versus Exon, Promoter versus Intron, and Promoter versus 3′UTR.