Bo Jin1, Xinghua Lu. 1. Department of Biochemistry and Molecular Biology, Medical University of South Carolina, 174 Ashley Ave, Charleston, SC 29425, USA.
Abstract
MOTIVATION: The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that retains as much as possible of the original semantic information of GO. RESULTS: We represented the semantic context of a GO term using the word-usage-profile associated with the term, which enables one to measure the semantic differences between terms based on the differences in their semantic contexts. We further employed the information bottleneck methods to automatically identify subsets of GO terms that retain as much as possible of the semantic information in an annotation database. The automatically retrieved informative subsets align well with an expert-picked GO slim subset, cover important concepts and proteins, and enhance literature-based GO annotation. AVAILABILITY: http://carcweb.musc.edu/TextminingProjects/.
MOTIVATION: The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that retains as much as possible of the original semantic information of GO. RESULTS: We represented the semantic context of a GO term using the word-usage-profile associated with the term, which enables one to measure the semantic differences between terms based on the differences in their semantic contexts. We further employed the information bottleneck methods to automatically identify subsets of GO terms that retain as much as possible of the semantic information in an annotation database. The automatically retrieved informative subsets align well with an expert-picked GO slim subset, cover important concepts and proteins, and enhance literature-based GO annotation. AVAILABILITY: http://carcweb.musc.edu/TextminingProjects/.
Authors: Adam J Richards; Brian Muller; Matthew Shotwell; L Ashley Cowart; Bäerbel Rohrer; Xinghua Lu Journal: Bioinformatics Date: 2010-06-15 Impact factor: 6.937
Authors: Songjian Lu; Chunhui Cai; Gonghong Yan; Zhuan Zhou; Yong Wan; Vicky Chen; Lujia Chen; Gregory F Cooper; Lina M Obeid; Yusuf A Hannun; Adrian V Lee; Xinghua Lu Journal: Cancer Res Date: 2016-10-10 Impact factor: 12.701
Authors: Songjian Lu; Kevin N Lu; Shi-Yuan Cheng; Bo Hu; Xiaojun Ma; Nicholas Nystrom; Xinghua Lu Journal: PLoS Comput Biol Date: 2015-08-28 Impact factor: 4.475