| Literature DB >> 20231229 |
Inbal Yomtovian1, Nuttinee Teerakulkittipong, Byungkook Lee, John Moult, Ron Unger.
Abstract
MOTIVATION: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called 'composition bias', based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome.Entities:
Mesh:
Year: 2010 PMID: 20231229 PMCID: PMC2853687 DOI: 10.1093/bioinformatics/btq093
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Histograms showing the composition bias for six organisms of several sets of proteins. All histograms were computed by using the average composition vector of the real proteins as the reference, and the composition bias of each protein relative to that reference was calculated. As expected, the real proteins have the smallest bias. Surprisingly, the composition bias of intergenic ‘proteins’ is significantly larger than that of random or antisense proteins. For the random genes, very similar results were obtained when using either the genome's coding or non-coding frequencies.
Fig. 2.Histograms of the composition bias of the set of ORFan proteins are compared with the composition bias of all proteins and of random proteins for six organisms. Since there are fewer ORFan proteins, their histograms were scaled up accordingly (the results were validated to ensure that they are not due to sampling effects). In the two examples in the top panel (a), the ORFan proteins behave like random proteins; in the two examples in the bottom panel (c), the ORFans behave like the real proteins; and the behavior of the examples in the middle panel (b) is intermediate.