| Literature DB >> 22829727 |
Abstract
In this work we propose the hypothesis that replacing the current system of representing the chemical entities known as amino acids using Latin letters with one of several possible alternative symbolic representations will bring significant benefits to the human construction, modification, and analysis of multiple protein sequence alignments. We propose ways in which this might be done without prescribing the choice of actual scripts used. Specifically we propose and explore three ways to encode amino acid texts using novel symbolic alphabets free from precedents. Primary orthographic encoding is the direct substitution of a new alphabet for the standard, Latin-based amino acid code. Secondary encoding imposes static residue groupings onto the orthography of the alphabet by manipulating the shape and/or orientation of amino acid symbols. Tertiary encoding renders each residue as a composite symbol; each such symbol thus representing several alternative amino acid groupings simultaneously. We also propose that the use of a new group-focussed alphabet will free the colouring of amino acid residues often used as a tool to facilitate the representation or construction of multiple alignments for other purposes, possibly to indicate dynamic properties of an alignment such as position-wise residue conservation.Entities:
Keywords: Atom pair; CDK-2; Molecular similarity; Similarity searching
Year: 2012 PMID: 22829727 PMCID: PMC3398780 DOI: 10.6026/97320630008539
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1The schematic representation embodied in this figure attempts to capture the quintessence of the proposed alternative scripts. Primary orthographic encoding is a literal alternative with a straight substitution of one alphabet for the standard, Latin-based amino acid code. As examples of what might be possible, we show here non-alphanumeric characters and letters culled from an aesthetically pleasing but little used language. Secondary encoding attempts to impose static amino acid groupings onto the orthography of the alphabet, either by using letter rotation or by encoding similarity in residue physical properties as the similarity of shape between letters. This captures either the explicit categorisation of amino acids into defined groups or the implicit groupings more usually represented using a principal component plot. Tertiary encoding renders each amino acid as a composite symbol; each such symbol thus representing several groupings simultaneously. As indicated, orthographically-encoded groupings can be effectively augmented by colouring each element differently.