Florian Kandlinger1,2, Maximilian G Plach1, Rainer Merkl3. 1. Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany. 2. Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany. 3. Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany. Rainer.Merkl@ur.de.
Abstract
BACKGROUND: Large enzyme families may contain functionally diverse members that give rise to clusters in a sequence similarity network (SSN). In prokaryotes, the genome neighborhood of a gene-product is indicative of its function and thus, a genome neighborhood network (GNN) deduced for an SSN provides strong clues to the specific function of enzymes constituting the different clusters. The Enzyme Function Initiative ( http://enzymefunction.org/ ) offers services that compute SSNs and GNNs. RESULTS: We have implemented AGeNNT that utilizes these services, albeit with datasets purged with respect to unspecific protein functions and overrepresented species. AGeNNT generates refined GNNs (rGNNs) that consist of cluster-nodes representing the sequences under study and Pfam-nodes representing enzyme functions encoded in the respective neighborhoods. For cluster-nodes, AGeNNT summarizes the phylogenetic relationships of the contributing species and a statistic indicates how unique nodes and GNs are within this rGNN. Pfam-nodes are annotated with additional features like GO terms describing protein function. For edges, the coverage is given, which is the relative number of neighborhoods containing the considered enzyme function (Pfam-node). AGeNNT is available at https://github.com/kandlinf/agennt . CONCLUSIONS: An rGNN is easier to interpret than a conventional GNN, which commonly contains proteins without enzymatic function and overly specific neighborhoods due to phylogenetic bias. The implemented filter routines and the statistic allow the user to identify those neighborhoods that are most indicative of a specific metabolic capacity. Thus, AGeNNT facilitates to distinguish and annotate functionally different members of enzyme families.
BACKGROUND: Large enzyme families may contain functionally diverse members that give rise to clusters in a sequence similarity network (SSN). In prokaryotes, the genome neighborhood of a gene-product is indicative of its function and thus, a genome neighborhood network (GNN) deduced for an SSN provides strong clues to the specific function of enzymes constituting the different clusters. The Enzyme Function Initiative ( http://enzymefunction.org/ ) offers services that compute SSNs and GNNs. RESULTS: We have implemented AGeNNT that utilizes these services, albeit with datasets purged with respect to unspecific protein functions and overrepresented species. AGeNNT generates refined GNNs (rGNNs) that consist of cluster-nodes representing the sequences under study and Pfam-nodes representing enzyme functions encoded in the respective neighborhoods. For cluster-nodes, AGeNNT summarizes the phylogenetic relationships of the contributing species and a statistic indicates how unique nodes and GNs are within this rGNN. Pfam-nodes are annotated with additional features like GO terms describing protein function. For edges, the coverage is given, which is the relative number of neighborhoods containing the considered enzyme function (Pfam-node). AGeNNT is available at https://github.com/kandlinf/agennt . CONCLUSIONS: An rGNN is easier to interpret than a conventional GNN, which commonly contains proteins without enzymatic function and overly specific neighborhoods due to phylogenetic bias. The implemented filter routines and the statistic allow the user to identify those neighborhoods that are most indicative of a specific metabolic capacity. Thus, AGeNNT facilitates to distinguish and annotate functionally different members of enzyme families.
Authors: Xinshuai Zhang; Michael S Carter; Matthew W Vetting; Brian San Francisco; Suwen Zhao; Nawar F Al-Obaidi; Jose O Solbiati; Jennifer J Thiaville; Valérie de Crécy-Lagard; Matthew P Jacobson; Steven C Almo; John A Gerlt Journal: Proc Natl Acad Sci U S A Date: 2016-07-11 Impact factor: 11.205
Authors: Hua Huang; Michael S Carter; Matthew W Vetting; Nawar Al-Obaidi; Yury Patskovsky; Steven C Almo; John A Gerlt Journal: J Am Chem Soc Date: 2015-11-12 Impact factor: 15.419
Authors: Danielle G Lemay; William F Martin; Angie S Hinrichs; Monique Rijnkels; J Bruce German; Ian Korf; Katherine S Pollard Journal: BMC Bioinformatics Date: 2012-09-28 Impact factor: 3.169
Authors: Alex Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K Attwood; Christian J A Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt Oates; Daniel Haft; Hongzhan Huang; Darren A Natale; Cathy H Wu; Christine Orengo; Ian Sillitoe; Huaiyu Mi; Paul D Thomas; Robert D Finn Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 16.971
Authors: José P Faria; James J Davis; Janaka N Edirisinghe; Ronald C Taylor; Pamela Weisenhorn; Robert D Olson; Rick L Stevens; Miguel Rocha; Isabel Rocha; Aaron A Best; Matthew DeJongh; Nathan L Tintle; Bruce Parrello; Ross Overbeek; Christopher S Henry Journal: Front Microbiol Date: 2016-11-24 Impact factor: 5.640