| Literature DB >> 20007151 |
Johannes Goll1, Robert Montgomery, Lauren M Brinkac, Seth Schobel, Derek M Harkins, Yinong Sebastian, Susmita Shrivastava, Scott Durkin, Granger Sutton.
Abstract
Generation of syntactically correct and unambiguous names for proteins is a challenging, yet vital task for functional annotation processes. Proteins are often named based on homology to known proteins, many of which have problematic names. To address the need to generate high-quality protein names, and capture our significant experience correcting protein names manually, we have developed the Protein Naming Utility (PNU, http://www.jcvi.org/pn-utility). The PNU is a web-based database for storing and applying naming rules to identify and correct syntactically incorrect protein names, or to replace synonyms with their preferred name. The PNU allows users to generate and manage collections of naming rules, optionally building upon the growing body of rules generated at the J. Craig Venter Institute (JCVI). Since communities often enforce disparate conventions for naming proteins, the PNU supports grouping rules into user-managed collections. Users can check their protein names against a selected PNU rule collection, generating both statistics and corrected names. The PNU can also be used to correct GenBank table files prior to submission to GenBank. Currently, the database features 3080 manual rules that have been entered by JCVI Bioinformatics Analysts as well as 7458 automatically imported names.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20007151 PMCID: PMC2808875 DOI: 10.1093/nar/gkp958
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Screenshots of the user interface (A) Full match entry: this ‘full match’ entry links four nonpreferred names to one preferred name, here ‘enoyl-[acyl-carrier-protein] reductase (NADH)’. The preferred name may be linked to an external reference, here EC 1.3.1.9 of the IUBMB (9). (B) PNU report: the report provides basic statistics in the heading. The table contains five columns: the number of entries for the respective input name, the input name, the PNU naming suggestion, a user confirmation check box and a link to further details. The bottom row in the figure represents a warning. If the user chooses to change the name associated with the warning, they can input the new name in the blank field under ‘Enter Suggested Name’. Checked and entered names will then be used to correct and update the imported file.
List of partial match actions
| Action | Match Value | Replace Value | Example Input | Example Output |
|---|---|---|---|---|
| full replace | DUF | conserved hypothetical protein | hypothetical protein (DUF 1092) | conserved hypothetical protein |
| partial replace | 7-DHC | 7-dehydrocholesterol | 7-DHC reductase | 7-dehydrocholesterol reductase |
| remove | homolog | N/A | putative repressor homolog | putative repressor |
| merge duplicates | outative | N/A | putative kinase putative | putative kinase |
| move to beginning | putative | N/A | acyltransferase, putative | putative acyltransferase |
| move to end | putative | N/A | putative calicivirin | calicivirin, putative |
| regular expr. warning | /Salmonella/i | N/A | Salmonella invasin chaperone | WARNING |
| regular expr. local | /acyl-[cC]o[aA]/acyl-CoA/ | acyl-coa dehydrogenase | acyl-CoA dehydrogenase | |
| regular expr. global | /[Gg]nat family/GNAT family/g | acetyltransferase, Gnat family | acetyltransferase, GNAT family | |
For full and partial replace actions, users need to enter two input fields (match and replace value), while the other actions need only one input field.
Perl-styled regular expressions can be used for the three regular expression actions. The example input and output columns demonstrate the respective action. All may match multiple names.
Figure 2.Overview of PNU use cases and project customization. Rules are entered via the web interface (either one by one or in a batch) and may be organized into groups, procedures and projects. Projects specify the set of rules that are used to correct names in input files. The PNU report allows users to verify name changes before correcting names (Figure 2B).