Literature DB >> 15588499

Cyclophilin nomenclature problems, or, 'a visit from the sequence police'.

Daniel W Nebert1, Nickolas A Sophos, Vasilis Vasiliou, David R Nelson.   

Abstract

Why is agreement on one particular name for each gene important? As one genome after another becomes sequenced, it is imperative to consider the complexity of genes, genetic architecture, gene expression, gene-gene and gene-product interactions and evolutionary relatedness across species. To agree on a particular gene name not only makes one's own research easier, it aids automated text-mining algorithms and search engines, which are increasingly employed to find relationships in the millions of abstracts in the medical research literature and sequence databases. A common nomenclature system will also be helpful to the present generation, as well as future generations, of graduate students and postdoctoral fellows who are about to enter genomics research. In this paper, the authors present some problems that arose when two separate research communities decided to choose the same root, CYP, for naming their gene families. They then offer a logical solution, by renaming the cyclophilin genes with a common root, such as cyn- in Caenorhabditis and CYN- in mammals (Cyn in mouse), and using evolutionary divergence to cluster genes of the highest level of relatedness.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15588499      PMCID: PMC3525097          DOI: 10.1186/1479-7364-1-5-381

Source DB:  PubMed          Journal:  Hum Genomics        ISSN: 1473-9542            Impact factor:   4.639


Introduction

A previous paper in this series [1] summarised the steps that one is strongly encouraged to follow in order to ensure proper nomenclature of any gene. Three examples were given to illustrate how and why one should strive for a standardised gene nomenclature system. In these examples, the focus of the paper was on using the gene names as search terms, rather than comparing a DNA or protein sequence that has just been determined by searching via BLAST [2]. The three examples included: PTGS1 and PTGS2 as the correct gene names for prostaglandin G/H synthase-1 and -2, also known as cyclooxygenase-1 and -2 and commonly erroneously nicknamed 'COX-1' and 'COX-2' in many journals; the short- and long-chain fatty acid synthase gene families, for which there is currently no official agreed-upon nomenclature (although FASN on human chromosome 17q25 is the official symbol for the fatty acid synthase gene); an POR as the correct name for the NADPH-P450 oxidoreductase gene [1]. Before deciding upon a new gene symbol, the reader is encouraged to visit the website describing this topic [3]. This theme is extended in the current paper, which shows how two completely separate research communities adopted the same gene root name, while not realising that the other group had done the same thing.

Cyclophilins as 'cyp-' in a Caenorhabditis elegans database

As Head of the Cytochrome P450 (CYP) Superfamily Gene Nomenclature Committee, David Nelson maintains a website dedicated to cytochrome P450 gene nomenclature [4]. The C. elegans genome has 76 full-length P450 genes and nine pseudogenes, which have been assembled by Nelson during the past several years and were nearly completed after the genome sequence had been published. Recently, Dan Lawson of WormPep [5] asked Nelson to review assemblies of these genes, following the recent revision of the worm's genome. While carrying out this request, Nelson discovered that Dan Lawson had referred to several P450 genes as 'ccp-xx'. Nelson then explored WormPep further and confirmed that ccp was being used as the root for cytochrome P450 genes. Although the usual root for P450 gene names is CYP, this term (cyp) was being used in the C. elegans protein database for the cyclophilins (Table 1). Can this be a problem -- i.e. the same gene root being used for different gene families by colleagues in two separate, very distant research fields?
Table 1

List of cyclophilin and P450 genes in C. elegans.

Cyclophilin genesAlternative nameWormPep accession #
cyp-1Y49A3A.5CGC approvedCE22213
cyp-2ZK520.5CGC approvedCE16730
cyp-3Y75B12B.5CGC approvedCE20374
cyp-4F59E10.2mog-6CGC approvedCE01596
cyp-5F31C3.1CGC approvedCE17730
cyp-6F42G9.2CGC approvedCE01301
cyp-7Y75B12B.2CGC approvedCE20371
cyp-8D1009.2aD1009.2bCGC approvedCE04286
cyp-9T27D1.1CGC approvedCE03745
cyp-10B0252.4aB0252.4bCGC approvedCE02420
cyp-11T01B7.4CGC approvedCE03588
cyp-12C34D4.12CGC approvedCE17506
cyp-13Y116A8C.34CGC approvedCE24152
cyp-14F39H2.2aF39H2.2bCGC approvedCE32410
cyp-15Y87G2A.6CGC approvedCE24686
cyp-16Y17G7B.9CGC approvedCE19042
cyp-17ZC250.1CGC approvedCE28157
P450 genes
ccp-13A7T10B9.10CGC approvedCE01655
ccp-14A5F08F3.7CGC approvedCE09262
ccp-31A1C01F6.3CGC approvedunavailable
ccp-44ZK177.5cyp-44CGC approvedCE25682

Data taken from Jonathan Hodgkin, CGC Genetic Map and Nomenclature Curator (Caenorhabditis Genetics Center), Genetics Unit, Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.

List of cyclophilin and P450 genes in C. elegans. Data taken from Jonathan Hodgkin, CGC Genetic Map and Nomenclature Curator (Caenorhabditis Genetics Center), Genetics Unit, Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.

CYP for cytochrome P450 genes in all species

The mammalian cytochrome P450 (CYP) superfamily encodes enzymes involved in: the metabolism of pharmaceuticals, foreign chemicals and pollutants; arachidonic acid metabolism and eicosanoid biosynthesis; cholesterol, sterol and bile acid biosynthesis; steroid synthesis and catabolism; vitamin D3 synthesis and catabolism; retinoic acid hydroxylation; biogenic amine and neuroamine metabolism; and several orphan CYP genes still of unknown function [6]. There are 102 and 57 putatively functiona CYP genes in the mouse and human, respectively [7]. To date, more than 3,400 P450 sequences have been named with the three-letter root of CYP. This nomenclature has been in place [8,9] since 1987, and is growing every day [4]. The official root names for mouse and human P450s are Cyp and CYP, respectively. The Drosophila nomenclature [10] also use Cyp. There are now 727 genes in rice an Arabidopsis that have been named CYP [4]. It is anticipated that the number of named P450 genes will exceed 4,000 by the end of 2004. Whereas continuing to use the CYP root for cyclophilin genes will be a nightmare for cyclophilin researchers, P450 researchers might find this an annoyance but not really much of a problem. To prevent conflicts over nomenclature, it becomes increasingly urgent to rename the cyclophilin genes. What is the best root name for these genes?

Finding the best root for the cyclophilin genes

The three families of immunophilins, known as peptidylprolyl cis-trans isomerases (PPIases), include the cyclophilins, the FK506-binding proteins (FKBPs) and parvulin [11-13]. All three gene families are found in animals, plants and eubacteria. While two cyclophilins and two types of FKBPs exist in archaebacteria, no parvulin homologue has been found. Parvulin is unique among the immunophilins. A search of the LocusLink [14], HUGO Gene Nomenclature Committee [15], and the National Center for Biotechnology Information (NCBI) UniGene [16] websites using 'parvulin', shows a single gene; Pin4 and PIN4 are the approved mouse and human gene names, respectively. 'PIN' is an abbreviation for peptidylprolyl cis-trans isomerase NIMA-interacting-4. 'NIMA' stands for 'never-in-mitosis-gene-a', which was first isolated as a series of conditional cell cycle mutants that failed to enter mitosis in Aspergillus nidulans [17,18]. There are 11 genes (NEK1, NEK2, ... NEK11) in the human genome that encode NIMA-related mitotic kinases and are involved in DNA replication and genotoxic stress responses [19,20]. Although parvulin has peptidylprolyl cis-trans isomerase activity, it shares no evolutionary homology with the FKBPs or cyclophilins. Immunophilins are defined as receptors for immuno-suppressive drugs including cyclosporin-A, FK506 and rapamycin. FK506 is also called tacrolimus, a macrolide of fungal origin (produced by Streptomyces tsukubaensis) and having strong immunosuppressive actions. FK506- and rapamycin-binding proteins are abbreviated as FKBPs and share no evolutionary homology with the cyclophilins or parvulin. A search of the LocusLink, HUGO Gene Nomenclature Committee and the NCBI UniGene websites using 'fkbp', shows more than 80 FKBP genes in the human and mouse (FKBP1, FKBP2, ... FKBP82). These gene products have many unique features, such as targeting BCL2 to the mitochondria and inhibiting apoptosis [21]. Cyclophilins, the third and last class of the PPIases, comprise cyclosporin-A-binding proteins [22] ranging in size from 17 kDa to 324 kDa [12]. This class of immunophilins carries out a wide range of functions -- including acting as a chaperone to facilitate the nuclear transport of the somatolactogenic hormones [23], facilitating the calcium-regulated mitochondrial permeability transition pore which precedes apoptosis [24] and participating in the pre-mRNA splicing machinery [25]. Cyclophilin-binding drugs are emerging as potential leads to novel targets for interference with interleukin-12 production [26] and, therefore, to the possibility of treating conditions such as multiple sclerosis and rheumatoid arthritis. Cyclosporin-A also has activity against helminth and protozoan parasites [27]. A search of the LocusLink, HUGO Gene Nomenclature Committee and the NCBI UniGene websites using 'cyclophilin', shows 15 putatively functional genes and 22 pseudogenes. The 15 putatively functional gene names (Table 2) include PPIA through to PPIH (for peptidylprolyl isomerase-A, -B, ... -H; cyclophilin A-, B-, ... H-related), one PPIA-like (PPIAL3), and six cyclophilin-like (PPIL1, PPIL2, ... PPIL6).
Table 2

List of putatively functional human cyclophilin genes.

Approved gene symbolApproved gene nameChromosomal location
PPIAPeptidylprolyl isomerase A (cyclophilin A)7p13-p11.2
PPIAL3Peptidylprolyl isomerase A (cyclophilin A)-like-321
PPIBPeptidylprolyl isomerase B (cyclophilin B)15
PPICPeptidylprolyl isomerase C (cyclophilin C)[reserved]
PPIDPeptidylprolyl isomerase D (cyclophilin D)4
PPIEPeptidylprolyl isomerase E (cyclophilin E)1p32
PPIFPeptidylprolyl isomerase F (cyclophilin F)10q22-q23
PPIGPeptidylprolyl isomerase G (cyclophilin G)2q31.1
PPIHPeptidylprolyl isomerase H (cyclophilin H)1p34.1
PPILIPeptidylprolyl isomerase (cyclophilin)-like 16p21.1
PPIL2Peptidylprolyl isomerase (cyclophilin)-like 222
PPIL3Peptidylprolyl isomerase (cyclophilin)-like 32
PPIL4Peptidylprolyl isomerase (cyclophilin)-like 46q24-25
PPIL5Peptidylprolyl isomerase (cyclophilin)-like 514q21.3
PPIL6Peptidylprolyl isomerase (cyclophilin)-like 66q21

Not included here are PPIAL, PPIAL2, PPIAP, PPI AP2, PPIAP3, PPIAP4, PPIAP5, PPIAP6, PPIHP1, PPIHP2, PPIL1P1, PPIP1, PPIP2, PPIP3, PPIP4, PPIP5, PPIP6, PPIP7, PPIP8, PPIP9, PPIP10 and PPIP11, which represent the 22 cyclophilin pseudogenes in the Human Genome Project (HGP) database.

List of putatively functional human cyclophilin genes. Not included here are PPIAL, PPIAL2, PPIAP, PPI AP2, PPIAP3, PPIAP4, PPIAP5, PPIAP6, PPIHP1, PPIHP2, PPIL1P1, PPIP1, PPIP2, PPIP3, PPIP4, PPIP5, PPIP6, PPIP7, PPIP8, PPIP9, PPIP10 and PPIP11, which represent the 22 cyclophilin pseudogenes in the Human Genome Project (HGP) database. PPID has the synonym 'CYP-40', but this is no longer the official name. Unfortunately, the mouse RIKEN full-length cDNAs that match this sequence are being called CYP40, not PPID, so the name is propagating itself in the literature and into the databases in an uncontrollable way. The cloning and naming of 11 cyclophilin genes from C. elegans (Cyp-1 to Cyp-11)[28] was reported in 1996. A search of GenBank for CYP20 finds AY568517, a Arabidopsis thylakoid lumen cyclophilin [29], named CYP20-2. (CYP20A1 is a chordate cytochrome P450 of unknown function, possibly involved in development.)[4] The date on this Arabidopsis CYP20 GenBank entry is 15th April, 2004, showing that the problem is not going away. In fact, the PubMed link from the GenBank entry leads to a publication [30] in which a nomenclature system for the 29 cyclophilin genes in the Arabidopsis thaliana genome is presented using CYP as the root.

What is the solution?

Solutions -- like politics -- are local. We have contacted the C. elegans community and alerted them to this nomenclature conflict. They are responding and will select a new root for cyclophilins and change their P450 gene names to cyp-, from the current ccp- root. This will go into the official WormPep and WormBase nomenclature and will eventually prevent use of the cyp- root in C. elegans (and, hopefully, C. briggsae) for cyclophilins. Additional effort will be needed for the Arabidopsis community, as well as for the human and mouse gene databases. What might be the best root for the cyclophilin gene family? Cyn has been used for cyclone, a mouse gene in LocusLink; CPN1 and CPN2 are being used for carboxypeptidase N-1 and -2; Cph was considered, but CPH1 has been used to refer to a cryptochrome or phytochrome (light-sensing protein) [31]. Because of the sharing of this paper (while still being written) with Lois Maltais of Mouse Genome Informatics (MGI), she consulted with the authors of the mouse cyclone gene paper and they have now agreed to use Cycn, in order to free up Cyn and CYN for the mouse and human cyclophilin, respectively. After searching databases and search engines for conflicts, the present authors suggest that Cyn- might be the most suitable root for C. elegans cyclophilins, but this needs to be decided among members of the worm community. It is unfortunate that some databases (eg worm, yeast and bacteria) are mandating that gene names be limited to three letters. The authors suspect that three-letter root names for the ~19,000 C. elegans genes may not be enough. For example, 10,000 families will require the same number of roots. 26 cubed is only 17,576; this will require the use of odd letter combinations that have no symbolic meaning, such a xyz1, cxq, rzx, etc. Also, the nature of language is to use some letters more often than others, which will put great pressure on naming the genes that begin with the most often-used letters. CYN has now been officially approved as the root to unify all mammalian cyclophilins. Using evolutionary trees to assign names to genes in the P450 superfamily [4,8,9], in the authors' experiences, has been very positive. This can work in general for any other homologous group of genes and, in fact, has been used for at least 124 families and/or superfamilies to date [32]. To illustrate this point, a simple sequence alignment (Figure 1) and tree (Figure 2) are presented for the C. elegans cyclophilins.
Figure 1

Sequence alignment of the conserved regions of 17 .

Figure 2

UPGMA tree and possible family and subfamily divisions of the . The root of cyn-, is acceptable because it has not yet been used by any other gene database, except 'cyclone' in mouse. Families are designated by Arabic numerals and represent amino acid identity of 40 per cent or greater. Subfamilies are designated by letters and represent amino acid identity of 54 per cent or greater. Individual genes within subfamilies are then given Arabic numbers. Identifiers are the WormPep accession numbers.

Sequence alignment of the conserved regions of 17 . UPGMA tree and possible family and subfamily divisions of the . The root of cyn-, is acceptable because it has not yet been used by any other gene database, except 'cyclone' in mouse. Families are designated by Arabic numerals and represent amino acid identity of 40 per cent or greater. Subfamilies are designated by letters and represent amino acid identity of 54 per cent or greater. Individual genes within subfamilies are then given Arabic numbers. Identifiers are the WormPep accession numbers. The vertical lines in Figure 2 are suggested break-points for family and subfamily designations. Branches on the tree intersected by the lines would define family and subfamily clusters. The lines could be moved to modify the number of families and subfamilies. As drawn, there are six subfamilies in family 1, and one each in families 2 and 3. Moving the subfamily line to the left could reduce the number of subfamilies in family 1 from six to three. If cyn were used, CE28157 (at the top of Figure 2) would be named cyn3a1 and CE20374 (at the bottom of Figure 2) would be named cyn1a1, and so on. A method for creating a network of 'gene co-occurrences' from the literature, and portioning it into communities of related genes, has recently been presented [33]. In that paper, a program is described (but not named) which searches all Medline titles and abstracts and OMIM entries for occurrences and co-occurrences of gene symbols, gene names and diseases; the databases contain more than 12 million abstracts. Relationships are identified by automated bioinformatics methods between genes, and between genes and diseases, that might not be detected by less computationally intense methods. Such methods must rely on consistent names, or they have to deal with a list of synonyms.

Conclusions

The cyclophilin gene nomenclature has several problems. First, many in the cyclophilin field continue to use CYP, which has been the gene root for the large cytochrome P450 gene superfamily since 1987. Secondly, the gene root chosen by the HUGO Human/Mouse Gene Nomenclature Committees had been PPI for peptidylprolyl cis-trans isomerase, although -- as detailed above -- the cyclophilins represent just one of three classes of the PPIases that are perhaps functionally related but evolutionarily unrelated. Thirdly, the authors suggest the root Cyn for the C. elegans cyclophilin genes. Fourthly, eight of the 15 putatively functional human cyclophilin genes end in the letters 'A' through to 'H', while the others end in two groups of numbers (one PPIA-like and six PPI-like). It is strongly recommended that these genes be named by families and subfamilies, according to evolutionary divergence, as shown in Figure 3. Because of discussions related to the writing of this paper, the CYN root has now been officially approved for mammals. It would be desirable to incorporate as many species as possible into the naming scheme. One additional source of nomenclature friction is the strict use of three letter roots for gene names in C. elegans, yeast and bacteria; this automatically creates conflicts when human and mouse root names can be much longer than three letters, as in PPIAL3 or NIPSNAP1; however, that is a battle for another day.
Figure 3

Rectangular cladogram of the 15 human cyclophilin proteins aligned. If one were to drop vertical lines, as in Figure 2, one might name: PPIL1 through to PPIL6 as CYNIA1 through to CYNIA4, and CYNIB1 and CYNIB2, respectively; PPIB and PPIC as CYN2A1 and CYN2A2, respectively; PPIE, PPIF, PPIA and PPIAL3 as CYN2C1, CYN2C2, CYN2C3 and CYN2C4, respectively; PPIG as CYN3; and so on.

Rectangular cladogram of the 15 human cyclophilin proteins aligned. If one were to drop vertical lines, as in Figure 2, one might name: PPIL1 through to PPIL6 as CYNIA1 through to CYNIA4, and CYNIB1 and CYNIB2, respectively; PPIB and PPIC as CYN2A1 and CYN2A2, respectively; PPIE, PPIF, PPIA and PPIAL3 as CYN2C1, CYN2C2, CYN2C3 and CYN2C4, respectively; PPIG as CYN3; and so on.
  24 in total

1.  A method for finding communities of related genes.

Authors:  Dennis M Wilkinson; Bernardo A Huberman
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-02       Impact factor: 11.205

Review 2.  Peptidylprolyl cis/trans isomerases (immunophilins): biological diversity--targets--functions.

Authors:  Andrzej Galat
Journal:  Curr Top Med Chem       Date:  2003       Impact factor: 3.295

3.  Nek11, a new member of the NIMA family of kinases, involved in DNA replication and genotoxic stress responses.

Authors:  Kohji Noguchi; Hidesuke Fukazawa; Yuko Murakami; Yoshimasa Uehara
Journal:  J Biol Chem       Date:  2002-08-01       Impact factor: 5.157

Review 4.  [Immunophilin FKBP38, an inherent inhibitor of calcineurin, targets Bcl-2 to mitochondria and inhibits apoptosis].

Authors:  Michiko Shirane; Keiichi I Nakayama
Journal:  Nihon Rinsho       Date:  2004-02

Review 5.  Clinical importance of the cytochromes P450.

Authors:  Daniel W Nebert; David W Russell
Journal:  Lancet       Date:  2002-10-12       Impact factor: 79.321

Review 6.  Role of cyclophilins in somatolactogenic action.

Authors:  M A Rycyzyn; C V Clevenger
Journal:  Ann N Y Acad Sci       Date:  2000       Impact factor: 5.691

Review 7.  Archaeal peptidyl prolyl cis-trans isomerases (PPIases).

Authors:  T Maruyama; M Furutani
Journal:  Front Biosci       Date:  2000-09-01

Review 8.  The permeability transition pore complex: another view.

Authors:  Andrew P Halestrap; Gavin P McStay; Samantha J Clarke
Journal:  Biochimie       Date:  2002 Feb-Mar       Impact factor: 4.079

Review 9.  Inhibiting cytokines of the interleukin-12 family: recent advances and novel challenges.

Authors:  Koen Vandenbroeck; Iraide Alloza; Massimo Gadina; Patrick Matthys
Journal:  J Pharm Pharmacol       Date:  2004-02       Impact factor: 3.765

10.  Identification and characterisation of Schizosaccharomyces pombe cyclophilin 3, a cyclosporin A insensitive orthologue of human USA-CyP.

Authors:  Trevor J Pemberton; Stuart L Rulten; John E Kay
Journal:  J Chromatogr B Analyt Technol Biomed Life Sci       Date:  2003-03-25       Impact factor: 3.205

View more
  4 in total

1.  On the role, ecology, phylogeny, and structure of dual-family immunophilins.

Authors:  Sailen Barik
Journal:  Cell Stress Chaperones       Date:  2017-05-31       Impact factor: 3.667

2.  Analysis of the glutathione S-transferase (GST) gene family.

Authors:  Daniel W Nebert; Vasilis Vasiliou
Journal:  Hum Genomics       Date:  2004-11       Impact factor: 4.639

Review 3.  A Family of Novel Cyclophilins, Conserved in the Mimivirus Genus of the Giant DNA Viruses.

Authors:  Sailen Barik
Journal:  Comput Struct Biotechnol J       Date:  2018-07-11       Impact factor: 7.271

Review 4.  Dual-Family Peptidylprolyl Isomerases (Immunophilins) of Select Monocellular Organisms.

Authors:  Sailen Barik
Journal:  Biomolecules       Date:  2018-11-15
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.