| Literature DB >> 27367670 |
Shu-Ting Pan1, Danfeng Xue2, Zhi-Ling Li3, Zhi-Wei Zhou4, Zhi-Xu He5, Yinxue Yang6, Tianxin Yang7, Jia-Xuan Qiu8, Shu-Feng Zhou9.
Abstract
The human cytochrome P450 (CYP) superfamily consisting of 57 functional genes is the most important group of Phase I drug metabolizing enzymes that oxidize a large number of xenobiotics and endogenous compounds, including therapeutic drugs and environmental toxicants. The CYP superfamily has been shown to expand itself through gene duplication, and some of them become pseudogenes due to gene mutations. Orthologs and paralogs are homologous genes resulting from speciation or duplication, respectively. To explore the evolutionary and functional relationships of human CYPs, we conducted this bioinformatic study to identify their corresponding paralogs, homologs, and orthologs. The functional implications and implications in drug discovery and evolutionary biology were then discussed. GeneCards and Ensembl were used to identify the paralogs of human CYPs. We have used a panel of online databases to identify the orthologs of human CYP genes: NCBI, Ensembl Compara, GeneCards, OMA ("Orthologous MAtrix") Browser, PATHER, TreeFam, EggNOG, and Roundup. The results show that each human CYP has various numbers of paralogs and orthologs using GeneCards and Ensembl. For example, the paralogs of CYP2A6 include CYP2A7, 2A13, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 2F1, 2J2, 2R1, 2S1, 2U1, and 2W1; CYP11A1 has 6 paralogs including CYP11B1, 11B2, 24A1, 27A1, 27B1, and 27C1; CYP51A1 has only three paralogs: CYP26A1, 26B1, and 26C1; while CYP20A1 has no paralog. The majority of human CYPs are well conserved from plants, amphibians, fishes, or mammals to humans due to their important functions in physiology and xenobiotic disposition. The data from different approaches are also cross-validated and validated when experimental data are available. These findings facilitate our understanding of the evolutionary relationships and functional implications of the human CYP superfamily in drug discovery.Entities:
Keywords: bioinformatics; comparative genomics; drug metabolism; homolog; human CYP; ortholog; paralog
Mesh:
Substances:
Year: 2016 PMID: 27367670 PMCID: PMC4964396 DOI: 10.3390/ijms17071020
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1(A) Alignment of 57 human CYP proteins which are retrieved from Swiss-Prot. Multiple sequence alignment of human CYPs is carried out using Clustal W v2.0; (B) The phylogenic tree of human CYPs which can infer the evolutionary relationships among human CYPs; (C) MEME (Multiple EM for Motif Elicitation) version 4.10.1 is employed to identify important conserved motifs present in human CYP proteins.
Figure 2Gene tree for human CYP1A1, 1A2, 1B1, 17A1, and 21A2 built using Ensembl 84. These five genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 537 genes from various species. The total number of speciation nodes is 370, and the number of duplication is 143. The number of ambiguous nodes is 21, and the number of gene split events is 2.
A list of 57 human functional CYP genes and their corresponding paralogs based on Ensembl 84 and GeneCards 4.1.1.
| Gene | Chromosomal Location | Substrates/Function | Number of Amino Acids | Paralogs by Ensembl 84 | Paralogs by GeneCards 4.1.1 |
|---|---|---|---|---|---|
| 15q24.1 | Drugs, procarcinogens, steroids, and fatty acids | 512 | |||
| 15q24.1 | Drugs, fatty acids, and steroids | 516 | |||
| 2p22.2 | Drugs, procarcinogens, steroids, and fatty acids | 543 | |||
| 19q13.2 | Drugs and steroids | 494 | |||
| 19q13.2 | Unknown (orphan) | 494 | |||
| 19q13.2 | Drugs and other xenobiotics | 494 | |||
| 19q13.2 | Drugs, steroids and fatty acids | 491 | |||
| 10q23.33 | Drugs, steroids and fatty acids | 490 | |||
| 10q24 | Drugs, steroids and fatty acids | 490 | |||
| 10q24 | Drugs, steroids and fatty acids | 490 | |||
| 10q24.1-q24.3 | Drugs | 490 | |||
| 22q13.1 | Drugs | 497 | |||
| 10q26.3 | Drugs, ethanol, and procarcinogens | 493 | |||
| 19q13.2 | Drugs and coumarins | 491 | |||
| 1p31.3-p31.2 | Fatty acid (e.g., AA) | 502 | |||
| 11p15.2 | Vitamin D | 501 | |||
| 19q13.1 | Xenobiotics | 504 | |||
| 4q25 | AA, DHEA, and long chain fatty acids | 544 | |||
| 7p22.3 | Unknown | 490 | |||
| 7q21.1 | Drugs, steroids and fatty acids | 503 | |||
| 7q21.1 | Drugs, steroids and fatty acids | 502 | |||
| 7q21-q22.1 | Drugs, steroids and fatty acids | 503 | |||
| 7q21.1 | Low level of testosterone 6β-hydroxylase activity | 503 | |||
| 1p33 | Medium-chain fatty acids such as laurate and myristate | 519 | |||
| 1p33 | Unknown (orphan) | 519 | |||
| 1p33 | Xenobiotics, steroids and fatty acids | 511 | |||
| p13.12 | Eicosanoids | 520 | |||
| 19p13.2 | Eicosanoids (e.g., LTB4) | 520 | |||
| 19p13.1 | Eicosanoids | 520 | |||
| 19p13.1 | Unknown (orphan) | 524 | |||
| 19p13.1 | Fatty acids | 524 | |||
| 19p13.12 | Unknown (orphan) | 531 | |||
| 4q35.2 | Unknown (orphan) | 525 | |||
| 1p33 | Unknown (orphan) | 509 | |||
| 1p33 | Flavoprotein hydroxylation | 505 | |||
| 7q34-q35 | Thromboxane synthesis | 534 | |||
| 8q11-q12 | Cholesterol | 504 | |||
| 8q21.3 | Cholesterol | 506 | |||
| 20q13.13 | Isomerisation of PGH2 to prostacyclin | 500 | |||
| 3p22-p21.3 | Steroids | 501 | |||
| 15q23-q24 | Side-chain cleavage of cholesterol pregnenolone | 521 | |||
| 8q21 | Steroids | 503 | |||
| 8q21-q22 | Steroids, especially production of aldosterone | 503 | |||
| 10q24.3 | Steroid metabolism, especially the conversion of pregnenolone and progesterone | 508 | |||
| 15q21.1 | Steroid metabolism, formation of aromatic C18 estrogens and C19 androgens | 503 | No paralog | ||
| 2q33.2 | Unknown (orphan) | 462 | No paralog | ||
| 6p21.3 | 21-hydroxylation of steroids; required for adrenal synthesis of mineralocorticoids and glucocorticoids | 495 | |||
| 20q13 | Vitamin D hydroxylation | 514 | |||
| 10q23-q24 | Retinoic acid metabolism | 497 | |||
| 2p13.2 | Retinoic acid metabolism | 512 | |||
| 10q23.33 | Retinoic acid metabolism | 522 | |||
| 2q35 | Steroid metabolism, catalyzing first step in oxidation of side-chain of sterol intermediates | 531 | |||
| 12q14.1 | Vitamin D metabolism | 508 | |||
| 2q14.3 | Unknown (orphan) | 372 | |||
| 6p21.1-p11.2 | Cholesterol | 469 | |||
| 14q32.1 | Cholesterol | 500 | |||
| 7q21.2- | Sterols | 509 | |||
Figure 3Gene tree for human CYP2A6, 2A7, 2A13, 2B6, 2C8, 2C9, 2C19, 2D6, 2D7, 2E1, 2F1, 2J2, 2R1, 2S1, 2U1, and 2W1 built using Ensembl 84. These CYP2 family genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 1254 genes from various species. The total number of speciation nodes is 741, and the number of duplication is 483. The number of ambiguous nodes is 29, and there is no gene split event.
Figure 4Gene tree for human CYP3A4, 3A5, 3A7, 3A43, 4A11, 4A22, 4B1, 4F2, 4F3, 4F8, 4F11, 4F12, 4F22, 4V2, 4X1, 4Z1, 5A1/TBXAS1, and 46A1 built using Ensembl 84. These CYP3, 4, 5 and 46 family genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 1008 genes from various species. The total number of speciation nodes is 558, and the number of duplication is 384. The number of ambiguous nodes is 31, and there are 4 gene split events.
Figure 5Gene tree for human CYP7A1, 7B1, 8A1/PTGIS, 8B1, and 39A1 built using Ensembl 84. These CYP7, 8, and 39 family genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 340 genes from various species. The total number of speciation nodes is 287, and the number of duplication is 41. The number of ambiguous nodes is 10, and there is only 1 gene split event.
Figure 6Gene tree for human CYP11A1, 11B1, 11C1, 24A1, 27A1, 27B1, and 27C1 built using Ensembl 84. These CYP11, 24, and 27 family genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 410 genes from various species. The total number of speciation nodes is 344, and the number of duplication is 52. The number of ambiguous nodes is 13, and there is no gene split event.
Figure 7Gene tree for human CYP26A1, 26B1, 26C1, and 51A1 built using Ensembl 84. These CYP26 and CYP51 family genes are paralogs to each other derived from the same ancestral gene via duplication events. The gene tree includes a total of 260 genes from various species. The total number of speciation nodes is 232, and the number of duplication is 12. The number of ambiguous nodes is 15, and there is no gene split event.
Databases and programs used to predict the orthologs of human cytochrome P450 (CYP) genes.
| Database | URL | Current Release | Primary Application | Number of Organisms | Taxonomic Range | Update Frequency |
|---|---|---|---|---|---|---|
| NCBI | RefSeq release 75 (released on 14 March 2016) | NCBI’s genome annotation pipeline | 862 mammalian vertebrates and 3145 other vertebrates (RefSeq covers 58,776 organisms) | Vertebrates | Daily | |
| Ensembl Compara | Release 84 (released in March 2016) | To build phylogenetic trees across the whole set of protein-coding genes with one pipeline | 68 chordates plus 240 others | All domains of life | Quarterly | |
| GeneCards | Version 4.1.1 (released on 13 March 2016) | To provide comprehensive, user-friendly information on all annotated and predicted human genes (39,629 HGNC approved, 21,976 protein-coding genes, 104,578 RNA genes, and 16,329 pseudogenes) | 58 | All domains of life | Yearly | |
| OMA | Release 17 (released in September 2014) | To identify orthologous genes via massive cross-comparison of complete genomes | 1706 (226 Eukaryota, 1353 Bacteria, and 127 Archaea) | All domains of life | Twice per year | |
| PANTHER | Version 10.0 (released in 25 April 2015) | Inference of gene function using GO terms and evolutional relationships of genes among organisms (11,928 protein families, divided into 83,190 functionally distinct protein subfamilies) | 104 | All domains of life | Yearly | |
| TreeFam | Release 9 (released on 3 May 2013) | Phylogenetic tree construction and providing orthology/parology predictions as well the evolutionary history of genes | 109 (vs. 79 in TreeFam 8) | Metazoans + model eukaryotes | Once every 2 years | |
| EggNOG | 4.1 (released on 5 May 2015) | A database of orthologous groups and functional annotation | 2031 | All domains of life | Once every 3–4 years | |
| RoundUp | 2.0 (released in January 2012) | An online database of gene orthologs for over 1800 genomes | 1807 | All domains of life | 2–4 times per year |