| Literature DB >> 29900035 |
Roger S Holmes1,2, Kimberly D Spradling-Reeves1, Laura A Cox1.
Abstract
Glutamyl aminopeptidase (ENPEP) is a member of the M1 family of endopeptidases which are mammalian type II integral membrane zinc-containing endopeptidases. ENPEP is involved in the catabolic pathway of the renin-angiotensin system forming angiotensin III, which participates in blood pressure regulation and blood vessel formation. Comparative ENPEP amino acid sequences and structures and ENPEP gene locations were examined using data from several mammalian genome projects. Mammalian ENPEP sequences shared 71-98% identities. Five N-glycosylation sites were conserved for all mammalian ENPEP proteins examined although 9-18 sites were observed, in each case. Sequence alignments, key amino acid residues and predicted secondary and tertiary structures were also studied, including transmembrane and cytoplasmic sequences and active site residues. Highest levels of human ENPEP expression were observed in the terminal ileum of the small intestine and in the kidney cortex. Mammalian ENPEP genes contained 20 coding exons. The human ENPEP gene promoter and first coding exon contained a CpG island (CpG27) and at least 6 transcription factor binding sites, whereas the 3'-UTR region contained 7 miRNA target sites, which may contribute to the regulation of ENPEP gene expression in tissues of the body. Phylogenetic analyses examined the relationships of mammalian ENPEP genes and proteins, including primate, other eutherian, marsupial and monotreme sources, using chicken ENPEP as a primordial sequence for comparative purposes.Entities:
Keywords: Amino acid sequence; Aminopeptidase A; Arterial hypertensionAbbreviations: ENPEP: Glutamyl Aminopeptidase; BLAST: Basic Local Alignment Search Tool; BLAT: Blast-Like Alignment Tool; CpG island: Multiple C (cytosine)-G (guanine) Dinucleotide Region; ENPEP; Evolution; Glutamyl aminopeptidase; Mammals; NCBI: National Center for Biotechnology Information; Peptidase M1 family; QTL: Quantitative Trait Locus; RAS: Renin-Angiotensin System; SWISS-MODEL: Automated Protein Structure Homology-modeling Server; Zinc metallopeptidase; kbps: Kilobase Pairs; miRNA: microRNA Binding Region
Year: 2017 PMID: 29900035 PMCID: PMC5995572 DOI: 10.4172/2153-0602.1000211
Source DB: PubMed Journal: J Data Mining Genomics Proteomics
Mammalian and chicken ENPEP genes and proteins.
| ENPEP Gene | Species | Chromosome location | Exons#(strand) | Gene Size bps | GenBank ID | UNIPROT ID | Amino acids | Subunit M (pI) |
|---|---|---|---|---|---|---|---|---|
| Human | Homo sapiens | 4:110,476,415-110,561,555 | 20 (+ve) | 85141 | NM_0019977 | Q07075 | 957 | 109,244 (5.3) |
| Chimpanzee | Pan troglodytes | 4:113,095,101-113,180,147 | 20 (+ve) | 85047 | H2QQ15 | 957 | 109,115 (5.3) | |
| Gorilla | Gorilla gorilla | 4:121,992,414-122,077,571 | 20 (+ve) | 85158 | G3SK36 | 957 | 109,262 (5.3) | |
| Orang-utan | Pongo abelii | 4:115,175,027-115,261,670 | 20 (+ve) | 86644 | NM_001132893 | H2PE46 | 957 | 109,098 (5.2) |
| Rhesus | Macaca mulatta | 5:109,436,911-109,519,173 | 20 (+ve) | 82263 | NM_001266656 | F7GTW9 | 957 | 109,188 (5.2) |
| Baboon | Papio anubis | 5:101,625,577-101,709,020 | 20 (+ve) | 83444 | A0A096MTU4 | 957 | 109,192 (5.3) | |
| Squirrel monkey | Saimiri boliviensis | 20 (−ve) | 88777 | Na | 957 | 109,059 (5.2) | ||
| Marmoset | Callithrix jacchus | 3:83,132,279-83,220,293 | 20 (−ve) | 88015 | na | 957 | 109,299 (5.4) | |
| Mouse lemur | Microbus murinus | 20 (−ve) | 84595 | na | 962 | 109,104 (5.6) | ||
| Mouse | Mus musculus | 3:129,270,282-129,332,481 | 20 (−ve) | 62200 | NM_007934 | P16406 | 945 | 107,956 (5.3) |
| Rat | Rattus norvegicus | 2:252,992,139-253,065,721 | 20 (−ve) | 73583 | P50123 | 945 | 107,995 (5.2) | |
| Cow | Bos taurus | 6:16,067,640-16,146,013 | 20 (−ve) | 78374 | NM_001038027 | F1MEM5 | 956 | 109,801 (5.1) |
| Horse | Equus caballus | 2:115,349,261-115,422,839 | 20 (−ve) | 73579 | F6XRR6 | 948 | 108,220 (4.8) | |
| Pig | Sus scrofa | 8:119,969,527-120,060,884 | 20 (−ve) | 91358 | NM_214017 | Q95334 | 942 | 108,284 (5.1) |
| Rabbit | Oryctolagus cuniculus | 15:38,927,056-39,017,176 | 20 (−ve) | 90121 | G1TBB2 | 956 | 109,013 (5.0) | |
| Dog | Canis familiaris | 32:30,553,200-30,638,483 | 20 (+ve) | 85284 | F6XRM5 | 954 | 109,202 (5.4) | |
| Cat | Catus felis | B1:113,256,430-113,341,776 | 20 (−ve) | 85347 | M3VU18 | 952 | 109,480 (5.7) | |
| Opossum | Monodelphis domestica | 5:63,362,365-63,488,028 | 20 (+ve) | 125664 | F6TL25 | 957 | 110,151 (5.4) | |
| Platypus | Ornithorhynchus anatinus | 20 (+ve) | 76898 | F7E6Z3 | 938 | 107,447 (5.6) | ||
| Chicken | Gallus gallus | 4:57,435,632-57,469,043 | 20 (−ve) | 33412 | A0A1D5PAZ7 | 943 | 107,918 (5.0) |
RefSeq: The reference amino acid sequence;
Predicted NCBI-derived amino acid sequence; na: Not Available; GenBank IDs are derived from NCBI http://www.ncbi.nlm.nih.gov/genbank/; UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual ENPEP proteins (http://kr.expasy.org); *JH and *KQ refer to a scaffold; bps refers to base pairs of nucleotide sequences; pI refers to theoretical isoelectric.
Predicted locations of N-glycosylation sites for mammalian ENPEP proteins. The predicted N-glycosylation sites were numbered following alignments using Clustal Omega [29] from the N-terminal end; conserved N-glycosylation sites for all mammalian ENPEP sequences examined are highlighted in yellow; individual amino acid residues were identified using standard single letter nomenclature: N-asparagine; S-serine; T-threonine etc.
| Site | Human | Chimp | Gorilla | Orangutan | Rhesus | Baboon | Squirrel | Marmoset | Mouse | Mouse | Rat | Cow | Horse | Pig | Rabbit | Cat | Dog | Opossum |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 43NHS | 44NTS | ||||||||||||||||
| 2 | 110NIS | |||||||||||||||||
| 3 | 124NLS | 124NLS | 124NLS | 124NLS | 124NLS | 124NLS | 124NLS | 124NLS | 122NLS | 116NLS | 116NLS | 126NVS | 115NVS | 114NVT | 123NVS | 119NVS | 121NVS | 123NVT |
| 4 | 197NGS | 197NGS | 197NGS | 197NGS | 197NGS | 197NGS | 197NGS | 197NGS | 202NGS | 189NGS | 189NGS | 199NGS | 188NGS | 187NGS | 196NGS | 192HGS | 194NGS | 197NGS |
| 5 | 236NIS | 241NIS | ||||||||||||||||
| 6 | 272NRT | 272NRT | 272NRT | 267NRT | 269NRT | 272NRT | ||||||||||||
| 7 | 324NIT | 324NIT | 324NIT | 324NIT | 324NIT | 324NIT | 324NIT | 324NIT | 329NIT | 316NIT | 316NIT | 326NIT | 315NIT | 314NIT | 323NIT | 321NIT | 324NIT | |
| 8 | 340NYS | 340NYS | 340NYS | 340NYS | 340NYS | 340NYS | 340NYS | 340NYS | 345NYS | 331NYS | ||||||||
| 9 | 383NES | 367NES | 367NES | |||||||||||||||
| 10 | 545NLS | |||||||||||||||||
| 11 | 554NIT | 554NIT | 554NIT | 554NIT | 554NIT | 554NIT | 554NIT | 554NIT | 546NIT | 547NVT | 556NIT | 549NIT | 551NIT | |||||
| 12 | 558NSS | 557NSS | 562NSS | 564NLS | ||||||||||||||
| 13 | 567NPS | 567NPS | 567NPS | 567NPS | 567NPS | 567NPS | 567NPS | 567NPS | 567NPS | |||||||||
| 14 | 589NIT | 589NIT | 589NIT | 589NIT | 589NIT | 589NIT | 589NIT | 589NIT | 589NIT | 584NIT | 580NVS | 579NES | 588NES | 584NVS | 586NVS | 592NIT | ||
| 15 | 597NRS | 597NRS | 597NRS | 597NRS | 597NRS | 597NRS | 597NRS | 597NRS | 602NRS | 599NRS | 588NRS | 587NRS | 596NRS | 592NRS | 594NRS | 597NRT | ||
| 16 | 607NSS | 607NSS | 607NSS | 607NSS | 607NSS | 607NSS | 607NSS | 612NSS | 601NLS | 601NLS | 601NPS | 597NSS | 606NPS | 602NSS | 604NSS | 607NST | ||
| 17 | 610NPS | 610NPS | 610NPS | 610NPS | 610NPS | 610NPS | 610NPS | 610NPS | 615NPS | |||||||||
| 18 | 643NLS | 633NLS | 632NLS | 641NLS | 637NLS | 646NFS | ||||||||||||
| 19 | 637NHT | 647NHT | 646NHT | |||||||||||||||
| 20 | 649NFS | 649NFS | 647NFS | 640NFS | ||||||||||||||
| 21 | 678NLT | 678NLT | 678NLT | 678NLT | 678NLT | 678NLT | 678NLT | 678NLT | 683NLT | 669NLT | 669NLT | 679NLT | 669NLT | 668NLT | 677NLT | 672NLT | 675NLT | 678NLT |
| 22 | 734NDT | 734NDT | ||||||||||||||||
| 23 | 763NAS | 763NAS | 763NAS | 763NAS | 763NAS | 763NAS | 763NAS | 763NAS | 768NAS | 754NAS | 754NAS | 764NAS | 754NAS | 753NAS | 762NAS | 758NAT | 760NAT | 763NAS |
| 24 | 766NES | |||||||||||||||||
| 25 | 773NGT | 773NGT | 773NGT | 773NGT | 773NGT | |||||||||||||
| 26 | 796NET | 796NET | 801NET | 797NET | 787NET | 786NET | 795NET | 791NET | 793NET | |||||||||
| 27 | 801NYT | 801NYT | 801NYT | 801NYT | 801NYT | 801NYT | 801NYT | 801NYT | 806NYT | 792NYT | 792NYT | 802NYT | 792NYT | 791NYT | 800NYT | 796NYT | 798NYT | 800NYT |
| 28 | 828NVT | 828NVT | 828NVT | 828NVT | 828NVT | 828NVT | 828NVT | 828NVT | 829NVT | 819NVT | 827NVT | 823NVT | 825NVT | 827NVT | ||||
| Total | 15 | 15 | 14 | 14 | 16 | 15 | 18 | 18 | 16 | 9 | 12 | 12 | 15 | 14 | 12 | 15 | 16 | 13 |
Figure 1Amino acid sequence alignments for vertebrate ENPEP sequences. Table 1 for sources of ENPEP sequences; *Shows identical residues for ENPEP subunits; : Similar alternate residues;. Dissimilar alternate residues; N-glycosylated and potential N-glycosylated Asn sites are in red and numbered according to; human ENPEP active site residues are shown: Zinc binding sites, 393His, 397His, 416Glu; proton acceptor, 394Glu; and transition state stabilizer 497Tyr; other active site residues are shown as ^; α-helices for vertebrate ENPEP [11] are in shaded yellow and numbered in sequence from the N-terminus end; predicted β-sheets are in grey and similarly numbered in sequence from the N-terminus; turns in the 3D structure are shown; bold underlined font shows residues corresponding to known or predicted exon start sites; exon numbers refer to human ENPEP gene exons; four major domains were identified as cytoplasmic (N-terminal tail) (1-19); signal membrane anchor transmembrane (for linking ENPEP to the plasma membrane) (20-39; N-terminal domain (M1 aminopeptidase N) (100-545); and C-terminal domain (ERAP1-like domain) (617-931).
Figure 2N-terminal amino acid sequence alignments (A) and 5′-nucleotide gene sequence alignments (B) for mammalian ENPEP proteins and genes. A: N-terminal mammalian ENPEP amino acid sequence alignments; *Shows identical residues for ENPEP subunits; : Similar alternate residues;. Dissimilar alternate residues; predicted cytosolic and transmembrane helical residues are shown; Table 1 for details of mammalian ENPEP proteins and genes; other mammalian ENPEP sequences were derived from NCBI as described in Methods; sn monkey: short nosed monkey; sq monkey: squirrel monkey; cap monkey: capucine monkey. B: N-Terminal mammalian ENPEP amino acid sequence alignments and 5′ mammalian ENPEP nucleotide sequence alignments; predicted cytosolic and transmembrane helical residues are shown; *Shows identical residues for ENPEP subunits and nucleotide residues; : Similar alternate residues;. Dissimilar alternate residues; ENPEP gene regions showing areas of deletions are shown.
Figure 3Tertiary structure for human ENPEP. The structure for human ENPEP is based on the reported structure [11] and obtained using the SWISS MODEL web site based on PDB 4KX7A (http://swissmodel.expasy.org/workspace/). The rainbow color code describes the 3-D structure from the N- (blue) to C-termini (red color); α-helices and β-sheets are shown; note the separation of 2 major domains: N-terminal M1 aminopeptidase N domain (in blue, with predominantly β-sheets); and C-terminal ERAP1-like domain (multicolored, with predominantly α-helical structures.
Figure 4Tissue expression for human ENPEP. RNA-seq gene expression profiles across 53 selected tissues (or tissue segments) were examined from the public database for human ENPEP, based on expression levels for 175 individuals (Data Source: GTEx Analysis Release V6p (dbGaP Accession phs000424.v6.p1) (http://www.gtex.org). Tissues: 1. Adipose-Subcutaneous; 2. Adipose-Visceral (Omentum); 3. Adrenal gland; 4. Artery-Aorta; 5. Artery-Coronary; 6. Artery-Tibial; 7. Bladder; 8. Brain-Amygdala; 9. Brain-Anterior cingulate Cortex (BA24); 10. Brain-Caudate (basal ganglia); 11. Brain-Cerebellar Hemisphere; 12. Brain-Cerebellum; 13. Brain-Cortex; 14. Brain-Frontal Cortex; 15. Brain-Hippocampus; 16. Brain-Hypothalamus; 17. Brain-Nucleus accumbens (basal ganglia); 18. Brain-Putamen (basal ganglia); 19. Brain-Spinal Cord (cervical c-1); 20. Brain-Substantia nigra; 21. Breast-Mammary Tissue; 22. Cells-EBV-transformed lymphocytes; 23. Cells-Transformed fibroblasts; 24. Cervix-Ectocervix; 25. Cervix-Endocervix; 26. Colon-Sigmoid; 27. Colon-Transverse; 28. Esophagus-Gastroesophageal Junction; 29. Esophagus-Mucosa; 30. Esophagus-Muscularis; 31. Fallopian Tube; 32. Heart-Atrial Appendage; 33. Heart-Left Ventricle; 34. Kidney-Cortex; 35. Liver; 36. Lung; 37. Minor Salivary Gland; 38. Muscle-Skeletal; 39. Nerve-Tibial; 40. Ovary; 41. Pancreas; 42. Pituitary; 43. Prostate; 44. Skin-Not Sun Exposed (Suprapubic); 45. Skin-Sun Exposed (Lower leg); 46. Small Intestine-Terminal Ileum; 47. Spleen; 48. Stomach; 49. Testis; 50. Thyroid; 51. Uterus; 52. Vagina; 53. Whole Blood.
Figure 5Gene structure and major gene transcript for the human ENPEP gene. Derived from the Ace View (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/) [24]; shown with capped 5′- and 3′-ends for the predicted mRNA sequences; NM refers to the NCBI reference sequence; coding exons are in pink; the direction for transcription is shown as 5′ ? 3′; a large CpG27 island is located at the gene promoter and the first exon; predicted transcription factor binding sites (TFBS) for human ENPEP are shown; 7 predicted miRNA target sites were identified within the extended 3′-UTR region of human ENPEP.
Figure 6Phylogenetic tree of mammalian ENPEP amino acid sequences with the chicken ENPEP amino acid sequence. The tree is labeled with the ENPEP name and the name of the animal and is ‘rooted’ with the chicken (Gallusi gallus) ENPEP sequence, which was used to ‘root’ the tree (Table 1). Note the single cluster corresponding to the ENPEP gene family. A genetic distance scale is shown. The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Replicate values of 0.9 or more, which are highly significant, are shown with 100 bootstrap replicates performed in each case. A proposed sequence of gene evolution events is shown arising from an ancestral bird ENPEP gene.