| Literature DB >> 34350187 |
Misty M Attwood1, Helgi B Schiöth1,2.
Abstract
Transmembrane proteins are involved in many essential cell processes such as signal transduction, transport, and protein trafficking, and hence many are implicated in different disease pathways. Further, as the structure and function of proteins are correlated, investigating a group of proteins with the same tertiary structure, i.e., the same number of transmembrane regions, may give understanding about their functional roles and potential as therapeutic targets. This analysis investigates the previously unstudied group of proteins with five transmembrane-spanning regions (5TM). More than half of the 58 proteins identified with the 5TM architecture belong to 12 families with two or more members. Interestingly, more than half the proteins in the dataset function in localization activities through movement or tethering of cell components and more than one-third are involved in transport activities, particularly in the mitochondria. Surprisingly, no receptor activity was identified within this dataset in large contrast with other TM groups. The three major 5TM families, which comprise nearly 30% of the dataset, include the tweety family, the sideroflexin family and the Yip1 domain (YIPF) family. We also analyzed the evolutionary origin of these three families. The YIPF family appears to be the most ancient with presence in bacteria and archaea, while the tweety and sideroflexin families are first found in eukaryotes. We found no evidence of common decent for these three families. About 30% of the 5TM proteins have prominent expression in the brain, liver, or testis. Importantly, 60% of these proteins are identified as cancer prognostic markers, where they are associated with clinical outcomes of various tumor types. Nearly 10% of the 5TMs are still not fully characterized and further investigation of their functional activities and expression is warranted. This study provides the first comprehensive analysis of proteins with the 5TM architecture, providing details of their unique characteristics.Entities:
Keywords: YIPF family; cancer prognostic marker; protein trafficking; sideroflexin family; transmembrane protein; tweety family
Year: 2021 PMID: 34350187 PMCID: PMC8327215 DOI: 10.3389/fcell.2021.708754
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Proteins involved in enzymatic activities.
| Protein family (gene name) | EC | Functional activity | Localization |
| 2.3.1.- | Ceramide synthesis, possibly involved in lipid trafficking, metabolism, or sensing | ER, Nucleus | |
| AB hydrolase superfamily, Lipase family (DAGLA, DAGLB) | 3.1.1.- | Hydrolase activity | PM |
| Dual specificity phosphatase catalytic domain (TPTE, TPTE2) | 3.1.3.- | Phosphatase activity | Golgi apparatus, ER |
| 3.1.-.- | Hydrolase activity | Nucleoplasm, Mitochondria | |
| 5.3.3.- | Cholesterol biosynthesis Lipoprotein internalization N.B: EBPL function is undetermined but not involved in cholesterol biosynthesis | ER, PM, Vesicle, Vacuole | |
| CH25H | 1.14.-.- | Catalyzes the formation of 25-hydroxycholesterol from cholesterol | ER, Vacuole |
| ZDHHC4 | 2.3.1.- | Protein-cysteine S-palmitoyltransferase activity, protein targeting to membrane | Golgi apparatus, ER, Vacuole |
| AGPAT4 | 2.3.1.- | Transferase activity, transferring acyl groups | Golgi apparatus, ER, Nucleoli, Vesicles |
| RNFT1 | 2.3.2.- | E3 ubiquitin-protein ligase | ER, Nucleoli |
| SYVN1 | 2.3.2.- | E3 ubiquitin-protein ligase | ER, PM, Nucleoplasm, Vacuole |
| CDIPT | 2.7.8.- | CDP-diacylglycerol metabolic process | PM, Nuclear membrane |
| TAOK2 | 2.7.11.- | Serine/threonine-protein kinase | CM, Nucleoli, Nucleoplasm |
| AIG1 | 3.1.-.- | Hydrolase activity: Long-chain fatty acid catabolic process | Golgi apparatus, ER |
| DOLPP1 | 3.6.1.- | Hydrolyzes dolichyl pyrophosphate and monophosphate | ER, Vesicle, Vacuole |
| HACD2 | 4.2.1.- | Catalyzes reaction in long-chain fatty acids elongation cycle | ER |
Proteins involved in transport activities.
| Protein family (gene name) | TCDB | Functional activity | Localization |
| Tweety family (TTYH1, TTYH2, TTYH3) | 1.A.48.-.- | Swelling-dependent volume-regulated anion channel in astrocytes | PM |
| OXA1/ALB3/YidC family (OXA1L, COX18) | 2.A.9.-.- | Insertases: translocation of COX2 and integral membrane proteins | Mitochondria IM |
| Sideroflexin family* (SFXN1, SFXN2, SFXN3, SFXN4, SFXN5) | 2.A.54.-.- | Amino acid transport | Mitochondria IM |
| TspO/BZRP family* (TSPO, TSPO2) | 9.A.24.-.- | Transmembrane signaling Mitochondrial respiration Cholesterol transport | ER, Vesicle, Vacuole Mitochondria OM |
| YIF1/YIP1 family* (YIF1A, YIF1B, YIPF1-7) | 9.B.135.-.- | COPII-coated ER to Golgi transport vesicle-mediated transport | Golgi apparatus, ER |
| CD47 | 1.N.1.-.- | Cell adhesion, membrane transport | PM, Vesicle |
| STIMATE | 8.A.65.-.- | Calcium channel regulator activity | ER, Vacuole |
| ARV1 | 9.A.19.-.- | Cholesterol transport | ER, Vesicle, |
| TMEM41A | 9.B.27.-.- | Putative transport protein; metastasis via modulation of E-cadherin | ER, Golgi apparatus |
Proteins with the 5TM architecture that perform varied functional activities.
| Protein family (gene names) | Functional activity | Localization |
| Dual oxidase maturation factor family (DUOXA1, DUOXA2) | Transport of DUOX1/2 from ER to PM | ER, PM |
| Prominin family (PROM1, PROM2) | Cholesterol binding | PM, Vesicle, Nucleoplasm |
| ATP6V0B | Proton-conducting pore forming subunit of vacuolar ATPase | Vacuole |
| BFAR | Apoptosis regulator | ER, PM |
| CLPTM1 | May play a role in T-cell development | ER, PM |
| CHRFAM7A | Transmembrane signaling receptor activity, regulation of membrane potential | PM |
| DNAH3 | Cilium-dependent cell motility | Plastid |
| SLC66A3 | Possible transport of amino acids across the lysosomal membrane | ER |
| TEX261 | COPII-coated ER to Golgi transport vesicle | Vesicle, Vacuole Nucleoplasm |
| TMEM79 | Regulated exocytosis, cornification | Golgi apparatus |
| UNC50 | Protein transport | Vacuole, Golgi apparatus |
FIGURE 1The five transmembrane architecture. (A) The basic topology of the 5TM dataset. More than half the proteins in the dataset have the amino (N)-terminal region in the cytoplasmic environment and the carboxyl (C)-terminal in the luminal region. Many of the proteins are expected to contain targeting signals embedded in the first transmembrane region along with possibly amino acid residues in the N-terminus. (B) The domain structures and important residue modifications affecting localizations of the three major 5TM families. The description of the tweety family includes estimates of four possible glycosylation sites in purple; the important pore-forming amino acid (R165) in TTYH1 indicated in yellow (Han et al., 2019); and the Pfam tweety domain (PF04906) in light orange. The Sideroflexin family is annotated with a possible acetylation site at residue one or two and colored orange; the conserved HPDT residues are the red symbol; and the sideroflexin Pfam domain (PF03820) is in light orange. Many of the YIPF proteins have an acetylation site at residue one or two that is colored orange; three conserved motifs are indicated in red; and the YIPF Pfam domains (PF03878 and PF04893) are shown in light orange.
FIGURE 2The major cellular localizations of the 5TM proteins. Localization information and analysis for with the number of proteins identified for each locale is in parenthesis and compartments that are overrepresented in comparison to the human transmembrane proteome are indicated in italics. Proteins that localize to the nuclear outer membrane-endoplasmic reticulum network, the inner and outer membrane of the mitochondria, the Golgi trans cisterna, vacuoles, the plasma membrane, and COPII-coated ER to Golgi vesicles are over-represented. Data for this figure is solely from the PANTHER Classification System and the overrepresentation analysis is from the PANTHER Overrepresentation Test (v14.1) (Mi et al., 2019) with the Gene Ontology (GO) Annotation database released on 2019-07-03. Fisher’s Exact test was performed and the False Discovery Rate was calculated with p < 0.05. The human transmembrane protein identities are from Attwood et al. (2017). 5,725 out of 5,779 proteins were successfully mapped while 55 of 58 proteins from the 5TM dataset were successfully mapped using GO annotation.
FIGURE 3Phylogenetic analysis of Sideroflexin family. Phylogenetic reconstruction is the result of Bayesian inference posterior probabilities and bootstrapping analysis with the best-scoring maximum likelihood tree using RAxML (v8.2.10) (Minjarez et al., 2016) on 30 taxa with 67 sequences. Support values are given in percent at the nodes of the major clades differentiating the sideroflexin gene families. The protein sequences were aligned using Mafft (v6) (Fang et al., 2018) with E-INSI-I algorithm and JTT substitution model. MrBayes was used with amino acid mixed model run for 1,000,000 generations. The PROTGAMMAAUTO model in RAxML was used with 500 bootstrap replicates.
FIGURE 4(A) Enhanced tissue expression of 5TM dataset. The enhanced or enriched expression of proteins in the 5TM dataset with the different types of tissues on the bottom part of the figure and associated proteins on the top part. Data are from The Tissue Atlas (Amorim et al., 2017). More than 35% of the proteins have enhanced or enriched expression in the cerebral cortex, liver, testis and blood tissues. The category Varied tissues includes intestine, breast, thyroid, parathyroid, gall bladder, prostate, and pancreas tissues. The category All Other 5TM includes thirty proteins in the dataset that have low tissue specificity. (B) 5TM proteins as prognostic markers for cancer. The nine different tumor types are on the bottom part of the figure while the 35 prognostic proteins associated with them are on the top half. Approximately 60% of the genes in the dataset are identified in the Pathology Atlas as candidate prognostic genes that are associated with the clinical outcome of different tumor types. The genes are identified from correlation analyses of gene expression and clinical outcome where Kaplan-Meier plots with high significance (p < 0.001) were considered prognostic (Pujar et al., 2018). Of the 35 proteins identified, 21 are associated with several different types of cancers. Gynecologic cancer includes cervical, endometrial, ovarian, and urothelial cancers. Proteins not identified as prognostic are not included in the figure.