Literature DB >> 22080560

MIPModDB: a central resource for the superfamily of major intrinsic proteins.

Anjali Bansal Gupta¹, Ravi Kumar Verma, Vatsal Agarwal, Manu Vajpai, Vivek Bansal, Ramasubbu Sankararamakrishnan.

Abstract

The channel proteins belonging to the major intrinsic proteins (MIP) superfamily are diverse and are found in all forms of life. Water-transporting aquaporin and glycerol-specific aquaglyceroporin are the prototype members of the MIP superfamily. MIPs have also been shown to transport other neutral molecules and gases across the membrane. They have internal homology and possess conserved sequence motifs. By analyzing a large number of publicly available genome sequences, we have identified more than 1000 MIPs from diverse organisms. We have developed a database MIPModDB which will be a unified resource for all MIPs. For each MIP entry, this database contains information about the source, gene structure, sequence features, substitutions in the conserved NPA motifs, structural model, the residues forming the selectivity filter and channel radius profile. For selected set of MIPs, it is possible to derive structure-based sequence alignment and evolutionary relationship. Sequences and structures of selected MIPs can be downloaded from MIPModDB database which is freely available at http://bioinfo.iitk.ac.in/MIPModDB.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2011 PMID： 22080560 PMCID： PMC3245135 DOI： 10.1093/nar/gkr914

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Major intrinsic proteins (MIPs) form one of the largest superfamily of channel proteins (1,2). They transport water, neutral solutes such as glycerol and urea, metalloids including antimonite and arsenite and gases like CO2, nitric oxide and ammonia across the membranes (3–5). Water-transporting aquaporin and glycerol-specific aquaglyceroporin are the prominent members of this family. Members of MIP superfamily are involved in vital physiological processes such as skin moisture, gastrointestinal fluid transport, fat metabolism, epidermal proliferation, maintaining corneal and lens transparency in eyes and water homeostasis in kidney and central nervous system (6–11). MIPs are found from bacteria to humans and a large number of diverse MIP members have been identified especially in plants (12–17). The abundant MIPs identified in higher plants can be classified into at least five major subfamilies (15). In humans, they are implicated in several diseases such as nephrogenic diabetes insipidous, acute and chronic renal failure, brain edema, cataract and arsenic toxicity (18–24). Sequence analysis of MIP members clearly revealed the presence of highly conserved Asn-Pro-Ala (NPA) signature sequence motif (25,26). MIP sequences also possess internal homology in which the N- and C-terminal halves have significant sequence similarity (25). At the structural level, the N-terminal half is related to the C-terminal half by a pseudo-2-fold symmetry (27,28). Three-dimensional structures of more than 10 MIPs from different organisms such as mammalian (29–31), plant (32), Escherichia coli (33), yeast (34) and archaea (35) have been determined. They all adopt a unique hourglass fold even when the sequence identity among them is very low. Two regions of constriction have been identified within the channel. The two NPA motifs form the central constriction and the outer constriction is toward the extracellular side known as aromatic/arginine (ar/R) selectivity filter formed by four residues. Both regions are known to play important role in the solute transport and selectivity (36–39). With their involvement in human physiology and pathophysiology and with the available structural knowledge, members of MIP family are being considered as attractive drug targets (40,41). For example, aquaglyceroporin in intracellular parasite Plasmodium falciparum and its structural knowledge (42) has opened new options for novel malaria therapies (43). However, for the majority of MIPs from large number of organisms, the tissue localization, their functional properties and the biological significance are either not known or they are not clearly understood. With the advent of new generation of sequencing technologies (44,45), genome sequences of large number of organisms are available. Using conserved sequence motifs and the internal sequence similarity as constraints, we have previously searched genome sequences of plants to identify and characterize the MIP proteins (15,16). We have extended this approach to genomes of other organism groups and identified more than 1000 MIP sequences in diverse organisms. The wealth of information on MIP sequences is now stored in a database called MIPModDB and various details about the substitutions in the conserved NPA motif, gene structure and the features obtained from structural models are available for each MIP sequence. With diverse MIP sequences, one can also obtain structure-based sequence alignment and evolutionary relationship for a given set of MIP sequences.

DATABASE CONSTRUCTION

MIP genes were identified using BLAST (46) from completed and partial genomes available in NCBI database and the protocol used for this purpose is the same as described in our previous studies (15,16). Thirteen human aquaporins and five plant MIPs belonging to the five major plant subfamilies were used as query sequences. Sequences of short length and those sequences with missing transmembrane segments and important loop regions were discarded. We also found that some MIPs have been wrongly annotated. For example, an MIP from Salinispora tropica is annotated as ‘low molecular weight phosphotyrosine protein phosphatase’ (NCBI accession: YP_001157491). MIPs from plants identified from previous work (13–16,47) were also included. We have also searched the motif-oriented database, MIPDB (48), and considered those MIP sequences not identified in the above BLAST search. The final data set contained 1008 MIP sequences from 341 different organisms. For each MIP sequence, presence of conserved NPA sequence motif and internal sequence similarity were examined and the sequences were also confirmed by pattern databases like Pfam (49) and PROSITE (50). The three-dimensional structure of each MIP sequence was modeled by homology modeling procedure using the same protocol that was applied earlier for plant MIPs (15,16). Structures of bovine AQP1, E. coli GlpF and archeal AQPM were used as template structures and their PDB (51) IDs are 1J4N, 1FX8 and 2F2B, respectively. The channel radius profile was calculated using the HOLE program (52) as described previously (16). Thus the contents of this database can be largely categorized into sequence and structural data and are explained in more details in the following sections. The important statistics related to sequence and structure data of MIPModDB is given in Table 1. Both the sequence and structure data for a representative MIP sequence is shown in Figure 1.

Table 1.

Important statistics of MIPModDB

Number of MIP sequences	1008
Number of organisms	341
Substitutions in the NPA motifs	219
Only in the first NPA motif (loop B)	74
Only in the second NPA motif (loop E)	82
Both NPA motifs	63
MIPs with selectivity filter similar to water-channels^a (FHTR + FHAR + FHCR + FHSR)	349
MIPs with selectivity filter similar to glycerol channels^a (WGYR + WGFR + WGWR)	170
Experimentally determined MIP structures	38

aThe selectivity filter is formed by four residues and the corresponding amino acids are given in one letter codes. The first and second residues come from the second and the fifth transmembrane segments, respectively. The other two residues are contributed by the loop E. See text for details.

Figure 1.

Screenshot of a representative MIP protein, human aquaporin 1. Information about gene structure, substitutions in NPA motif, residues forming the ar/R selectivity filter, sequence similarity with the templates, RMSD calculated for the modeled structure with the three template structures are some of the features reported for a given MIP in the protein page. Important statistics of MIPModDB aThe selectivity filter is formed by four residues and the corresponding amino acids are given in one letter codes. The first and second residues come from the second and the fifth transmembrane segments, respectively. The other two residues are contributed by the loop E. See text for details.

MIP SEQUENCE DATA

In general, only a single MIP has been identified in most of the microbial genomes. Plants have large number of MIPs in comparison to animals. For each MIP, its source and the NCBI accession ID are given. Each MIP sequence is also given a unique identifier derived from its scientific name. The first two characters are taken from its genus and the next four characters are from its species name followed by a four digit unique number. Wherever it is available, UNIPROT (53) accession ID is also provided. Apart from the primary structure information, sequence data includes exon–intron organization of the gene, substitution (if any) in the conserved NPA motif and percentage sequence similarity with the template sequences that are used in the homology modeling procedure to build three-dimensional models. Sequence similarity between a given MIP sequence and the template sequences was calculated using the program NEEDLE as available in the EMBOSS suite of programs (54). Only the modeled part of the target MIP sequence was considered for this purpose.

Gene structure

For each MIP, gene structure is represented in the form of a graphical diagram. It gives the length and the positions of the introns with respect to the secondary structures of the corresponding MIP. The red and blue vertical bars indicate the starting positions of helices and loops B/E, respectively. The information regarding the positions of each individual introns was extracted from the NCBI database annotations. Transmembrane segments are marked based on the modeled structures (see below). Knowledge of intron–exon organization helps to understand the evolution of MIPs across different organisms and MIP subfamilies within a same species. For example, in plants it has been shown that the number and positions of introns are conserved within a given MIP subfamily (14–16).

Substitutions in NPA motif

In addition to its role in solute transport and selectivity (36,55–57), substitutions in the highly conserved NPA motifs seem to be important in other functional roles such as protein targeting (58) and full expression of the protein (59). In our data set, substitutions in only the first NPA motif occur in 74 examples. In 82 cases, substitutions are found only in the second NPA. Both NPA motifs are substituted in 63 MIPs (Table 1). In total, substitutions in at least one of the NPA motifs are found in about 22% of the total MIPs in our data set. Majority of the substitutions involve mutation of either Pro or Ala of NPA motif. Only a handful of examples (less than 16) are found in which Asn is mutated indicating its important role in both in structure as a helix capping residue and function as a residue responsible for cation exclusion as demonstrated in recent studies (60).

DATA FROM MIP STRUCTURAL MODELS

Structure-based data includes the atomic model obtained using the homology modeling procedure, residues that form the ar/R selectivity filter, structure-based sequence alignment, conservation of residues at the helix–helix interface and the HOLE radius profile. The structure-based details also include the root mean square deviation (RMSD) calculated for the modeled MIP and each of the template structure using the program DALI (61). The superposed figures are available in two different orientations (Figure 2A). Many MIPs have long N- and C-terminal extensions and hence the start and end positions of the polypeptide segment used in the homology modeling method are given.

Figure 2.

(A) Superposition of the modeled MIP structure with 1J4N, the structure of bovine aquporin shown in two different orientations, namely, parallel (left) and perpendicular (right) to the channel axis. (B) The four residues of the ar/R selectivity filter superposed on that of 1J4N structure. (C) Comparison of HOLE radius profiles plotted for the water channel (green), glycerol facilitator (blue) and the modeled MIP structure (red). The ar/R selectivity filter is approximately located at −10 Å. (D) Phylogenetic tree calculated for all MIPs of a representative organism Phytophthora infestans using parsimony method.

Aromatic/arginine selectivity filter

Four residues form the outer constriction nearly 8 Å from the conserved NPA motif toward the extracellular side. These residues are contributed by second and fifth TM segments and loop E. This aromatic/arginine (ar/R) selectivity filter has been implicated in obstructing the proton conduction (62) and efficient solute transport (37,63,64). The four ar/R selectivity filter residues, represented by their one letter codes, are given for each MIP. For example, ‘FIIR’ indicates that the first and second residues Phe and Ile are from TM2 and TM5, respectively, and the last two are the loop E residues. Analysis of selectivity filter residues indicates that there are 349 MIPs, which have selectivity filter (FHTR, FHAR, FHCR or FHSR) similar to that found in water-selective aquaporin channels (FHTR or FHCR). The number of MIPs having selectivity filter (WGYR, WGFR or WGWR) similar to the glycerol-specific aquaglyceroporin (WGFR) is 170. Thus, about 50% of the total MIPs in the database have selectivity filter typical of aquaporin or aquaglyceroporin. The remaining half has substitutions that can alter the size and chemical nature of the outer constriction. This will have major influence in the nature of solute that is being transported by the channel. The channel diameters of water channel and the aquaglyceroporin at the ar/R constriction are 2.0 and 3.5 Å, respectively. The selectivity filter residues of the predicted MIP model superposed on that of the pure water channel from bovine aquaporin and that of aquaglyceroporin from E. coli are available (Figure 2B). The HOLE radius profiles of all three MIPs, water channel, glycerol channel and the modeled MIP channel, can be compared (Figure 2C). This will give an idea about the size of the region around the ar/R selectivity filter region with respect to both water channel and aquaglyceroporin and it will help the user to predict the possible size of a solute that can pass through this constriction.

Structure-based sequence alignment

As mentioned earlier, MIP sequences are diverse. For example, the sequence identities between some of the plant MIP subfamilies are as low as 20% (13,15). In such cases, programs such as ClustalW (65) that are used to generate multiple sequence alignment of a given set of sequences are unlikely to produce meaningful alignment for a diverse set of MIP sequences. Instead, if the structurally equivalent positions belonging to the TM helical segments are aligned, they are likely to result in high conservation and indicate the importance of residues in certain positions. Structure-based sequence alignment for all the TM segments and the functionally important loops B and E is provided for all the MIPs present in the same organism along with the template sequences. We have also previously identified 17 positions that occur in the helix–helix interface. They are small and weakly polar residues and have been shown to be highly conserved if the amino acids Ala, Thr, Ser, Cys and Gly are considered as a group (15,16). These positions along with the residues that form the ar/R selectivity filter are highlighted in the structure-based sequence alignment. An example of structure-based sequence alignment as obtained from the MIPModDB database is shown in Figure 3.

Figure 3.

Structure-based sequence alignment for a selected set of MIPs. This alignment is always produced with the six high-resolution MIP structures and their PDB IDs are also shown. This alignment is produced for the six transmembrane segments and the two functionally important loops B and E. The residues forming the ar/R selectivity filter are shown in the dark brown background. Seventeen positions previously identified to occur in the helix–helix interface (16) are highly group conserved when small and weakly polar residues are considered together as a group. They are displayed in cyan background.

Construction of phylogenetic tree for selected MIPs

The database provides an interface whereby the user will be able to analyze the evolutionary relationship among the selected group of MIPs. For constructing a phylogenetic tree, the user needs to select at least three sequences. The user will have the option to choose one of the three different methods, namely, neighbor-joining method, maximum likelihood method and maximum parsimony method to construct a phylogenetic tree. In addition to the complication in generating a multiple sequence alignment of diverse MIP sequences, many MIPs have long N- and C-termini as well as long loops connecting the TM segments. Hence to avoid errors, while constructing the evolutionary tree, the input used for this purpose is the structure-based sequence alignment of TM helical regions and the loops B and E. The program PHYLIP which is part of the EMBOSS suite of programs (54) is being utilized by the server to create the phylogenetic trees (Figure 2D).

IMPLEMENTATION

The information content on MIPs is maintained as a relational database using MySQL (http://www.mysql.com). This allows easy access and storage. The database is hosted on a web server running apache (http://www.apache.org/) on Fedora Core Linux platform and can be queried through the web interface which is implemented in PHP (v5.2) scripting language (http://www.php.net/).

ACCESS TO MIP DATA

MIPModDB allows the users to browse, retrieve and query the database. The Statistics page lists all the MIPs in three different categories: (i) based on the NPA sequence motif, (ii) selectivity filter residues and (iii) organism-wise grouping. Users can browse MIPs according to the conservation or substitutions that are found in the NPA boxes. One can also look for MIPs with particular residues that form the selectivity filter. MIPs are also organized as per the organism in which they occur. They are arranged in the descending order in which the model tree Populus tricocarpa is present on the top with the maximum number of 55 MIPs. The database interface allows the user to retrieve and identify MIPs using various features of MIPs as query. It can be searched by unique accession and complete or partial amino acid sequence. The search facility also allows the user to select MIP(s) with particular residues in the selectivity filter. Alternatively, MIPs with specific substitutions in the NPA motif also can be queried. A query based on an organism name will retrieve all MIP sequences from a particular species. More than one MIP features can also be used to narrow down the search. More detailed information can be retrieved by following the associated links. In addition to searching specific MIPs, the database enables users to download all the MIP sequences from a given organism or all the sequences that have the same ar/R selectivity filter residues in FASTA format. MIP sequences with specific substitutions in the NPA motif can also be downloaded. Sequence alignments used to generate the three-dimensional models can be downloaded in PIR or PAP format. For each MIP, the coordinates of the model are available in the PDB format. Similarly, the phylogenetic tree for a selected set of MIPs can be downloaded.

COMPARISON WITH OTHER TRANSPORTER DATABASES

There are databases which are developed specifically for membrane proteins that are involved in transporting the solutes across the membranes. The database TransportDB (66) provides information about the complete list of transporters for a given organism. For example, in the case of humans, the type of transporters listed include those that are ATP-dependent, ion channels, secondary transporters and unclassified. It also lists all outer membrane porins and channels from different organisms. The other database TCDB is a Transporter Classification Database (67) and is a classification system for membrane transporters. The superfamilies of transporters in this database include channel-forming toxins and peptides, transporters, symporters, antiporters, porins, carriers and ion channels. Both the above databases do not include all the MIP superfamily members whose members are known to predominantly transport neutral solutes. For example, while TransportDB lists human MIPs, MIP members in other organisms are not found. TCDB does not seem to include MIP superfamily although some MIP sequences are found. Moreover, while the above two databases largely contain sequence information and known PDB structures, MIPModDB provides structural models and associated information for more than 1000 MIP sequences.

FUTURE DIRECTIONS

Few MIPs have been functionally very well characterized and experimental studies are being carried out to determine the solute transport properties of many more MIPs. In the next version of MIPModDB database, whenever it is available, functional properties of MIPs, post-translational modification and cellular localization will be annotated and the related literature will be linked through PUBMED. As new MIPs are being recognized, the users will have the option to upload the sequences in the future version. Ultimately, our software will follow a pipeline procedure that will take an MIP sequence to series of steps which will include extracting the sequence features, building a homology model, identifying the selectivity filter residues, generating the HOLE radius profile and a possible prediction of the solutes that are likely to be transported.

FUNDING

The Department of Biotechnology, Government of India (BT/HRD/34/17/2008, partial); CSIR (senior research fellowship to A.B.G. and R.K.V.). Funding for open access charge: Indian Institute of Technology Kanpur. Conflict of interest statement. None declared.

65 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Structural context shapes the aquaporin selectivity filter.

Authors: David F Savage; Joseph D O'Connell; Larry J W Miercke; Janet Finer-Moore; Robert M Stroud
Journal: Proc Natl Acad Sci U S A Date: 2010-09-20 Impact factor: 11.205

3. MIPs and their role in the exchange of metalloids. Preface.

Authors: Thomas P Jahn; Gerd P Bienert
Journal: Adv Exp Med Biol Date: 2010 Impact factor: 2.622

Review 4. The evolutionary aspects of aquaporin family.

Authors: Kenichi Ishibashi; Shintaro Kondo; Shigeki Hara; Yoshiyuki Morishita
Journal: Am J Physiol Regul Integr Comp Physiol Date: 2010-12-09 Impact factor: 3.619

Review 5. Aquaporins in kidney pathophysiology.

Authors: Yumi Noda; Eisei Sohara; Eriko Ohta; Sei Sasaki
Journal: Nat Rev Nephrol Date: 2010-01-26 Impact factor: 28.314

6. HOLE: a program for the analysis of the pore dimensions of ion channel structural models.

Authors: O S Smart; J G Neduvelil; X Wang; B A Wallace; M S Sansom
Journal: J Mol Graph Date: 1996-12

Review 7. Aquaporin water channels in gastrointestinal physiology.

Authors: T Ma; A S Verkman
Journal: J Physiol Date: 1999-06-01 Impact factor: 5.182

8. Solanaceae XIPs are plasma membrane aquaporins that facilitate the transport of many uncharged substrates.

Authors: Gerd Patrick Bienert; Manuela Désirée Bienert; Thomas Paul Jahn; Marc Boutry; François Chaumont
Journal: Plant J Date: 2011-03-01 Impact factor: 6.417

Review 9. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond.

Authors: Ryan Lister; Brian D Gregory; Joseph R Ecker
Journal: Curr Opin Plant Biol Date: 2009-01-20 Impact factor: 7.834

10. Ongoing and future developments at the Universal Protein Resource.

Authors:
Journal: Nucleic Acids Res Date: 2010-11-04 Impact factor: 16.971

13 in total

1. Presence of Intra-helical Salt-Bridge in Loop E Half-Helix Can Influence the Transport Properties of AQP1 and GlpF Channels: Molecular Dynamics Simulations of In Silico Mutants.

Authors: Alok Jain; Ravi Kumar Verma; Ramasubbu Sankararamakrishnan
Journal: J Membr Biol Date: 2018-11-23 Impact factor: 1.843

2. Aquaglyceroporin 2 controls susceptibility to melarsoprol and pentamidine in African trypanosomes.

Authors: Nicola Baker; Lucy Glover; Jane C Munday; David Aguinaga Andrés; Michael P Barrett; Harry P de Koning; David Horn
Journal: Proc Natl Acad Sci U S A Date: 2012-06-18 Impact factor: 11.205

3. Molecular identification of first putative aquaporins in snails.

Authors: Joanna R Pieńkowska; Ewa Kosicka; Małgorzata Wojtkowska; Hanna Kmita; Andrzej Lesicki
Journal: J Membr Biol Date: 2014-01-21 Impact factor: 1.843

4. New subfamilies of major intrinsic proteins in fungi suggest novel transport properties in fungal channels: implications for the host-fungal interactions.

Authors: Ravi Kumar Verma; Neel Duti Prabh; Ramasubbu Sankararamakrishnan
Journal: BMC Evol Biol Date: 2014-08-12 Impact factor: 3.260

Review 5. Pollen Aquaporins: The Solute Factor.

Authors: Juliana A Pérez Di Giorgio; Gabriela C Soto; Jorge P Muschietti; Gabriela Amodeo
Journal: Front Plant Sci Date: 2016-11-09 Impact factor: 5.753

Review 6. Drug resistance in African trypanosomiasis: the melarsoprol and pentamidine story.

Authors: Nicola Baker; Harry P de Koning; Pascal Mäser; David Horn
Journal: Trends Parasitol Date: 2013-01-30

7. Two putative-aquaporin genes are differentially expressed during arbuscular mycorrhizal symbiosis in Lotus japonicus.

Authors: Marco Giovannetti; Raffaella Balestrini; Veronica Volpe; Mike Guether; Daniel Straub; Alex Costa; Uwe Ludewig; Paola Bonfante
Journal: BMC Plant Biol Date: 2012-10-09 Impact factor: 4.215

8. Aquaporin 2 mutations in Trypanosoma brucei gambiense field isolates correlate with decreased susceptibility to pentamidine and melarsoprol.

Authors: Fabrice E Graf; Philipp Ludin; Tanja Wenzler; Marcel Kaiser; Reto Brun; Patient Pati Pyana; Philippe Büscher; Harry P de Koning; David Horn; Pascal Mäser
Journal: PLoS Negl Trop Dis Date: 2013-10-10

9. Pentamidine Is Not a Permeant but a Nanomolar Inhibitor of the Trypanosoma brucei Aquaglyceroporin-2.

Authors: Jie Song; Nicola Baker; Monja Rothert; Björn Henke; Laura Jeacock; David Horn; Eric Beitz
Journal: PLoS Pathog Date: 2016-02-01 Impact factor: 6.823

10. A structural preview of aquaporin 8 via homology modeling of seven vertebrate isoforms.

Authors: Andreas Kirscht; Yonathan Sonntag; Per Kjellbom; Urban Johanson
Journal: BMC Struct Biol Date: 2018-02-17