Literature DB >> 16381861

LGICdb: a manually curated sequence database after the genomes.

Marco Donizelli1, Marie-Ange Djite, Nicolas Le Novère.   

Abstract

Ligand-gated ion channels form transmembrane ionic pores controlled by the binding of chemicals. The LGICdb aims to be a non-redundant, manually curated resource offering access to the large number of subunits composing extracellularly activated ligand-gated ion channels, such as nicotinic, ATP, GABA and glutamate ionotropic receptors. Composed of more than 500 human curated entries, the XML native database has been relocated in 2004 to the EBI. Its facilities have been enhanced with a new search system, customized multiple sequence alignments and manipulation of protein structures (http://www.ebi.ac.uk/compneur-srv/LGICdb/). Despite the vast improvement of general sequence resources, the LGICdb still provide sequences unavailable elsewhere.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381861      PMCID: PMC1347466          DOI: 10.1093/nar/gkj104

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Ligand-gated ion channels are transmembrane proteins that can exist under different conformations, at least one forming a pore through the membrane connecting the two neighbour compartments. The equilibrium between the various conformations is affected by the binding of ligands on the channels. Phenomenologically, the ligands ‘open’ or ‘close’ the channel (1,2). There are three different superfamilies of extracellularly activated ligand-gated ion channel subunits. The receptors of the ‘cys-loop’ superfamily (named after a conserved 13 residue loop closed by a disulfide bridge) are made up of five homologous subunits (3). Each subunit contains an extracellular N-terminal domain, followed by four transmembrane segments. The loop located between TM3 and TM4 composes the intracellular domain, of variable length. The subunits of the cys-loop superfamily are distributed in two clear monophyletic groups, one containing the subunits forming anionic channels (GABAA and GABAC, glycine, GLUCl, histamine and 5HTmod1 receptors) and one containing the subunits forming cationic channels (5-HT3 and nicotinic receptors) (4,5). Although site-directed mutagenesis in the channel part succeeded to invert selectivity (6,7), few examples transgressing this frontier have been discovered so far in nature (some subunits from Lymnea stagnalis may be exceptions). The ATP-gated channels (ATP2x receptors) are made of three homologous subunits (8,9). Each subunit displays two transmembrane segments separated by an extracellular domain. Finally, the glutamate-activated cationic channels are made of four homologous subunits (10,11). Each subunit contains an extracellular N-terminal domain similar to the bacterial leucine, isoleucine, valine binding protein (LIVBP), followed by half of the agonist binding core, two transmembrane domains separated by a ‘P-loop’, the second half of the agonist-binding core and a third transmembrane segment. The agonist binding core is similar to the bacterial lysine, arginine, ornithine binding protein (LAOBP). The cytoplasmic tail has a variable length. Many of the subunits from the three superfamilies possess multiple isoforms, generated by alternative splicing or editing. The Ligand-Gated Ion Channel database (LGICdb) was created in the mid-1990s, as a repository offering a unique entry per gene (12). All the data are manually curated, in order to reduce redundancy and correct the errors coming from sequencing or introduced by automated methods of data mining (such as gene prediction).

CONSTRUCTION AND CONTENT

The LGICdb evolved significantly since the latest report in the literature. The resource created at the Pasteur Institute of Paris is now hosted by the European Bioinformatics Institute. Seminal was the transformation of the native format from a dedicated markup to an XML format. This move permitted a better syntax checking, the design of a validating editor, easier generation of export formats and the further treatment of the data by a native XML database engine. The number of entries increased much, thanks to the numerous genome projects. The systematic use of Ensembl (13) and UniProt (14) to retrieve possible entries permitted a more comprehensive population of the database. The detailed procedure we used to build LGICdb entries is described elsewhere (15). Briefly, for each gene, the various transcripts and proteins are identified based on the published experimental sequences, but also on the predicted gene structure. Predicted isoforms are taken into account only if they are backed by experimental reports. If several variants exist for the ‘same’ sequence, all the possibilities are taken into account and a decision is taken based on the frequency of description of each variant, the comparisons with close orthologous sequences, etc. The resulting sequence is sometimes a chimaera built from several primary data. In the infrequent case where no consensus can be achieved, the variants are all presented with adequate annotation. When a predicted gene structure is incomplete, a tentative reconstruction is proposed, based on the genomic sequence and homologous subunits. The release 57 (September 9, 2005) of the LGICdb contained 516 entries, totalizing 7 million nucleotides, 400 000 residues and 30 3D structures. Each entry of the database is composed of one XML file. A Perl script using BioPerl (16), XML::Simple () and XML::Writer () generates one HTML page per entry, and files in the FASTA, GenBank and EMBL formats for all sequences (Figure 1). In addition, browsing lists of entries, by entry accession and by organisms, are also generated.
Figure 1

Schema describing the relationships between the various components of the LGICdb. Square boxes represent servers. http: Apache Hypertext Transfer Protocol server; jsp: Jakarta Tomcat Java Server Page. Gears represent external applications.

User can use the central FASTA (17) search of the EBI to retrieve entries based on sequence similarity. String searching of the whole LGICdb content is implemented by using the API provided by the Apache Xindice native XML database (). This solution has proved to be adequate in terms of speed for the amount of data we currently have in the LGICdb. Custom multiple sequence alignments can now be generated on the result of a string search, using ClustalW (18). Other multiple sequence alignments methods should be implemented in the future. The atomic coordinates can be manipulated with the Jmol applet (). Users can currently download the whole database (∼13 MB). Selective download following string and sequence similarity search are under development. While the core of the database is available through an Apache HTTP () server, with HTML pages generated with PHP, the string search and the multiple alignments are provided by an Apache Tomcat server ().

DISCUSSION

One could argue about the utility of manually curated databases of thematic focus, now that the community can benefit from large efforts such as Ensembl and UniProt. However, several problems are directly triggered by the large-scale aspect of those efforts. The first issue, that triggered the creation of the LGICdb in the first place, is the redundancy. Although efforts have been undertaken to reduce this redundancy to a minimum, and to gather overlapping information together, there are still four entries for the human gene CHRNA7 in UniProt. The situation is worse in non-curated resources. For instance, there are 10 proteins corresponding to CHRNA7 in GenBank. While this is an unavoidable problem for a general resource, it can be easily solved when the resource is of limited focus. There is only one entry in the LGICdb for the human nicotinic subunit alpha7. In addition, while the LGICdb reports the various alternative splicing and editing, sequencing errors are corrected based on diverse criteria, such as the comparison with orthologous sequences. Another problem is generated by the automatic annotation, such as the recognition of genes. For instance, the human GABA receptor rho3 subunit is splitted in two parts in Ensembl. The C-terminal exon is reported in UniProt, and therefore has been annotated as ‘known transcript’ in Ensembl, while part of the N-terminal portion has been predicted by Ensembl, and annotated as ‘novel transcript’. As a consequence, the longest sequence for human rho3 is currently the one in the LGICdb, built by fusion. The LGICdb actually belongs to a constellation of expert-maintained topical data resources. While their size is limited (by comparison with general purpose public resources), they serve data of high accuracy. In the field of transmembrane proteins, one could quote the GPCRDB () on G-protein coupled receptors (19), the VKCDB () on voltage-gated ion channels (20), the TCDB () on transporters and the protein kinase resource () (21).

PERSPECTIVES

While a long-recognized resource in the field of neurotransmitter receptor research, the LGICdb progressively attracts attention on a larger audience, as witnessed by the Science NetWatch on August 26, 2005. It becomes all the more important to complete the database in order to improve its comprehensivness. However, if the availability of complete genomes could have suggested a possible completion of the work, the results of the FANTOM consortium (22), describing an unforeseen number of transcripts, make perhaps such a prospect unrealistic for a project without dedicated resources. All data contained in the LGICdb may be copied and redistributed freely, without any restriction. If one uses some of these data in a scientific publication, authors would welcome a citation of the resource in the list of references.
  20 in total

1.  M2 pore mutations convert the glycine receptor channel from being anion- to cation-selective.

Authors:  A Keramidas; A J Moorhouse; C R French; P R Schofield; P H Barry
Journal:  Biophys J       Date:  2000-07       Impact factor: 4.033

Review 2.  The Ligand Gated Ion Channel database: an example of a sequence database in neuroscience.

Authors:  N Le Novère; J P Changeux
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2001-08-29       Impact factor: 6.237

3.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

4.  Mutations in the channel domain of a neuronal nicotinic receptor convert ion selectivity from cationic to anionic.

Authors:  J L Galzi; A Devillers-Thiéry; N Hussy; S Bertrand; J P Changeux; D Bertrand
Journal:  Nature       Date:  1992-10-08       Impact factor: 49.962

Review 5.  Function and structure in glycine receptors and some of their relatives.

Authors:  David Colquhoun; Lucia G Sivilotti
Journal:  Trends Neurosci       Date:  2004-06       Impact factor: 13.837

Review 6.  Structure and function of glutamate receptor ion channels.

Authors:  Mark L Mayer; Neali Armstrong
Journal:  Annu Rev Physiol       Date:  2004       Impact factor: 19.318

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

Review 8.  Molecular physiology of P2X receptors.

Authors:  R Alan North
Journal:  Physiol Rev       Date:  2002-10       Impact factor: 37.312

9.  The transcriptional landscape of the mammalian genome.

Authors:  P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal:  Science       Date:  2005-09-02       Impact factor: 47.728

Review 10.  VKCDB: voltage-gated potassium channel database.

Authors:  Bin Li; Warren J Gallin
Journal:  BMC Bioinformatics       Date:  2004-01-09       Impact factor: 3.169

View more
  15 in total

Review 1.  Modulating inhibitory ligand-gated ion channels.

Authors:  Michael Cascio
Journal:  AAPS J       Date:  2006-05-26       Impact factor: 4.009

Review 2.  End-plate acetylcholine receptor: structure, mechanism, pharmacology, and disease.

Authors:  Steven M Sine
Journal:  Physiol Rev       Date:  2012-07       Impact factor: 37.312

Review 3.  Cracking the nodule worm code advances knowledge of parasite biology and biotechnology to tackle major diseases of livestock.

Authors:  Rahul Tyagi; Anja Joachim; Bärbel Ruttkowski; Bruce A Rosa; John C Martin; Kymberlie Hallsworth-Pepin; Xu Zhang; Philip Ozersky; Richard K Wilson; Shoba Ranganathan; Paul W Sternberg; Robin B Gasser; Makedonka Mitreva
Journal:  Biotechnol Adv       Date:  2015-05-27       Impact factor: 14.227

4.  Web services at the European Bioinformatics Institute-2009.

Authors:  Hamish McWilliam; Franck Valentin; Mickael Goujon; Weizhong Li; Menaka Narayanasamy; Jenny Martin; Teresa Miyar; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

5.  5-HT3 receptor ion size selectivity is a property of the transmembrane channel, not the cytoplasmic vestibule portals.

Authors:  Nicole K McKinnon; David C Reeves; Myles H Akabas
Journal:  J Gen Physiol       Date:  2011-10       Impact factor: 4.086

6.  Sequence-dependent gating of an ion channel by DNA hairpin molecules.

Authors:  Veronica S DeGuzman; Clarence C Lee; David W Deamer; Wenonah A Vercoutere
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

Review 7.  Briefing in application of machine learning methods in ion channel prediction.

Authors:  Hao Lin; Wei Chen
Journal:  ScientificWorldJournal       Date:  2015-04-16

8.  HMPAS: Human Membrane Protein Analysis System.

Authors:  Min-Sung Kim; Gwan-Su Yi
Journal:  Proteome Sci       Date:  2013-11-07       Impact factor: 2.480

9.  IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels.

Authors:  Anthony J Harmar; Rebecca A Hills; Edward M Rosser; Martin Jones; O Peter Buneman; Donald R Dunbar; Stuart D Greenhill; Valerie A Hale; Joanna L Sharman; Tom I Bonner; William A Catterall; Anthony P Davenport; Philippe Delagrange; Colin T Dollery; Steven M Foord; George A Gutman; Vincent Laudet; Richard R Neubig; Eliot H Ohlstein; Richard W Olsen; John Peters; Jean-Philippe Pin; Robert R Ruffolo; David B Searls; Mathew W Wright; Michael Spedding
Journal:  Nucleic Acids Res       Date:  2008-10-23       Impact factor: 16.971

10.  Gene characterization index: assessing the depth of gene annotation.

Authors:  Danielle Kemmer; Raf M Podowski; Dimas Yusuf; Jochen Brumm; Warren Cheung; Claes Wahlestedt; Boris Lenhard; Wyeth W Wasserman
Journal:  PLoS One       Date:  2008-01-23       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.