Literature DB >> 25712690

LCR-eXXXplorer: a web platform to search, visualize and share data for low complexity regions in protein sequences.

Ioannis Kirmitzoglou1, Vasilis J Promponas1.   

Abstract

MOTIVATION: Local compositionally biased and low complexity regions (LCRs) in amino acid sequences have initially attracted the interest of researchers due to their implication in generating artifacts in sequence database searches. There is accumulating evidence of the biological significance of LCRs both in physiological and in pathological situations. Nonetheless, LCR-related algorithms and tools have not gained wide appreciation across the research community, partly due to the fact that only a handful of user-friendly software is currently freely available.
RESULTS: We developed LCR-eXXXplorer, an extensible online platform attempting to fill this gap. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Moreover, users may perform powerful queries against a custom designed sequence/LCR-centric database. We anticipate that LCR-eXXXplorer will be a useful starting point in research efforts for the elucidation of the structure, function and evolution of proteins with LCRs.
AVAILABILITY AND IMPLEMENTATION: LCR-eXXXplorer is freely available at the URL http://repeat.biol.ucy.ac.cy/lcr-exxxplorer. CONTACT: vprobon@ucy.ac.cy SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25712690      PMCID: PMC4481844          DOI: 10.1093/bioinformatics/btv115

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

During the past 30 years, the main focus of research related to regions of local compositional extremes (low complexity regions; LCRs) was their identification for the purpose of sequence masking (Altschul ; Wootton and Federhen, 1993; Ye ) for eliminating spurious hits in database searches (Promponas ; Tsoka ). Several studies have been published showcasing the abundance and importance of such regions at the molecular/structural (e.g. Radivojac ; Tamana ), functional (e.g. Andrade ; Haerty and Golding, 2010), organismic (e.g. Miskinyte ; Pizzi and Frontali, 2001) and habitat level (e.g. Nandi ). Despite the apparent biological importance of LCRs there’s a distinct lack of tools or services capable of helping biologists to study them in depth. Most of the methods capable of detecting LCRs were developed for the sole purpose of masking them and are meant to be used from the command line as part of a sequence analysis or search pipeline. While some tools, such as SEG (Wootton and Federhen, 1993), CAST (Promponas ) or BIAS (Kuznetsov and Hwang, 2006) do offer more advanced reports as an option, their results are mostly meant to be parsed by a computer software and not a biologist. In this work, we present LCR-eXXXplorer, an online service to search, visualize and share LCRs in protein sequences. We highlight its unique features that may facilitate research efforts towards understanding the biological roles of proteins with LCRs.

2 Functionality

2.1 General description

LCR-eXXXplorer is built upon a customized instance of GBrowse (Stein ) modified to properly work with protein sequences. It currently contains 545 000 sequences (retrieved from UniProt/SwissProt) annotated with over 16 million LCR-related annotations. Along with information about sequence complexity, LCR-eXXXplorer displays external annotations from UniProt, as well as predicted disordered and binding regions by utilizing IUPRED (Dosztányi ) and ANCHOR (Dosztányi ; Mészáros ) respectively. Data are stored in a MySQL database, using a database schema based on the SeqFeature schema internally used by GBrowse (see Supplementary Methods and Supplementary Fig. S1).

2.2 Key functionality

A basic keyword-based search functionality (allowing wildcards) is available for retrieving protein sequences with matching UniProtKB Accession(s)/Entry Name(s) or gene name(s). Moreover, the ‘Advanced Search’ option (specifically implemented for this process as a custom-made GBrowse plug-in) facilitates more fine-tuned queries. Using the basic search mode, users are able to retrieve up to 500 entries using simple keyword search (e.g. with a single UniProt identifier or accession number). An ‘Advanced Search’ may be initiated by querying a suitable combination of UniProt fields (e.g. gene or protein name, source organism) or LCR properties (e.g. type of LCR, percent of masked residues)—yet, only the AND Boolean operator is currently supported for combining search criteria. Under this mode, batch search functionality is also available using a list of UniProt accession numbers: this feature enables users to take advantage of the powerful UniProt search engine and come up with a list of entries specifying complex search criteria. Results can be displayed in the browser (with a limit of 15 000 entries) or downloaded in a plain text tab-delimited formatted file providing statistics on the LCR content for further processing (with a limit of 50 000 entries). Different options of masking protein sequences are provided for each individual sequence from the graphical GBrowse ‘protein details’ view and sequences are available in FASTA format. The Downloads section offers LCR-eXXXplorer the option of downloading the complete set of sequences in FASTA formatted files masked for LCRs, the complete set of annotations in GFF3 format or a CSV formatted table with LCR statistics for each sequence in the database. Users may also search for data in LCR-eXXXplorer using BLASTP (Ye ) powered by the user-friendly SequenceServer (Priyam et al., manuscript in preparation). Three underlying databases (unmasked, SEG or CAST masking with default parameters) are provided, with the masked databases being a unique feature of this service; this configuration is shown to improve database search results (Kirmitzoglou, 2014; Kirmitzoglou et al., in preparation). Furthermore, users may initiate BLASTP searches against the sequence databases hosted at the NCBI web servers (http://www.ncbi.nlm.nih.gov/) using as input query the currently displayed sequence; several options of applying masking using any combination of amino acid residue types and detection algorithm are available. The main strength of LCR-eXXXplorer—setting it apart from similar services—is its visualization capabilities. Displaying LCRs in a protein sequence is more informative when information regarding other functional or structural features is also shown (Supplementary Fig. S2). By taking advantage of the underlying GBrowse capability to display features stored on a remote web accessible server, LCR-eXXXplorer incorporates selected annotations from UniProt into the main browser interface. UniProt annotations displayed in LCR-eXXXplorer are of two major types: (i) general annotations associated with the protein sequence (e.g. protein name, gene ontology terms, PDB accession IDs) and (ii) position-specific annotations, which may include domains, sites, secondary structure etc. These annotations are fetched from UniProt/SwissProt on-the-fly for the protein sequence of interest. This is facilitated by a custom-designed cgi-bin script and the retrieved features are further post-processed to a format suitable for the LCR-eXXXplorer. Using the same underlying mechanism, LCR-eXXXplorer can display tracks generated by another instance of GBrowse, a Distributed Annotation System (DAS) server or valid GFF3 files generated by the user. The only requirement is that the remote tracks must use the same coordinates system, which in the case of LCR-eXXXplorer is the protein sequence itself. Thus, users may practically display results from any LCR-detection tool (or any other protein sequence analysis tool) alongside the data provided by LCR-eXXXplorer.

2.3 Comparison to similar services

Two services for providing access to protein sequence LCR-related data are currently available online. The one most closely related to LCR-eXXXplorer is LPS-annotate (Harbi ), which identifies LCRs based on the LPS algorithm (Harrison and Gerstein, 2003), compared to SEG. These LCR annotations are accompanied with disordered region predictions by DISOPRED (Buchan ). Even though LPS-annotate is an invaluable resource for researchers interested in compositionally biased proteins, its main drawback is the lack of any effective visualization options. Moreover, the underlying database (according to data available at the LPS-annotate website) has not been updated since 2009. Recently, the HRaP server (Lobanov ) was developed, specializing in the study of homopolymeric repeats, which comprise a highly specialized case of LCRs, thus it is not further discussed herein. A detailed presentation of web-based services providing information related to LCRs is presented in Kirmitzoglou (2014).

3 Future Developments

The current version of the LCR-eXXXplorer web server offers several tools for facilitating research on proteins with LCRs, including BLAST search and interactive visualization by exploiting inherent GBrowse features. Given the genuine interest of our research group in LCR-containing proteins, we plan to expand this service in the near future. More specifically, we are in the process of automating the LCR-eXXXplorer update procedure to regularly synchronize with UniProt updates. Moreover, the customizations performed on different GBrowse modules require some additional work (and appropriate documentation) for enabling full programmatic access to our service through the REST interface already available for GBrowse. An important improvement destined for the next version of LCR-eXXXplorer is enabling full support of Boolean queries against fields in the underlying database. The modular (both in terms of data and software) architecture of LCR-eXXXplorer enables easy incorporation of novel datasets (e.g. complete genome sequences) and LCR detection tools in future versions.
  19 in total

1.  Reproducibility in genome sequence annotation: the Plasmodium falciparum chromosome 2 case.

Authors:  S Tsoka; V Promponas; C A Ouzounis
Journal:  FEBS Lett       Date:  1999-05-28       Impact factor: 4.124

Review 2.  Protein repeats: structures, functions, and evolution.

Authors:  M A Andrade; C Perez-Iratxeta; C P Ponting
Journal:  J Struct Biol       Date:  2001 May-Jun       Impact factor: 2.867

3.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

4.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.

Authors:  Zsuzsanna Dosztányi; Veronika Csizmok; Peter Tompa; István Simon
Journal:  Bioinformatics       Date:  2005-06-14       Impact factor: 6.937

5.  Low-complexity regions in Plasmodium falciparum proteins.

Authors:  E Pizzi; C Frontali
Journal:  Genome Res       Date:  2001-02       Impact factor: 9.043

Review 6.  Issues in searching molecular sequence databases.

Authors:  S F Altschul; M S Boguski; W Gish; J C Wootton
Journal:  Nat Genet       Date:  1994-02       Impact factor: 38.330

7.  The low complexity proteins from enteric pathogenic bacteria: taxonomic parallels embedded in diversity.

Authors:  Tannistha Nandi; Krishnamoorthy Kannan; Srinivasan Ramachandran
Journal:  In Silico Biol       Date:  2003

8.  HRaP: database of occurrence of HomoRepeats and patterns in proteomes.

Authors:  Mikhail Yu Lobanov; Igor V Sokolovskiy; Oxana V Galzitskaya
Journal:  Nucleic Acids Res       Date:  2013-10-22       Impact factor: 16.971

9.  A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes.

Authors:  Paul M Harrison; Mark Gerstein
Journal:  Genome Biol       Date:  2003-05-30       Impact factor: 13.583

10.  The genetic basis of Escherichia coli pathoadaptation to macrophages.

Authors:  Migla Miskinyte; Ana Sousa; Ricardo S Ramiro; Jorge A Moura de Sousa; Jerzy Kotlinowski; Iris Caramalho; Sara Magalhães; Miguel P Soares; Isabel Gordo
Journal:  PLoS Pathog       Date:  2013-12-12       Impact factor: 6.823

View more
  10 in total

1.  Dual ARID1A/ARID1B loss leads to rapid carcinogenesis and disruptive redistribution of BAF complexes.

Authors:  Zixi Wang; Kenian Chen; Yuemeng Jia; Jen-Chieh Chuang; Xuxu Sun; Yu-Hsuan Lin; Cemre Celen; Lin Li; Fang Huang; Xin Liu; Diego H Castrillon; Tao Wang; Hao Zhu
Journal:  Nat Cancer       Date:  2020-09-07

2.  Profiles of low complexity regions in Apicomplexa.

Authors:  Fabia U Battistuzzi; Kristan A Schneider; Matthew K Spencer; David Fisher; Sophia Chaudhry; Ananias A Escalante
Journal:  BMC Evol Biol       Date:  2016-02-29       Impact factor: 3.260

3.  A bioinformatics pipeline to search functional motifs within whole-proteome data: a case study of poxviruses.

Authors:  Haitham Sobhy
Journal:  Virus Genes       Date:  2016-12-20       Impact factor: 2.332

4.  PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins.

Authors:  Patryk Jarnot; Joanna Ziemska-Legiecka; Laszlo Dobson; Matthew Merski; Pablo Mier; Miguel A Andrade-Navarro; John M Hancock; Zsuzsanna Dosztányi; Lisanna Paladin; Marco Necci; Damiano Piovesan; Silvio C E Tosatto; Vasilis J Promponas; Marcin Grynberg; Aleksandra Gruca
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

5.  Pseudomonas aeruginosa core metabolism exerts a widespread growth-independent control on virulence.

Authors:  Kaliopi Georgiades; Theodoulakis Christofi; Stella Tamana; Stavria Panayidou; Vasilis J Promponas; Yiorgos Apidianakis
Journal:  Sci Rep       Date:  2020-06-11       Impact factor: 4.379

6.  Disentangling the complexity of low complexity proteins.

Authors:  Pablo Mier; Lisanna Paladin; Stella Tamana; Sophia Petrosian; Borbála Hajdu-Soltész; Annika Urbanek; Aleksandra Gruca; Dariusz Plewczynski; Marcin Grynberg; Pau Bernadó; Zoltán Gáspári; Christos A Ouzounis; Vasilis J Promponas; Andrey V Kajava; John M Hancock; Silvio C E Tosatto; Zsuzsanna Dosztanyi; Miguel A Andrade-Navarro
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

7.  Increased BUB1B/BUBR1 expression contributes to aberrant DNA repair activity leading to resistance to DNA-damaging agents.

Authors:  Kazumasa Komura; Teruo Inamoto; Takuya Tsujino; Yusuke Matsui; Tsuyoshi Konuma; Kazuki Nishimura; Taizo Uchimoto; Takeshi Tsutsumi; Tomohisa Matsunaga; Ryoichi Maenosono; Yuki Yoshikawa; Kohei Taniguchi; Tomohito Tanaka; Hirofumi Uehara; Koichi Hirata; Hajime Hirano; Hayahito Nomi; Yoshinobu Hirose; Fumihito Ono; Haruhito Azuma
Journal:  Oncogene       Date:  2021-09-20       Impact factor: 9.867

8.  De novo determination of mosquitocidal Cry11Aa and Cry11Ba structures from naturally-occurring nanocrystals.

Authors:  Guillaume Tetreau; Michael R Sawaya; Elke De Zitter; Elena A Andreeva; Anne-Sophie Banneville; Natalie A Schibrowsky; Nicolas Coquelle; Aaron S Brewster; Marie Luise Grünbein; Gabriela Nass Kovacs; Mark S Hunter; Marco Kloos; Raymond G Sierra; Giorgio Schiro; Pei Qiao; Myriam Stricker; Dennis Bideshi; Iris D Young; Ninon Zala; Sylvain Engilberge; Alexander Gorel; Luca Signor; Jean-Marie Teulon; Mario Hilpert; Lutz Foucar; Johan Bielecki; Richard Bean; Raphael de Wijn; Tokushi Sato; Henry Kirkwood; Romain Letrun; Alexander Batyuk; Irina Snigireva; Daphna Fenel; Robin Schubert; Ethan J Canfield; Mario M Alba; Frédéric Laporte; Laurence Després; Maria Bacia; Amandine Roux; Christian Chapelle; François Riobé; Olivier Maury; Wai Li Ling; Sébastien Boutet; Adrian Mancuso; Irina Gutsche; Eric Girard; Thomas R M Barends; Jean-Luc Pellequer; Hyun-Woo Park; Arthur D Laganowsky; Jose Rodriguez; Manfred Burghammer; Robert L Shoeman; R Bruce Doak; Martin Weik; Nicholas K Sauter; Brian Federici; Duilio Cascio; Ilme Schlichting; Jacques-Philippe Colletier
Journal:  Nat Commun       Date:  2022-07-28       Impact factor: 17.694

9.  Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved.

Authors:  Chrysa Ntountoumi; Panayotis Vlastaridis; Dimitris Mossialos; Constantinos Stathopoulos; Ioannis Iliopoulos; Vasilios Promponas; Stephen G Oliver; Grigoris D Amoutzias
Journal:  Nucleic Acids Res       Date:  2019-11-04       Impact factor: 16.971

10.  A Selective Autophagy Pathway for Phase-Separated Endocytic Protein Deposits.

Authors:  Florian Wilfling; Chia-Wei Lee; Philipp S Erdmann; Yumei Zheng; Dawafuti Sherpa; Stefan Jentsch; Boris Pfander; Brenda A Schulman; Wolfgang Baumeister
Journal:  Mol Cell       Date:  2020-11-17       Impact factor: 17.970

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.