Literature DB >> 25990727

APPRIS WebServer and WebServices.

Jose Manuel Rodriguez¹, Angel Carro², Alfonso Valencia³, Michael L Tress⁴.

Abstract

This paper introduces the APPRIS WebServer (http://appris.bioinfo.cnio.es) and WebServices (http://apprisws.bioinfo.cnio.es). Both the web servers and the web services are based around the APPRIS Database, a database that presently houses annotations of splice isoforms for five different vertebrate genomes. The APPRIS WebServer and WebServices provide access to the computational methods implemented in the APPRIS Database, while the APPRIS WebServices also allows retrieval of the annotations. The APPRIS WebServer and WebServices annotate splice isoforms with protein structural and functional features, and with data from cross-species alignments. In addition they can use the annotations of structure, function and conservation to select a single reference isoform for each protein-coding gene (the principal protein isoform). APPRIS principal isoforms have been shown to agree overwhelmingly with the main protein isoform detected in proteomics experiments. The APPRIS WebServer allows for the annotation of splice isoforms for individual genes, and provides a range of visual representations and tools to allow researchers to identify the likely effect of splicing events. The APPRIS WebServices permit users to generate annotations automatically in high throughput mode and to interrogate the annotations in the APPRIS Database. The APPRIS WebServices have been implemented using REST architecture to be flexible, modular and automatic.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Protein Isoforms

Year: 2015 PMID： 25990727 PMCID： PMC4489225 DOI： 10.1093/nar/gkv512

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The alternative splicing of messenger RNA generates a range of mature RNA transcripts, which if translated into stable proteins, would greatly enrich the repertoire of cellular functions (1,2). The functional annotation of these alternative isoforms presents a serious challenge, not least because of the sheer quantity of genomic data that is being generated by genome annotation projects (3,4). The current human genome GENCODE21 (equivalent to Ensembl 77, (5,6)) currently houses 20 317 protein-coding genes and 93 107 coding transcripts, and the number of annotated transcripts is growing rapidly. Alternative splicing of pre-messenger RNA has been estimated to occur in 95% of multi-exon human genes (7,8). APPRIS (9) was developed within the GENCODE consortium (5) to address the challenge of annotating alternative protein isoforms with functional information, a topic of growing interest both in normal cells and in disease states (10–12). The APPRIS modules annotate splice isoforms with protein 3D structure information, functionally important residues, Pfam (13) domains, signal peptides and trans-membrane helices, and a score for the cross-species conservation of each transcript model. What differentiates APPRIS from other methods for annotating splice isoforms is that it uses only the most reliable annotations for protein structure, function and cross-species conservation, and that it uses these annotations to select a single reference CDS as the ‘principal’ isoform. This principal isoform is the one with the most conserved protein features and most evidence of cross-species conservation, while those isoforms with unusual, missing or non-conserved protein features are flagged as alternative (9). Recent results from our group and others (14,15) suggest that many genes have a single clearly definable dominant protein isoform and that the alternative isoforms are either expressed less frequently, in limited tissues or in unique developmental stages, or have a much shorter half-life. The standard strategy for determining the dominant protein isoform of a coding gene (16) is to select the longest isoform. For instance databases often choose the longest isoform because it is easier to annotate features to this isoform. However, we have shown that while this method is often correct, it does not have any biological meaning. Between 20 and 25% of the reference isoforms selected by this strategy are likely not to be the main protein product for the gene (9,17) and this number will grow with the expansion of the annotation databases. In contrast APPRIS principal isoforms coincide with the main isoform detected in proteomics experiments for almost 98% of comparable genes, showing that APPRIS principal isoforms are a clear improvement on choosing the longest isoform as the main protein isoform in the cell (17). The reason that APPRIS is so effective is that most alternative isoforms have either lost regions of conserved structure or function, or have non-conserved exons that are inserted into regions that code for conserved structure or function. The APPRIS Database identifies a principal isoform for 73.33% of human genes annotated in the GENCODE 21 gene set. The main goal of developing the APPRIS WebServer and WebServices is to allow users to annotate splice isoforms and select a principal isoform for vertebrate genome species beyond those that are annotated in the APPRIS Database, to annotate genes and variants that are missing from the APPRIS Database, and to annotate their experimental results with existing annotations. The APPRIS WebServer has been designed to be used for the comparison of splice isoform annotations for individual genes, while the APPRIS WebServices have been created to allow access to the APPRIS Database and to run an automatic version of the APPRIS server, using REST architecture to be portable, modular and flexible in the automation of programmatic scripts.

METHODS, WEB SERVER AND WEB SERVICES

The APPRIS WebServer and APPRIS WebServices provide annotations for alternative splice isoforms and identify principal isoforms for individual genes. These annotations are based on the modules in the APPRIS Database (9). The WebServer and WebServices are based on six of the modules of APPRIS Database (see Supplementary Figure S1, see Supplementary ‘APPRIS Modules and Their Scores’). CORSAIR uses BLAST (18) to map (correctly and without gaps) orthologous isoforms in related vertebrate species; CRASH makes conservative predictions of signal peptides using the SignalP and TargetP programs (19); firestar (20) makes reliable predictions of functionally important residues; MATADOR3D checks for the presence of structural homology to 3D structures in the PDB (21); SPADE uses Pfamscan (13) to count conserved and compromised Pfam functional domains; THUMP generates conservative predictions of trans-membrane helices from three separate trans-membrane predictors (22–24). The principal splice isoform for each gene is selected based on the conservation of protein features, including protein structural and functional data and information from cross-species conservation.

APPRIS WebServer

The APPRIS WebServer can process two types of queries, either the gene name (or Ensembl gene identifier) from specific assembly version, or a set of alternative protein sequences for that gene. In both cases, the species is also required. When the user provides a gene name, the gene is linked to a specific assembly version of Ensembl. At the moment all species, except for human, have a unique assembly version. At present the APPRIS Database houses annotations for five Ensembl species (human, mouse, rat, pig and zebra fish), the APPRIS WebServer allows users to check Ensembl annotations for six other species, dog, cat, cow, opossum, chicken and fugu. If the query gene falls outside these 11 species the user is required to use a set of alternative protein sequences as input. The report view of APPRIS WebServer displays four sections. The first section shows a panel with information about the query gene. When the query input is protein sequences, the panel contains the identifier of the job and the species name. When the query is a gene name (or Ensembl identifier), the panel contains the name and the identifier of the gene, the species name, the genome location of the gene, and the Ensembl classification (biotype) of the gene. The second section (‘Principal Isoforms’) shows all the variants and highlights the principal isoforms. The isoforms are tagged with the flags PRINCIPAL, and ALTERNATIVE based on the range of protein features. The third section (‘APPRIS annotations’) shows the scores of the APPRIS modules, such as the number of functional residues, the number of whole functional domains, or the vertebrate species conservation score. The APPRIS modules are described in the Supplementary ‘APPRIS Modules and Their Scores’ section. The last section shows three tab browser panels that allow different views of the annotations. The first browser panel (‘Annotation Browser’) displays the annotations in detail for each isoform. These detailed annotations include information such as the best template of PDB, the best Pfam domain, or the nearest homologue species. The detailed annotations are described in the Supplementary ‘APPRIS Modules and Their Scores’ section. The second browser panel (‘Sequence Browser’) displays the detailed annotations mapped onto the amino acid sequences. The third browser panel (‘Genome Browser’) maps the annotations onto the genomic regions provided by the UCSC Genome Browser (25). The annotations that appear in these browser panels can be filtered by variant and the amino acid sequences of the isoforms can be aligned in the sequence browser panel in ClustalW (26) format. In addition, the APPRIS WebServer supports the downloading and the displaying of data through the website. It should be noted that the UCSC genome browser panel will only be shown for those species where Ensembl and UCSC are using the same build (otherwise the coordinates will be out of phase).

APPRIS WebServices

The APPRIS Database annotations of protein features for human, mouse, rat, pig, and zebra fish splice isoforms are available via web services and the APPRIS WebServices also provides automatic access to the APPRIS server. APPRIS WebServices make use of standard HTTP method calls (often termed a RESTful services), and then the HTTP request methods GET and POST can be used to send and receive queries and data. The APPRIS WebServices, as APPRIS WebServer, allow the analysis of a specific gene or the sequences of alternative isoforms. These RESTful web services are categorized in ‘runner’ group of services, and have been developed as asynchronous services. While retrieval services (or some types of analysis) can return a result almost immediately and are suitable for synchronous requests; the request processing of most analyses may be delayed. By calling a web service asynchronously, the client can continue its work without interruption, and will be notified when the asynchronous response is returned. To address these issues we have provided a mechanism for making asynchronous requests: (i) submit a job and get a job identifier (the ‘run’ service), (ii) get the status of a job, an indication of whether the job is pending, running, finished or gave an error (the ‘status’ service), (iii) receive the results of a finished job (the ‘result’ service). In addition, there are services that retrieve information from specific job analyses and that provide access to the integrated APPRIS Database for the available species. These retrieval services (see Supplementary ‘APPRIS WebServices’ for further details) can be invoked by job identifier, by means of a gene name/identifier, or by means of a genome position. These web services are classified into three broad categories: ‘seeker’, which retrieves information for the available genes or finished job; ‘sequencer’, which retrieves the protein features mapped onto the amino acid sequences; and ‘exporter’, which retrieves information on genes in the APPRIS Database. While any language capable of making standard HTTP requests can be used, RESTful calls can be accessed using Universal Resource Locators (URLs) by means of a simple web browser query, or from a command-line (using CURL). Client scripts in Perl programming have been provided to allow the execution of APPRIS analyses (‘runner’ RESTful services), and the retrieval of the stored annotations (‘exporter’ RESTful services). The responses of the requests are reported in JSON (by default), GTF, BED or TSV (tabular) format.

System architecture and supported platforms

The APPRIS WebServer (see Figure 1) is designed using the open-source web application framework, AngularJS with back-end servers in Node.js, and Express.js. The interface of RESTful API is created using Swagger, which allows the interaction with the APPRIS WebServices. The software architectural style of REST services has been developed in Perl programming language. The modules involved in the APPRIS analysis are implemented using Perl with required packages, and with the appropriate programs; whose information is stored in an optimized MySQL relational database.

Figure 1.

Workflow diagram of APPRIS WebServer and WebServices. The schema represents the organization of server APPRIS WebServer and APPRIS WebServices. The figure also shows the activity of data (inputs/outputs) of the RESTful web services that connect to the web server and to scripts that are capable of making standard HTTP requests. The icons display the tools, frameworks and programming languages used. APPRIS WebServer has been tested in Mac OS X, Linux and Windows for the browsers Firefox 35.0.1, Google Chrome 40.0.2214.111, and Safari 7.1.3. At this point, it does not support Explorer. Additional support for alternative browsers is in progress.

PRACTICAL CASE

Here, we show one practical example to illustrate the utility of APPRIS WebServer in the selection of principal isoforms (Figure 2). For this example, we use isoforms from the gene ZNF721 (ENSG00000182903) to which we have added a new splice isoform (ZNF721-NEW) to the annotated Ensembl isoforms by combining the ZNF721-009 (ENST00000511833) and ZNF721-002 (ENST00000505900) variants. This new protein sequence was created by adding the translated residues from the first two exons of ENST00000505900 to the translated residues from the third exon of ENST00000511833 (Figure 2A). The APPRIS WebServer is executed submitting the Homo sapiens species name, the set of alternative protein sequences, and the selected methods that will be applied (Figure 2B). A status log panel appears after a submitted a job, indicating whether the job is pending, running, finished or giving an error.

Figure 2.

Tutorial Example for APPRIS WebServer (ZNF721). (A) Gene model for ZNF721 showing two Ensembl annotated transcripts, ZNF721-002 (ENST00000505900) and ZNF721-009 (ENST00000511833) and a mock-up of a third transcript, ZNF721-NEW (in orange). The exons from ENST00000505900 and ENST00000511833 that have been used to build the new transcript are labeled. (B) APPRIS WebServer input form showing a query composed by three sequences. Two of them are the protein sequences of ENST00000505900 and ENST00000511833 and the third is the new isoform (ZNF721-NEW) created by joining the first two exons of the ENST00000505900 transcript to the third exon of ENST00000511833. (C) Sections of ‘Principal Isoform’ and ‘APPRIS annotation’ report view. The ZNF721-NEW isoform is selected as the principal isoform, based on the number of Pfam domains. ZNF721-NEW has 10 whole conserved PfamA domains compared to the nine domains in ENST00000511833, and the single domain in ENST00000505900. (D) Snapshot of the ‘Sequence Browser’ panel that shows the annotations mapped onto the alignments of protein sequences. The detailed annotations appear in pop-up windows. The new isoform (ZNF721-NEW) brings together the KRAB domain from ENST00000505900 and the nine C2H2 zinc finger domains from ENST00000511833 (just one highlighted). The report view of APPRIS annotations (Figure 2C) shows the selection of the new isoform (ZNF721-NEW) as the principal isoform because it has ten whole conserved PfamA domains compared to the nine domains from ENST00000511833, and the single domain in ENST00000505900. The ‘Sequence Browser’ panel (Figure 2D) shows the annotations mapped onto the alignments of sequences. The new isoform (ZNF721-NEW) brings together the Krueppel-associated box (KRAB) domain from ENST00000505900 and the nine C2H2 zinc finger domains from ENST00000511833. KRAB domains are transcription repression modules and are common in C2H2 zinc finger proteins; indeed over 400 human C2H2 zinc finger proteins contain a KRAB domain (27).

DISCUSSION

The APPRIS WebServer and WebServices are tools for the annotation of alternative splice isoforms While the WebServer can be used to annotate individual genes and isoforms with protein structural and functional information and an indication of the cross-species conservation, the WebServices provides access to the existing annotations in the APPRIS Database and allows the automatic use of the annotation modules via the server. APPRIS select a principal isoform for each protein coding gene and the annotations make it possible to predict how alternative splicing events will affect splice isoforms. We have shown that the principal isoforms selected by APPRIS almost always correspond with the most highly expressed protein isoform, as determined from large-scale proteomics experiments (17). The APPRIS WebServer and WebServices have a wide range of uses, from the determination of principal and alternative isoforms for genes in individual research projects, to the determination of principal and alternative exons for use in genome-wide analysis of variants. The APPRIS Database (9) currently houses splice isoform annotations and principal isoforms for five vertebrate species (human, mouse, rat, pig and zebrafish), and an annotation for Drosophila is close to completion. All these annotations are available through the APPRIS WebServices. APPRIS principal isoforms have been incorporated into the Ensembl annotations (6). The APPRIS annotations, the WebServer and the WebServices are free, accessible to all and there is no login requirement.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

27 in total

Review 1. Alternative pre-mRNA splicing: the logic of combinatorial control.

Authors: C W Smith; J Valcárcel
Journal: Trends Biochem Sci Date: 2000-08 Impact factor: 13.807

2. Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information.

Authors: Håkan Viklund; Arne Elofsson
Journal: Protein Sci Date: 2004-07 Impact factor: 6.725

3. A combined transmembrane topology and signal peptide prediction method.

Authors: Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal: J Mol Biol Date: 2004-05-14 Impact factor: 5.469

4. Improving the accuracy of transmembrane protein topology prediction using evolutionary information.

Authors: David T Jones
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

5. The implications of alternative splicing in the ENCODE protein complement.

Authors: Michael L Tress; Pier Luigi Martelli; Adam Frankish; Gabrielle A Reeves; Jan Jaap Wesselink; Corin Yeats; Páll Isólfur Olason; Mario Albrecht; Hedi Hegyi; Alejandro Giorgetti; Domenico Raimondo; Julien Lagarde; Roman A Laskowski; Gonzalo López; Michael I Sadowski; James D Watson; Piero Fariselli; Ivan Rossi; Alinda Nagy; Wang Kai; Zenia Størling; Massimiliano Orsini; Yassen Assenov; Hagen Blankenburg; Carola Huthmacher; Fidel Ramírez; Andreas Schlicker; France Denoeud; Phil Jones; Samuel Kerrien; Sandra Orchard; Stylianos E Antonarakis; Alexandre Reymond; Ewan Birney; Søren Brunak; Rita Casadio; Roderic Guigo; Jennifer Harrow; Henning Hermjakob; David T Jones; Thomas Lengauer; Christine A Orengo; László Patthy; Janet M Thornton; Anna Tramontano; Alfonso Valencia
Journal: Proc Natl Acad Sci U S A Date: 2007-03-19 Impact factor: 11.205

6. Locating proteins in the cell using TargetP, SignalP and related tools.

Authors: Olof Emanuelsson; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

7. UniProtKB/Swiss-Prot.

Authors: Emmanuel Boutet; Damien Lieberherr; Michael Tognolli; Michel Schneider; Amos Bairoch
Journal: Methods Mol Biol Date: 2007

Review 8. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

9. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors: J D Thompson; D G Higgins; T J Gibson
Journal: Nucleic Acids Res Date: 1994-11-11 Impact factor: 16.971

10. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors.

Authors: Stuart Huntley; Daniel M Baggott; Aaron T Hamilton; Mary Tran-Gyamfi; Shan Yang; Joomyeong Kim; Laurie Gordon; Elbert Branscomb; Lisa Stubbs
Journal: Genome Res Date: 2006-04-10 Impact factor: 9.043

13 in total

1. Differential molecular regulation of processing and membrane expression of Type-I BMP receptors: implications for signaling.

Authors: Tal Hirschhorn; Michal Levi-Hofman; Oded Danziger; Nechama I Smorodinsky; Marcelo Ehrlich
Journal: Cell Mol Life Sci Date: 2017-03-29 Impact factor: 9.261

2. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources.

Authors: Dinanath Sulakhe; Mark D'Souza; Sheng Wang; Sandhya Balasubramanian; Prashanth Athri; Bingqing Xie; Stefan Canzar; Gady Agam; T Conrad Gilliam; Natalia Maltsev
Journal: Brief Bioinform Date: 2019-09-27 Impact factor: 11.622

3. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

Authors: Juan Rodriguez-Rivas; Simone Marsili; David Juan; Alfonso Valencia
Journal: Proc Natl Acad Sci U S A Date: 2016-12-13 Impact factor: 11.205

4. Framework and resource for more than 11,000 gene-transcript-protein-reaction associations in human metabolism.

Authors: Jae Yong Ryu; Hyun Uk Kim; Sang Yup Lee
Journal: Proc Natl Acad Sci U S A Date: 2017-10-24 Impact factor: 11.205

5. ProtAnnot: an App for Integrated Genome Browser to display how alternative splicing and transcription affect proteins.

Authors: Tarun Mall; John Eckstein; David Norris; Hiral Vora; Nowlan H Freese; Ann E Loraine
Journal: Bioinformatics Date: 2016-04-07 Impact factor: 6.937

6. Ensembl 2016.

Authors: Andrew Yates; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Nathan Johnson; Thomas Juettemann; Stephen Keenan; Ilias Lavidas; Fergal J Martin; Thomas Maurel; William McLaren; Daniel N Murphy; Rishi Nag; Michael Nuhn; Anne Parker; Mateus Patricio; Miguel Pignatelli; Matthew Rahtz; Harpreet Singh Riat; Daniel Sheppard; Kieron Taylor; Anja Thormann; Alessandro Vullo; Steven P Wilder; Amonida Zadissa; Ewan Birney; Jennifer Harrow; Matthieu Muffato; Emily Perry; Magali Ruffier; Giulietta Spudich; Stephen J Trevanion; Fiona Cunningham; Bronwen L Aken; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2015-12-19 Impact factor: 16.971

7. APPRIS 2017: principal isoforms for multiple gene sets.

Authors: Jose Manuel Rodriguez; Juan Rodriguez-Rivas; Tomás Di Domenico; Jesús Vázquez; Alfonso Valencia; Michael L Tress
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

8. Identification of protein features encoded by alternative exons using Exon Ontology.

Authors: Léon-Charles Tranchevent; Fabien Aubé; Louis Dulaurier; Clara Benoit-Pilven; Amandine Rey; Arnaud Poret; Emilie Chautard; Hussein Mortada; François-Olivier Desmet; Fatima Zahra Chakrama; Maira Alejandra Moreno-Garcia; Evelyne Goillot; Stéphane Janczarski; Franck Mortreux; Cyril F Bourgeois; Didier Auboeuf
Journal: Genome Res Date: 2017-04-18 Impact factor: 9.043

9. Isoform-level gene expression patterns in single-cell RNA-sequencing data.

Authors: Trung Nghia Vu; Quin F Wills; Krishna R Kalari; Nifang Niu; Liewei Wang; Yudi Pawitan; Mattias Rantalainen
Journal: Bioinformatics Date: 2018-07-15 Impact factor: 6.937

10. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

Authors: Erli Pang; Xiaomei Wu; Kui Lin
Journal: Mol Genet Genomics Date: 2016-01-30 Impact factor: 3.291