Literature DB >> 28334231

ProtVista: visualization of protein sequence annotations.

Xavier Watkins^1,2, Leyla J Garcia¹, Sangya Pundir¹, Maria J Martin¹.

Abstract

SUMMARY: ProtVista is a comprehensive visualization tool for the graphical representation of protein sequence features in the UniProt Knowledgebase, experimental proteomics and variation public datasets. The complexity and relationships in this wealth of data pose a challenge in interpretation. Integrative visualization approaches such as provided by ProtVista are thus essential for researchers to understand the data and, for instance, discover patterns affecting function and disease associations.
AVAILABILITY AND IMPLEMENTATION: ProtVista is a JavaScript component released as an open source project under the Apache 2 License. Documentation and source code are available at http://ebi-uniprot.github.io/ProtVista/ . CONTACT: martin@ebi.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28334231 PMCID： PMC5963392 DOI： 10.1093/bioinformatics/btx120

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

With the continuous growth in biological data, integration and visualization are increasingly important to aid interpretation. UniProt (The UniProt Consortium, 2015) provides various protein sequence annotations, or ‘features’, such as domains, sites, post-translational modifications and variants from multiple sources. Visualizing these features together enables the identification of patterns that might affect protein function; for instance, deleterious variants co-localized with a known important protein motif or structural feature. While browsers and tools exist for genomic sequences and specific types of protein features such as structures, e.g. PDBe for secondary structure and InterPro for domains, there is currently no highly interactive tool which allows the visualization of a wide range of protein sequence features together in the same space. ProtVista is implemented using JavaScript and makes extensive use of D3 (https://d3js.org/), a library for producing dynamic, interactive data visualizations in web browsers. Our viewer is implemented as a BioJS component (Corpas ) to ensure interoperability with other visualization tools, and its source code is publicly available on GitHub. It makes use of the Protein API (http://www.ebi.ac.uk/proteins/api/doc/swagger/), which provides the required data through different endpoints (features, variants and peptides) as XML and JSON. Each endpoint is called asynchronously thus reducing the waiting time for end-users. This implementation makes it easy to add more data endpoints from UniProt and/or other external data sources as they become available in the future.

2 Visualization

ProtVista’s display consists of three sections (see Fig. 1). The first, located at the top, is used for navigation and zooming. It represents the full length of the protein sequence. Elements on both sides of the component can be dragged to specify the zoom level, as well as navigating along the sequence. The second section, on the left, consists of categories of the feature types. The third and main section of the visualization is the area where the features are displayed in tracks. Each category track is collapsed by default, providing a feature-dependant overview of all the features in the category. If multiple features are present at the same position, a bumping algorithm will vertically reduce the size of the features and arrange them to ensure they don’t overlap. When a category is expanded, this overview disappears and each type track then contains a representation of its own features in the same way as the overview does. In this area, it is possible to use the mouse scroll wheel as well as gestures to change the zoom level and navigate along the protein sequence. Clicking on a feature highlights the area covered by the feature, allowing for quick discovery of overlapping features, as well as highlighting the corresponding amino-acid sequence if the zoom level allows their display. A tooltip also appears and displays more information about the feature: exact position, description, scientific evidence and data source with cross-reference if available. The viewer was iteratively improved through a User Centered Design process (see Supplementary Information), including feedback from the scientific community from the early stages of development. A dictionary of shapes and colors to represent protein features was defined with input from various expert resources (InterPro, Pfam and Intact) to ensure consistency. The variation track provides a different way of displaying amino acid changes using a matrix-based approach to map the changes to their sequence position. The y-axis of the matrix represents all amino acids, grouped by chemical properties, as well as deletions and stop gained mutations while the x-axis is the position in the protein sequence. Compared to other paradigms such as sequence logos (Schneider and Stephens, 1990), this approach gives equal importance to each variant for a given position. A set of filters allows users to select variants by their consequence (disease association, predicted deleteriousness) or data source (large-scale studies or UniProtKB). Color categories allows users to quickly see the severity of the variant: burgundy (disease) and light green (benign) for UniProtKB reviewed, and a luminance scale based on the product of SIFT (Kumar ) and PolyPhen (Adzhubei ) predictions for large scale studies. Tooltips contain more information relevant to the disease association and provenance of the variant data.

Fig. 1.

Simplified view of ProtVista for Human Lipoprotein lipase (P06858) To investigate the human Lipoprotein lipase enzyme’s involvement in the Lipoprotein lipase deficiency (LPL deficiency), we can look at the potential effect of variants on active sites. Clicking on the active site at position 183 highlights three disease variants in the same position. Clicking on these variants brings up a popup window containing more information about the variants, showing direct relation with the disease

3 Summary

ProtVista provides the scientific community with a visualization tool integrating information available about UniProtKB proteins from curated, automatic and imported sources. It also allows users to add custom data services using the features JSON format. ProtVista offers an intuitive and compact representation of protein features, making it easier to highlight different data relationships that might otherwise be unclear or difficult to grasp. It has been developed as a JavaScript component for easy integration within any website and is already used by the UniProt website (www.uniprot.org), the Open Targets platform (www.targetvalidation.org) and the EMBL-EBI Enzyme Portal (www.ebi.ac.uk/enzymeportal/; Alcántara ). As part of our future plans, we are exploring interactive integration with public web based genomic visualization tools and ways to upload user data directly to the visualization tool.

Funding

This work has been supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822; the European Molecular Biology Laboratory core funds; and Open Targets. Conflict of Interest: none declared. Click here for additional data file.

6 in total

1. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm.

Authors: Prateek Kumar; Steven Henikoff; Pauline C Ng
Journal: Nat Protoc Date: 2009-06-25 Impact factor: 13.491

2. Sequence logos: a new way to display consensus sequences.

Authors: T D Schneider; R M Stephens
Journal: Nucleic Acids Res Date: 1990-10-25 Impact factor: 16.971

3. A method and server for predicting damaging missense mutations.

Authors: Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal: Nat Methods Date: 2010-04 Impact factor: 28.547

4. UniProt: a hub for protein information.

Authors:
Journal: Nucleic Acids Res Date: 2014-10-27 Impact factor: 16.971

5. The EBI enzyme portal.

Authors: Rafael Alcántara; Joseph Onwubiko; Hong Cao; Paula de Matos; Jennifer A Cham; Jules Jacobsen; Gemma L Holliday; Julia D Fischer; Syed Asad Rahman; Bijay Jassal; Mikael Goujon; Francis Rowland; Sameer Velankar; Rodrigo López; John P Overington; Gerard J Kleywegt; Henning Hermjakob; Claire O'Donovan; María Jesús Martín; Janet M Thornton; Christoph Steinbeck
Journal: Nucleic Acids Res Date: 2012-11-21 Impact factor: 16.971

6. BioJS: an open source standard for biological visualisation - its status in 2014.

Authors: Manuel Corpas; Rafael Jimenez; Seth J Carbon; Alex García; Leyla Garcia; Tatyana Goldberg; John Gomez; Alexis Kalderimis; Suzanna E Lewis; Ian Mulvany; Aleksandra Pawlik; Francis Rowland; Gustavo Salazar; Fabian Schreiber; Ian Sillitoe; William H Spooner; Anil S Thanki; José M Villaveces; Guy Yachdav; Henning Hermjakob
Journal: F1000Res Date: 2014-02-13

6 in total

30 in total

Review 1. Exploring the dark genome: implications for precision medicine.

Authors: Tudor I Oprea
Journal: Mamm Genome Date: 2019-07-04 Impact factor: 2.957

Review 2. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

Authors: Franziska Hufsky; Kevin Lamkiewicz; Alexandre Almeida; Abdel Aouacheria; Cecilia Arighi; Alex Bateman; Jan Baumbach; Niko Beerenwinkel; Christian Brandt; Marco Cacciabue; Sara Chuguransky; Oliver Drechsel; Robert D Finn; Adrian Fritz; Stephan Fuchs; Georges Hattab; Anne-Christin Hauschild; Dominik Heider; Marie Hoffmann; Martin Hölzer; Stefan Hoops; Lars Kaderali; Ioanna Kalvari; Max von Kleist; Renó Kmiecinski; Denise Kühnert; Gorka Lasso; Pieter Libin; Markus List; Hannah F Löchel; Maria J Martin; Roman Martin; Julian Matschinske; Alice C McHardy; Pedro Mendes; Jaina Mistry; Vincent Navratil; Eric P Nawrocki; Áine Niamh O'Toole; Nancy Ontiveros-Palacios; Anton I Petrov; Guillermo Rangel-Pineros; Nicole Redaschi; Susanne Reimering; Knut Reinert; Alejandro Reyes; Lorna Richardson; David L Robertson; Sepideh Sadegh; Joshua B Singer; Kristof Theys; Chris Upton; Marius Welzel; Lowri Williams; Manja Marz
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

3. AmyPro: a database of proteins with validated amyloidogenic regions.

Authors: Mihaly Varadi; Greet De Baets; Wim F Vranken; Peter Tompa; Rita Pancsa
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

4. The Proteins API: accessing key integrated protein and genome information.

Authors: Andrew Nightingale; Ricardo Antunes; Emanuele Alpi; Borisas Bursteinas; Leonardo Gonzales; Wudong Liu; Jie Luo; Guoying Qi; Edd Turner; Maria Martin
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

5. MISCAST: MIssense variant to protein StruCture Analysis web SuiTe.

Authors: Sumaiya Iqbal; David Hoksza; Eduardo Pérez-Palma; Patrick May; Jakob B Jespersen; Shehab S Ahmed; Zaara T Rifat; Henrike O Heyne; M Sohel Rahman; Jeffrey R Cottrell; Florence F Wagner; Mark J Daly; Arthur J Campbell; Dennis Lal
Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971

6. MolArt: a molecular structure annotation and visualization tool.

Authors: David Hoksza; Piotr Gawron; Marek Ostaszewski; Reinhard Schneider
Journal: Bioinformatics Date: 2018-12-01 Impact factor: 6.937

7. PredictProtein - Predicting Protein Structure and Function for 29 Years.

Authors: Michael Bernhofer; Christian Dallago; Tim Karl; Venkata Satagopam; Michael Heinzinger; Maria Littmann; Tobias Olenyi; Jiajun Qiu; Konstantin Schütze; Guy Yachdav; Haim Ashkenazy; Nir Ben-Tal; Yana Bromberg; Tatyana Goldberg; Laszlo Kajan; Sean O'Donoghue; Chris Sander; Andrea Schafferhans; Avner Schlessinger; Gerrit Vriend; Milot Mirdita; Piotr Gawron; Wei Gu; Yohan Jarosz; Christophe Trefois; Martin Steinegger; Reinhard Schneider; Burkhard Rost
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971