Literature DB >> 25075289

Sequence, a BioJS component for visualising sequences.

Abstract

SUMMARY: Sequences are probably the most common piece of information in sites providing biological data resources, particularly those related to genes and proteins. Multiple visual representations of the same sequence can be found across those sites. This can lead to an inconsistency compromising both the user experience and usability while working with graphical representations of a sequence. Furthermore, the code of the visualisation module is commonly embedded and merged with the rest of the application, making it difficult to reuse it in other applications. In this paper, we present a BioJS component for visualising sequences with a set of options supporting a flexible configuration of the visual representation, such as formats, colours, annotations, and columns, among others. This component aims to facilitate a common representation across different sites, making it easier for end users to move from one site to another. AVAILABILITY: http://www.ebi.ac.uk/Tools/biojs; http://dx.doi.org/10.5281/zenodo.8299.

Entities: Chemical Gene Species

Year: 2014 PMID： 25075289 PMCID： PMC4103491 DOI： 10.12688/f1000research.3-52.v1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Visualising biological data on the web is a common practice on sites providing bio-oriented services and resources. A wide variety of JavaScript libraries are being used to build pieces of software capable of representing bio-entities such as DNA sequences [1], protein sequences ( http://www.uniprot.org), protein structures ( http://www.wwpdb.org), ontology trees [2], protein-protein interactions ( http://www.ebi.ac.uk/intact/) [3], and others. Therefore, a variety of possible visual representations for the same bio-entity can be found as a result of its multiple implementations. In many cases, such implementations are difficult to maintain, test, and reuse as they are developed only with one use case in mind. Furthermore, user experience (UX) and usability across different sites may be compromised. One particular type of data commonly affected by multiple representations is the sequence, either a DNA or protein sequence. A sequence is a common bio-entity present in most sites offering biological data resources. Figure 1 shows different visual representations of a protein sequence as it can be found in Uniprot ( http://www.uniprot.org), Dasty [4] ( http://www.ebi.ac.uk/dasty) and Ensembl ( http://www.ensembl.org), among others [5, 6]. Multiple features are identified across the entire set of sequences. Features such as formatting, indexing numbers, annotations, marks, colouring tags, and even the capability of user interaction are not integrated in one reusable piece of code. Instead, multiple representations prevail. Furthermore, web developers often make their own isolated efforts to reproduce those views for their sites and, in most cases, the representation is not identical, no documentation is available, and often they are not portable to other sites.

Figure 1.

Multiple representations compiled as one flexible BioJS component.

In this paper, a reusable component to visualise sequences is presented under the BioJS set of minimum standards for visualisation of biological components. BioJS is a community-driven standard to develop visualisation functionality [7]. The library is developed using well-established methodologies and object-oriented design with inheritance that facilitates rapid development, reuse, extension, integration and deployment of web applications.

The Sequence component

Exploring sequence visualisation across different sites reveals a set of features that should be supported by a single, reusable, and well documented piece of code, capable of painting sequences on the web in a consistent manner. In this sense, BioJS provides a baseline for Javascript coding and development to create pieces of reusable code, called components. Creating a new Sequence component consists of extending a core BioJS class and defining three core concepts: options, methods and events. Options are the data required by the component for initialisation, while methods and events are actions supported in execution time. Methods are fired externally while events are triggered in the component and exposed to external listeners. Methods and events allow the component to communicate with others components as well as web applications. Figure 2 shows a working example implemented within the Biotea project [8]. This example shows a communication between two component instances, the Sequence component and the Protein3D component. When a region (highlighted in yellow) on the sequence is selected, automatically a selection action is fired in the Protein3D. Additionally, Sequence supports a set of options to change the visual representation of the sequence by using different formats, colours, indexing numbers, annotations and more. It helps deployment because the component can be easily fitted to the particular need. Figure 3 shows an example of the Sequence component displaying the protein P918283 in CODATA format.

Figure 2.

Example of communication between Sequence and Protein3D components.

Figure 3.

Example displaying the sequence corresponding to the UniProt accession P918283.

Example displaying the sequence corresponding to the UniProt accession P918283.

The part highlighted in yellow denotes the current selection, the black pop-up box indicates what the interval is with every move of the pointer. Green highlight denotes an annotation on that interval. Multiple annotations are supported. As any other BioJS component, the Sequence component is well documented and has been tested during development, not only for functionality but also for usability. BioJS makes it easier to document the code by adding annotations that are later exposed as a web page. Thus, human-friendly documentation is generated without any additional effort. BioJS web pages for components are compiled in a registry that acts as a showcase of working examples extracted from the component annotations. The registry makes it easier for both developers and end users to understand components and their functionality. Once a component has met the BioJS guidelines, it becomes a candidate to be submitted and publicly shared in the common repository of components, the EBI BioJS registry ( http://www.ebi.ac.uk/Tools/biojs/registry/). There, it is possible to find more information about options, installation, methods, and events ( http://www.ebi.ac.uk/Tools/biojs/registry/Biojs.Sequence.html).

Future work

Currently, the Sequence component supports the visualisation of a single strand. However, in some cases, it should be more interesting to display similarities between two or multiple sequences. Another possible extension is using this component as a base for multiple aligned sequences visualisation. Aligner algorithms [9] could be run on the server side or consumed from a web service [10] while the component would be in charge of painting the similarities, taking advantage of already developed features such as colouring, highlighting, and tagging. Collaborative work and social networking is nowadays a mechanism for knowledge construction. Such features can be integrated into the Sequence component so end users can submit sequences and annotations to public sequence databases such as UniProt. Comments and references could also be added, adding valuable information for a researcher during his/her investigation.

Software availability

Zenodo: Sequence BioJS component for visualising sequences, doi: 10.5281/zenodo.8299 [11]. GitHuB: BioJS, http://www.ebi.ac.uk/Tools/biojs. Here, the authors present Sequence, a web-based visualization component for biological sequence data implemented in JavaScript. Investigators can use Sequence to visualize both DNA and protein sequences, either as a standalone visualization or together with other visualizations. Strengths of Sequence include (a) the ability to customize sequence using options and (b) integration of sequence via events. These features ensure that Sequence can be used in a wide variety of applications. What is missing from this manuscript is a description of how well Sequence scales to large sequences and whether a Sequence visualization can be updated dynamically in response to events from other components. Overall , Sequence is a solid contribution to web-based visualization that is useful as it is and forms the foundation for more complex web-based sequence visualization in the future. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors present the first re-usable JavaScript based sequence component. It can be used in web applications dealing with bio-polymers like proteins and nucleotide sequences and can also interact with other parts of the website via events. Previously, Java applets have been used for interactive web content. However, Java constitutes an additional layer of software and thereby carries an own set of technical problems and risks. For this reason, JavaScript is being increasingly used at the client side. In this respect, the development of BioJS components follows a general trend. The BioJS registry is the first and only framework plus standard for interactive web components, and Sequence will be one of the most important components following the BioJS specification. Therefore, I expect that the Sequence component will be widely used in bioinformatics web services. Even if current features might not satisfy all needs, the BioJS format allows for extensions and incorporation of new features with the source code clear and well documented, allowing developers to change it to their requirements. I would suggest replacing the word "compiled" with another word (in the figure 1 legend and in the third paragraph of the Sequence component section) as it might be mistaken for source code getting compiled on a server like on the Debian Linux server. The manuscript does not provide answer to some important questions: It would be good if these points could be clarified in the manuscript. Is the length of the sequence limited? Is the sequence immutable? Or could it change like alternative splicing? Can parts of the sequences be hidden like cutting of signal peptide? "Indexing numbers" - does the numbering support PDB insertion codes? For demonstration, the authors have coupled the sequence view with a BioJS 3D component. With the newest Java, the JMol applet fails to start with the message: " Your security system has blocked an untrusted ...". I expect that the line Permissions: sandbox in the jar-file manifest and signing the jar-file will fix the problem. The authors should also consider using a JavaScript based 3D visualization. On events like 'Annotation Clicked', there is no parameter indicating whether the context pop-up trigger (right click, long touch) is active and what modifier keys like Shift and Ctrl are pressed - this should be made clearer for ease of use. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

9 in total

1. Artemis: sequence visualization and annotation.

Authors: K Rutherford; J Parkhill; J Crook; T Horsnell; P Rice; M A Rajandream; B Barrell
Journal: Bioinformatics Date: 2000-10 Impact factor: 6.937

2. ModView, visualization of multiple protein sequences and structures.

Authors: Valentin A Ilyin; Ursula Pieper; Ashley C Stuart; Marc A Marti-Renom; Linda McMahan; Andrej Sali
Journal: Bioinformatics Date: 2003-01 Impact factor: 6.937

Review 3. A survey of sequence alignment algorithms for next-generation sequencing.

Authors: Heng Li; Nils Homer
Journal: Brief Bioinform Date: 2010-05-11 Impact factor: 11.622

4. pLogo: a probabilistic approach to visualizing sequence motifs.

Authors: Joseph P O'Shea; Michael F Chou; Saad A Quader; James K Ryan; George M Church; Daniel Schwartz
Journal: Nat Methods Date: 2013-10-06 Impact factor: 28.547

5. Clustal W and Clustal X version 2.0.

Authors: M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal: Bioinformatics Date: 2007-09-10 Impact factor: 6.937

6. BioJS: an open source JavaScript framework for biological data visualization.

Authors: John Gómez; Leyla J García; Gustavo A Salazar; Jose Villaveces; Swanand Gore; Alexander García; Maria J Martín; Guillaume Launay; Rafael Alcántara; Noemi Del-Toro; Marine Dumousseau; Sandra Orchard; Sameer Velankar; Henning Hermjakob; Chenggong Zong; Peipei Ping; Manuel Corpas; Rafael C Jiménez
Journal: Bioinformatics Date: 2013-02-23 Impact factor: 6.937

7. Dasty3, a WEB framework for DAS.

Authors: Jose M Villaveces; Rafael C Jimenez; Leyla J Garcia; Gustavo A Salazar; Bernat Gel; Nicola Mulder; Maria Martin; Alexander Garcia; Henning Hermjakob
Journal: Bioinformatics Date: 2011-07-28 Impact factor: 6.937

8. The IntAct molecular interaction database in 2012.

Authors: Samuel Kerrien; Bruno Aranda; Lionel Breuza; Alan Bridge; Fiona Broackes-Carter; Carol Chen; Margaret Duesbury; Marine Dumousseau; Marc Feuermann; Ursula Hinz; Christine Jandrasits; Rafael C Jimenez; Jyoti Khadake; Usha Mahadevan; Patrick Masson; Ivo Pedruzzi; Eric Pfeiffenberger; Pablo Porras; Arathi Raghunath; Bernd Roechert; Sandra Orchard; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2011-11-24 Impact factor: 16.971

9. The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries.

Authors: Richard G Côté; Philip Jones; Rolf Apweiler; Henning Hermjakob
Journal: BMC Bioinformatics Date: 2006-02-28 Impact factor: 3.169