Literature DB >> 25095880

AliView: a fast and lightweight alignment viewer and editor for large datasets.

Anders Larsson1.   

Abstract

SUMMARY: AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets.
AVAILABILITY AND IMPLEMENTATION: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview CONTACT: anders.larsson@ebc.uu.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25095880      PMCID: PMC4221126          DOI: 10.1093/bioinformatics/btu531

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

As DNA and protein datasets are getting larger, the demand for a refined and fast alignment editor increases. The need for an improved alignment editor and viewer, therefore, emerged in the 1000 plants project (1KP, www.onekp.com) while designing degenerate primers for a diverse set of ferns from transcriptome data (Rothfels ). What was lacking in the previous available programs was the combination of abilities to (i) get an overview of large nucleotide alignments, (ii) visually highlight various conserved regions, (iii) have a simple and intuitive way to align, rearrange, delete and merge sequences and (iv) find degenerate primers in selected semiconserved regions. Although some of these features are individually present in current alignment editors, the combination is not. In addition to the core functionality meeting these specific needs, AliView (Fig. 1) is designed with a complete set of intuitive general functions meeting the most common demands for preparing a multiple sequence alignment.
Fig. 1.

Alignment zoomed out to give a complete overview of the regions

Here, AliView is introduced as an alignment viewer and editor with a unique combination of features that allows the user to work with large datasets. The intuitive user interface provides easy visual overview and navigation and works with unlimited size alignments.

2 IMPLEMENTATION

AliView is cross-platform, built in Java and thoroughly tested on Linux, Mac OS X and Windows operating systems. It uses the Java Evolutionary Biology Library v2.0 (available at http://code.google.com/p/jebl2/) for parsing files in Nexus format.

3 FEATURES

3.1 Large alignments, speed and more

The key features of AliView include the ability to swiftly handle large alignments with low memory impact (see Table 1 for comparison with other popular free cross-platform alignment viewers). AliView loads large alignment files 2–14 times faster and demands less than half of the memory resources than comparable alignment editors (Table 1; Supplementary Table S1A–C). AliView will read unlimited size alignment files in FASTA, Phylip, Nexus, Clustal and MSF-format (Table 2). This works through an indexing process where the sequences in the file initially are indexed and only cached in memory when viewed. Aside from the built-in indexing of large files, the program also reads and saves Fasta index files (.fai) as implemented by Samtools (Li ). The program either reads the whole alignment into memory or leaves parts on file, depending on memory resources available on the specific computer. This way any alignment file can be opened regardless of the memory resources of the computer.
Table 1.

Time to open alignment file and memory usage. Comparison of AliView with popular free and cross-platform alignment editors

Alignment
Dimension (sequence × character)Program
SizeFormatAliViewJalView 2SeaViewClustalXMesquite
22.4 GBFASTA479 726 × 46 5125–110 s (88 MB)a,bNot supportedNot supportedNot supportedNot supported
22.4 GBFASTA479 726 × 46 5120.6 s (88 MB)a,cNot supportedNot supportedNot supportedNot supported
2.1 GBFASTA39 407 × 54 10317 s (2.2 GB)73 s (4.7 GB)51 s (5.7 GB)Memory error>10 min
1.3 GBFASTA11 792 × 107 4015.6 s (1.2 GB)33 s (3.3 GB)23 s (3.6 GB)Memory error>5 min
1.3 GBPHYLIP11 799 × 107 4015.9 s (1.2 GB)Not supported17 s (2.7 GB)Memory error>5 min
1.3 GBNEXUS11 792 × 107 4015.7 s (1.2 GB)Not supported18 s (3.5 GB)Not supported>5 min
317 MBFASTA361 874 × 49582.1 s (608 MB)31 s (3.1 GB)9.5 s (3.8 GB)Memory error>5 min
42.2 MBFASTA5441 × 76820.6 s (53 MB)2.8 s (160 MB)1.2 s (145 MB)20 s (1GB)>5 min

Note: Test results shown were performed on Linux Ubuntu 12.04, Intel Core i7 2700K 3.5 Ghz, 16 GB internal memory and Intel 520 SSD. Similar results were obtained on Mac OS X and Windows systems. For a more extensive comparison including the test methodology, see Supplementary Table S1A–C.

aThe 22.4 GB FASTA file was not read completely into memory but instead accessed as an indexed file. In all other tests the files were read into memory.

bTimes depending on how many sequences being indexed at once.

cWith alignment file already indexed.

Table 2.

Comparison of AliView features with popular free and cross-platform alignment editors

Feature / ProgramAliViewJalView 2SeaViewClustalXMesquite
Open alignments of unlimited size (read from disk)Yes
Maximum number of sequences visible at onceaUnlimited495 or overview window10612068
Maximum sequence length visible at onceaUnlimited1830 or overview window3053451650
Merge sequencesYes
Find degenerate primers in selected areasYes
Define exon boundaries and codon positions for translating nucleotidesYesYes
Highlight difference from consensus or ‘trace sequence’YesYes
Highlight consensus residuesYesYesOnly proteinOnly proteinYes

Note: A more thorough comparison is included as Supplementary Table S2.

aMaximum number of sequences and maximum sequence length visible were tested at 1920 × 1200 screen resolution.

Time to open alignment file and memory usage. Comparison of AliView with popular free and cross-platform alignment editors Note: Test results shown were performed on Linux Ubuntu 12.04, Intel Core i7 2700K 3.5 Ghz, 16 GB internal memory and Intel 520 SSD. Similar results were obtained on Mac OS X and Windows systems. For a more extensive comparison including the test methodology, see Supplementary Table S1A–C. aThe 22.4 GB FASTA file was not read completely into memory but instead accessed as an indexed file. In all other tests the files were read into memory. bTimes depending on how many sequences being indexed at once. cWith alignment file already indexed. Comparison of AliView features with popular free and cross-platform alignment editors Note: A more thorough comparison is included as Supplementary Table S2. aMaximum number of sequences and maximum sequence length visible were tested at 1920 × 1200 screen resolution. Another important feature of AliView is the speed in rendering large alignments. The speed, together with the mouse wheel zoom feature, makes it possible to get a quick overview and easily navigate in large alignments. Alignment zoomed out to give a complete overview of the regions AliView can merge overlapping sequences into a consensus sequence. This feature is useful when working with multiple read NGS-generated sequences. Sometimes the overlap of different sequences or contigs falls outside of the tolerance of assembly programs, and a manually merged sequence is needed. AliView has unique functionality aimed at supporting the design of universal degenerate primers. It is possible to select an alignment region and have AliView calculate all possible primers (Kämpke ). To make it easy to select which primer to use, they are presented as an ordered list sorted by the number of degenerate positions, self-binding values and melting temperature.

3.2 Other features

Apart from the key features, AliView also has several other alignment program functions. Alignment can be done by calling any external alignment program. AliView includes and has MUSCLE integrated as the default alignment program (Edgar, 2004), but the user can incorporate other programs if desired. Other features include, for example, manual editing capabilities to insert, delete, change, move or rename sequences in an alignment; undo/redo functionality; several visual cues to highlight consensus characters or characters deviating from the consensus; ClustalX conserved region color scheme (Larkin ); search functionality that finds patterns across gaps and follows IUPAC codes; implementation of the Nexus specification of Codonpos, Charset and Excludes. AliView is intended to be a simple easy-to-use alignment editor, and not a complete program for phylogenetic analyses. Instead, the ‘external interface’ function is aimed to ease the use of AliView as one program in a chain of software, making it possible to call other programs from within AliView with the current alignment or selected sequences as arguments. As a proof of concept, AliView comes with a preset code that adds a button for directing the alignment to FastTree (Price ) that calculates a phylogenetic tree that is then automatically opened in FigTree (Rambaut, 2012). For comparison of the key features of AliView with other free cross-platform editors such as Jalview 2 (Waterhouse ), SeaView (Gouy ), ClustalX (Larkin ) and Mesquite (Maddison, and Maddison, 2011) see Table 2. For a more comprehensive comparison of features see Supplementary Table S2.

3.3 User interface and usability

Because an alignment editor is an everyday tool for many researchers, AliView was designed with extensive focus on usability and intuitive handling, implemented by following the logical standards of commonly used software such as text-editors, word processors, browsers and, of course, other alignment viewers.
  8 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building.

Authors:  Manolo Gouy; Stéphane Guindon; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2009-10-23       Impact factor: 16.240

3.  Clustal W and Clustal X version 2.0.

Authors:  M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal:  Bioinformatics       Date:  2007-09-10       Impact factor: 6.937

4.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

5.  Efficient primer design algorithms.

Authors:  T Kämpke; M Kieninger; M Mecklenburg
Journal:  Bioinformatics       Date:  2001-03       Impact factor: 6.937

6.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

7.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

8.  Transcriptome-mining for single-copy nuclear markers in ferns.

Authors:  Carl J Rothfels; Anders Larsson; Fay-Wei Li; Erin M Sigel; Layne Huiet; Dylan O Burge; Markus Ruhsam; Sean W Graham; Dennis W Stevenson; Gane Ka-Shu Wong; Petra Korall; Kathleen M Pryer
Journal:  PLoS One       Date:  2013-10-08       Impact factor: 3.240

  8 in total
  608 in total

1.  Species-Level Identification of Actinomyces Isolates Causing Invasive Infections: Multiyear Comparison of Vitek MS (Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry) to Partial Sequencing of the 16S rRNA Gene.

Authors:  T Lynch; D Gregson; D L Church
Journal:  J Clin Microbiol       Date:  2016-01-06       Impact factor: 5.948

2.  Defective HIV-1 proviruses produce novel protein-coding RNA species in HIV-infected patients on combination antiretroviral therapy.

Authors:  Hiromi Imamichi; Robin L Dewar; Joseph W Adelsberger; Catherine A Rehm; Una O'Doherty; Ellen E Paxinos; Anthony S Fauci; H Clifford Lane
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-18       Impact factor: 11.205

3.  Long-term balancing selection drives evolution of immunity genes in Capsella.

Authors:  Daniel Koenig; Jörg Hagmann; Rachel Li; Felix Bemm; Tanja Slotte; Barbara Neuffer; Stephen I Wright; Detlef Weigel
Journal:  Elife       Date:  2019-02-26       Impact factor: 8.140

4.  Pearl Sac Gene Expression Profiles Associated With Pearl Attributes in the Silver-Lip Pearl Oyster, Pinctada maxima.

Authors:  Carmel McDougall; Felipe Aguilera; Ali Shokoohmand; Patrick Moase; Bernard M Degnan
Journal:  Front Genet       Date:  2021-01-08       Impact factor: 4.599

5.  The cancer-associated, gain-of-function TP53 variant P152Lp53 activates multiple signaling pathways implicated in tumorigenesis.

Authors:  Siddharth Singh; Manoj Kumar; Sanjeev Kumar; Shrinka Sen; Pawan Upadhyay; Sayan Bhattacharjee; Naveen M; Vivek Singh Tomar; Siddhartha Roy; Amit Dutt; Tapas K Kundu
Journal:  J Biol Chem       Date:  2019-07-31       Impact factor: 5.157

6.  Match and mismatch between dietary switches and microbial partners in plant sap-feeding insects.

Authors:  Louis Bell-Roberts; Angela E Douglas; Gijsbert D A Werner
Journal:  Proc Biol Sci       Date:  2019-05-15       Impact factor: 5.349

7.  Evolution of wood anatomical characters in Nepenthes and close relatives of Caryophyllales.

Authors:  Rachel Schwallier; Barbara Gravendeel; Hugo de Boer; Stephan Nylinder; Bertie Joan van Heuven; Anton Sieder; Sukaibin Sumail; Rogier van Vugt; Frederic Lens
Journal:  Ann Bot       Date:  2017-05-01       Impact factor: 4.357

8.  Minimal barcode distance between two water mite species from Madeira Island: a cautionary tale.

Authors:  Ricardo García-Jiménez; Jose Luis Horreo; Antonio G Valdecasas
Journal:  Exp Appl Acarol       Date:  2017-06-16       Impact factor: 2.132

9.  Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis.

Authors:  Polly Yingshan Hsu; Lorenzo Calviello; Hsin-Yen Larry Wu; Fay-Wei Li; Carl J Rothfels; Uwe Ohler; Philip N Benfey
Journal:  Proc Natl Acad Sci U S A       Date:  2016-10-21       Impact factor: 11.205

10.  HIV Subtype and Nef-Mediated Immune Evasion Function Correlate with Viral Reservoir Size in Early-Treated Individuals.

Authors:  Fredrick H Omondi; Sandali Chandrarathna; Shariq Mujib; Chanson J Brumme; Steven W Jin; Hanwei Sudderuddin; Rachel L Miller; Asa Rahimi; Oliver Laeyendecker; Phil Bonner; Feng Yun Yue; Erika Benko; Colin M Kovacs; Mark A Brockman; Mario Ostrowski; Zabrina L Brumme
Journal:  J Virol       Date:  2019-03-05       Impact factor: 5.103

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.