Literature DB >> 35696379

CView: A network based tool for enhanced alignment visualization.

Raquel Linheiro¹, Stephen Sabatino^1,2, Diana Lobo^1,2,3, John Archer^1,2.

Abstract

To date basic visualization of sequence alignments have largely focused on displaying per-site columns of nucleotide, or amino acid, residues along with associated frequency summarizations. The persistence of this tendency to the recent tools designed for viewing mapped read data indicates that such a perspective not only provides a reliable visualization of per-site alterations, but also offers implicit reassurance to the end-user in relation to data accessibility. However, the initial insight gained is limited, something that is especially true when viewing alignments consisting of many sequences representing differing factors such as location, date and subtype. A basic alignment viewer can have potential to increase initial insight through visual enhancement, whilst not delving into the realms of complex sequence analysis. We present CView, a visualizer that expands on the per-site representation of residues through the incorporation of a dynamic network that is based on the summarization of diversity present across different regions of the alignment. Within the network, nodes are based on the clustering of sequence fragments that span windows placed consecutively along the alignment. Edges are placed between nodes of neighbouring windows where they share sequence identification(s), i.e. different regions of the same sequence(s). Thus, if a node is selected on the network, then the relationship that sequences passing through that node have to other regions of diversity within the alignment can be observed through path tracing. In addition to augmenting visual insight, CView provides export features including variant summarization, per-site residue and kmer frequencies, consensus sequence, alignment dissection as well as clustering; each useful across a range of research areas. The software has been designed to be user friendly, intuitive and interactive. It is open source and an executable jar, source code, quick start, usage tutorial and test data are available (under the GNU General Public License) from https://sourceforge.net/projects/cview/.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35696379 PMCID： PMC9191720 DOI： 10.1371/journal.pone.0259726

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Tools developed to visualize local sets of aligned sequences, such as those produced by multiple sequence aligners including MUSCLE [1] and Clustal W [2], have focused largely on displaying columns of nucleotide or amino acid characters, and highlighting the differences between such characters primarily through the use of colour [3-7]. More advanced sequence management and analysis packages, such as Geneious [8] and Mega [9], as well as the more recent tools designed for basic visualization of mapped read data, including IGV [10], GenomeView [11] and Tablet [12], incorporate a wide array of analysis, summarization and annotation options, but in terms of basic visualization they follow a similar approach. In light of the emergence of the vast quantities of sequence data generated during the last decades [13], alignment free approaches to sequence summarization have also proliferated [14, 15]. However, the per-site information associated with multiple sequence alignments is at times integral to many topics of bioinformatics research ranging from phylogenetics [16-18], co-evolution [19], recombination detection [20] and protein-protein interactions [21] to inter-species evolutionary dynamics, such as those involving pathogens [22-25] as well as predator-pray [26] relationships; to mention just a few. The direct observation of aligned residues, as well as the general per-site based summarization, not only provides an accessible view of per-site alterations between sequences within the alignment but also implicitly gives the end-user a level of reassurance in relation to data. However, initial insight gained about the overall alignment is limited, especially when viewing alignments consisting of many sequences representing varying factors of interest such as geographical location, subtype, treatment strategy, compartment and date (or time-point). Basic alignment visualization should have the potential to increase the level of initial insight within sequence datasets whilst not delving into the realms of more complex sequence analysis. Here we present CView, a simple multiple sequence alignment visualizer that incorporates a dynamic network that is based on a summarization of the diversity across different regions of the alignment. The immediate coupling of aligned sequences to such a network provides a way of visually tracking the context of observed diversity within characters that are currently onscreen to that of the surrounding regions of the alignment not currently in view. This provides the user with an increased intuitive and visual summarization of the context of this diversity. CView provides a range of export features that can be applied to the entire alignment, to a specified region of the alignment, or to a specified region in conjunction with a specified subset of sequences. Such export features include variant summarization, per-site residue and kmer frequency matrixes, clustering, pairwise-distance matrixes as well as consensus sequence generation. For example, when the “Variant Frequencies” option of the “Title Search” menu is selected, a list of variants spanning the user-specified region of the alignment, from sequences containing a user-specified search criteria within the sequence title (such as a specific year), is created. This is done by identifying all unique residue permutations between the specified co-ordinates from the subset of sequences matching the search criteria and associating each with their frequency of occurrence. The titles of each sequence associated with individual variants are then outputted in conjunction with a summarization, relative to the most frequent variant, of the subsection of the alignment used. On the other hand if the “Variant Frequency” option of the “All Sequences” menu is selected variants will be identified from all sequences within the alignment instead of a subset matching a user search criteria. Such a feature has use in the tracking of viral populations, for example in searching for the presence of genotypic alterations such as those associated with immune escape [27], drug resistance [28], or co-receptor usage [29, 30]. Additionally, this feature has use in both clinical [31] and environmental [32-34] metagenomics, where the summarization of populations of microbes is of interest. Other output features within CView function in a similar manner to that of variant summarization and an overview of each is provided within the user manual, available on the CView SourceForge wiki page, as well as within the in-software “Help” menu. A usage demonstration video [35], as well as a brief tutorial on identifying variants [36], has been provided on zenodo repository, and are also accessible through the CView SourceForge project wiki page. Aside from features related to the extraction of secondary information from the alignment, CView provides the ability to dissect the alignment into subsets of sequences and regions; a task that is often laborious in the absence of a background in script development. For example, a user can export a specified region of sequences associated with a specific time-point, geographical location or body compartment, as long as the sequence titles have been labelled with such information. Such labelling is often as standard output feature of sequence repositories, for example, in the Los Alamos HIV sequence database the user can select options such as subtype, patient code, country and year to be included within the title of each sequence [37]. Such information may also be part of experimental design where information on compartment [38] or time-point [39] may be available. Within CView the network, displayed directly below the alignment, is based on the clustering of sequences within windows placed consecutively across the alignment, where each cluster becomes a node. Within any given window if multiple nodes exist, each represents a different portion of the diversity present at that windows location. Edges are placed between nodes of neighbouring windows where they share differing regions of the same sequence(s). Thus, if a single node within a region of the alignment displaying multiple nodes is selected, then the relationship that all sequences passing through that node have to nodes within other regions of the alignment can be instantly observed by highlighting the paths through the rest of the network that the sequences within the selected node take. Here we describe how these networks are constructed and displayed. The clustering threshold used during network construction, as well as the number and width of windows, are specified on the user-interface through a series of user-friendly slider bars. Alterations are updated in real time, which allows the user to visually explore the variation present across the alignment. A parameter reset button allows the user to instantly reset all network parameter slider bars to default values. The software has been designed to be user friendly, intuitive and interactive and it, along with source code, a quick start guide and test data, is available (under the GNU General Public License) through the SourceForge project page https://sourceforge.net/projects/cview/.

Methods

Implementation

The interface has been designed for simplicity and clarity. It consists of four general areas of user-interaction (Fig 1) which are: (1) sequence view, (2) network view, (3) navigation and control and, (4) menu driven outputs. Within the software a brief graphical overview of these areas is available though the “Help” -> “Interface” menu option, as well as through a help button within the “navigation and control” area. Hint text is also displayed within the control panel area once various components of the interface are clicked on. A video tutorial located on the zenodo repository provides a general overview of the interface [35].

Fig 1

CView interface.

The four main areas of the CView interface are depicted. These are sequence view, network view, control panel and the top menu. The yellow numbers on the top indicate the sites of the alignment that are currently in view. These correspond to the yellow bar on the top of the location indicator. The orange numbers along the bottom indicate the locations of the windows that nodes within the network are dependent on. These window locations correspond to the area that the orange bar located the under the location indicator covers. Grey dots indicate selectable nodes within windows. The squares along the location indicator can also be selected in order to jump directly to the indicated co-ordinates. The red text around the outside of the interface summarizes the main features.

CView interface.

(1) Sequence view

Sequences are displayed above the alignment location indicator. The dynamic yellow bar associated with the latter represents the region of the alignment that is currently visible. The green dot on the right hand side indicates what proportion of the sequences are currently visible. The consensus sequence of the alignment is displayed along the top of the sequence area, and directly under this the “+” indicates columns where all characters agree with the consensus character. Sequences and their titles are clickable and when an individual sequence is selected it will be traced through the corresponding network as a yellow line. If a user clicks on a node within the network area, all sequences that pass through that node will have a red dot placed next to their titles. These red dots correspond to the red paths that will become visible on the network; the latter indicative of through which other nodes the sequences pass. Sequences possessing a red dot next to their titles in this manner will be placed on top of the sequence display list, i.e. the order of sequences is sorted with these on top. If a different node is subsequently selected, the red title dots are reallocated to sequences passing through the newly selected node, paths redrawn and the sequence list is resorted. Basic parameters associated with the sequence display include the masking of characters that are the same as consensus, thus making it easier to identify non-consensus residues by eye, altering font size, altering space allocated to display titles and scroll speed; these are achieved through the “Navigation and Control” panel that is located on the bottom right of the display. A button within the control panel area can be used to reset all display parameters to default values. Site locations are highlighted in yellow along the top of the interface.

(2) Network view

The network depicting sequence diversity within the alignment is displayed directly below the alignment location indicator. The associated orange bar of the latter represents the region of the alignment that is currently represented by the network. The region begins from the current sequence view starting site and extends to the right-hand side in a manner that is dependent on the number of consecutive windows placed along the alignment, as well as their width (Fig 2, step i); windows being regions from which nodes reflecting diversity are created. Both these parameters, as well as that of the clustering threshold that is applied within windows, can be altered using sliders located within the control area, thus allowing the user to visually explore the variation present. Once again there is a reset button to reset all network parameters to default values. Window locations are highlighted in orange along the bottom of the interface.

Fig 2

Network construction.

Network construction.

Coloured bars indicate unique sequence id’s relative to the corresponding sequences (dotted lines). Within each window the identification representing each full sequence are associated with individual sequence fragments spanning that window (i), and fragments within windows are clustered (ii). Edges are placed between neighbouring clusters where they share one or more sequence identification, i.e. differing regions of the same sequence (iii). Clusters are represented visually on the network by grey dots. If a single cluster is selected the paths of all sequences passing through in relation to all other clusters (red lines) can be traced (vi). (2.1) Nodes. For a given window clustering fragments of sequences that span that window creates nodes (Fig 2, step ii). Each cluster is created using an iterative approach. Initially a fragment is randomly selected to be a seed for a newly created empty cluster. All related fragments to that seed are then added to the cluster and become seeds for the next iteration. The metric used to define relatedness is hamming distance, i.e. the number of different characters between two aligned sequence fragments are counted. The default threshold value is a lenient 0.3, indicating that fragments that have less than 30% divergence from a seed are included within the cluster. More advanced measures of genetic distance exist that account for proposed models of sequence evolution at both nucleotide and amino acid levels [40, 41], but for the rapid clustering across windows placed along an alignment for the purpose of visualization hamming distance works well [42]. Iterations continue until no more next-round seeds can be identified. If unclustered fragments within the window still exist, a new cluster is initiated by selecting another random seed from the remaining fragments and the process is repeated. For windows where six or less clusters are created, all clusters are displayed as grey circles, or nodes, on the network. For windows with more than six clusters, the largest six are displayed as grey circles, whilst sequences from any remaining smaller clusters are placed into a holding structure that is used for visualization purpose and that is displayed as a black circle. This holding structure is treated in the same as any other node when it comes to tracking sequences that are within it, i.e. if it is clicked then the paths of all sequences passing through it will be highlighted across the rest of the network. Here six was chosen to be the upper display limit so that following edge placement (next section), and associated edge crossover minimization, the maximum number of nodes that need order re-arrangement within any one window is seven, including the holding node (if present). This is because for a set containing n items, there are n factorial different order permutations [43], and during edge crossover minimization the number of edge crossovers produced by each permutation, in relative to nodes within a neighbouring window, must be counted. For a given window if there are the maximum of seven nodes present, 7! permutations (5040) must be identified during crossover minimization and this can be done in a reasonable time (< 1 second on an average laptop). If on the other hand there are fifteen nodes allowed within a window, then there are 15! permutations (1307674368000) requiring a time of many days. We arbitrarily felt that six clusters, and the additional holding node, was a sufficient number to graphically indicate the main variant groups present whilst minimizing edge crossover calculation time; the latter maintaining parameter updates in real time for the user as they scroll through the alignment. However, this limitation is for interface visualization purposes only and a higher resolution of all clusters defined by up to a 1% divergence threshold is possible using the “All Sequences” -> “Cluster Sequences” option. The default number (10) and width (50 bp) of windows, as well as the pairwise distance threshold, can be altered using the slider bars within the “Navigation and Control” area. The number of sequences passing through each node is indicated in green on the left and right hand sides of the network, as well as within the hint box of the control panel area if a particular node is clicked on. (2.2) Edges. Edges are placed between nodes of neighbouring windows where they possess fragments that are derived from the same underlying sequence(s) (Fig 2, iii). Consequently, individual sequences can be traced through nodes across different windows. As indicate in the previous section, edge crossover between nodes within adjacent windows is minimized. Starting at the second most right-hand-side window, this is done by calculating all possible node order permutations, following which for each permutation, the number of edge crossovers to nodes within the adjacent right-hand window is counted, node layout order in the latter being kept constant (Fig 3). The permutations that produce the minimum number of crossovers are selected (Fig 3, red numbers), and from these a random one is used. The process is then repeated one window to the left, until the first window of the alignment is reached. Crossover minimization, whilst resulting in a visually more pleasing network, has no effect on the sequence information or underlying node connections. Following the connection of edges it is possible to click on nodes within the graph and track the sequences that pass through them (Fig 2, iv). On the interface, and as described in detail within the “Sequence View” section, such sequence paths are displayed in red, and a red dot is placed next to the titles of the sequences within the clicked on node.

Fig 3

Minimization of edge crossovers between nodes of the two right most windows of the alignment.

This process is repeated until the left most window (anchored on site 1) is reached. Clusters within the two windows are labelled with integers and required edges, based on the sequence ids (coloured bars), are listed (i). All order permutations of the current left window are identified and for each permutation the required edges are placed relative to the constant cluster order of the right window (ii). Crossovers are then counted (red numbers). Of the permutations that produce the minimum number of crossovers a random one is selected for graphical node layout order.

Minimization of edge crossovers between nodes of the two right most windows of the alignment.

(3) Navigation and control

Access to all the network and display parameters is provided through the slider bars associated with the navigation and control panel (Fig 1), and in each case reset to default buttons have been provided. The buttons labelled with then red directional arrows are used to scroll through the alignment. These were implemented to remove the need for flat scroll bars as future developments will be aimed more at tablets and mobile devices. The red dot, at the centre of the four scroll arrows, immediately jumps a viewpoint at the centre of the alignment. In addition to the directional arrows the user can click directly on the grey squares along the alignment location indicator bar to immediately move to a particular location. Within this control area there are also buttons associated with for printing the network to a.png formatted file and clearing highlighted paths as well as a hint text section. The latter displays hint text from other areas of the display that the user clicks on such as sequences, nodes, location bars and general areas.

(4) Menu driven output

Output options are accessed through the top-level menu bar and can be applied to (i) all sequences within the alignment, (ii) a subset of sequences whose titles match a user search criteria, (iii) a subset of sequences that pass through a selected node and (iv) a subset of sequences defined by the user based on a supplied file of titles. Interacting with sequences through these predefined options increases output data robustness; as opposed to allowing the random the inclusion of sequences through mouse selection into the subsets that the various sub-menu options are applied to. In addition to exporting subsets of sequences and/or specified regions of the alignment CView can generate summary statistics such as frequencies of residues and kmers as well as tertiary information such as pairwise distance matrix’s, variant count information and clustered sets of sequences. A description of sub-menu options is available on the wiki associated with the SourceForge project page, within the software itself under the menu option “Help” -> “Menu” (also accessible via the help button within the “navigation and control” area) and is outlined within the demonstration video [35].

Results

(1) The software

CView has been implemented in Java and runs on operating systems with installed Java Runtime Environment 8.0 or higher. It has been developed using an object-orientated approach for ease of plug-in development; where plug-ins related to alignment visualization will be based on user feedback, and placed under the “Plug-ins” menu. To obtain an executable jar file, download the cview.zip file from the SourceForge project page. Following the extraction the CView.jar file from the zip file, CView is executed by double clicking on the jar file. This will launch the interface through which alignments can be loaded. Alignments must be in fasta format where a ‘-’ character usually represents a gap (occasionally a ‘.’ can be used), whilst an ‘N’ character represents an ambiguously sequenced residue. Gap characters are inserted by the alignment software, e.g. MUSCLE [1] and Clustal W [2], whilst ambiguous characters are added during the sequencing process and usually in accordance to the IUPAC codes [44, 45]. Importantly, following the multiple alignment process (not performed by CView), sequences are all of the same length, and this is a requirement of the input of CView. Aligned fasta formatted sequences are loaded using the “Load Fasta (Alignment)” option of the “All Sequences” menu. Future developments will aim to include other more data rich alignment formats, such as the Stockholm format developed by the Pfam [46] and Rfam [47] consortiums, but fasta was selected for the initial release version as it is widely adopted and allowed the development to focus on the display of aligned residues, the associated network and associated export features, rather than on the interactions involving metadata such as that involving secondary structure, surface accessibility, intron information and at times phylogenetic relationships. We view the latter as being more suitable to future plug-in’s development designed to target specific data analysis pipelines that may be of less widespread interest. Once a fasta-formatted alignment has been loaded the workflow is driven by how the user interacts with the interface and the various output options. A test dataset, in the form of an alignment consisting of 636 sequences representing the gp120 region of the HIV-1 genome is included with the cview.zip download file. This data was obtained from the Los Alamos HIV sequence database [37] and is intended for initial testing of the software, but it is not the more complex alignment that was prepared during the test case example that follows.

(2) Test case example: Exploring variation associated with co-receptor usage

(2.1) Background

HIV-1 viruses can be characterized into two phenotypes that are dependent on cellular tropism and that are as a result of differences in co-receptor usage [48]. The macrophage tropic phenotype, often referred to as R5, requires the CCR5 co-receptor, whilst the T-cell tropic phenotype (X4) uses the CXCR4 co-receptor, the latter often emerging later on during infection [49]. Co-receptor usage can be detected by computational analysis based on specific genetic alterations within the V3 loop of the gp120 gene [29, 30]. Genetic variation within this region, of approximately 105 nt in length, lead to structural shifts that result in optimized binding to one co-receptor or the other [50]. For demonstrating the applicability of CView in aiding the characterization of sequence diversity, we have prepared a number of easy-to-follow steps involving the preparation and alignment of HIV-1 subtype B gp120 sequences representing each of the phenotypes described. Towards the end of these steps the V3 region of the alignment is extracted and per-site sequence variation between the two phenotypes is summarized. In this test case scenario the exact co-ordinates of the V3 loop are known prior to analysis, as are the phenotypes of each of the sequences present. However, this will often not be the case and a more exploratory approach could be adopted as indicated following the steps below. All North American subtype B gp120 sequences, verified to be CCR5-using sequences (n = 636), were downloaded from the Los Alamos HIV sequence database in aligned fasta format [37]. These were loaded into CView, using the “All Sequences” -> “Load Fasta (Alignment)” menu option, following which they were saved in unaligned format using the “All Sequences” -> “Save Unaligned” option. Additionally, the titles of these sequences were saved to a separate file using “All Sequences” -> “Save Titles”. Step 1 was repeated for CXCR4-using sequences (n = 76). In order to make sites directly comparable between the two sets of unaligned sequences, they were combined into a single file (by simply copying and pasting using a text editor) and aligned using MUSCLE [1]. Note: steps 1 to 3 were performed in order to ensure the quality of the final alignment. Alignment containing both CCR5- and CXCR4-using sequences could have been downloaded directly from the database, but these will have been extracted from a larger alignment containing many more sequences and so may not have been optimized. The alignment created by MUSCLE, available at [51], was loaded into CView and the consensus sequence of the region spanning the V3 loop was saved using the “All Sequences” -> “Consensus Sequence” option. Within this alignment the co-ordinates of the region spanning the V3 loop were from 1436 to 1568. Although the exact location of the V3 loop within the gp120 region is known relative to the HXB2-LAI-IIIB-BRU reference strain (accession: K03455), the coordinates will vary depending on the alignment due to the placement of gap characters (‘-‘) introduced during the alignment process. The exact co-ordinates of the V3 region within our alignment were identified by eye using the V3 sequence of the HIV-1 reference strain where the start residues of the loop are TGTACAAGACCC and the end residues are CAAGCACATTGT [37]. The proportion of the alignment corresponding to R5 sequences was saved to a separate file in aligned format; and labelled accordingly with R5 in the title. This was done using the “User Group” -> “Save Region” menu option, where the titles used to define the R5 group of sequences to be saved were those obtained during step 1. The latter was used as in the case of this data the sequence titles were not labelled with any motif/tag that could have been used to identify them using the “Search Titles” top-level menu option. Step 5 was repeated for the titles corresponding to the X4 sequences (obtained during step 2) and the output file was labelled accordingly. Steps 5 and 6 resulted in two sub-alignments, contained within separate files, where although they represented different phenotypes (R5 and X4) the per-site co-ordinates were compatible as they represented sequences from a common underlying alignment. Each of these two files were loaded into CView, and nucleotide frequencies spanning the V3 loop were obtained using the “All Sequences” -> “Residue Frequencies” menu option. The co-ordinates entered for the V3 region within the subsequent popup box were those used in step 3. Once again each output file was labelled accordingly, for example the frequencies obtained from the R5 alignment were labelled R5_frequencies.txt. For each of the extracted R5 and X4 alignments obtained in steps 5 and 6, CView was also used to output a list of variants spanning the V3 loop along with their frequency of occurrence. This was done by loading each sub-alignment into CView and applying the “Variant Frequencies”option of the “All Sequences” menu and using the co-ordinates specified above. Each of the two output files containing variants were then translated using the EMBOSS Transeq tool [52].

(2.2) Method

The underlying alignment described for this use-case scenario, consisting of all the HIV-1 SUBTYPE B sequences spanning the gp120 region of the genome that have been verified as either being a CCR5-using (n = 636) or CXCR-using (n = 76), is available from the zenodo repository [51]. Within this alignment the exact co-ordinates of the region of the gene that harboured the diversity associated with either the X4 or the R5 phenotypes were known prior to the analysis. However, for many alignments a more exploratory approach may be preferable. One way that this can be achieved is by toggling the number of windows, their size and the clustering threshold used in order to obtain a more global view of the diversity present, and by subsequently using the “Node Path” top-level menu option to extract portions of sequences, and/or information from sequences such as kmer frequencies, that pass though nodes on the network that are of visual interest.

(2.3) Result and discussion

Fig 4A displays the consensus sequence of the V3 region from the MUSCLE generated alignment prior to being divided by phenotype. The top ten most frequent variants from each of the two phenotypes are also displayed. The seqPublish tool [37], located at https://www.hiv.lanl.gov/content/sequence/SeqPublish/seqpublish.html, was used to format these alignments from the CView output such that characters identical to those of the consensus sequence were hidden. A similar text-based formatting is available at the bottom of the output file that is generated by the “Variant Frequency” option of CView, where residues that are identical to those present of the most frequent variant are represented by a “|” character. Here we used the seqPublish tool as variants from both phenotypes were being compared to the combined consensus and not the most frequent variant from a single alignment representing a single phenotype. The translation of each of these variants is presented within Fig 4B. Fig 4C shows a summary of the translations, obtained using sequence logos [53], and it can be observed that at site 11 the positively charged amino acid residues R (arginine) and H (histidine) are present within the sequences that were known to be CXCR4-using, whilst they are absent within the sequences obtained from the CCR5-uisng strains. At site 26 a similar observation is made in relation to positively charged residues, this time including a K (lysine) residue; although there is a minority K also present at 26 within the CCR5-using variants. This is a known observation where the presence of positively charged amino acids at sites 11 and 26 result in a structural alteration that optimizes CXCR4 co-receptor binding [29, 30]. The defined steps outlined in the use-case scenario presented here led to the summarization of the variation present within the V3 loop partially demonstrate the utility of CView when exploring aligned sequence data. The correctness of the summarization is confirmed by the presence of the known variation seen at sites 11 and 26. Complete per-site nucleotide frequencies for both R5 and X4 sequences spanning the V3 region are presented in supplementary S1 Table.

Fig 4

Summarization of variation present within the V3 loop.

Summarization of variation present within the V3 loop.

(A) Green residues represent non-consensus residues from the ten most frequent variants associate with the CCR5-using phenotype. Brown represents those of the CXCR4-using phenotype. The consensus sequence (black) is shown. (B) Translations of the most frequent ten variants from each phenotype. (C) Sequence logos summarizing these translations. The top logo is from represents the CCR5-using sequences whilst the bottom represents the CXCR4-using ones. In the absence of pre-defined phenotypes, site co-ordinates and subtype, a more exploratory approach could be preferred. Fig 5 displays a picture of the network diversity observed across the entire gp120 alignment were the number and size of the windows have been maximized (to 50 and 75 respectively) and the clustering threshold set to 0.20. The user can click on individual nodes within the network, in order to identify subsets of sequences passing through, in order to export a range of information types that are implemented within CView for subsequent analysis relative to those subsets.

Fig 5

Network based diversity across the full gp120 case-study alignment: Thirty non-overlapping windows of length 75 nt fitted along the length of the alignment.

The clustering threshold within each was 0.2. The orange numbers along the top indicate window locations, whilst the green numbers on each side indicate the number of sequences observed within nodes (clusters) at the ends of the network. Grey dots are clickable nodes, that when clicked will highlight the paths (in red) of the sequences passing through across the remainder of the network. Here the highlighted paths are a result of clicking the bottom left most node (containing 33 sequences). Information from sequences highlighted in this manner can be extracted using the sub-options of the “Node Path” top-level menu within the software as indicated within the demonstration video [35], and under the software help menus. This network was printed using the “Print PNG” button located within the control panel of the software.

Network based diversity across the full gp120 case-study alignment: Thirty non-overlapping windows of length 75 nt fitted along the length of the alignment.

Conclusion

CView is a tool that allows the user to interactively explore sequence alignments with the aid of a dynamic network that summarizes the diversity present within regions of the alignment not currently in-view. Here we have described how CView was designed and implemented as well as summarized the various output features available to the user. As a use-case example, we have characterized the variation within sequences involved in HIV-1 co-receptor usage, but we have also hinted at how CView can be used in a more exploratory manner to explore aligned sequence data. The exact usage scenario in which CView can be applied is dependent on the requirements, insight and background of the individual user.

Nucleotide frequencies from the V3 loop.

Sites covering codon 11 and 26 are highlighted in red. (DOCX) Click here for additional data file. 21 Mar 2022

PONE-D-21-33624

CView: A network based tool for enhanced alignment visualization

PLOS ONE Dear Dr. Archer, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Specially take into account the comments related to fasta pre-processing, visualisation tips and configuration paramenters. Please submit your revised manuscript by May 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Eduardo Andrés-León Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript describes CView, a new software for visualising multiple sequence alignments using a graph-based approach. The text is easy to follow and I was able to install and run the software. However, I am not sure I fully understand how to use CView to its full potential, and although I agree with the authors that a new software such as CView could be useful to understand complex alignments, without extensive documentation and tutorials the users are unlikely to adopt the tool for their work. Major comments I struggled with the interface and found that the software requires additional context-specific documentation in the user interface, such as tooltips or popups. For example, the manuscript explains that a black circle is a holding structure that is used for visualisation purposes only when there are more than six clusters in the window. However, the user interface does not seem to explain why some nodes look like black circles and what is the meaning behind it. There are other examples along the same lines, for example, the red circles or the significance of colours in the network chart. These should be better explained in the program itself so that the users do not have to constantly refer to the paper to use CView. Have the authors considered providing a video walkthrough or a tutorial for CView? As this is a new concept, it would be very useful to see an expert use the software, as I am not sure I am able to take full advantage of it having read the paper and the help section of the website. I may have missed it, but as far as I can tell the use case does not take advantage of the network view, which I found surprising given that the graph aspect is the key strength of the method. Minor comments It would be useful to support importing alignments in Stockholm format to enable direct import from major alignment databases like Pfam and Rfam. Would it be possible to add an option to restore defaults for all parameters? After tinkering with the sliders I quickly lost track of what the recommended values were. Have the authors considered performing user testing to see how novice users or those experienced with other software (for example Jalview) interact with CView? Reviewer #2: This article provides CView, a software for visualization of multiple sequence alignments (DNA/RNA and amino-acids). This software application favors small sequences, such as viral genomes or proteomes. The novelty is centered on the network analysis and additional components that interactively allow an easier data summarization. The tool provides variant summarization, per-site character and kmer frequency matrixes, clustered sequences, pairwise-distance matrixes, and consensus sequence generation. The visualization is divided into four panels, the sequence view, network view, navigation and control, and menu-driven options. Generally, the article is well written. Regarding the software, I have successfully "installed" and tested it on a Linux machine using the HIV viral sequences example. Then, I went to the NCBI and downloaded the B19V sequences in FASTA format. When I loaded these sequences into the CView, the following error emerged: "Exception occurred in newAlignment()". Then, I looked into the sequences and saw that the example had "-" instead of "N" symbols (which exist in some FASTA files but do not in the majority of the FASTA files). I changed the "N" to "-" using a Linux tr command and got the same error. Finally, I've tried it with other sequences and got the same error. Therefore, further than this action, I've not been able to test the software. This error is an obvious problem that must be fixed before any deeper review. Minor subjects: Once the bugs are fixed, please consider adding this software to bioconda and biootools; Please, state the license type in the manuscript; It would be nice to select some of the sequences by "mouse selection" to perform further actions; Please, consider stating the existence of alignment-free methodologies (for example, to visual applications with larger viral genomes, such as Herpesvirus) and Galaxy; Please, state clearly in the main manuscript (perhaps in the abstract) if this tool is open-source or not. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 May 2022 The text we here is the same as that within the uploaded response_to_reviewers.docx file. We have included the text here as this box was compulsory on the submission form. Reviewer #1 Summary "The manuscript describes CView, a new software for visualising multiple sequence alignments using a graph-based approach. The text is easy to follow and I was able to install and run the software. However, I am not sure I fully understand how to use CView to its full potential, and although I agree with the authors that a new software such as CView could be useful to understand complex alignments, without extensive documentation and tutorials the users are unlikely to adopt the tool for their work." We thank this reviewer for downloading, running and reviewing our software and manuscript. Additionally, we are pleased that the reviewer sees the scope that new alignment visualization tools could be of value in understanding sequence data. We agree that further documentation is required for users to widely adopt our tool and as such we have worked extensively on this and the other recommendations as outlined below. Major comments "I struggled with the interface and found that the software requires additional context-specific documentation in the user interface, such as tooltips or popups. For example, the manuscript explains that a black circle is a holding structure that is used for visualisation purposes only when there are more than six clusters in the window. However, the user interface does not seem to explain why some nodes look like black circles and what is the meaning behind it. There are other examples along the same lines, for example, the red circles or the significance of colours in the network chart. These should be better explained in the program itself so that the users do not have to constantly refer to the paper to use CView." We have added three in-software documentation sections under the “Help” top-level menu that can be accessed at any time during usage of the tool. These are: (1) “Help” -> “Interface”, contains brief graphical overviews of the software interface, the sequence area and the network area. (2) “Help” -> “Menu” contains textual descriptions of the methods that can be accessed via the other top-level menu options and (3) “Help” -> “Plug-ins” will contain information on the plug-ins that will be developed; so far there is one example titled “Variant Scan”. These in-software help menu options have been mentioned within the manuscript at lines 164, 208 and 391, as well as within the legend of figure 5 at line 597. Access to the first two help popup windows can also be achieved through the two new buttons within the navigation region of the control area (bottom left of display) as has been mentioned on lines 209 and 392. Under the image associated with network area (“Help” -> “Interface”), black circles are labelled as containing sequences that are not within one of the largest six clusters (if more than six exist). The reason for this limitation of six of the largest clusters within any given window is due to reducing the time required to minimize edge crossovers for network visualization network and is described in detail within the manuscript (around line 299). We have also included a new figure 5 (line 597), describing a global view of the alignment used for the case study scenario, where these black circles are also mentioned. Within the in-software interface help image, under “help” -> “Interface”, it has been made clear that the red circles next to sequence titles belong to sequences that pass though a user clicked-on node, whose paths also are shown in red on the network. This is discussed in detail within the manuscript at line 238. The previous grey circles that were next to the sequence titles, if the red ones were not present, have been removed as they were likely to be adding confusion to the interface, and were not adding real information value. They were initially used as placeholder indicating that the individual sequence was not passing through a user-clicked node. The interface has been modified so that when an area is clicked on, a hint is displayed within a newly added hint box at the top of the control area of the display. For example, if the user clicks on the ‘+’ symbols under the consensus sequence then these are explained as “'+' indicates residue conserved across all sequences”, whilst if a node is clicked on the location of the node and the number of sequences passing through it are given. If a black placeholder node (black circle) is present and clicked on, the hint text provided is “sequences not in top six clusters” and the total number of sequences within the holder is given. This hint text addition is mentioned at lines 210 and 327 and is also visible within figure 1 and under the “Help” -> ”Interface” in-software help image. Note: with the exception of standard sliders and buttons, we chose to display hint text once an item is “clicked on”, rather than “hovered on”. This is because this feature was implemented in a bespoke manner for our general display area and hover text would mean that the software must constantly monitor the cursor location when it is being moved. With a click this constant monitoring is avoided and the background monitoring that may at times affect the speed of updating the display could be avoided. "Have the authors considered providing a video walkthrough or a tutorial for CView? As this is a new concept, it would be very useful to see an expert use the software, as I am not sure I am able to take full advantage of it having read the paper and the help section of the website." We have created a walk through video of the basic usage of CView. This can be accessed through the citable zenodo repository link (https://zenodo.org/record/6514787), along with a text based outline of the script used within the video. A reference to this video tutorial has been provided at lines 164, 393 and 599, and is also hyperlinked from the CView SourceForge project page wiki (https://sourceforge.net/p/cview/wiki/Home/). Also included at line 165 as a separate zenodo reference is a small PDF based tutorial in relation to identifying variants using CView. Additional tutorials will follow as features are added to the software and as user scenarios are identified. "I may have missed it, but as far as I can tell the use case does not take advantage of the network view, which I found surprising given that the graph aspect is the key strength of the method." Choosing a test case scenario was difficult, as our aim was to develop a “basic”, or “lite”, alignment viewer that adds insight into the diversity observed within residues on-screen by looking at the context of regions off-screen. Additionally, we wanted to include an array of straightforward output features that other tools sometimes ignore; such as that of kmer information, variant counts and hamming distances. Our goal was not to come up with complex scenarios where CView excels over other viewers, but to attempt to enhance the basic experience of viewing and exploring sequence data. As it happens, if a user is looking at an alignment, where at some location there are two distinct clusters of diversity, then being able to click on one of these and immediately identify and extract sequences, and information within, “could” be of interest. But we did not start off with this being a main goal of functionality. Our main goal was to keep the interface as simple possible whilst augmenting intuitive insight and output, as implied throughout the introduction of manuscript e.g. line 128. Our current test case scenario, using HIV-1’s gp120 gene, comes from the angle were it is known that within the gene there is a small 105 nt region where residue alterations involving a few sites define which host co-receptor the virus can use. We felt that by being able to explicitly indicate the co-ordinates of this region from a set of longer aligned gp120 sequences, and extract information from sequences representing virus known use either one co-receptor or the other, it provided us with a way of demonstrating how the information that CView outputs, in this case that of variant frequencies and nucleotide frequencies, can be utilized to characterize diversity present when following a number of pre-defined steps. The other scenario, hinted at by the reviewer, is where the user may not know much detail about the alignment, but by “playing with” the network parameters identify clusters of similarity and then export information from sequences passing though these for further analysis. This is of interest and is indeed one way that CView can be used to explore alignments, hence why the “Node Path” top-level menu was added, but its requires a more subjective toggling of network parameters, and the steps are harder to define for a step-by-step demonstration of the simple and intuitive viewing and export features. We would like the reviewer to consider this and understand our choice in maintaining our HIV-1 gp120 example. That being said, we have specifically emphasised at the start (line 454) and end of the case study (line 529) that the user could adopt a more exploratory approach. We have elaborated on this slightly within the paragraph starting at line 578 where a new figure 5 (line 588) is used to highlight the discussion. In relation to the HIV-1 case study, we have clarified some of the steps outlined (from line 458) and placed the alignment generated along with the sequence titles within the zenodo repository where it is associated with a citable doi number (lines 477 and 522). Previously this data was available form the SourceForge project wiki, but we feel that it is more suited to a permanent citable repository and so has been moved from the software download page. Test data is still available within the cview.zip download file, but it has been made clear that this is not the case-study data (line 432). Minor comments "It would be useful to support importing alignments in Stockholm format to enable direct import from major alignment databases like Pfam and Rfam." We agree that increasing the number of input data formats is important, but initially chose fasta format as it is widely used when dealing with aligned sequences, and no additional visualization features were required as there is no specific metadata fields (outside of what information is contained within the title lines). The ability to display metadata fields that can be associated with more advanced sequence input formats has not been incorporated and such information would immediately move away from the level of visualization simplicity we currently aimed at. We definitely agree that implementing more advanced formats, such as the Stockholm format, would achieve a wider usage of our tool, but we see this as advanced development post release since the current interface would need to incorporate the ability to display such additional metadata appropriately. For example, in the Stockholm format, some of the extra data allowed for RNA sequences are fields describing Secondary Structure, Surface Accessibility, TransMembrane, Posterior LIgand binding, Intron information as well as possible trees. Including such information within groups of aligned sequences whilst creating a network representing the alignment is a significant added complication to the interface, and we feel that this should be achieved though a specifically designed plug-in to accommodate users interested in such fields. A half-way point could be to simply ignore the extra metadata, effectively only using the titles and sequence residue information, but this would be misleading in terms of the initial functionality of our software; as it would not really be utilizing the richness of the Stockholm format, whilst at the same time portraying that it can load such data. Given that: (i) this is classified as a minor comment by the reviewer, (ii) our tool has been initially designed for the simple intuitive viewing of alignments and (iii) we do have a plug-in menu where we intent to develop more user specific requirements, we would request that the reviewer can overlook this limitation; and see the current input option as a base for further more advanced developments incorporating metadata specific fields. In relation to the reviewers comment we have extensively discussed the Stockholm format within the manuscript as well as the reason behind the current input limitation (starting at line 417). "Would it be possible to add an option to restore defaults for all parameters? After tinkering with the sliders I quickly lost track of what the recommended values were." Yes, we have added two buttons to the control area. One of these resets the network parameters to default, whilst the other resets the general display parameters to defaults. These buttons have been mentioned within the main manuscript at lines 195, 250, 265 and 363. These reset buttons are also visible in figure 1 and under the in-software interface help figure located under the “Help” -> “Interface” menu option. "Have the authors considered performing user testing to see how novice users or those experienced with other software (for example Jalview) interact with CView?" At the moment we have not, although we see this as a valuable approach. Currently during development we have utilized our personal backgrounds in sequence viewing to estimate what users could be interested in. Our main concern was to increase the intuition that can be gained when viewing an alignment. Once we have established our tool and its usage within a wider set of users, we would be pleased to perform user-base comparisons on how users interact with differing tools and include components of such interactions into our future developments of the interface. This would include, for example, users interested in specific areas of research such as metagenomics, viral genomics, phylogeny, recombination detection analysis and sequence classification as well as increase the output connectivity to external software. For example, networks could also be exported in a format that can be loaded into cytoscape, a tool that can be used to enhance the visualization of networks, add additional metadata to individual nodes and elaborate on the number of layout options. We see this as future tuning and user guided developments of our basic release interface once the initial concept has been established. Reviewer #2 Summary "This article provides CView, a software for visualization of multiple sequence alignments (DNA/RNA and amino-acids). This software application favors small sequences, such as viral genomes or proteomes. The novelty is centered on the network analysis and additional components that interactively allow an easier data summarization. The tool provides variant summarization, per-site character and kmer frequency matrixes, clustered sequences, pairwise-distance matrixes, and consensus sequence generation. The visualization is divided into four panels, the sequence view, network view, navigation and control, and menu-driven options." We thank this reviewer for taking the time to review our software and we have responded to the comments below. The comments provided helped us greatly to clarify our manuscript. Major comments "Generally, the article is well written. Regarding the software, I have successfully "installed" and tested it on a Linux machine using the HIV viral sequences example. Then, I went to the NCBI and downloaded the B19V sequences in FASTA format. When I loaded these sequences into the CView, the following error emerged: "Exception occurred in newAlignment()". Then, I looked into the sequences and saw that the example had "-" instead of "N" symbols (which exist in some FASTA files but do not in the majority of the FASTA files). I changed the "N" to "-" using a Linux tr command and got the same error. Finally, I've tried it with other sequences and got the same error. Therefore, further than this action, I've not been able to test the software. This error is an obvious problem that must be fixed before any deeper review." Usually within aligned fasta formatted sequences a ‘-’ character represents a gap (occasionally a ‘.’ can be used), whilst an ‘N’ character represents an ambiguously sequenced residue. Gap characters are inserted by the alignment software (such as MUSCLE), whilst ambiguous characters are added during the sequencing process, and usually in accordance to the IUPAC codes. Importantly, sequences aligned by a software, such as MUSCLE, are all of the same length and this is a requirement of the input of CView. We have extensively clarified this definition of what we mean by aligned fasta format (for input) on line 406 to 415 and supplied a reference referring to the IUPAC codes near this clarification. Note: we specifically did not refer to our tool as a sequence viewer, or editor, as it is only for viewing pre-aligned sets of sequences that adhere to the input restrictions mentioned above. Viewing unaligned sequences is not possible, unless by chance they are all of the same length, but even here the network and output options will not function as intended. Additionally, within the software itself we have added a warning message that indicates that “Sequences must be aligned and in fasta format. Please see the 'Help' -> 'Menu' option above.” rather than output the exception that the reviewer previously observed. Within the “Help” -> “Menu” option, we have also described this under the title “(i) Load Fasta” and recommended both Muscle and Clustal as example tools for creating an alignment; although many others exist. This is also mentioned within the demonstration video, and corresponding PDF file, available from the zenodo repository (https://zenodo.org/record/6514787) as referenced within the manuscript on lines 164, 393 and 599 (and is hyperlinked from the CView SourceForge project page wiki). Minor comments "Once the bugs are fixed, please consider adding this software to bioconda and biootools; Please, state the license type in the manuscript; It would be nice to select some of the sequences by "mouse selection" to perform further actions;" The license agreement (GNU General Public License) has now been stated clearly in the manuscript at line 77 of the abstract and at line 199 of the introduction. At line 75 of the abstract it is now clear that the software is open source. We have provided the “Title Search” top level menu option where the user can interact with groups of sequences through a user specified search criteria, such as a year or subtype, that will be searched for within the sequence titles. Additionally, we have provided the “User Group” top-level menu options where the user can interact with groups of sequences defined by a user specified set of titles, supplied as a text list. An example of supplying such a set of titles is given within the pdf script that accompanies the demonstration video (cited on lines 164, 393 and 599). We have clarified this functionality further with the user documentation as well as within the manuscript (line 382). Selecting multiple sequences using the mouse has not been implemented as we want to encourage the user to explore the alignment through interaction with the associated network through defined groups such as those specified above or through network nodes. For example, if the user clicks on a node on the network, then within the sequence view area all sequences passing through that node are highlighted with a red dot and are arranged to the top of the sequence list (discussed at line 238). Methods can then be applied to these through the “Node Path” top-level menu option. If the user can drag the mouse and select random sequences at will then such sequences will likely pass though multiple nodes on the network and no visual intuition will be achieved. We feel that interacting with sequences through the “Title Search”, “User Group” and “Node Path” menus options increases the definition of what information is contained with the various output options and improves output data robustness. This has been discussed within the manuscript starting at line 378. In terms of considering to adding CView to bioconda and biootools, initially we were not sure how an alignment viewer would fit such platforms and this is why we created it as an open source sourceforge project. However, these platforms offer an interesting opportunity for us to increase our user-base and this is something we would like to look into further as the user scope of our tool becomes more clear. "Please, consider stating the existence of alignment-free methodologies (for example, to visual applications with larger viral genomes, such as Herpesvirus) and Galaxy; Please, state clearly in the main manuscript (perhaps in the abstract) if this tool is open-source or not." The existence of alignment free method has now been explicitly mentioned within the introduction, as has the reasoning we chose to create a tool based on aligned sequences. This inclusion is at line 112 and we have added some references in relation to it. 31 May 2022 CView: A network based tool for enhanced alignment visualization PONE-D-21-33624R1 Dear Dr. Archer, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Eduardo Andrés-León Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I would like to thank the Authors for addressing my comments and including a video tutorial which I found very helpful. I was able to run the updated version of the software without any problems and I don't have any further comments. I look forward to the future versions of the software that would support the Stockholm format. Reviewer #2: The authors have addressed my concerns. I'm happy to see that this software is open-source, a significant advantage over other software. Also, the video significantly improved the easiness of learning about this tool. Below are some points that could improve the manuscript and/or software. Another advantage of using alignment methods is the enhanced local resolution. In my view, alignment-free and alignment methodologies are complementary. Convincing a researcher to learn and use software intensely depends on multiple characteristics, including flexibility, simplicity, novelty, and communication. The video has dramatically improved the communication (Please, consider using Github for the next time - it offers much more features than Sourceforge; for example, the inclusion of a video online on the provided website, improved visualization of the repository, and custom characteristics). The simplicity and novelty are achieved with the good visualization usability and features it provides (great networking synchronized with the view of the alignment). The flexibility is perhaps the only feature that could contain an enhanced feature. Allowing CView to perform alignments would enable it to avoid multiple tools and provide a comprehensive offer. Please, consider in the future to add this option. It would be great to have the option to change the default colors (perhaps in the main menu tab). This feature would enable a customized version of the software (also for the generation of the image prints) and please different individuals. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Anton Petrov Reviewer #2: No 2 Jun 2022 PONE-D-21-33624R1 CView: A network based tool for enhanced alignment visualization Dear Dr. Archer: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Eduardo Andrés-León Academic Editor PLOS ONE

49 in total

1. Evaluation of methods for detecting recombination from DNA sequences: computer simulations.

Authors: D Posada; K A Crandall
Journal: Proc Natl Acad Sci U S A Date: 2001-11-20 Impact factor: 11.205

2. Differences in molecular evolution between switch (R5 to R5X4/X4-tropic) and non-switch (R5-tropic only) HIV-1 populations during infection.

Authors: Mattias Mild; Anders Kvist; Joakim Esbjörnsson; Ingrid Karlsson; Eva Maria Fenyö; Patrik Medstrand
Journal: Infect Genet Evol Date: 2009-05-14 Impact factor: 3.342

3. Clustal W and Clustal X version 2.0.

Authors: M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal: Bioinformatics Date: 2007-09-10 Impact factor: 6.937

4. CView: A network based tool for enhanced alignment visualization.

Authors: Raquel Linheiro; Stephen Sabatino; Diana Lobo; John Archer
Journal: PLoS One Date: 2022-06-13 Impact factor: 3.752

5. Interspecies Transmission, Genetic Diversity, and Evolutionary Dynamics of Pseudorabies Virus.

Authors: Wanting He; Lisa Zoé Auclert; Xiaofeng Zhai; Gary Wong; Cheng Zhang; Henan Zhu; Gang Xing; Shilei Wang; Wei He; Kemang Li; Liang Wang; Guan-Zhu Han; Michael Veit; Jiyong Zhou; Shuo Su
Journal: J Infect Dis Date: 2019-05-05 Impact factor: 5.226