| Literature DB >> 30245567 |
Laurynas Kalesinskas1,2, Evan Cudone1,3, Yuriy Fofanov4, Catherine Putonti1,2,5.
Abstract
With the daily release of data from whole genome sequencing projects, tools to facilitate comparative studies are hard-pressed to keep pace. Graphical software solutions can readily recognize synteny by measuring similarities between sequences. Nevertheless, regions of dissimilarity can prove to be equally informative; these regions may harbor genes acquired via lateral gene transfer (LGT), signify gene loss or gain, or include coding regions under strong selection. Previously, we developed the software S-plot. This tool employed an alignment-free approach for comparing bacterial genomes and generated a heatmap representing the genomes' similarities and dissimilarities in nucleotide usage. In prior studies, this tool proved valuable in identifying genome rearrangements as well as exogenous sequences acquired via LGT in several bacterial species. Herein, we present the next generation of this tool, S-plot2. Similar to its predecessor, S-plot2 creates an interactive, 2-dimensional heatmap capturing the similarities and dissimilarities in nucleotide usage between genomic sequences (partial or complete). This new version, however, includes additional metrics for analysis, new reporting options, and integrated BLAST query functionality for the user to interrogate regions of interest. Furthermore, S-plot2 can evaluate larger sequences, including whole eukaryotic chromosomes. To illustrate some of the applications of the tool, 2 case studies are presented. The first examines strain-specific variation across the Pseudomonas aeruginosa genome and strain-specific LGT events. In the second case study, corresponding human, chimpanzee, and rhesus macaque autosomes were studied and lineage specific contributions to divergence were estimated. S-plot2 provides a means to both visually and quantitatively compare nucleotide sequences, from microbial genomes to eukaryotic chromosomes. The case studies presented illustrate just 2 potential applications of the tool, highlighting its capability to identify and investigate the variation in molecular divergence rates across sequences. S-plot2 is freely available through https://bitbucket.org/lkalesinskas/splot and is supported on the Linux and MS Windows operating systems.Entities:
Keywords: alignment-free; comparative genomics; gene loss; gene transfer
Year: 2018 PMID: 30245567 PMCID: PMC6144591 DOI: 10.1177/1176934318797354
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1.Comparison of Pseudomonas aeruginosa PAO1 (x-axis) and PA7 (y-axis) genomes. (A) “Genome approach” comparison with a window and offset of 5000 bp. (B) Genomic island present with the PA7 strain. (C) “Gene-by-gene approach” comparison of protein-coding gene sequences annotated for the 2 genomes in panel A (*.faa files). Here, the window size is equivalent to a single coding region and k = 3 is evaluated (the same color bar as shown in panel A). The comparisons conducted here for both approaches were done using the Pearson correlation coefficient. Sequence similarity is measured by the frequency of shared k-mers, with green signifying low similarity and red signifying high similarity.
Seven Pseudomonas aeruginosa genomes examined.
| Strain | Genome size, Mbp | No. of scaffolds | No. of coding regions | Assembly |
|---|---|---|---|---|
| PAO1 | 6.26 | 1 | 5572 | GCA_000006765 |
| LESB58 | 6.60 | 1 | 6041 | GCA_000026645 |
| C3719 | 6.22 | 1 | 5648 | GCA_000152525 |
| PACS2 | 6.49 | 1 | 5913 | GCA_000168335 |
| JD316 | 6.19 | 1882 | 6590 | GCA_000506125 |
| JD317 | 6.49 | 2043 | 6979 | GCA_000506145 |
| JD320 | 6.41 | 2038 | 6876 | GCA_000506165 |
Sequences were retrieved for genomes (*_genomic.fna.gz) and coding sequences (*_cds_from_genomic.fna.gz).[42]
Figure 2.Evolution of the Pseudomonas aeruginosa chromosome. (A) Comparison of cluster topologies based on sequence similarity based on 6-mer usage for window size = offset size = 5000 bp over 0.2 Mbp regions of the PAO1 genome. Heatmaps for (B) PAO1 vs C3719, (C) LESB58, and (D) PACS2. The same color scale as Figure 1 is used here: sequence similarity is measured by the frequency of shared k-mers, with green signifying low similarity and red signifying high similarity.
Figure 3.Comparison of human (Homo sapiens) chromosome 17 (H17), chimpanzee (Pan troglodytes) chromosome 17 (C17), and rhesus (Macaca mulatta) chromosome 16 (R16). Sequence similarity is measured by the frequency of shared k-mers, with green signifying low similarity and red signifying high similarity. The inlay shows the divergence between H17 and C17 (red), C17 and R16 (yellow), and H17 and R16 (blue), relative to the window’s GC content. The x-axis is representative of the divergence calculated for a window relative to its GC content.