Literature DB >> 35561173

plotsr: visualizing structural similarities and rearrangements between multiple genomes.

Manish Goel1,2, Korbinian Schneeberger1,2.   

Abstract

SUMMARY: Third-generation genome sequencing technologies have led to a sharp increase in the number of high-quality genome assemblies. This allows the comparison of multiple assembled genomes of individual species and demands new tools for visualizing their structural properties. Here, we present plotsr, an efficient tool to visualize structural similarities and rearrangements between genomes. It can be used to compare genomes on chromosome level or to zoom in on any selected region. In addition, plotsr can augment the visualization with regional identifiers (e.g. genes or genomic markers) or histogram tracks for continuous features (e.g. GC content or polymorphism density).
AVAILABILITY AND IMPLEMENTATION: plotsr is implemented as a python package and uses the standard matplotlib library for plotting. It is freely available under the MIT license at GitHub (https://github.com/schneebergerlab/plotsr) and bioconda (https://anaconda.org/bioconda/plotsr). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2022        PMID: 35561173      PMCID: PMC9113368          DOI: 10.1093/bioinformatics/btac196

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

Third-generation sequencing technologies together with efficient phasing and scaffolding methods like Hi-C, trio-binning or gamete-binning have led to a sharp increase in the number of available haplotype-resolved, chromosome-level assemblies (Campoy ; Koren ; Zhang ). Chromosome-level assemblies allow the identification of small genomic differences (like SNPs and indels) as well as large structural rearrangements (SRs) like inversions and translocations. Therefore, they are considered the gold standard for genomic differences identification (Simpson and Pop, 2015). Typically, the overall structure of individual genomes of a specific species is highly conserved because the genomes recombine (exchange chromosome arms) during sexual reproduction and thereby keep the same karyotype across different haplotypes. This introduces large syntenic regions between the genomes where recombination can occur without affecting the overall structure of the genomes. During genome comparisons, these syntenic regions (the ‘syntenic backbone’) can be identified. All remaining regions in the genomes are structural rearrangements by definition and can then be classified into inversions, duplications or translocations (here, we consider both intra- and inter-chromosomal relocations as translocations) based on their orientation and location in the genomes. Recently, we used these principles to develop SyRI, a tool to identify genomic differences in whole-genome assemblies of the same species (Goel ). For better analysis of these genomic differences between multiple genomes, there is a need for accurate and intuitive visualization tools. Currently available tools can visualize large structural rearrangements between a pair of genomes (e.g. MUMmer, Ribbon, rearrvisr) or are designed for visualizing local variations like SNPs and indels in pan-genomes (e.g. tubemaps, ODGI) (Beyer ; Guarracino ; Kurtz ; Lindtke and Yeaman, 2020; Nattestad ). For the visualization of large structural rearrangements in multiple genomes, we have developed plotsr (plot structural rearrangements). plotsr uses the synteny between genomes to identify homologous chromosomes as well as to match the orthologous regions between the genomes allowing for efficient zooming in on specific regions. It is a simple-to-use yet flexible and powerful visualization tool. It can be used to compare multiple haploid genomes as well as different haplotypes of individual polyploid genomes. In addition, plotsr can mark specific loci as well as plot histogram tracks to show distributions of genomic features along the chromosomes.

2 Implementation

plotsr is a python-based command-line tool. It requires the chromosome size (either through a fasta file or as a table) and the synteny and SRs information between the assemblies in a pairwise manner as input. For example, to visualize genomes A, B and C in this order, plotsr requires the comparison of A versus B and B versus C. These can be generated using genomic difference identification methods like SyRI, MUM&Co or assemblytics (Goel ; Nattestad and Schatz, 2016; O’Donnell and Fischer, 2020). The output of SyRI is accepted directly, while output from other methods can be provided in BEDPE format. Firstly, plotsr validates that the assemblies and structural information are consistent. Then, by using the pairwise synteny between genomes, it groups homologous chromosomes across the genomes and then plots the syntenic regions as well as SRs between them. plotsr can generate plots in two modes, (i) stacked mode: for better visualization of synteny and intra-chromosome rearrangements (Fig. 1); (ii) itx mode (similar to plots generated by JCVI): for better visualization of inter-chromosomal rearrangements (Fig. 2) (Tang ). The output can be generated in pdf, png or svg format. In addition, plotsr can show markers at predefined loci (e.g. genes, TEs or genomic markers) using BED files. plotsr can also plot the distribution of genomic features along the chromosomes (e.g. distribution of genes, SNPs, sequencing reads, etc.). This provides a visual comparison between sequence features and structural properties of the chromosomes. To adjust the plots, plotsr includes multiple parameters to control the visual properties (colour, size, spacing, etc.) of genomes, markers and tracks.
Fig. 1.

Visualizing structural rearrangements using plotsr. We used plotsr to visualize syntenic regions and structural rearrangements between 10 chromosomes from 6 human genomes. The visualization was created using plotsr without further modifications. Tracks for three genomic features: genes, number of SNPs and centromeric regions were included using optional parameters. In the genes track, smaller lines correspond to transcribed regions and longer lines represent coding-sequences (CDS)

Fig. 2.

Customizing visualization using plotsr. The individual panels were created using plotsr without any further modifications. (a) Zooming in on a specific location allows for resolved visualization of the local genomic differences. Here, we visualized Chr8:1–13 000 000. Using plotsr, we have labelled a large inversion and a not-aligned region that became visible in the zoomed in view. (b) Inter-chromosomal rearrangements among the 10 chromosomes

Visualizing structural rearrangements using plotsr. We used plotsr to visualize syntenic regions and structural rearrangements between 10 chromosomes from 6 human genomes. The visualization was created using plotsr without further modifications. Tracks for three genomic features: genes, number of SNPs and centromeric regions were included using optional parameters. In the genes track, smaller lines correspond to transcribed regions and longer lines represent coding-sequences (CDS) Customizing visualization using plotsr. The individual panels were created using plotsr without any further modifications. (a) Zooming in on a specific location allows for resolved visualization of the local genomic differences. Here, we visualized Chr8:1–13 000 000. Using plotsr, we have labelled a large inversion and a not-aligned region that became visible in the zoomed in view. (b) Inter-chromosomal rearrangements among the 10 chromosomes plotsr can also be used to zoom in on specific regions in any of the input genomes. For this, plotsr identifies the corresponding orthologous regions in all other genomes. This is a non-trivial task as some regions might include multiple rearrangements that obfuscate the syntenic regions in the other genomes. The identification of all syntenic regions would require whole-genome alignments of all genomes against the genome of interest implying the need for an all-versus-all genome alignment as input. This is computationally prohibitive once more than a few dozen genomes are involved. Instead, plotsr overcomes this challenge by using the syntenic backbone between the genomes to zoom in on any given region. For this, plotsr iteratively selects the regions syntenic to the selected region using pairwise genome comparisons until all genomes are covered. It then filters the structural information to only plot information overlapping these homologous regions resulting in a zoomed-in view of the genomes. Markers and feature tracks are also filtered automatically to plot those overlapping with the homologous regions.

3 Results

We visualized structural rearrangements between the human reference sequence (GRCh38), the human telomere-to-telomere assembly (t2t), two assemblies from the Human Pangenome Reference Consortium (panpat and panmat) and two assemblies from the Vertebrate Genomes Project (vgppat and vgpmat) using plotsr (Abdellah ; Jarvis ; Nurk ; Rhie ). Figure 1 shows the structural rearrangements in the first ten chromosomes whereas Supplementary Figure S1 shows structural rearrangements in all autosomal chromosomes. For this, pairwise whole-genome alignments were performed using minimap2 followed by synteny and structural rearrangement identification using SyRI (Goel ; Li, 2018). We also plotted gene annotation, distribution of common SNPs and centromere coordinates. Figure 1 shows that the genomes are predominantly syntenic (grey alignments). The vgppat and vgpmat assemblies have smaller pericentromeric regions (highly rearranged regions near the centromere) in chromosomes 1 and 9. Consequently, these chromosomes are smaller in vgppat and vgpmat genomes than other genomes. We also observed the depletion of genes in the centromeric regions. Using plotsr, we could also zoom in to highlight the genomic differences at Chr8:1–13 000 000 (reference genome coordinates) (Fig. 2a). The region was provided as a command-line parameter to plotsr which then automatically filtered and plotted the syntenic regions and rearrangements in all of the other genomes. In this region, we observed large inversions between the assemblies (labeled as ‘Inversion’ using plotsr) suggesting the presence of broadly two haplotypes (Logsdon ). We also observed that a large region without any alignment between the t2t and the panpat genomes (labelled as ‘Not aligned’) became visible within the zoom-in visualization. In Figure 2b, we show the inter-chromosomal translocations and duplications between the assemblies as well using the ‘itx mode’ visualization from plotsr. We benchmarked plotsr by visualizing differences in six human (haploid genome size: ∼3 Gbp), eight Arabidopsis thaliana (haploid genome size: ∼120 Mbp, Supplementary Fig. S2) and four potato (haploid genome size: ∼800 Mbp, Supplementary Fig. S3) genomes. plotsr finished within 1 min and used less than 0.5 GB of RAM for all tests (Supplementary Fig. S4). Runtime and memory both scaled linearly with the number of samples. They were independent of the size of the genome, rather they were correlated to the number of structural rearrangements present in the genomes. Filtering out small variants (SNPs and InDels) from input files further improved the runtime in all tests. We also demonstrated the usability of plotsr with different structural differences identification methods by visualizing genomic differences between A.thaliana accessions identified by MUM&Co and assemblytics (Supplementary Figs S5 and S6, Supplementary Note S1) (Nattestad and Schatz, 2016; O’Donnell and Fischer, 2020).

4 Discussion and conclusion

The advent of long-read sequencing technologies has simplified the generation of high-quality genome assemblies. To support the visual analysis of such assemblies, we presented plotsr, a python-based command-line tool for visualizing structural similarities and rearrangements between genomes. In addition, plotsr allows visualization of genomic features as well as zoom-in views on specific regions. plotsr is highly efficient as it only requires pairwise comparisons in the order in which the genomes are compared. In turn, this limits the visualization flexibility because different orders of the genomes would require additional comparisons (which of course could be generated). However, often the genome order is predetermined (e.g. based on phylogeny), and in such cases, pairwise comparisons are computationally more efficient than comparing all genomes against each other. plotsr generates publication-quality visualizations that have already been used by several research groups (Li ; van Rengs ; Zamyatin ; Zhang ). We believe that plotsr visualizations will help in getting a better understanding of the genome divergence of a species. Given the great importance of genomic analysis in many research fields, we are continuously developing plotsr to add more useful parameters allowing for more control and customization. Click here for additional data file.
  20 in total

Review 1.  The Theory and Practice of Genome Sequence Assembly.

Authors:  Jared T Simpson; Mihai Pop
Journal:  Annu Rev Genomics Hum Genet       Date:  2015-04-22       Impact factor: 8.929

2.  Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data.

Authors:  Xingtan Zhang; Shengcheng Zhang; Qian Zhao; Ray Ming; Haibao Tang
Journal:  Nat Plants       Date:  2019-08-05       Impact factor: 15.793

3.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

4.  The complete sequence of a human genome.

Authors:  Sergey Nurk; Sergey Koren; Arang Rhie; Mikko Rautiainen; Andrey V Bzikadze; Alla Mikheenko; Mitchell R Vollger; Nicolas Altemose; Lev Uralsky; Ariel Gershman; Sergey Aganezov; Savannah J Hoyt; Mark Diekhans; Glennis A Logsdon; Michael Alonge; Stylianos E Antonarakis; Matthew Borchers; Gerard G Bouffard; Shelise Y Brooks; Gina V Caldas; Nae-Chyun Chen; Haoyu Cheng; Chen-Shan Chin; William Chow; Leonardo G de Lima; Philip C Dishuck; Richard Durbin; Tatiana Dvorkina; Ian T Fiddes; Giulio Formenti; Robert S Fulton; Arkarachai Fungtammasan; Erik Garrison; Patrick G S Grady; Tina A Graves-Lindsay; Ira M Hall; Nancy F Hansen; Gabrielle A Hartley; Marina Haukness; Kerstin Howe; Michael W Hunkapiller; Chirag Jain; Miten Jain; Erich D Jarvis; Peter Kerpedjiev; Melanie Kirsche; Mikhail Kolmogorov; Jonas Korlach; Milinn Kremitzki; Heng Li; Valerie V Maduro; Tobias Marschall; Ann M McCartney; Jennifer McDaniel; Danny E Miller; James C Mullikin; Eugene W Myers; Nathan D Olson; Benedict Paten; Paul Peluso; Pavel A Pevzner; David Porubsky; Tamara Potapova; Evgeny I Rogaev; Jeffrey A Rosenfeld; Steven L Salzberg; Valerie A Schneider; Fritz J Sedlazeck; Kishwar Shafin; Colin J Shew; Alaina Shumate; Ying Sims; Arian F A Smit; Daniela C Soto; Ivan Sović; Jessica M Storer; Aaron Streets; Beth A Sullivan; Françoise Thibaud-Nissen; James Torrance; Justin Wagner; Brian P Walenz; Aaron Wenger; Jonathan M D Wood; Chunlin Xiao; Stephanie M Yan; Alice C Young; Samantha Zarate; Urvashi Surti; Rajiv C McCoy; Megan Y Dennis; Ivan A Alexandrov; Jennifer L Gerton; Rachel J O'Neill; Winston Timp; Justin M Zook; Michael C Schatz; Evan E Eichler; Karen H Miga; Adam M Phillippy
Journal:  Science       Date:  2022-03-31       Impact factor: 63.714

5.  ODGI: understanding pangenome graphs.

Authors:  Andrea Guarracino; Simon Heumos; Sven Nahnsen; Pjotr Prins; Erik Garrison
Journal:  Bioinformatics       Date:  2022-05-13       Impact factor: 6.931

6.  SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies.

Authors:  Manish Goel; Hequan Sun; Wen-Biao Jiao; Korbinian Schneeberger
Journal:  Genome Biol       Date:  2019-12-16       Impact factor: 13.583

7.  Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes.

Authors:  José A Campoy; Hequan Sun; Manish Goel; Wen-Biao Jiao; Kat Folz-Donahue; Nan Wang; Manuel Rubio; Chang Liu; Christian Kukat; David Ruiz; Bruno Huettel; Korbinian Schneeberger
Journal:  Genome Biol       Date:  2020-12-29       Impact factor: 13.583

8.  The structure, function and evolution of a complete human chromosome 8.

Authors:  Glennis A Logsdon; Mitchell R Vollger; PingHsun Hsieh; Yafei Mao; Mikhail A Liskovykh; Sergey Koren; Sergey Nurk; Ludovica Mercuri; Philip C Dishuck; Arang Rhie; Leonardo G de Lima; Tatiana Dvorkina; David Porubsky; William T Harvey; Alla Mikheenko; Andrey V Bzikadze; Milinn Kremitzki; Tina A Graves-Lindsay; Chirag Jain; Kendra Hoekzema; Shwetha C Murali; Katherine M Munson; Carl Baker; Melanie Sorensen; Alexandra M Lewis; Urvashi Surti; Jennifer L Gerton; Vladimir Larionov; Mario Ventura; Karen H Miga; Adam M Phillippy; Evan E Eichler
Journal:  Nature       Date:  2021-04-07       Impact factor: 69.504

9.  Towards complete and error-free genome assemblies of all vertebrate species.

Authors:  Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal:  Nature       Date:  2021-04-28       Impact factor: 49.962

10.  De novo assembly of haplotype-resolved genomes with trio binning.

Authors:  Sergey Koren; Arang Rhie; Brian P Walenz; Alexander T Dilthey; Derek M Bickhart; Sarah B Kingan; Stefan Hiendleder; John L Williams; Timothy P L Smith; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2018-10-22       Impact factor: 54.908

View more
  1 in total

1.  Chromosome-level genome assembly for the Aldabra giant tortoise enables insights into the genetic health of a threatened population.

Authors:  F Gözde Çilingir; Luke A'Bear; Dennis Hansen; Leyla R Davis; Nancy Bunbury; Arpat Ozgul; Daniel Croll; Christine Grossen
Journal:  Gigascience       Date:  2022-10-12       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.