| Literature DB >> 29888139 |
Floréal Cabanettes1, Christophe Klopp1.
Abstract
Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.Entities:
Keywords: Dot plot; Genome assessment; Interactive user interface; Large genomes
Year: 2018 PMID: 29888139 PMCID: PMC5991294 DOI: 10.7717/peerj.4958
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Results page view.
(A) Main menu to navigate D-GENIES pages. (B) Reference and query sequence drop down selection boxes and button to zoom in the alignment. (C) Export menu to download image files (PNG and SVG), alignment, ordered query and unaligned query or reference FASTA files. (D) Identity color panel. (E) Match size filtering slider. (F) Identity filtering entry and check boxes. (G) Strong precision check-box. (H) Line width slider. (I) Reference and query border horizontal and vertical border line slider. (J) Query sort and unsort button. (K) Noise filtering button. (L) Similarity summary button. (M) Delete job button.
Figure 2Example of identity summary.
Processing times for Gepard, r2cat and D-GENIES.
| Reference genome | Query sequence | Gepard | r2cat | D-Genies |
|---|---|---|---|---|
| E. coli K12 | E. coli O157:H7 | 18 s | 5 s | 0.7 s |
| 3 min 25 s | 45 s | 5 s | ||
| 31 min 29 s | 13 min | 4 min | ||
| Drosophila | Drosophila | 53 min 21 s | 32 min | 3 min |
| 1 h 23 min | 55 min | 1 min |
Notes:
Ensembl 38 datasets.
Ensembl 38 datasets, Chromosome 1.
Ensembl 91 datasets.
Ensemble 38 datasets, str sakai.
Time without display generation: display time is high and the interface becomes buggy (memory limit is reached).
D-GENIES processing time and memory consumption for Ensembl datasets.
| Reference genome | Query genome | Elapsed time | Maximum RAM usage |
|---|---|---|---|
| Human | Chimpanzee | 67 min 14 s | 36 GB |
| Mouse | Rat | 39 min 54 s | 24 GB |
| Cow | Sheep | 43 min 3 s | 27 GB |
| 1 min 4 s | 2 GB | ||
| Poplar | Vine | 3 min 21 s | 8 GB |
| 2 min 52 s | 8.3 Gb |
Notes:
Ensembl 91 datasets.
Ensembl 38 datasets.