| Literature DB >> 31867049 |
Martin D Muggli1, Simon J Puglisi2, Christina Boucher3.
Abstract
BACKGROUND: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging.Entities:
Keywords: FM-index; Graph algorithms; Index based data structures; Optical mapping
Year: 2019 PMID: 31867049 PMCID: PMC6907254 DOI: 10.1186/s13015-019-0160-9
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Fig. 1Example automata and corresponding memory representation
Performance on simulated E. coli dataset
| Method | Time (s) | Memory (MB) | Align-ments | Recall | Precision |
|---|---|---|---|---|---|
| K | 20 | 19.0 | 907 | 702/4305 (16%) | 702/771 (91%) |
| K | 373 | 18.3 | 8545 | 3925/4305 (91%) | 3925/8545 (46%) |
| Valouev et al. | 148 | 4.0 | 742 | 699/4305 (16%) | 699/742 (94%) |
| MalignerDP | 47 | 6.0 | 1959 | 1296/4305 (30%) | 1296/1959 (66%) |
| OMBlast | 116 | 2078 | 1008 | 806/4305 (19%) | 806/1008 (80%) |
| RefAligner | 31 | 81.2 | 992 | 958/4305 (22%) | 948/992 (97%) |
| MalignerIX | 4 | 6.0 | 0 | 0/4305 (0%) | 0/0 (N/A) |
| OPTIMA | 455 | 10756.5 | 0 | 0/4305 (0%) | 0/0 (N/A) |
Kohdista (lax) demonstrates that our indexing and search method is capable of finding the majority of ground truth alignments when the search is pruned to the more relaxed thresholds for chi-squared-cdf-thresh and binom-cdf-thresh, i.e., chi-squared-cdf-thresh = 0.02 and binom-cdf-thresh = 0.5
Fig. 2Precision-recall plot of successful methods on simulated E. coli
Fig. 3ROC plot of successful methods on simulated E. coli
Performance on plum
| Method | Time (h) | Memory | Alignments |
|---|---|---|---|
| K | 31 | 7.4 GB | 16,109,151 |
| Valouev et al. | 678 | 60 MB | 6387 |
| MalignerDP | 214 | 784 MB | 1,258,328 |
| OMBlast | 151 | 12.3 GB | 424,730 |
| RefAligner | 90 | 374 MB | 10,039 |
Fig. 4A comparison between the quality of the scores of the alignments found by the various methods on the plum data. All alignments were realigned using the dynamic programming method of Valouev et al. [12] in order to acquire a score for each alignment. Hence, the method finds the optimal alignment using a function balancing size agreement and cut site agreement known as a S-score. The following alignments were considered: a those obtained from aligning random pairs of Rmaps; b those obtained from the method of Valouev et al. [12]; c those obtained from Kohdista; d those obtained from MalignerDP; e those obtained from OMBlast; and lastly, f those obtained from BioNano’s commercial RefAligner