Literature DB >> 28505226

OMTools: a software package for visualizing and processing optical mapping data.

Alden King-Yung Leung^1,2, Nana Jin^1,2, Kevin Y Yip³, Ting-Fung Chan^1,2.

Abstract

SUMMARY: Optical mapping is a molecular technique capturing specific patterns of fluorescent labels along DNA molecules. It has been widely applied in assisted-scaffolding in sequence assemblies, microbial strain typing and detection of structural variations. Various computational methods have been developed to analyze optical mapping data. However, existing tools for processing and visualizing optical map data still have many shortcomings. Here, we present OMTools, an efficient and intuitive data processing and visualization suite to handle and explore large-scale optical mapping profiles. OMTools includes modules for visualization (OMView), data processing and simulation. These modules together form an accessible and convenient pipeline for optical mapping analyses.
AVAILABILITY AND IMPLEMENTATION: OMTools is implemented in Java 1.8 and released under a GPL license. OMTools can be downloaded from https://github.com/aldenleung/OMTools and run on any standard desktop computer equipped with a Java virtual machine. CONTACT: kevinyip@cse.cuhk.edu.hk or tf.chan@cuhk.edu.hk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28505226 PMCID： PMC5870549 DOI： 10.1093/bioinformatics/btx317

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Optical mapping is a technique for imaging DNA molecules along which specific labels are captured. These labels form distinct patterns along DNA molecules based on their nucleotide sequences. Compared with the short read length in next-generation sequencing (∼150 bp), DNA molecules in optical mapping data are several orders longer, at ∼100–1000 kb (Lam ). Because of the much greater read length, optical mapping has been used for assisted scaffolding, (Dong ), structural variations detection (Cao , Mak ) and microbial strain typing (Schwan ). There are currently two instrument platforms available for generating optical mapping data: OpGen Inc. and BioNano Genomics Inc. Emerging computational methods have been developed for data analysis. To gain knowledge and insights from the optical mapping data, human analysis and interpretation is required and usually involves data visualization. At present, three visualization tools for optical mapping are available: BioNumerics v7 (Applied Maths NV), IrysView (Shelton ) and JBrowse (Skinner ). However, none of them could provide multiple visualization styles in accordance to the types of application. Here, we present the software package OMTools, which comprises modules for visualization (OMView), processing optical mapping data and alignment results, and simulation. The package has fast loading time, easy installation and support to multiple data formats (Supplementary Tables S1 and S2).

2 OMView

OMView is a multipurpose visualization module for optical mapping data analysis. Five types of visualizations (Supplementary Table S3) are implemented in OMView for different objectives: regional view, anchor view, alignment view, multiple alignments view and molecule view (Fig. 1A–E). In all types of visualizations, a rectangular block represents the DNA backbone, while vertical bars on the blocks represent signals on the DNA molecule. We explain the purpose and a related example for each type of visualization below.

Fig. 1

Major visualization modes in OMView. Five types of views are available to visualize optical mapping data for different purposes: (A) Regional view, (B) Anchor view, (C) Alignment view, (D) Multiple alignments view and (E) Molecule view The regional view displays all alignments as an overview at a selected region. In each panel, the reference DNA is represented by a thick red line, while aligned and unaligned portions of a molecule are represented by yellow and green lines, respectively. Molecule signals matching the reference signals are in pink while the unaligned ones are in black. Contigs alignment along the reference is also displayed in a similar manner. Each aligned molecule is stretched accordingly so that the first and last matching signals are located at the respective horizontal positions of the reference signals. For a molecule that has individual portions separately aligned to different reference regions, OMView depicts the relationships between consecutive aligned portions (insertion, deletion, inversion and translocation). Multiple panels could be created to display alignment results from different datasets for comparison at the same region, as exemplified with a case of copy number variations on chromosome 18 in the 1000 Genomes trio dataset NA12878, NA12891 and NA12892 in Figure 1A (Mak ). Additional panels could be loaded to visualize annotations on the reference, such as gene annotations or gaps depicted as black rectangles above the alignments. The anchor view is mainly designed for validating structural variations. It displays alignments of which molecule signals could match two selected signals on the reference. Under this view, molecules are shown at the original measured lengths such that structural variations can be easily seen. After the automatic sorting of alignments according to the distance between the two signals on molecules by a sub-module in OMView, the presence as well as the zygosity of insertions/deletions could be easily determined. A similar example of copy number variations described before in anchor view is shown in Figure 1B. The alignment view illustrates the alignment of one single molecule against the reference with more details. This is especially important in visualizing partial alignments with complicated relationships. The alignment view employs the aforementioned coloring scheme, where an extended blue block is added to represent unaligned portion of the reference. Multiple panels of alignments are displayed if there are individual portions separately aligned to different reference regions. Below each alignment shows detailed information including aligned signals, alignment score and CIGAR (Concise Idiosyncratic Gapped Alignment Report) string. Certain alignments picked from the previous copy number variations example and an alignment demonstrating an inversion is shown in Figure 1C (Mak ). The multiple alignments view, designed for whole-genome comparisons, depicts multiple optical map alignment of a dataset. Here, matching patterns are represented by a list of collinear-aligned-blocks. Each row of rectangular blocks represents one optical map genome, while each collinear-aligned-block contains a column of rectangular blocks with the same color that represents a matching pattern across different genomes. Down the same column position, a solid line represents a gap and a different colored block represents different matching patterns. Users can therefore easily visualize the structural difference within variable regions across multiple samples. Figure 1D offers an example of multiple alignments. The user-interface also allows users to manually modify the multiple alignment results and produce statistics of collinear blocks. Finally, the molecule view is a module for inspection of molecules to offer a general impression on the dataset as shown in Figure 1E. One could sort the molecules by name, size or number of signals, and a constant number of molecules can be viewed page-by-page.

3 Additional features

OMTools contains some useful modules that can be executed within the same software framework for processing of optical mapping data (Supplementary Table S4). Data processing is important for any downstream analysis. OMTools provides filtering tools on optical mapping data such as filtering by size or number of signals. Since molecules with high density or low complexity impede alignment and assembly, OMTools also offers a signal density and complexity filter. A separate module could be applied to detect data duplication errors. Similarly, OMTools provides filtering tools on alignment results. A partial alignment joining module separated from OMBlast (Leung ) could be employed to treat alignments from other alignment methods as partial alignments to connect them into final alignments. OMTools can also merge and generate statistics for results from various alignment methods. A set of simulation modules enables data simulation with a variety of modeling parameters. Various categories of errors including missing and extra signals, scaling, measurement and resolution error are modeled, with optional structural variations added on reference or data to test software related to structural variation detection.

4 Conclusions

OMTools offers a fundamental toolbox for optical mapping data processing. OMView serves as a powerful visualization tool for data analysis and illustration. On top of the existing modules and methods included in the OMTools Java library, researchers could build additional analysis modules with minimal effort. Click here for additional data file.

8 in total

1. JBrowse: a next-generation genome browser.

Authors: Mitchell E Skinner; Andrew V Uzilov; Lincoln D Stein; Christopher J Mungall; Ian H Holmes
Journal: Genome Res Date: 2009-07-01 Impact factor: 9.043

2. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus).

Authors: Yang Dong; Min Xie; Yu Jiang; Nianqing Xiao; Xiaoyong Du; Wenguang Zhang; Gwenola Tosser-Klopp; Jinhuan Wang; Shuang Yang; Jie Liang; Wenbin Chen; Jing Chen; Peng Zeng; Yong Hou; Chao Bian; Shengkai Pan; Yuxiang Li; Xin Liu; Wenliang Wang; Bertrand Servin; Brian Sayre; Bin Zhu; Deacon Sweeney; Rich Moore; Wenhui Nie; Yongyi Shen; Ruoping Zhao; Guojie Zhang; Jinquan Li; Thomas Faraut; James Womack; Yaping Zhang; James Kijas; Noelle Cockett; Xun Xu; Shuhong Zhao; Jun Wang; Wen Wang
Journal: Nat Biotechnol Date: 2012-12-23 Impact factor: 54.908

3. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly.

Authors: Ernest T Lam; Alex Hastie; Chin Lin; Dean Ehrlich; Somes K Das; Michael D Austin; Paru Deshpande; Han Cao; Niranjan Nagarajan; Ming Xiao; Pui-Yan Kwok
Journal: Nat Biotechnol Date: 2012-08 Impact factor: 54.908

4. Use of optical mapping to sort uropathogenic Escherichia coli strains into distinct subgroups.

Authors: William R Schwan; Adam Briska; Buffy Stahl; Trevor K Wagner; Emily Zentz; John Henkhaus; Steven D Lovrich; William A Agger; Steven M Callister; Brian DuChateau; Colin W Dykes
Journal: Microbiology (Reading) Date: 2010-04-08 Impact factor: 2.777

5. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology.

Authors: Hongzhi Cao; Alex R Hastie; Dandan Cao; Ernest T Lam; Yuhui Sun; Haodong Huang; Xiao Liu; Liya Lin; Warren Andrews; Saki Chan; Shujia Huang; Xin Tong; Michael Requa; Thomas Anantharaman; Anders Krogh; Huanming Yang; Han Cao; Xun Xu
Journal: Gigascience Date: 2014-12-30 Impact factor: 6.524

6. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

Authors: Angel C Y Mak; Yvonne Y Y Lai; Ernest T Lam; Tsz-Piu Kwok; Alden K Y Leung; Annie Poon; Yulia Mostovoy; Alex R Hastie; William Stedman; Thomas Anantharaman; Warren Andrews; Xiang Zhou; Andy W C Pang; Heng Dai; Catherine Chu; Chin Lin; Jacob J K Wu; Catherine M L Li; Jing-Woei Li; Aldrin K Y Yim; Saki Chan; Justin Sibert; Željko Džakula; Han Cao; Siu-Ming Yiu; Ting-Fung Chan; Kevin Y Yip; Ming Xiao; Pui-Yan Kwok
Journal: Genetics Date: 2015-10-28 Impact factor: 4.562

7. OMBlast: alignment tool for optical mapping using a seed-and-extend approach.

Authors: Alden King-Yung Leung; Tsz-Piu Kwok; Raymond Wan; Ming Xiao; Pui-Yan Kwok; Kevin Y Yip; Ting-Fung Chan
Journal: Bioinformatics Date: 2017-02-01 Impact factor: 6.937

8. Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool.

Authors: Jennifer M Shelton; Michelle C Coleman; Nic Herndon; Nanyan Lu; Ernest T Lam; Thomas Anantharaman; Palak Sheth; Susan J Brown
Journal: BMC Genomics Date: 2015-09-29 Impact factor: 3.969

8 in total

9 in total

1. OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.

Authors: Alden King-Yung Leung; Melissa Chun-Jiao Liu; Le Li; Yvonne Yuk-Yin Lai; Catherine Chu; Pui-Yan Kwok; Pak-Leung Ho; Kevin Y Yip; Ting-Fung Chan
Journal: Gigascience Date: 2019-07-01 Impact factor: 6.524

2. Genome maps across 26 human populations reveal population-specific patterns of structural variation.

Authors: Michal Levy-Sakin; Steven Pastor; Yulia Mostovoy; Le Li; Alden K Y Leung; Jennifer McCaffrey; Eleanor Young; Ernest T Lam; Alex R Hastie; Karen H Y Wong; Claire Y L Chung; Walfred Ma; Justin Sibert; Ramakrishnan Rajagopalan; Nana Jin; Eugene Y C Chow; Catherine Chu; Annie Poon; Chin Lin; Ahmed Naguib; Wei-Ping Wang; Han Cao; Ting-Fung Chan; Kevin Y Yip; Ming Xiao; Pui-Yan Kwok
Journal: Nat Commun Date: 2019-03-04 Impact factor: 14.919

3. A reference-grade wild soybean genome.

Authors: Min Xie; Claire Yik-Lok Chung; Man-Wah Li; Fuk-Ling Wong; Xin Wang; Ailin Liu; Zhili Wang; Alden King-Yung Leung; Tin-Hang Wong; Suk-Wah Tong; Zhixia Xiao; Kejing Fan; Ming-Sin Ng; Xinpeng Qi; Linfeng Yang; Tianquan Deng; Lijuan He; Lu Chen; Aisi Fu; Qiong Ding; Junxian He; Gyuhwa Chung; Sachiko Isobe; Takanari Tanabata; Babu Valliyodan; Henry T Nguyen; Steven B Cannon; Christine H Foyer; Ting-Fung Chan; Hon-Ming Lam
Journal: Nat Commun Date: 2019-03-14 Impact factor: 14.919

4. The 22q11 low copy repeats are characterized by unprecedented size and structural variability.

Authors: Wolfram Demaerel; Yulia Mostovoy; Feyza Yilmaz; Lisanne Vervoort; Steven Pastor; Matthew S Hestand; Ann Swillen; Elfi Vergaelen; Elizabeth A Geiger; Curtis R Coughlin; Stephen K Chow; Donna McDonald-McGinn; Bernice Morrow; Pui-Yan Kwok; Ming Xiao; Beverly S Emanuel; Tamim H Shaikh; Joris R Vermeesch
Journal: Genome Res Date: 2019-09 Impact factor: 9.043

5. nanotatoR: a tool for enhanced annotation of genomic structural variants.

Authors: Surajit Bhattacharya; Hayk Barseghyan; Emmanuèle C Délot; Eric Vilain
Journal: BMC Genomics Date: 2021-01-06 Impact factor: 3.969

6. FaNDOM: Fast nested distance-based seeding of optical maps.

Authors: Siavash Raeisi Dehkordi; Jens Luebeck; Vineet Bafna
Journal: Patterns (N Y) Date: 2021-05-03

7. Combining dense and sparse labeling in optical DNA mapping.

Authors: Erik Torstensson; Gaurav Goyal; Anna Johnning; Fredrik Westerlund; Tobias Ambjörnsson
Journal: PLoS One Date: 2021-11-29 Impact factor: 3.240

8. Genomic regions associated with microdeletion/microduplication syndromes exhibit extreme diversity of structural variation.

Authors: Yulia Mostovoy; Feyza Yilmaz; Stephen K Chow; Catherine Chu; Chin Lin; Elizabeth A Geiger; Naomi J L Meeks; Kathryn C Chatfield; Curtis R Coughlin; Urvashi Surti; Pui-Yan Kwok; Tamim H Shaikh
Journal: Genetics Date: 2021-02-09 Impact factor: 4.562

9. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps.

Authors: Le Li; Alden King-Yung Leung; Tsz-Piu Kwok; Yvonne Y Y Lai; Iris K Pang; Grace Tin-Yun Chung; Angel C Y Mak; Annie Poon; Catherine Chu; Menglu Li; Jacob J K Wu; Ernest T Lam; Han Cao; Chin Lin; Justin Sibert; Siu-Ming Yiu; Ming Xiao; Kwok-Wai Lo; Pui-Yan Kwok; Ting-Fung Chan; Kevin Y Yip
Journal: Genome Biol Date: 2017-12-01 Impact factor: 13.583

9 in total