Hasan H Otu1, Khalid Sayood. 1. University of Nebraska-Lincoln, Department of Electrical Engineering, 209N WSEC, 68503, USA. otu@eecomm.unl.edu
Abstract
MOTIVATION: One of the major problems in DNA sequencing is assembling the fragments obtained by shotgun sequencing. Most existing fragment assembly techniques follow the overlap-layout-consensus approach. This framework requires extensive computation in each phase and becomes inefficient with increasing number of fragments. RESULTS: We propose a new algorithm which solves the overlap, layout, and consensus phases simultaneously. The fragments are clustered with respect to their Average Mutual Information (AMI) profiles using the k-means algorithm. This removes the unnecessary burden of considering the collection of fragments as a whole. Instead, the orientation and overlap detection are solved efficiently, within the clusters. The algorithm has successfully reconstructed both artificial and real data. AVAILABILITY: Available on request from the authors.
MOTIVATION: One of the major problems in DNA sequencing is assembling the fragments obtained by shotgun sequencing. Most existing fragment assembly techniques follow the overlap-layout-consensus approach. This framework requires extensive computation in each phase and becomes inefficient with increasing number of fragments. RESULTS: We propose a new algorithm which solves the overlap, layout, and consensus phases simultaneously. The fragments are clustered with respect to their Average Mutual Information (AMI) profiles using the k-means algorithm. This removes the unnecessary burden of considering the collection of fragments as a whole. Instead, the orientation and overlap detection are solved efficiently, within the clusters. The algorithm has successfully reconstructed both artificial and real data. AVAILABILITY: Available on request from the authors.
Authors: Michael A Lee; Orla M Keane; Belinda C Glass; Tim R Manley; Neil G Cullen; Ken G Dodds; Alan F McCulloch; Chris A Morris; Mark Schreiber; Jonathan Warren; Amonida Zadissa; Theresa Wilson; John C McEwan Journal: BMC Genomics Date: 2006-11-26 Impact factor: 3.969
Authors: David Koslicki; Saikat Chatterjee; Damon Shahrivar; Alan W Walker; Suzanna C Francis; Louise J Fraser; Mikko Vehkaperä; Yueheng Lan; Jukka Corander Journal: PLoS One Date: 2015-10-23 Impact factor: 3.240