Literature DB >> 33375937

Convex hulls in hamming space enable efficient search for similarity and clustering of genomic sequences.

David S Campo1, Yury Khudyakov2.   

Abstract

BACKGROUND: In molecular epidemiology, comparison of intra-host viral variants among infected persons is frequently used for tracing transmissions in human population and detecting viral infection outbreaks. Application of Ultra-Deep Sequencing (UDS) immensely increases the sensitivity of transmission detection but brings considerable computational challenges when comparing all pairs of sequences. We developed a new population comparison method based on convex hulls in hamming space. We applied this method to a large set of UDS samples obtained from unrelated cases infected with hepatitis C virus (HCV) and compared its performance with three previously published methods.
RESULTS: The convex hull in hamming space is a data structure that provides information on: (1) average hamming distance within the set, (2) average hamming distance between two sets; (3) closeness centrality of each sequence; and (4) lower and upper bound of all the pairwise distances among the members of two sets. This filtering strategy rapidly and correctly removes 96.2% of all pairwise HCV sample comparisons, outperforming all previous methods. The convex hull distance (CHD) algorithm showed variable performance depending on sequence heterogeneity of the studied populations in real and simulated datasets, suggesting the possibility of using clustering methods to improve the performance. To address this issue, we developed a new clustering algorithm, k-hulls, that reduces heterogeneity of the convex hull. This efficient algorithm is an extension of the k-means algorithm and can be used with any type of categorical data. It is 6.8-times more accurate than k-mode, a previously developed clustering algorithm for categorical data.
CONCLUSIONS: CHD is a fast and efficient filtering strategy for massively reducing the computational burden of pairwise comparison among large samples of sequences, and thus, aiding the calculation of transmission links among infected individuals using threshold-based methods. In addition, the convex hull efficiently obtains important summary metrics for intra-host viral populations.

Entities:  

Keywords:  Centrality; Clustering; Hamming; Population distance

Year:  2020        PMID: 33375937      PMCID: PMC7772912          DOI: 10.1186/s12859-020-03811-z

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  21 in total

Review 1.  Estimating Prevalence of Hepatitis C Virus Infection in the United States, 2013-2016.

Authors:  Megan G Hofmeister; Elizabeth M Rosenthal; Laurie K Barker; Eli S Rosenberg; Meredith A Barranco; Eric W Hall; Brian R Edlin; Jonathan Mermin; John W Ward; A Blythe Ryerson
Journal:  Hepatology       Date:  2018-11-06       Impact factor: 17.425

2.  Hepatitis C virus genotypes and viral concentrations in participants of a general population survey in the United States.

Authors:  Omana V Nainan; Miriam J Alter; Deanna Kruszon-Moran; Feng-Xiang Gao; Guoliang Xia; Geraldine McQuillan; Harold S Margolis
Journal:  Gastroenterology       Date:  2006-08       Impact factor: 22.682

3.  MOODS: fast search for position weight matrix matches in DNA sequences.

Authors:  Janne Korhonen; Petri Martinmäki; Cinzia Pizzi; Pasi Rastas; Esko Ukkonen
Journal:  Bioinformatics       Date:  2009-09-22       Impact factor: 6.937

4.  The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies.

Authors:  Mattia C F Prosperi; Andrea De Luca; Simona Di Giambenedetto; Laura Bracciale; Massimiliano Fabbiani; Roberto Cauda; Marco Salemi
Journal:  PLoS One       Date:  2010-10-25       Impact factor: 3.240

5.  BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

Authors:  Sergey I Nikolenko; Anton I Korobeynikov; Max A Alekseyev
Journal:  BMC Genomics       Date:  2013-01-21       Impact factor: 3.969

6.  Molecular epidemiology of a hepatitis C virus outbreak in a hemodialysis unit.

Authors:  Maria Alma Bracho; María José Gosalbes; David Blasco; Andrés Moya; Fernando González-Candelas
Journal:  J Clin Microbiol       Date:  2005-06       Impact factor: 5.948

7.  Efficient detection of viral transmissions with Next-Generation Sequencing data.

Authors:  Inna Rytsareva; David S Campo; Yueli Zheng; Seth Sims; Sharma V Thankachan; Cansu Tetik; Jain Chirag; Sriram P Chockalingam; Amanda Sue; Srinivas Aluru; Yury Khudyakov
Journal:  BMC Genomics       Date:  2017-05-24       Impact factor: 3.969

8.  Molecular evolution in court: analysis of a large hepatitis C virus outbreak from an evolving source.

Authors:  Fernando González-Candelas; María Alma Bracho; Borys Wróbel; Andrés Moya
Journal:  BMC Biol       Date:  2013-07-19       Impact factor: 7.431

9.  A novel method to identify routes of hepatitis C virus transmission.

Authors:  Cyrille Féray; Julie Bouscaillou; Bruno Falissard; Mostafa K Mohamed; Naglaa Arafa; Iman Bakr; Mostafa El-Hoseiny; Mai El Daly; Sherif El-Kafrawy; Sabine Plancoulaine; Mohamed Abdel-Hamid; Valérie Thiers; Arnaud Fontanet
Journal:  PLoS One       Date:  2014-01-23       Impact factor: 3.240

10.  GHOST: global hepatitis outbreak and surveillance technology.

Authors:  Atkinson G Longmire; Seth Sims; Inna Rytsareva; David S Campo; Pavel Skums; Zoya Dimitrova; Sumathi Ramachandran; Magdalena Medrzycki; Hong Thai; Lilia Ganova-Raeva; Yulin Lin; Lili T Punkova; Amanda Sue; Massimo Mirabito; Silver Wang; Robin Tracy; Victor Bolet; Thom Sukalac; Chris Lynberg; Yury Khudyakov
Journal:  BMC Genomics       Date:  2017-12-06       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.