Literature DB >> 27587671

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Nan Du1, Yanni Sun1.   

Abstract

MOTIVATION: Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.
RESULTS: In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.
AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://sourceforge.net/projects/frame-pro/ CONTACT: yannisun@msu.edu.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2016        PMID: 27587671     DOI: 10.1093/bioinformatics/btw458

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  RIFRAF: a frame-resolving consensus algorithm.

Authors:  Kemal Eren; Ben Murrell
Journal:  Bioinformatics       Date:  2018-11-15       Impact factor: 6.937

2.  Genome-wide analysis of WOX genes in upland cotton and their expression pattern under different stresses.

Authors:  Zhaoen Yang; Qian Gong; Wenqiang Qin; Zuoren Yang; Yuan Cheng; Lili Lu; Xiaoyang Ge; Chaojun Zhang; Zhixia Wu; Fuguang Li
Journal:  BMC Plant Biol       Date:  2017-07-06       Impact factor: 4.215

3.  Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi.

Authors:  G Margos; S Hepner; C Mang; D Marosevic; S E Reynolds; S Krebs; A Sing; M Derdakova; M A Reiter; V Fingerle
Journal:  BMC Genomics       Date:  2017-05-30       Impact factor: 3.969

4.  Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.

Authors:  Y M Suvorova; M A Korotkova; K G Skryabin; E V Korotkov
Journal:  DNA Res       Date:  2019-04-01       Impact factor: 4.458

5.  Improving protein domain classification for third-generation sequencing reads using deep learning.

Authors:  Nan Du; Jiayu Shang; Yanni Sun
Journal:  BMC Genomics       Date:  2021-04-09       Impact factor: 3.969

6.  Transcriptomics and Metabolomics Reveal Purine and Phenylpropanoid Metabolism Response to Drought Stress in Dendrobium sinense, an Endemic Orchid Species in Hainan Island.

Authors:  Cuili Zhang; Jinhui Chen; Weixia Huang; Xiqiang Song; Jun Niu
Journal:  Front Genet       Date:  2021-07-02       Impact factor: 4.599

7.  Targeted Long-Read Sequencing of a Locus Under Long-Term Balancing Selection in Capsella.

Authors:  Jörg A Bachmann; Andrew Tedder; Benjamin Laenen; Kim A Steige; Tanja Slotte
Journal:  G3 (Bethesda)       Date:  2018-03-28       Impact factor: 3.154

8.  A new full-length circular DNA sequencing method for viral-sized genomes reveals that RNAi transgenic plants provoke a shift in geminivirus populations in the field.

Authors:  Devang Mehta; Matthias Hirsch-Hoffmann; Mariam Were; Andrea Patrignani; Syed Shan-E-Ali Zaidi; Hassan Were; Wilhelm Gruissem; Hervé Vanderschuren
Journal:  Nucleic Acids Res       Date:  2019-01-25       Impact factor: 16.971

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.