Literature DB >> 34214175

A sensitive repeat identification framework based on short and long reads.

Xingyu Liao1,2, Min Li1, Kang Hu1, Fang-Xiang Wu3, Xin Gao2, Jianxin Wang1.   

Abstract

Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2021        PMID: 34214175      PMCID: PMC8464074          DOI: 10.1093/nar/gkab563

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  54 in total

1.  MaskerAid: a performance enhancement to RepeatMasker.

Authors:  J A Bedell; I Korf; W Gish
Journal:  Bioinformatics       Date:  2000-11       Impact factor: 6.937

Review 2.  Short, interspersed repetitive DNA sequences in prokaryotic genomes.

Authors:  J R Lupski; G M Weinstock
Journal:  J Bacteriol       Date:  1992-07       Impact factor: 3.490

3.  PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes.

Authors:  Gwo-Liang Chen; Yun-Juan Chang; Chun-Hway Hsueh
Journal:  Bioinformatics       Date:  2013-08-19       Impact factor: 6.937

4.  DSK: k-mer counting with very low memory usage.

Authors:  Guillaume Rizk; Dominique Lavenier; Rayan Chikhi
Journal:  Bioinformatics       Date:  2013-01-16       Impact factor: 6.937

Review 5.  Repetitive DNA and next-generation sequencing: computational challenges and solutions.

Authors:  Todd J Treangen; Steven L Salzberg
Journal:  Nat Rev Genet       Date:  2011-11-29       Impact factor: 53.242

6.  T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data.

Authors:  Anna-Sophie Fiston-Lavier; Matthew Carrigan; Dmitri A Petrov; Josefa González
Journal:  Nucleic Acids Res       Date:  2010-12-21       Impact factor: 16.971

7.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks.

Authors:  Zhao Yang; René Algesheimer; Claudio J Tessone
Journal:  Sci Rep       Date:  2016-08-01       Impact factor: 4.379

8.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

9.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons.

Authors:  David Ellinghaus; Stefan Kurtz; Ute Willhoeft
Journal:  BMC Bioinformatics       Date:  2008-01-14       Impact factor: 3.169

10.  REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Authors:  Chong Chu; Rasmus Nielsen; Yufeng Wu
Journal:  PLoS One       Date:  2016-03-15       Impact factor: 3.240

View more
  3 in total

1.  msRepDB: a comprehensive repetitive sequence database of over 80 000 species.

Authors:  Xingyu Liao; Kang Hu; Adil Salhi; You Zou; Jianxin Wang; Xin Gao
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

2.  Hybrid Assembly and Annotation of the Genome of the Indian Punica granatum, a Superfood.

Authors:  Talambedu Usha; Sushil Kumar Middha; Dinesh Babu; Arvind Kumar Goyal; Anupam J Das; Deepti Saini; Aditya Sarangi; Venkatesh Krishnamurthy; Mothukapalli Krishnareddy Prasannakumar; Deepak Kumar Saini; Kora Rudraiah Sidhalinghamurthy
Journal:  Front Genet       Date:  2022-05-11       Impact factor: 4.772

Review 3.  Methodologies for the De novo Discovery of Transposable Element Families.

Authors:  Jessica M Storer; Robert Hubley; Jeb Rosen; Arian F A Smit
Journal:  Genes (Basel)       Date:  2022-04-17       Impact factor: 4.141

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.