Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A sensitive repeat identification framework based on short and long reads.

Literature DB >> 34214175

A sensitive repeat identification framework based on short and long reads.

Xingyu Liao^1,2, Min Li¹, Kang Hu¹, Fang-Xiang Wu³, Xin Gao², Jianxin Wang¹.

Abstract

Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).

Entities: Chemical

Mesh：

Year: 2021 PMID： 34214175 PMCID： PMC8464074 DOI： 10.1093/nar/gkab563

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

54 in total

1. MaskerAid: a performance enhancement to RepeatMasker.

Authors: J A Bedell; I Korf; W Gish
Journal: Bioinformatics Date: 2000-11 Impact factor: 6.937

Review 2. Short, interspersed repetitive DNA sequences in prokaryotic genomes.

Authors: J R Lupski; G M Weinstock
Journal: J Bacteriol Date: 1992-07 Impact factor: 3.490

3. PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes.

Authors: Gwo-Liang Chen; Yun-Juan Chang; Chun-Hway Hsueh
Journal: Bioinformatics Date: 2013-08-19 Impact factor: 6.937

4. DSK: k-mer counting with very low memory usage.

Authors: Guillaume Rizk; Dominique Lavenier; Rayan Chikhi
Journal: Bioinformatics Date: 2013-01-16 Impact factor: 6.937

Review 5. Repetitive DNA and next-generation sequencing: computational challenges and solutions.

Authors: Todd J Treangen; Steven L Salzberg
Journal: Nat Rev Genet Date: 2011-11-29 Impact factor: 53.242

6. T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data.

Authors: Anna-Sophie Fiston-Lavier; Matthew Carrigan; Dmitri A Petrov; Josefa González
Journal: Nucleic Acids Res Date: 2010-12-21 Impact factor: 16.971

7. A Comparative Analysis of Community Detection Algorithms on Artificial Networks.

Authors: Zhao Yang; René Algesheimer; Claudio J Tessone
Journal: Sci Rep Date: 2016-08-01 Impact factor: 4.379

8. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors: Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal: Gigascience Date: 2012-12-27 Impact factor: 6.524

9. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons.

Authors: David Ellinghaus; Stefan Kurtz; Ute Willhoeft
Journal: BMC Bioinformatics Date: 2008-01-14 Impact factor: 3.169

10. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Authors: Chong Chu; Rasmus Nielsen; Yufeng Wu
Journal: PLoS One Date: 2016-03-15 Impact factor: 3.240

3 in total

1. msRepDB: a comprehensive repetitive sequence database of over 80 000 species.

Authors: Xingyu Liao; Kang Hu; Adil Salhi; You Zou; Jianxin Wang; Xin Gao
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

2. Hybrid Assembly and Annotation of the Genome of the Indian Punica granatum, a Superfood.

Authors: Talambedu Usha; Sushil Kumar Middha; Dinesh Babu; Arvind Kumar Goyal; Anupam J Das; Deepti Saini; Aditya Sarangi; Venkatesh Krishnamurthy; Mothukapalli Krishnareddy Prasannakumar; Deepak Kumar Saini; Kora Rudraiah Sidhalinghamurthy
Journal: Front Genet Date: 2022-05-11 Impact factor: 4.772

Review 3. Methodologies for the De novo Discovery of Transposable Element Families.

Authors: Jessica M Storer; Robert Hubley; Jeb Rosen; Arian F A Smit
Journal: Genes (Basel) Date: 2022-04-17 Impact factor: 4.141

3 in total