Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 WindowMasker: window-based masker for sequenced genomes.

Literature DB >> 16287941

WindowMasker: window-based masker for sequenced genomes.

Aleksandr Morgulis¹, E Michael Gertz, Alejandro A Schäffer, Richa Agarwala.

Abstract

MOTIVATION: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes.
RESULTS: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis. AVAILABILITY: WM is included in the NCBI C++ toolkit. The source code for the entire toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. Once the toolkit source is unpacked, the instructions for building WindowMasker application in the UNIX environment can be found in file src/app/winmasker/README.build. SUPPLEMENTARY INFORMATION: Supplementary data are available at ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf

Entities: Gene Species

Mesh：

Substances：
DNA

Year: 2005 PMID： 16287941 DOI： 10.1093/bioinformatics/bti774

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

113 in total

Review 1. A beginner's guide to eukaryotic genome annotation.

Authors: Mark Yandell; Daniel Ence
Journal: Nat Rev Genet Date: 2012-04-18 Impact factor: 53.242

2. An integrative study of a meromictic lake ecosystem in Antarctica.

Authors: Federico M Lauro; Matthew Z DeMaere; Sheree Yau; Mark V Brown; Charmaine Ng; David Wilkins; Mark J Raftery; John A E Gibson; Cynthia Andrews-Pfannkoch; Matthew Lewis; Jeffrey M Hoffman; Torsten Thomas; Ricardo Cavicchioli
Journal: ISME J Date: 2010-12-02 Impact factor: 10.302

3. Adaptive seeds tame genomic sequence comparison.

Authors: Szymon M Kiełbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin C Frith
Journal: Genome Res Date: 2011-01-05 Impact factor: 9.043

4. Copy number variation of individual cattle genomes using next-generation sequencing.

Authors: Derek M Bickhart; Yali Hou; Steven G Schroeder; Can Alkan; Maria Francesca Cardone; Lakshmi K Matukumalli; Jiuzhou Song; Robert D Schnabel; Mario Ventura; Jeremy F Taylor; Jose Fernando Garcia; Curtis P Van Tassell; Tad S Sonstegard; Evan E Eichler; George E Liu
Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043

5. High speed BLASTN: an accelerated MegaBLAST search tool.

Authors: Ying Chen; Weicai Ye; Yongdong Zhang; Yuesheng Xu
Journal: Nucleic Acids Res Date: 2015-08-06 Impact factor: 16.971

6. A bioinformatics search pipeline, RNA2DSearch, identifies RNA localization elements in Drosophila retrotransposons.

Authors: Russell S Hamilton; Eve Hartswood; Georgia Vendra; Cheryl Jones; Veronique Van De Bor; David Finnegan; Ilan Davis
Journal: RNA Date: 2009-02 Impact factor: 4.942

7. Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins.

Authors: Da Yin; Erich M Schwarz; Cristel G Thomas; Rebecca L Felde; Ian F Korf; Asher D Cutter; Caitlin M Schartner; Edward J Ralston; Barbara J Meyer; Eric S Haag
Journal: Science Date: 2018-01-05 Impact factor: 47.728

8. Plant noncoding RNA gene discovery by "single-genome comparative genomics".

Authors: Chong-Jian Chen; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu; Daniel Gautheret
Journal: RNA Date: 2011-01-10 Impact factor: 4.942

9. Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

Authors: John L Spouge; Leonardo Mariño-Ramírez; Sergey L Sheetlin
Journal: Int J Bioinform Res Appl Date: 2014

10. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169