Literature DB >> 24803667

Realistic artificial DNA sequences as negative controls for computational genomics.

Juan Caballero1, Arian F A Smit1, Leroy Hood1, Gustavo Glusman2.   

Abstract

A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24803667      PMCID: PMC4081056          DOI: 10.1093/nar/gku356

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  65 in total

1.  Rfam: an RNA family database.

Authors:  Sam Griffiths-Jones; Alex Bateman; Mhairi Marshall; Ajay Khanna; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

3.  Analysis of compositionally biased regions in sequence databases.

Authors:  J C Wootton; S Federhen
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

4.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

Authors:  A Rambaut; N C Grassly
Journal:  Comput Appl Biosci       Date:  1997-06

5.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

6.  The ENCODE project: missteps overshadowing a success.

Authors:  Sean R Eddy
Journal:  Curr Biol       Date:  2013-04-08       Impact factor: 10.834

7.  Prediction of complete gene structures in human genomic DNA.

Authors:  C Burge; S Karlin
Journal:  J Mol Biol       Date:  1997-04-25       Impact factor: 5.469

Review 8.  Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors: 
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

9.  On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.

Authors:  Dan Graur; Yichen Zheng; Nicholas Price; Ricardo B R Azevedo; Rebecca A Zufall; Eran Elhaik
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  Widespread purifying selection on RNA structure in mammals.

Authors:  Martin A Smith; Tanja Gesell; Peter F Stadler; John S Mattick
Journal:  Nucleic Acids Res       Date:  2013-07-11       Impact factor: 16.971

View more
  11 in total

1.  Thousands of human mutation clusters are explained by short-range template switching.

Authors:  Ari Löytynoja
Journal:  Genome Res       Date:  2022-06-27       Impact factor: 9.438

2.  Clustered 8-Oxo-Guanine Mutations and Oncogenic Gene Fusions in Microsatellite-Unstable Colorectal Cancer.

Authors:  Russell W Madison; Xiaoju Hu; Vivek Ramanan; Zhuxuan Xu; Richard S P Huang; Ethan S Sokol; Garrett M Frampton; Alexa B Schrock; Siraj M Ali; Shridar Ganesan; Subhajyoti De
Journal:  JCO Precis Oncol       Date:  2022-05

3.  Trimming of sequence reads alters RNA-Seq gene expression estimates.

Authors:  Claire R Williams; Alyssa Baccarella; Jay Z Parrish; Charles C Kim
Journal:  BMC Bioinformatics       Date:  2016-02-25       Impact factor: 3.169

4.  An Evaluation of Function of Multicopy Noncoding RNAs in Mammals Using ENCODE/FANTOM Data and Comparative Genomics.

Authors:  Marc P Hoeppner; Elena Denisenko; Paul P Gardner; Sebastian Schmeier; Anthony M Poole
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

5.  Assembly of the Mitochondrial Genome in the Campanulaceae Family Using Illumina Low-Coverage Sequencing.

Authors:  Hyun-Oh Lee; Ji-Weon Choi; Jeong-Ho Baek; Jae-Hyeon Oh; Sang-Choon Lee; Chang-Kug Kim
Journal:  Genes (Basel)       Date:  2018-07-30       Impact factor: 4.096

6.  CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features.

Authors:  Yao Yao; Zheng Liu; Qi Wei; Stephen A Ramsey
Journal:  BMC Bioinformatics       Date:  2019-02-06       Impact factor: 3.169

7.  Comparison of machine learning and deep learning techniques in promoter prediction across diverse species.

Authors:  Nikita Bhandari; Satyajeet Khare; Rahee Walambe; Ketan Kotecha
Journal:  PeerJ Comput Sci       Date:  2021-02-09

8.  A call for benchmarking transposable element annotation methods.

Authors:  Douglas R Hoen; Glenn Hickey; Guillaume Bourque; Josep Casacuberta; Richard Cordaux; Cédric Feschotte; Anna-Sophie Fiston-Lavier; Aurélie Hua-Van; Robert Hubley; Aurélie Kapusta; Emmanuelle Lerat; Florian Maumus; David D Pollock; Hadi Quesneville; Arian Smit; Travis J Wheeler; Thomas E Bureau; Mathieu Blanchette
Journal:  Mob DNA       Date:  2015-08-04

9.  A systematic, large-scale comparison of transcription factor binding site models.

Authors:  Daniela Hombach; Jana Marie Schwarz; Peter N Robinson; Markus Schuelke; Dominik Seelow
Journal:  BMC Genomics       Date:  2016-05-21       Impact factor: 3.969

10.  The Dfam database of repetitive DNA families.

Authors:  Robert Hubley; Robert D Finn; Jody Clements; Sean R Eddy; Thomas A Jones; Weidong Bao; Arian F A Smit; Travis J Wheeler
Journal:  Nucleic Acids Res       Date:  2015-11-26       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.