Literature DB >> 21325301

Annotating genes and genomes with DNA sequences extracted from biomedical articles.

Maximilian Haeussler1, Martin Gerner, Casey M Bergman.   

Abstract

MOTIVATION: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study.
RESULTS: Here, we report the results of a novel text-mining approach that extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. We find that ∼20% of open access articles in PubMed central (PMC) have extractable DNA sequences that can be accurately mapped to the correct gene (91%) and genome (96%). We illustrate the utility of data extracted by text2genome from more than 150 000 PMC articles for the interpretation of ChIP-seq data and the design of quantitative reverse transcriptase (RT)-PCR experiments.
CONCLUSION: Our approach links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data.
AVAILABILITY AND IMPLEMENTATION: Source code is available under a BSD license from http://sourceforge.net/projects/text2genome/ and results can be browsed and downloaded at http://text2genome.org.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21325301      PMCID: PMC3065681          DOI: 10.1093/bioinformatics/btr043

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  31 in total

1.  PubMed Central: The GenBank of the published literature.

Authors:  R J Roberts
Journal:  Proc Natl Acad Sci U S A       Date:  2001-01-16       Impact factor: 11.205

2.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  STRUCTURE OF A RIBONUCLEIC ACID.

Authors:  R W HOLLEY; J APGAR; G A EVERETT; J T MADISON; M MARQUISEE; S H MERRILL; J R PENSWICK; A ZAMIR
Journal:  Science       Date:  1965-03-19       Impact factor: 47.728

4.  Citations in supplementary material.

Authors:  Manfred S Weiss; Howard Einspahr; Edward N Baker; Zbigniew Dauter; Anke R Kaysser-Pyzalla; Gernot Kostorz; Sine Larsen
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2010-11-16

5.  A novel method for real time quantitative RT-PCR.

Authors:  U E Gibson; C A Heid; P M Williams
Journal:  Genome Res       Date:  1996-10       Impact factor: 9.043

6.  The murine tumor necrosis factor-beta (lymphotoxin) gene sequence.

Authors:  P W Gray; E Chen; C B Li; W L Tang; N Ruddle
Journal:  Nucleic Acids Res       Date:  1987-05-11       Impact factor: 16.971

7.  Nucleotide sequence of the murine TNF locus, including the TNF-alpha (tumor necrosis factor) and TNF-beta (lymphotoxin) genes.

Authors:  D Semon; E Kawashima; C V Jongeneel; A N Shakhov; S A Nedospasov
Journal:  Nucleic Acids Res       Date:  1987-11-11       Impact factor: 16.971

8.  The FlyBase database of the Drosophila genome projects and community literature.

Authors: 
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  The distributed annotation system.

Authors:  R D Dowell; R M Jokerst; A Day; S R Eddy; L Stein
Journal:  BMC Bioinformatics       Date:  2001-10-10       Impact factor: 3.169

10.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.

Authors:  Jo Vandesompele; Katleen De Preter; Filip Pattyn; Bruce Poppe; Nadine Van Roy; Anne De Paepe; Frank Speleman
Journal:  Genome Biol       Date:  2002-06-18       Impact factor: 13.583

View more
  15 in total

Review 1.  The UCSC Genome Browser: What Every Molecular Biologist Should Know.

Authors:  Mary E Mangan; Jennifer M Williams; Robert M Kuhn; Warren C Lathe
Journal:  Curr Protoc Mol Biol       Date:  2014-07-01

2.  pubmed2ensembl: a resource for mining the biological literature on genes.

Authors:  Joachim Baran; Martin Gerner; Maximilian Haeussler; Goran Nenadic; Casey M Bergman
Journal:  PLoS One       Date:  2011-09-29       Impact factor: 3.240

3.  The GNAT library for local and remote gene mention normalization.

Authors:  Jörg Hakenberg; Martin Gerner; Maximilian Haeussler; Illés Solt; Conrad Plake; Michael Schroeder; Graciela Gonzalez; Goran Nenadic; Casey M Bergman
Journal:  Bioinformatics       Date:  2011-08-03       Impact factor: 6.937

4.  Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Authors:  Aurélie Névéol; W John Wilbur; Zhiyong Lu
Journal:  Bioinformatics       Date:  2011-10-13       Impact factor: 6.937

5.  Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.

Authors:  Aurélie Névéol; W John Wilbur; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2012-06-08       Impact factor: 3.451

6.  The UCSC Genome Browser database: 2015 update.

Authors:  Kate R Rosenbloom; Joel Armstrong; Galt P Barber; Jonathan Casper; Hiram Clawson; Mark Diekhans; Timothy R Dreszer; Pauline A Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A Harte; Steve Heitner; Glenn Hickey; Angie S Hinrichs; Robert Hubley; Donna Karolchik; Katrina Learned; Brian T Lee; Chin H Li; Karen H Miga; Ngan Nguyen; Benedict Paten; Brian J Raney; Arian F A Smit; Matthew L Speir; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

7.  Database citation in full text biomedical articles.

Authors:  Şenay Kafkas; Jee-Hyub Kim; Johanna R McEntyre
Journal:  PLoS One       Date:  2013-05-29       Impact factor: 3.240

8.  The UCSC Genome Browser database: extensions and updates 2013.

Authors:  Laurence R Meyer; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Robert M Kuhn; Matthew Wong; Cricket A Sloan; Kate R Rosenbloom; Greg Roe; Brooke Rhead; Brian J Raney; Andy Pohl; Venkat S Malladi; Chin H Li; Brian T Lee; Katrina Learned; Vanessa Kirkup; Fan Hsu; Steve Heitner; Rachel A Harte; Maximilian Haeussler; Luvina Guruvadoo; Mary Goldman; Belinda M Giardine; Pauline A Fujita; Timothy R Dreszer; Mark Diekhans; Melissa S Cline; Hiram Clawson; Galt P Barber; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2012-11-15       Impact factor: 16.971

9.  Translational web robots for pathogen genome analysis.

Authors:  Vitali Sintchenko; Enrico W Coiera
Journal:  Microb Inform Exp       Date:  2011-10-31

10.  Mining locus tags in PubMed Central to improve microbial gene annotation.

Authors:  Chris J Stubben; Jean F Challacombe
Journal:  BMC Bioinformatics       Date:  2014-02-05       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.