Literature DB >> 23632294

Web scraping technologies in an API world.

Daniel Glez-Peña, Anália Lourenço, Hugo López-Fernández, Miguel Reboiro-Jato, Florentino Fdez-Riverola.   

Abstract

Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
© The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Keywords:  Web scraping; data integration; database interfaces; interoperability

Mesh:

Year:  2013        PMID: 23632294     DOI: 10.1093/bib/bbt026

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  9 in total

1.  Publication outcome of research presented at the European Congress of Endocrinology: a web scraping-based analysis and critical appraisal.

Authors:  Emre Sedar Saygili; Bulent Okan Yildiz
Journal:  Endocrine       Date:  2021-01-05       Impact factor: 3.633

2.  Technical Job Recommendation System Using APIs and Web Crawling.

Authors:  Naresh Kumar; Manish Gupta; Deepak Sharma; Isaac Ofori
Journal:  Comput Intell Neurosci       Date:  2022-06-21

3.  Vigi4Med Scraper: A Framework for Web Forum Structured Data Extraction and Semantic Representation.

Authors:  Bissan Audeh; Michel Beigbeder; Antoine Zimmermann; Philippe Jaillon; Cédric Bousquet
Journal:  PLoS One       Date:  2017-01-25       Impact factor: 3.240

4.  Digital Trespass: Ethical and Terms-of-Use Violations by Researchers Accessing Data From an Online Patient Community.

Authors:  Paul Wicks; Emil Chiauzzi
Journal:  J Med Internet Res       Date:  2019-02-21       Impact factor: 5.428

5.  What Affects an Orthopaedic Surgeon's Online Rating? A Large-Scale, Retrospective Analysis.

Authors:  Mital D Patel; Marshall D Williams; Merritt J Thompson; Parth N Desai
Journal:  J Am Acad Orthop Surg Glob Res Rev       Date:  2022-03-15

6.  Multimedia Knowledge Translation Tools for Parents About Childhood Heart Failure: Environmental Scan.

Authors:  Chentel Cunningham; Hyelin Sung; James Benoit; Jennifer Conway; Shannon D Scott
Journal:  JMIR Pediatr Parent       Date:  2022-03-21

7.  The Use of the Target Trial Approach in Perinatal Pharmacoepidemiology: A Scoping Review Protocol.

Authors:  Lisiane Freitas Leal; Sonia Marzia Grandi; Daniel Marques Mota; Paulo José Gonçalves Ferreira; Genevieve Gore; Robert William Platt
Journal:  Front Pharmacol       Date:  2022-07-22       Impact factor: 5.988

8.  Decision Support System Applications for Scheduling in Professional Team Sport. The Team's Perspective.

Authors:  Xavier Schelling; Jose Fernández; Patrick Ward; Javier Fernández; Sam Robertson
Journal:  Front Sports Act Living       Date:  2021-06-04

9.  Webcrawling and machine learning as a new approach for the spatial distribution of atmospheric emissions.

Authors:  Susana Lopez-Aparicio; Henrik Grythe; Matthias Vogt; Matthew Pierce; Islen Vallejo
Journal:  PLoS One       Date:  2018-07-16       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.