Literature DB >> 23645815

RegaDB: community-driven data management and analysis for infectious diseases.

Pieter Libin¹, Gertjan Beheydt, Koen Deforche, Stijn Imbrechts, Fossie Ferreira, Kristel Van Laethem, Kristof Theys, Ana Patricia Carvalho, Joana Cavaco-Silva, Giuseppe Lapadula, Carlo Torti, Matthias Assel, Stefan Wesner, Joke Snoeck, Jean Ruelle, Annelies De Bel, Patrick Lacor, Paul De Munter, Eric Van Wijngaerden, Maurizio Zazzi, Rolf Kaiser, Ahidjo Ayouba, Martine Peeters, Tulio de Oliveira, Luiz C J Alcantara, Zehava Grossman, Peter Sloot, Dan Otelea, Simona Paraschiv, Charles Boucher, Ricardo J Camacho, Anne-Mieke Vandamme.

Abstract

SUMMARY: RegaDB is a free and open source data management and analysis environment for infectious diseases. RegaDB allows clinicians to store, manage and analyse patient data, including viral genetic sequences. Moreover, RegaDB provides researchers with a mechanism to collect data in a uniform format and offers them a canvas to make newly developed bioinformatics tools available to clinicians and virologists through a user friendly interface.
AVAILABILITY AND IMPLEMENTATION: Source code, binaries and documentation are available on http://rega.kuleuven.be/cev/regadb. RegaDB is written in the Java programming language, using a web-service-oriented architecture.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 23645815 PMCID： PMC3661054 DOI： 10.1093/bioinformatics/btt162

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Advances in infectious diseases research require efficient collaboration and exchange of clinical and virological data. Researchers need access to large amounts of data to test hypotheses or extract valuable information through data mining (Sloot , 2009). For this purpose, RegaDB was developed as a free and open source data management and analysis environment for infectious diseases (Libin ). RegaDB runs on Windows, Linux or Mac OS X. The system can be installed within a hospital or institute so that the data stays within the clinical environment. RegaDB follows the idea of an integrated environment for bioinformatics analysis, such as the Genetic Data Environment (de Oliveira ), ViroLab (Assel ) and Geneious (Drummond ). The difference is that RegaDB uses a relational database, and can be locally or remotely accessed. This allows RegaDB to be used for clinical management and/or research in one locality or for long-term data-sharing collaborations between different institutes.

2 DATABASE STRUCTURE AND TOOLS

RegaDB’s database enforces the data abstraction paradigm (Fig. 1). This approach ensures flexibility, as the database can be conveniently extended as needed without upgrading its schema in most of the cases (Imbrechts ). All abstract data entities are connected to a central patient entity, including attributes, tests, events, therapies and viral isolates. Attributes annotate a patient with information, which is typically of a clinical or epidemiological nature, e.g. the gender or transmission risk group. RegaDB implements tests as values that are obtained at a given moment in time, i.e. there is only one date associated with it. The results can be in vivo or in vitro measurements, appointments, as well as computational results obtained from a web service. General tests are used to store data extracted from patient samples, e.g. cell counts and viral loads. Tests can also be linked to viral isolates, e.g. typing and subtyping results, to drugs, e.g. therapeutic drug monitoring, or to a combination of an isolate and a drug, e.g. phenotypic and genotypic resistance interpretations. Events cover a specific time interval in the patient’s history, i.e. have a start and end date, e.g. AIDS-defining illnesses or pregnancy. The default list of attributes, tests and events available in the system can be extended via the user interface. In this way, RegaDB can be tailored to the user’s needs or research interests. Attributes, tests and events are annotated with a data type (numbers, strings, nominal values, etc.), which allows the user interface and data access layer to maintain data integrity. The therapy entity allows users to store the medication history of a patient. A single therapy consists of a start date, a stop date and a combination of drugs, i.e. a regimen, which the users can select from a list of both generic and commercial drug names. When the therapy has a stop date, the clinician can indicate a reason for ending or switching the treatment, e.g. resistance, side effects or adherence issues.

Fig. 1.

An overview of RegaDB’s database entities and functionalities

An overview of RegaDB’s database entities and functionalities A viral isolate contains one or more nucleotide sequences, allowing multiple sequences extracted from one viral genome to be grouped together. Once an isolate is added to RegaDB, the corresponding pathogen is determined by invoking a web service that implements a BLAST search procedure (Altschul ). When RegaDB supports the pathogen, the appropriate reference sequence is loaded and used to perform a codon-correct alignment with frame-shift detection and correction. The alignment procedure finds the protein reading frames encoded by the sequences that make up the isolate. This information, together with all detected point mutations, insertions and deletions, is stored in the database. The alignment web service implements the Needleman–Wunsch algorithm in C++ (Needleman and Wunsch, 1970) to analyse large sequences efficiently. Depending on the pathogen determination returned by the web service, the viral isolate is directed to a typing web service (Alcantara ; de Oliveira ) and/or resistance interpretation web service (Liu and Shafer, 2006). Table 1 shows detailed information on reference sequences and bioinformatics tools available for the supported pathogens. RegaDB supports the use of bioinformatics tools published on the web as web services.

Table 1.

Pathogen	Reference sequence (Genbank accession)	Genotyping	ASI resistance interpretation
HIV-1	HXB2 (K03455)	Rega HIV Subtyping Tool	REGA, HIVDB, ANRS
HIV-2a	ROD (M15390)	Rega HIV Subtyping Tool	REGA, ANRS
HIV-2b	EHO (U27200)	Rega HIV Subtyping Tool	–
HCV	H77 (AF009606)	Oxford HCV Subtyping Tool	–
HTLV	HTLV-1 (J02029)	LASP HTLV-1 Subtyping Tool	–

Pathogens currently supported by RegaDB, annotated with the reference sequence used for alignments and with the subtyping and resistance interpretation bioinformatics tools applied to new isolates of the respective pathogen All data can be viewed and edited through a web-based interface. Key parameters of a patient’s clinical history are visualized in a patient chart as a time-line annotated with viral loads, CD4 counts, regimens and viral isolate time points. RegaDB can export patient details into a report document by replacing variables in a user-designed RTF template. Several tools are already available or are being developed, some of which by the users. Drug resistance interpretation can be performed according to several algorithms. For HIV, various versions of the Stanford algorithms (HIVdb, Liu and Shafer, 2006), the Rega algorithms (Van Laethem ) and the ANRS algorithms (Meynard ) are implemented. For each algorithm, a cumulative overview is available, whereby resistance detected in a patient is taken forward to the last sample. Evolution of a virus isolate is tabulated as amino acid changes compared with the previous isolate from the same patient. Another tool allows plotting a phylogenetic tree constructed from a set of sequences with a pre-defined similarity to a query sequence. To ensure the quality of the sequence database, a tool was developed that can be used to flag potential contaminations, errors in sampling or data entry, super infection or transmission chains, by detecting unusual intra- or inter-patient evolutionary distances. Attributes are synchronized with a central repository to ensure compatibility between different RegaDB instances. The central repository contains a collection of standardized data fields and corresponding values such as demographic information (country of origin, transmission risk group, etc.), test results (viral load, cell count, etc.) and drug names (both generic and commercial). In addition, this repository also provides access to the latest versions of drug resistance algorithms. Compatibility functionalities allow the system to be updated, with minimal effort, as new content becomes available.

3 OPPORTUNITIES FOR RESEARCHERS

When the development of RegaDB started, several custom-made databases were available that allowed users to enter ambiguous representations of data, for example, different representations for the same medical compound. However, to facilitate efficient data exchange and to make the execution of aggregate queries possible, it is important that data are available in a structured format. By providing support for explicit data types and enforcing these data types through the user interface, RegaDB circumvents many difficulties that might complicate the exchange of data. RegaDB allows data to be exported in XML format from local data sources (hospitals, institutes), and these exports can be combined in a research database. Data from other databases can be imported via a generic import tool. RegaDB also provides a programming interface, which can be used to develop custom import programs to support more complicated data sources. A procedure to import data encoded in the HICDEP (hicdep.org) format directly into RegaDB is currently under development. A research database will generally be accessed via the Internet; therefore, authentication is an important security aspect. RegaDB supports password-based authentication by default. The authentication module abstraction allows for a straightforward implementation of alternative authentication back-ends (OpenId, Kerberos, etc.), which makes it possible for RegaDB to connect to existing user management systems. The application will only allow registered users to access the system. Once granted access to the system, a user is only able to access patient information that belongs to a dataset connected to the user’s profile. The owner of the dataset can configure the access of users to this dataset, and revoke the access after a certain analysis or assignment is finished. Researchers can query RegaDB using the visual query tool, which allows users to define complex queries guided by a user interface. Query definitions can be saved and re-run every time an update of the data becomes available. Work is in progress to support the use of predefined SQL-based queries via the user interface. Query results can be exported to a CSV and/or FASTA file. It is possible to set-up an analysis workflow by configuring a query to execute a python post processing script. If the script generates statistical data in a graphical format, this is visualized in the query user interface after the query has been executed. When researchers make their tools available as web services, they can be easily integrated in RegaDB, lowering the threshold for clinicians and virologists to use such tools. RegaDB has been used in several collaborations including the Virolab EC project (virolab.org). Data from several European hospitals were stored in one RegaDB instance, resulting in a combined dataset of >8000 sequences. During the last phase of the project, we were able to combine our efforts with another EC project, EUResist (euresist.org), resulting in a combined database of >55 000 sequences. Another example of the utility of RegaDB is the collaborative database used within the Southern African Treatment and Resistance Network (SATuRN). This network has 24 member institutions working in Southern Africa, the region at the epicentre of the HIV epidemic. Currently there are >10 institutions using the SATuRN RegaDB for patient data management, data curation and research. Under SATuRN, >7000 genotypes with treatment and monitoring data have been collected. Using the built-in customized report and query functionality, data of specific attributes are selected, analysed and used to answer specific clinical and research questions (de Oliveira ; Manasa ). In addition, members of the SATuRN project recently published a book (Rossouw ) containing a series of case studies used for training. More than 1450 physicians and nurses have been trained through conferences, workshops and online web-tutorials.

4 AVAILABILITY AND USAGE

RegaDB is a software application that can be downloaded from the Internet and installed in a health care or research institute. Documentation, source files and binaries are available on http://rega.kuleuven.be/cev/regadb. Because of its modular and flexible design, RegaDB can be used in many different contexts and settings, from managing patient data in a clinical environment to setting up large-scale research collaborations. Currently, all RegaDB instances are private instances that can only be accessed by a restricted user base. Some of these instances are accessible on the Internet; others are only accessible from within the institute’s intranet. The current version of the software is already used for storing genetic data of HIV-1, HIV-2, HTLV (Araujo ) and HCV isolates and related patient and clinical information.

13 in total

1. An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences.

Authors: T De Oliveira; R Miller; M Tarin; S Cassol
Journal: Bioinformatics Date: 2003-01 Impact factor: 6.937

2. A genotypic drug resistance interpretation algorithm that significantly predicts therapy response in HIV-1-infected patients.

Authors: Kristel Van Laethem; Andrea De Luca; Andrea Antinori; Antonella Cingolani; Carlo Federico Perna; Anne-Mieke Vandamme
Journal: Antivir Ther Date: 2002-06

Review 3. Primary drug resistance in South Africa: data from 10 years of surveys.

Authors: Justen Manasa; David Katzenstein; Sharon Cassol; Marie-Louise Newell; Tulio de Oliveira
Journal: AIDS Res Hum Retroviruses Date: 2012-03-12 Impact factor: 2.205

4. Public database for HIV drug resistance in southern Africa.

Authors: Tulio de Oliveira; Robert W Shafer; Christopher Seebregts
Journal: Nature Date: 2010-04-01 Impact factor: 49.962

5. A collaborative environment allowing clinical investigations on integrated biomedical databases.

Authors: Matthias Assel; David van de Vijver; Pieter Libin; Kristof Theys; Daniel Harezlak; Breanndán O Nualláin; Piotr Nowakowski; Marian Bubak; Anne-Mieke Vandamme; Stijn Imbrechts; Raphael Sangeda; Tao Jiang; Dineke Frentz; Peter Sloot
Journal: Stud Health Technol Inform Date: 2009

6. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

7. Web resources for HIV type 1 genotypic-resistance test interpretation.

Authors: Tommy F Liu; Robert W Shafer
Journal: Clin Infect Dis Date: 2006-04-28 Impact factor: 9.079

8. HIV decision support: from molecule to man.

Authors: P M A Sloot; Peter V Coveney; G Ertaylan; V Müller; C A Boucher; M Bubak
Journal: Philos Trans A Math Phys Eng Sci Date: 2009-07-13 Impact factor: 4.226

9. A public HTLV-1 molecular epidemiology database for sequence management and data mining.

Authors: Thessika Hialla Almeida Araujo; Leandro Inacio Souza-Brito; Pieter Libin; Koen Deforche; Dustin Edwards; Antonio Eduardo de Albuquerque-Junior; Anne-Mieke Vandamme; Bernardo Galvao-Castro; Luiz Carlos Junior Alcantara
Journal: PLoS One Date: 2012-09-10 Impact factor: 3.240

10. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

Authors: Luiz Carlos Junior Alcantara; Sharon Cassol; Pieter Libin; Koen Deforche; Oliver G Pybus; Marc Van Ranst; Bernardo Galvão-Castro; Anne-Mieke Vandamme; Tulio de Oliveira
Journal: Nucleic Acids Res Date: 2009-05-29 Impact factor: 16.971

16 in total

1. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

Authors: Yuan Luo; Özlem Uzuner; Peter Szolovits
Journal: Brief Bioinform Date: 2016-02-05 Impact factor: 11.622

2. An Evolutionary Model-Based Approach To Quantify the Genetic Barrier to Drug Resistance in Fast-Evolving Viruses and Its Application to HIV-1 Subtypes and Integrase Inhibitors.

Authors: Kristof Theys; Pieter J K Libin; Kristel Van Laethem; Ana B Abecasis
Journal: Antimicrob Agents Chemother Date: 2019-07-25 Impact factor: 5.191

Review 3. Implementing HIV-1 genotypic resistance testing in antiretroviral therapy programs in Africa: needs, opportunities, and challenges.

Authors: Richard J Lessells; Ava Avalos; Tulio de Oliveira
Journal: AIDS Rev Date: 2013 Oct-Dec Impact factor: 2.500

Review 4. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

Authors: Franziska Hufsky; Kevin Lamkiewicz; Alexandre Almeida; Abdel Aouacheria; Cecilia Arighi; Alex Bateman; Jan Baumbach; Niko Beerenwinkel; Christian Brandt; Marco Cacciabue; Sara Chuguransky; Oliver Drechsel; Robert D Finn; Adrian Fritz; Stephan Fuchs; Georges Hattab; Anne-Christin Hauschild; Dominik Heider; Marie Hoffmann; Martin Hölzer; Stefan Hoops; Lars Kaderali; Ioanna Kalvari; Max von Kleist; Renó Kmiecinski; Denise Kühnert; Gorka Lasso; Pieter Libin; Markus List; Hannah F Löchel; Maria J Martin; Roman Martin; Julian Matschinske; Alice C McHardy; Pedro Mendes; Jaina Mistry; Vincent Navratil; Eric P Nawrocki; Áine Niamh O'Toole; Nancy Ontiveros-Palacios; Anton I Petrov; Guillermo Rangel-Pineros; Nicole Redaschi; Susanne Reimering; Knut Reinert; Alejandro Reyes; Lorna Richardson; David L Robertson; Sepideh Sadegh; Joshua B Singer; Kristof Theys; Chris Upton; Marius Welzel; Lowri Williams; Manja Marz
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

5. Drug resistance in children at virological failure in a rural KwaZulu-Natal, South Africa, cohort.

Authors: Sureshnee Pillay; Ruth M Bland; Richard J Lessells; Justen Manasa; Tulio de Oliveira; Sivapragashini Danaviah
Journal: AIDS Res Ther Date: 2014-01-20 Impact factor: 2.250

6. Implementing antiretroviral resistance testing in a primary health care HIV treatment programme in rural KwaZulu-Natal, South Africa: early experiences, achievements and challenges.

Authors: Richard J Lessells; Katharine E Stott; Justen Manasa; Kevindra K Naidu; Andrew Skingsley; Theresa Rossouw; Tulio de Oliveira
Journal: BMC Health Serv Res Date: 2014-03-07 Impact factor: 2.655

7. HIV-1 Gag C-terminal amino acid substitutions emerging under selective pressure of protease inhibitors in patient populations infected with different HIV-1 subtypes.

Authors: Guangdi Li; Jens Verheyen; Kristof Theys; Supinya Piampongsant; Kristel Van Laethem; Anne-Mieke Vandamme
Journal: Retrovirology Date: 2014-09-25 Impact factor: 4.602

8. Sub-Epidemics Explain Localized High Prevalence of Reduced Susceptibility to Rilpivirine in Treatment-Naive HIV-1-Infected Patients: Subtype and Geographic Compartmentalization of Baseline Resistance Mutations.

Authors: Kristof Theys; Kristel Van Laethem; Perpetua Gomes; Guy Baele; Andrea-Clemencia Pineda-Peña; Anne-Mieke Vandamme; Ricardo J Camacho; Ana B Abecasis
Journal: AIDS Res Hum Retroviruses Date: 2016-01-29 Impact factor: 2.205

9. Superinfection with drug-resistant HIV is rare and does not contribute substantially to therapy failure in a large European cohort.

Authors: István Bartha; Matthias Assel; Peter M A Sloot; Maurizio Zazzi; Carlo Torti; Eugen Schülter; Andrea De Luca; Anders Sönnerborg; Ana B Abecasis; Kristel Van Laethem; Andrea Rosi; Jenny Svärd; Roger Paredes; David A M C van de Vijver; Anne-Mieke Vandamme; Viktor Müller
Journal: BMC Infect Dis Date: 2013-11-12 Impact factor: 3.090

10. The Development of Computational Biology in South Africa: Successes Achieved and Lessons Learnt.

Authors: Nicola J Mulder; Alan Christoffels; Tulio de Oliveira; Junaid Gamieldien; Scott Hazelhurst; Fourie Joubert; Judit Kumuthini; Ché S Pillay; Jacky L Snoep; Özlem Tastan Bishop; Nicki Tiffin
Journal: PLoS Comput Biol Date: 2016-02-04 Impact factor: 4.475