| Literature DB >> 31868882 |
Clara Amid1, Nima Pakseresht1, Nicole Silvester1, Suran Jayathilaka1, Ole Lund2, Lukasz D Dynovski2, Bálint Á Pataki3,4, Dávid Visontai3,4, Basil Britto Xavier5, Blaise T F Alako1, Ariane Belka6, Jose L B Cisneros2, Matthew Cotten7, George B Haringhuizen8, Peter W Harrison1, Dirk Höper6, Sam Holt1, Camilla Hundahl2, Abdulrahman Hussein1, Rolf S Kaas2, Xin Liu1, Rasko Leinonen1, Surbhi Malhotra-Kumar5, David F Nieuwenhuijse7, Nadim Rahman1, Carolina Dos S Ribeiro8, Jeffrey E Skiby2, Dennis Schmitz7,8, József Stéger3,4, János M Szalai-Gindl3,4, Martin C F Thomsen2, Simone M Cacciò9, István Csabai3,4, Annelies Kroneman8, Marion Koopmans7, Frank Aarestrup2, Guy Cochrane1.
Abstract
Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats, often lead to data not being shared or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders.Entities:
Keywords: FAIR principles; data hubs; data sharing platform; pathogen portal; pathogen sequencing data
Mesh:
Year: 2019 PMID: 31868882 PMCID: PMC6927095 DOI: 10.1093/database/baz136
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Example CDH user workflow.
Figure 2System architecture.
CDHs
| Name | Topic | Description |
|---|---|---|
| dcc_sibelius | Pilot: Influenza H5N8 | This data hub contains RNA sequence data and metadata of three highly pathogenic avian influenza H5N8 viruses used in the H5N8 pilot project. The aim of this project is to determine the similarity and reproducibility of viral genome consensus determination and minority variant detections between three different but commonly used workflows. |
| dcc_berlioz | Pilot: Ebola | This data hub contains Ebola sequence data generated as part of a relief effort in West Africa. Data public at: |
| dcc_liszt | Global Sewage | This data hub contains the DNA sequences, metagenomics data from the Global Sewage Surveillance project. The data represent all urban sewage samples collected prior to treatment plant inlet from major cities around the world. The study aims to establish a global surveillance of infectious disease agents and AMR. |
| dcc_strauss | Kibera Sewage | This data hub contains the DNA sequences, metagenomics data from the informal settlement of Kibera, Nairobi, Kenya. The data represent sewage samples collected over 3 months from 2 of the 10 population clusters under US CDC surveillance. The study aims to establish a hot spot disease surveillance. |
| dcc_vivaldi | Foodborne pathogen surveillance and epidemiological analysis | This data hub contains the DNA sequences and metagenomics data from the COMPARE Work package 4/7 workgroup of food-borne pathogens. The majority of the data consist of the bacterial pathogens |
| dcc_schubert | AMR working group | This data hub contains DNA sequences of bacterial isolates’ genomes coupled with phenotypic AMR data. The phenotypic data are presented in standardized antibiograms, describing for each isolate the antibiotics tested, the levels of resistance observed, the antimicrobial susceptibility testing method employed, etc. The aim is to provide a platform for exchange of genomic and phenotypic information regarding AMR, thus encouraging surveillance and the development of innovating projects for the prediction of AMR from sequence data. |
| dcc_brahms | Diagnostic metagenomics on clinical samples | Diagnostic metagenomics on clinical samples: Prediction of antibiotic resistance genes and pathogen discovery in shot-gun metagenomic data from swine faeces. |
| dcc_handel | Virus metagenomics | This data hub is used to share Fastq files of NGS experiments, mostly with a metagenomics approach, on clinical samples of patients with hepatitis A and norovirus gastroenteritis. The human reads in the files have been removed before being uploaded. |
| dcc_puccini | Parasites (Comparative Genomics of Intestinal Protozoa) | This data hub contains the raw DNA sequences from isolates of the protozoan Cryptosporidium. The data were generated in collaboration with the UK Cryptosporidium Reference Unit and represent both sporadic and outbreak cases. The study aims to understand the major factors that structure parasites’ genomes by using a comparative genomics approach.Data public at: |
| dcc_beard | Global Sewage snapshot Virome sequencing part | This data hub contains the raw read data of the virus specific part of the Global Sewage Surveillance project. The data aim to capture the complete DNA and RNA virome of the sampled locations. In addition, the analysis results of the SELECTA-SLIM pipeline can be found in this datahub. |
| dcc_cole | Metagenomics ring trial | This data hub contains the DNA sequences, metagenomics data from the Food Metagenomics ring trial 2018. The data represent DNA- and RNA-derived metagenomics data sets processed from a piece of smoked salmon spiked with a complex mock community consisting of viruses, bacteria, fungi and a parasite. The study aims to compare wet lab protocols using the same starting material. |
| dcc_bromhead | CoVetLab (Colistin resistant Enterobacteriaceae project) | This data hub contains the DNA sequences of single bacterial isolates of primarily antimicrobial resistant enterobacteriaceae from European National Reference Laboratories. The data represent amongst others isolates collected for the EU antimicrobial resistance monitoring in zoonotic and indicator bacteria from humans, animals and food as well as for the CoVetLab, colistin resistance project. The intention is that all data related to the EURL-AR will be submitted to the data hub. |
| dcc_schumann | Bioaccumulation experiment | This data hub contains DNA sequences and metadata to analyse bioaccumulated oysters. In addition, the analysis results of the SELECTA-SLIM pipeline can be found in this datahub. |
Updates will be made available through https://www.ebi.ac.uk/ena/pathogens/datahubs.
List of the bacterial lineages for which the CGE analysis pipeline has databases
| Database | Included lineages/plasmids |
|---|---|
| PlasmidFinder |
|
| VirulenceFinder |
|
| SalmonellaTypeFinder |
|
| cgMLST |
|
| MLST | has a database for each scheme in the pubMLST database |
| ResFinder | all databases in the ResFinder database |
| pMLST | IncF, IncHI1, IncHI2, IncI1, IncN, IncAC |
Figure 3PP interactive interface. 1: select a domain (e.g. read_run); 2: narrow down search (where desired) by specifying taxonomy and sample collection details (e.g. collection date and country); 3: specify (where desired) which fields should be returned in the result report (e.g. centre name, study accession, sample accession, etc); 4: result page with download options in TSV and JASON (JavaScript Object Notation).