Literature DB >> 24422916

Unpublished genomic data-how to share?

Abstract

The field of genomics is often cited as the branch of biology that has led the way in data sharing. In most cases, sequencing data are made publicly available immediately after generation and often before the data generators have completed their analyses. Although the pros of such openness cannot be denied, problems can arise when unpublished genomic data are shared. In this editorial we touch on these issues and discuss the roles and responsibilities of the data generators, data users and journal editors.

Entities: Disease Gene Species

Mesh：

Year: 2014 PMID： 24422916 PMCID： PMC3897942 DOI： 10.1186/1471-2164-15-5

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

The past decade has seen big changes in the field of genomics, not only in terms of advances in technology, but also with regard to the views on sharing the data generated [1,2]. Open data has become the buzzword of this age. No one can deny that this openness and willingness to share genomic data, both published and (perhaps more importantly) unpublished, has resulted in remarkable progress. However, when it comes to unpublished genomic data, this openness can also leave the data generators vulnerable. The community needs to balance the benefits of data sharing against the interests of the data owners, and usually the process works well. The genomics community has measures in place to protect the data owners–data are often released under embargoes (of varying lengths, but usually not longer than 24 months) and data owners can also publish a ‘statement of intent’, i.e. outline the specific analyses they plan to undertake, when they release the data. There are also community norms–specifically the Bermuda rules [3], and the Fort Lauderdale [4] and Toronto [5] agreements–to help researchers navigate this rather sensitive issue. However, embargoes are not indefinite and neither does it seem fair to indefinitely prohibit specific analyses. It is also worth clarifying that the agreements mentioned above are so-called gentlemen’s agreements, they are not law, and their utility depends on goodwill and communication within the community, not unlike attribution and the way scientists use citation to give credit. The key words, as we see it, are community and communication. The researchers in the field are essentially in the same boat–they could be the data generators in one case and data users in another. Without communication the boat is likely to capsize. The data generators need to be clear in their intentions and in specifying any conditions that the data are released under, and the data users need to inform the data generators and seek permission to use the data if appropriate. Perhaps also there is a need to have enforceable guidelines in place rather than relying on gentlemen’s agreements? The US National Institutes of Health (NIH) have already taken a step in this direction and have recently released a draft policy on the sharing of genomic data [6], which, if approved, will be applicable to all researchers who receive NIH funding. The guidelines cover, amongst other topics, the issue of when to release data; for raw sequence data from non-human organisms, the specified deadline is within 6 months of submission to an approved data repository. A question that follows is–whose responsibility is it to ensure that appropriate permission has been acquired to include the analysis of unpublished genomic data in a manuscript? Does the responsibility lie with the authors or the reviewers or with the journal editors? In our experience, such issues have usually been brought to light during the review process, but given the extensive amounts of data being generated, neither reviewers nor editors can be expected to be aware of the requirements for the use of each and every genome sequence. BMC Genomics has recently published a study by Zhao et al. [7], including an analysis of 103 fungal genomes. After publication it became apparent that some of these genomes were unpublished, and the authors had not informed the data owners of their intent of publishing an analysis of these genomes. Given this situation, we and the authors, in consultation with the data owners, agreed that a correction [8], whereby the authors would remove specific genomes from the analysis, was the appropriate way to proceed. In fact, only two of the disputed genomes were specifically under embargo, but after discussion with the data generators the authors agreed to remove from the analysis not only the embargoed genomes, but also an additional seven yet unpublished genomes. Data generators, data users and journal editors all have a role to play in ensuring that the interests of all involved parties are protected, and as we have mentioned, the key to this is communication. We feel the ultimate responsibility should lie with the data user; it is up to them to ensure that they are aware of (and adhere to) any conditions set by the data generators. The latter could also make it easier for the data users by ensuring that the necessary information is readily available. This is not to say that a journal has no responsibility however; a journal can increase awareness of the requirements in a field by incorporating guidance into their policies or instructions for authors. BioMed Central’s editorial policies [9] now include a section on the use of unpublished genomic data: “Authors using unpublished genomic data are expected to abide by the guidelines of the Fort Lauderdale and Toronto agreements. Based on broadly accepted scientific community standards, the key requirement for the third parties using genomic data is to contact the owners of unpublished data (i.e., the principal investigator and sequencing center) prior to undertaking their research, to advise them about their planned analyses.” A journal is also, of course, responsible for taking the appropriate action when problems such as those exemplified by this case arise. Additionally, journal editors can facilitate communication between the concerned parties and help them arrive at a mutually satisfactory solution. Finally, a journal can instigate discussion on a topic or issue by bringing them to light–as we are doing by publishing this editorial.

Competing interests

The authors are employees of BioMed Central.

Authors’ contributions

Both authors contributed to this editorial. Both authors read and approved the final text.

6 in total

1. Bermuda rules: community spirit, with teeth.

Authors: E Marshall
Journal: Science Date: 2001-02-16 Impact factor: 47.728

2. Prepublication data sharing.

Authors: Ewan Birney; Thomas J Hudson; Eric D Green; Chris Gunter; Sean Eddy; Jane Rogers; Jennifer R Harris; S Dusko Ehrlich; Rolf Apweiler; Christopher P Austin; Lisa Berglund; Martin Bobrow; Chas Bountra; Anthony J Brookes; Anne Cambon-Thomsen; Nigel P Carter; Rex L Chisholm; Jorge L Contreras; Robert M Cooke; William L Crosby; Ken Dewar; Richard Durbin; Stephanie O M Dyke; Joseph R Ecker; Khaled El Emam; Lars Feuk; Stacey B Gabriel; John Gallacher; William M Gelbart; Antoni Granell; Francisco Guarner; Tim Hubbard; Scott A Jackson; Jennifer L Jennings; Yann Joly; Steven M Jones; Jane Kaye; Karen L Kennedy; Bartha Maria Knoppers; Nikos C Kyrpides; William W Lowrance; Jingchu Luo; John J MacKay; Luis Martín-Rivera; W Richard McCombie; John D McPherson; Linda Miller; Webb Miller; Don Moerman; Vincent Mooser; Cynthia C Morton; James M Ostell; B F Francis Ouellette; Julian Parkhill; Parminder S Raina; Christopher Rawlings; Steven E Scherer; Stephen W Scherer; Paul N Schofield; Christoph W Sensen; Victoria C Stodden; Michael R Sussman; Toshihiro Tanaka; Janet Thornton; Tatsuhiko Tsunoda; David Valle; Eero I Vuorio; Neil M Walker; Susan Wallace; George Weinstock; William B Whitman; Kim C Worley; Cathy Wu; Jiayan Wu; Jun Yu
Journal: Nature Date: 2009-09-10 Impact factor: 49.962

3. Towards a data sharing Code of Conduct for international genomic research.

Authors: Bartha Maria Knoppers; Jennifer R Harris; Anne Marie Tassé; Isabelle Budin-Ljøsne; Jane Kaye; Mylène Deschênes; Ma'n H Zawati
Journal: Genome Med Date: 2011-07-14 Impact factor: 11.117

4. The Open Knowledge Foundation: open data means better science.

Authors: Jennifer C Molloy
Journal: PLoS Biol Date: 2011-12-06 Impact factor: 8.029

5. Comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi.

Authors: Zhongtao Zhao; Huiquan Liu; Chenfang Wang; Jin-Rong Xu
Journal: BMC Genomics Date: 2013-04-23 Impact factor: 3.969

6. Correction: Comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi.

Authors: Zhongtao Zhao; Huiquan Liu; Chenfang Wang; Jin-Rong Xu
Journal: BMC Genomics Date: 2014-01-03 Impact factor: 3.969

6 in total

5 in total

Review 1. Big data, open science and the brain: lessons learned from genomics.

Authors: Suparna Choudhury; Jennifer R Fishman; Michelle L McGowan; Eric T Juengst
Journal: Front Hum Neurosci Date: 2014-05-16 Impact factor: 3.169

2. diArk--the database for eukaryotic genome and transcriptome assemblies in 2014.

Authors: Martin Kollmar; Lotte Kollmar; Björn Hammesfahr; Dominic Simm
Journal: Nucleic Acids Res Date: 2014-11-06 Impact factor: 16.971

Review 3. Open data and digital morphology.

Authors: Thomas G Davies; Imran A Rahman; Stephan Lautenschlager; John A Cunningham; Robert J Asher; Paul M Barrett; Karl T Bates; Stefan Bengtson; Roger B J Benson; Doug M Boyer; José Braga; Jen A Bright; Leon P A M Claessens; Philip G Cox; Xi-Ping Dong; Alistair R Evans; Peter L Falkingham; Matt Friedman; Russell J Garwood; Anjali Goswami; John R Hutchinson; Nathan S Jeffery; Zerina Johanson; Renaud Lebrun; Carlos Martínez-Pérez; Jesús Marugán-Lobón; Paul M O'Higgins; Brian Metscher; Maëva Orliac; Timothy B Rowe; Martin Rücklin; Marcelo R Sánchez-Villagra; Neil H Shubin; Selena Y Smith; J Matthias Starck; Chris Stringer; Adam P Summers; Mark D Sutton; Stig A Walsh; Vera Weisbecker; Lawrence M Witmer; Stephen Wroe; Zongjun Yin; Emily J Rayfield; Philip C J Donoghue
Journal: Proc Biol Sci Date: 2017-04-12 Impact factor: 5.349

Review 4. Current status of use of high throughput nucleotide sequencing in rheumatology.

Authors: Sebastian Boegel; John C Castle; Andreas Schwarting
Journal: RMD Open Date: 2021-01

5. From the principles of genomic data sharing to the practices of data access committees.

Authors: Mahsa Shabani; Bartha Maria Knoppers; Pascal Borry
Journal: EMBO Mol Med Date: 2015-05 Impact factor: 12.137

5 in total