Alexander Senf1,2, Robert Davies3, Frédéric Haziza4, John Marshall5, Juan Troncoso-Pastoriza6, Oliver Hofmann7, Thomas M Keane1,8. 1. European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK. 2. Enthought, Inc., 200 W Cesar Chavez, Suite 202, Austin, TX 78701, USA. 3. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK. 4. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain. 5. Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow G61 1QH, UK. 6. Laboratory for Data Security, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. 7. University of Melbourne, Centre for Cancer Research, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, VIC, 3000, Australia. 8. School of Life Sciences, University of Nottingham, Nottingham, UK.
Abstract
MOTIVATION: The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium. RESULTS: We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy. AVAILABILITY: The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf.
MOTIVATION: The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium. RESULTS: We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy. AVAILABILITY: The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf.
Authors: Heidi L Rehm; Angela J H Page; Lindsay Smith; Jeremy B Adams; Gil Alterovitz; Lawrence J Babb; Maxmillian P Barkley; Michael Baudis; Michael J S Beauvais; Tim Beck; Jacques S Beckmann; Sergi Beltran; David Bernick; Alexander Bernier; James K Bonfield; Tiffany F Boughtwood; Guillaume Bourque; Sarion R Bowers; Anthony J Brookes; Michael Brudno; Matthew H Brush; David Bujold; Tony Burdett; Orion J Buske; Moran N Cabili; Daniel L Cameron; Robert J Carroll; Esmeralda Casas-Silva; Debyani Chakravarty; Bimal P Chaudhari; Shu Hui Chen; J Michael Cherry; Justina Chung; Melissa Cline; Hayley L Clissold; Robert M Cook-Deegan; Mélanie Courtot; Fiona Cunningham; Miro Cupak; Robert M Davies; Danielle Denisko; Megan J Doerr; Lena I Dolman; Edward S Dove; L Jonathan Dursi; Stephanie O M Dyke; James A Eddy; Karen Eilbeck; Kyle P Ellrott; Susan Fairley; Khalid A Fakhro; Helen V Firth; Michael S Fitzsimons; Marc Fiume; Paul Flicek; Ian M Fore; Mallory A Freeberg; Robert R Freimuth; Lauren A Fromont; Jonathan Fuerth; Clara L Gaff; Weiniu Gan; Elena M Ghanaim; David Glazer; Robert C Green; Malachi Griffith; Obi L Griffith; Robert L Grossman; Tudor Groza; Jaime M Guidry Auvil; Roderic Guigó; Dipayan Gupta; Melissa A Haendel; Ada Hamosh; David P Hansen; Reece K Hart; Dean Mitchell Hartley; David Haussler; Rachele M Hendricks-Sturrup; Calvin W L Ho; Ashley E Hobb; Michael M Hoffman; Oliver M Hofmann; Petr Holub; Jacob Shujui Hsu; Jean-Pierre Hubaux; Sarah E Hunt; Ammar Husami; Julius O Jacobsen; Saumya S Jamuar; Elizabeth L Janes; Francis Jeanson; Aina Jené; Amber L Johns; Yann Joly; Steven J M Jones; Alexander Kanitz; Kazuto Kato; Thomas M Keane; Kristina Kekesi-Lafrance; Jerome Kelleher; Giselle Kerry; Seik-Soon Khor; Bartha M Knoppers; Melissa A Konopko; Kenjiro Kosaki; Martin Kuba; Jonathan Lawson; Rasko Leinonen; Stephanie Li; Michael F Lin; Mikael Linden; Xianglin Liu; Isuru Udara Liyanage; Javier Lopez; Anneke M Lucassen; Michael Lukowski; Alice L Mann; John Marshall; Michele Mattioni; Alejandro Metke-Jimenez; Anna Middleton; Richard J Milne; Fruzsina Molnár-Gábor; Nicola Mulder; Monica C Munoz-Torres; Rishi Nag; Hidewaki Nakagawa; Jamal Nasir; Arcadi Navarro; Tristan H Nelson; Ania Niewielska; Amy Nisselle; Jeffrey Niu; Tommi H Nyrönen; Brian D O'Connor; Sabine Oesterle; Soichi Ogishima; Vivian Ota Wang; Laura A D Paglione; Emilio Palumbo; Helen E Parkinson; Anthony A Philippakis; Angel D Pizarro; Andreas Prlic; Jordi Rambla; Augusto Rendon; Renee A Rider; Peter N Robinson; Kurt W Rodarmer; Laura Lyman Rodriguez; Alan F Rubin; Manuel Rueda; Gregory A Rushton; Rosalyn S Ryan; Gary I Saunders; Helen Schuilenburg; Torsten Schwede; Serena Scollen; Alexander Senf; Nathan C Sheffield; Neerjah Skantharajah; Albert V Smith; Heidi J Sofia; Dylan Spalding; Amanda B Spurdle; Zornitza Stark; Lincoln D Stein; Makoto Suematsu; Patrick Tan; Jonathan A Tedds; Alastair A Thomson; Adrian Thorogood; Timothy L Tickle; Katsushi Tokunaga; Juha Törnroos; David Torrents; Sean Upchurch; Alfonso Valencia; Roman Valls Guimera; Jessica Vamathevan; Susheel Varma; Danya F Vears; Coby Viner; Craig Voisin; Alex H Wagner; Susan E Wallace; Brian P Walsh; Marc S Williams; Eva C Winkler; Barbara J Wold; Grant M Wood; J Patrick Woolley; Chisato Yamasaki; Andrew D Yates; Christina K Yung; Lyndon J Zass; Ksenia Zaytseva; Junjun Zhang; Peter Goodhand; Kathryn North; Ewan Birney Journal: Cell Genom Date: 2021-11-10