Sajid Mughal1, Ismail Moghul2, Jing Yu3, Tristan Clark4, David S Gregory4, Nikolas Pontikos5,6,7. 1. Globe View, London, EC4V 3PP, UK. 2. UCL Cancer Institute, University College London, London WC1E 6DD, UK. 3. Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK. 4. Computer Science Department. 5. UCL Genetics Institute, University College London, London WC1E 6BT, UK. 6. Institute of Ophthalmology, University College London, London, EC1V 9EL, UK. 7. Moorfields Eye Hospital, London EC1V 2PD, UK.
Abstract
SUMMARY: Efficient storage and querying of large amounts of genetic and phenotypic data is crucial to contemporary clinical genetic research. This introduces computational challenges for classical relational databases, due to the sparsity and sheer volume of the data. Our Java based solution loads annotated genetic variants and well phenotyped patients into a graph database to allow fast efficient storage and querying of large volumes of structured genetic and phenotypic data. This abstracts technical problems away and lets researchers focus on the science rather than the implementation. We have also developed an accompanying webserver with end-points to facilitate querying of the database. AVAILABILITY AND IMPLEMENTATION: The Java and Python code are available at https://github.com/phenopolis/pheno4j. CONTACT: n.pontikos@ucl.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Efficient storage and querying of large amounts of genetic and phenotypic data is crucial to contemporary clinical genetic research. This introduces computational challenges for classical relational databases, due to the sparsity and sheer volume of the data. Our Java based solution loads annotated genetic variants and well phenotyped patients into a graph database to allow fast efficient storage and querying of large volumes of structured genetic and phenotypic data. This abstracts technical problems away and lets researchers focus on the science rather than the implementation. We have also developed an accompanying webserver with end-points to facilitate querying of the database. AVAILABILITY AND IMPLEMENTATION: The Java and Python code are available at https://github.com/phenopolis/pheno4j. CONTACT: n.pontikos@ucl.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Alberto Santos; Ana R Colaço; Annelaura B Nielsen; Lili Niu; Maximilian Strauss; Philipp E Geyer; Fabian Coscia; Nicolai J Wewer Albrechtsen; Filip Mundt; Lars Juhl Jensen; Matthias Mann Journal: Nat Biotechnol Date: 2022-01-31 Impact factor: 68.164