Nab Raj Roshyara1, Markus Scholz1. 1. Medical Department, Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany; Medical Department, LIFE Research Center (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes and Environment), University of Leipzig, Leipzig, Germany.
Abstract
BACKGROUND: Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses. METHODS: In order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses. RESULTS: fcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications. CONCLUSIONS: We have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.
BACKGROUND: Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses. METHODS: In order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses. RESULTS: fcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications. CONCLUSIONS: We have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.
Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025
Authors: Elisabeth M van Leeuwen; Alexandros Kanterakis; Patrick Deelen; Mathijs V Kattenberg; P Eline Slagboom; Paul I W de Bakker; Cisca Wijmenga; Morris A Swertz; Dorret I Boomsma; Cornelia M van Duijn; Lennart C Karssen; Jouke Jan Hottenga Journal: Nat Protoc Date: 2015-07-30 Impact factor: 13.491
Authors: Stephan Buch; Felix Stickel; Eric Trépo; Michael Way; Alexander Herrmann; Hans Dieter Nischalke; Mario Brosch; Jonas Rosendahl; Thomas Berg; Monika Ridinger; Marcella Rietschel; Andrew McQuillin; Josef Frank; Falk Kiefer; Stefan Schreiber; Wolfgang Lieb; Michael Soyka; Nasser Semmo; Elmar Aigner; Christian Datz; Renate Schmelz; Stefan Brückner; Sebastian Zeissig; Anna-Magdalena Stephan; Norbert Wodarz; Jacques Devière; Nicolas Clumeck; Christoph Sarrazin; Frank Lammert; Thierry Gustot; Pierre Deltenre; Henry Völzke; Markus M Lerch; Julia Mayerle; Florian Eyer; Clemens Schafmayer; Sven Cichon; Markus M Nöthen; Michael Nothnagel; David Ellinghaus; Klaus Huse; Andre Franke; Steffen Zopf; Claus Hellerbrand; Christophe Moreno; Denis Franchimont; Marsha Y Morgan; Jochen Hampe Journal: Nat Genet Date: 2015-10-19 Impact factor: 38.330
Authors: David C Qian; Jinyoung Byun; Younghun Han; Casey S Greene; John K Field; Rayjean J Hung; Yonathan Brhane; John R Mclaughlin; Gordon Fehringer; Maria Teresa Landi; Albert Rosenberger; Heike Bickeböller; Jyoti Malhotra; Angela Risch; Joachim Heinrich; David J Hunter; Brian E Henderson; Christopher A Haiman; Fredrick R Schumacher; Rosalind A Eeles; Douglas F Easton; Daniela Seminara; Christopher I Amos Journal: Hum Mol Genet Date: 2015-10-19 Impact factor: 6.150
Authors: Gillian M Belbin; Stephanie Rutledge; Tetyana Dodatko; Sinead Cullina; Michael C Turchin; Sumita Kohli; Denis Torre; Muh-Ching Yee; Christopher R Gignoux; Noura S Abul-Husn; Sander M Houten; Eimear E Kenny Journal: Am J Hum Genet Date: 2021-10-21 Impact factor: 11.025
Authors: Bhaveni B Kooverjee; Pranisha Soma; Magrieta A Van Der Nest; Michiel M Scholtz; Frederick W C Neser Journal: Front Genet Date: 2022-06-17 Impact factor: 4.772