| Literature DB >> 23652425 |
Louis-Philippe Lemieux Perreault1, Sylvie Provost, Marc-André Legault, Amina Barhdadi, Marie-Pierre Dubé.
Abstract
UNLABELLED: Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis.Entities:
Mesh:
Year: 2013 PMID: 23652425 PMCID: PMC3694635 DOI: 10.1093/bioinformatics/btt261
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Proposed data clean up pipeline. Each box represents a customizable stand-alone script with a quick description of its function. Optional manual checks for go-no-go decisions are indicated. Numbers represent the ordering of the cyclic part of the pipeline
Fig. 2.Z0 in function of showing sample relatedness