Literature DB >> 21284835

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies.

Moara Machado1, Wagner Cs Magalhães1, Allan Sene1, Bruno Araújo1, Alessandra C Faria-Campos2, Stephen J Chanock3,4, Leandro Scott5, Guilherme Oliveira5, Eduardo Tarazona-Santos1, Maira R Rodrigues1.   

Abstract

BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data.
RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp.
CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.

Entities:  

Year:  2011        PMID: 21284835      PMCID: PMC3041995          DOI: 10.1186/2041-2223-2-3

Source DB:  PubMed          Journal:  Investig Genet        ISSN: 2041-2223


  34 in total

1.  VariScan: Analysis of evolutionary patterns from large-scale DNA sequence polymorphism data.

Authors:  Albert J Vilella; Angel Blanco-Garcia; Stephan Hutter; Julio Rozas
Journal:  Bioinformatics       Date:  2005-04-06       Impact factor: 6.937

2.  Statistical evaluation of alternative models of human evolution.

Authors:  Nelson J R Fagundes; Nicolas Ray; Mark Beaumont; Samuel Neuenschwander; Francisco M Salzano; Sandro L Bonatto; Laurent Excoffier
Journal:  Proc Natl Acad Sci U S A       Date:  2007-10-31       Impact factor: 11.205

3.  Texas population substructure and its impact on estimating the rarity of Y STR haplotypes from DNA evidence*.

Authors:  Bruce Budowle; Jianye Ge; Xavier G Aranda; John V Planz; Arthur J Eisenberg; Ranajit Chakraborty
Journal:  J Forensic Sci       Date:  2009-07-15       Impact factor: 1.832

4.  FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis.

Authors:  Nicholas C Manoukis
Journal:  Mol Ecol Notes       Date:  2007-07-01

5.  Consed: a graphical tool for sequence finishing.

Authors:  D Gordon; C Abajian; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  Analysis of nucleotide diversity of NAT2 coding region reveals homogeneity across Native American populations and high intra-population diversity.

Authors:  S Fuselli; R H Gilman; S J Chanock; S L Bonatto; G De Stefano; C A Evans; D Labuda; D Luiselli; F M Salzano; G Soto; G Vallejo; A Sajantila; D Pettener; E Tarazona-Santos
Journal:  Pharmacogenomics J       Date:  2006-07-18       Impact factor: 3.550

7.  CYBB, an NADPH-oxidase gene: restricted diversity in humans and evidence for differential long-term purifying selection on transmembrane and cytosolic domains.

Authors:  Eduardo Tarazona-Santos; Toralf Bernig; Laurie Burdett; Wagner C S Magalhaes; Cristina Fabbri; Jason Liao; Rodrigo A F Redondo; Robert Welch; Meredith Yeager; Stephen J Chanock
Journal:  Hum Mutat       Date:  2008-05       Impact factor: 4.878

8.  Mutation in intron 5 of GTP cyclohydrolase 1 gene causes dopa-responsive dystonia (Segawa syndrome) in a Brazilian family.

Authors:  C P Souza; E R Valadares; A L C Trindade; V L Rocha; L R Oliveira; A L B Godard
Journal:  Genet Mol Res       Date:  2008-08-05

9.  A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33.

Authors:  Gloria M Petersen; Laufey Amundadottir; Charles S Fuchs; Peter Kraft; Rachael Z Stolzenberg-Solomon; Kevin B Jacobs; Alan A Arslan; H Bas Bueno-de-Mesquita; Steven Gallinger; Myron Gross; Kathy Helzlsouer; Elizabeth A Holly; Eric J Jacobs; Alison P Klein; Andrea LaCroix; Donghui Li; Margaret T Mandelson; Sara H Olson; Harvey A Risch; Wei Zheng; Demetrius Albanes; William R Bamlet; Christine D Berg; Marie-Christine Boutron-Ruault; Julie E Buring; Paige M Bracci; Federico Canzian; Sandra Clipp; Michelle Cotterchio; Mariza de Andrade; Eric J Duell; J Michael Gaziano; Edward L Giovannucci; Michael Goggins; Göran Hallmans; Susan E Hankinson; Manal Hassan; Barbara Howard; David J Hunter; Amy Hutchinson; Mazda Jenab; Rudolf Kaaks; Charles Kooperberg; Vittorio Krogh; Robert C Kurtz; Shannon M Lynch; Robert R McWilliams; Julie B Mendelsohn; Dominique S Michaud; Hemang Parikh; Alpa V Patel; Petra H M Peeters; Aleksandar Rajkovic; Elio Riboli; Laudina Rodriguez; Daniela Seminara; Xiao-Ou Shu; Gilles Thomas; Anne Tjønneland; Geoffrey S Tobias; Dimitrios Trichopoulos; Stephen K Van Den Eeden; Jarmo Virtamo; Jean Wactawski-Wende; Zhaoming Wang; Brian M Wolpin; Herbert Yu; Kai Yu; Anne Zeleniuch-Jacquotte; Joseph F Fraumeni; Robert N Hoover; Patricia Hartge; Stephen J Chanock
Journal:  Nat Genet       Date:  2010-01-24       Impact factor: 38.330

10.  Evaluation of next generation sequencing platforms for population targeted sequencing studies.

Authors:  Olivier Harismendy; Pauline C Ng; Robert L Strausberg; Xiaoyun Wang; Timothy B Stockwell; Karen Y Beeson; Nicholas J Schork; Sarah S Murray; Eric J Topol; Samuel Levy; Kelly A Frazer
Journal:  Genome Biol       Date:  2009-03-27       Impact factor: 13.583

View more
  19 in total

1.  Novel SNPs and INDEL polymorphisms in the 3'UTR of DGAT1 gene: in silico analyses and a possible association.

Authors:  Izinara da Cruz Rosse; Raphael da Silva Steinberg; Roney Santos Coimbra; Maria Gabriela Campolina Diniz Peixoto; Rui Silva Verneque; Marco Antonio Machado; Cleusa Graça Fonseca; Maria Raquel Santos Carvalho
Journal:  Mol Biol Rep       Date:  2014-03-28       Impact factor: 2.316

2.  Trypanosoma cruzi iron superoxide dismutases: insights from phylogenetics to chemotherapeutic target assessment.

Authors:  Silvane Maria Fonseca Murta; Laila Alves Nahum; Jéssica Hickson; Lucas Felipe Almeida Athayde; Thainá Godinho Miranda; Policarpo Ademar Sales Junior; Anderson Coqueiro Dos Santos; Lúcia Maria da Cunha Galvão; Antônia Cláudia Jácome da Câmara; Daniella Castanheira Bartholomeu; Rita de Cássia Moreira de Souza
Journal:  Parasit Vectors       Date:  2022-06-06       Impact factor: 4.047

3.  Complete genome sequence of 285P, a novel T7-like polyvalent E. coli bacteriophage.

Authors:  Bin Xu; Xiangyu Ma; Hongyan Xiong; Yafei Li
Journal:  Virus Genes       Date:  2014-03-26       Impact factor: 2.332

Review 4.  Population, Epidemiological, and Functional Genetics of Gastric Cancer Candidate Genes in Peruvians with Predominant Amerindian Ancestry.

Authors:  Roxana Zamudio; Latife Pereira; Carolina D Rocha; Douglas E Berg; Thaís Muniz-Queiroz; Hanaisa P Sant Anna; Lilia Cabrera; Juan M Combe; Phabiola Herrera; Martha H Jahuira; Felipe B Leão; Fernanda Lyon; William A Prado; Maíra R Rodrigues; Fernanda Rodrigues-Soares; Meddly L Santolalla; Camila Zolini; Aristóbolo M Silva; Robert H Gilman; Eduardo Tarazona-Santos; Fernanda S G Kehdy
Journal:  Dig Dis Sci       Date:  2015-09-21       Impact factor: 3.199

5.  A graph-based approach for designing extensible pipelines.

Authors:  Maíra R Rodrigues; Wagner C S Magalhães; Moara Machado; Eduardo Tarazona-Santos
Journal:  BMC Bioinformatics       Date:  2012-07-12       Impact factor: 3.169

6.  Population genetics of GYPB and association study between GYPB*S/s polymorphism and susceptibility to P. falciparum infection in the Brazilian Amazon.

Authors:  Eduardo Tarazona-Santos; Lilian Castilho; Daphne R T Amaral; Daiane C Costa; Natália G Furlani; Luciana W Zuccherato; Moara Machado; Marion E Reid; Mariano G Zalis; Andréa R Rossit; Sidney E B Santos; Ricardo L Machado; Sara Lustigman
Journal:  PLoS One       Date:  2011-01-24       Impact factor: 3.240

7.  Genomic analysis of six new Geobacillus strains reveals highly conserved carbohydrate degradation architectures and strategies.

Authors:  Phillip J Brumm; Pieter De Maayer; David A Mead; Don A Cowan
Journal:  Front Microbiol       Date:  2015-05-12       Impact factor: 5.640

8.  De Novo Transcriptome Assembly and Comparative Analysis Elucidate Complicated Mechanism Regulating Astragalus chrysochlorus Response to Selenium Stimuli.

Authors:  Özgür Çakır; Neslihan Turgut-Kara; Şule Arı; Baohong Zhang
Journal:  PLoS One       Date:  2015-10-02       Impact factor: 3.240

9.  A method enabling high-throughput sequencing of human cytomegalovirus complete genomes from clinical isolates.

Authors:  Steven Sijmons; Kim Thys; Michaël Corthout; Ellen Van Damme; Marnix Van Loock; Stefanie Bollen; Sylvie Baguet; Jeroen Aerssens; Marc Van Ranst; Piet Maes
Journal:  PLoS One       Date:  2014-04-22       Impact factor: 3.240

10.  Genomic and enzymatic results show Bacillus cellulosilyticus uses a novel set of LPXTA carbohydrases to hydrolyze polysaccharides.

Authors:  David Mead; Colleen Drinkwater; Phillip J Brumm
Journal:  PLoS One       Date:  2013-04-04       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.