Literature DB >> 34146111

A New Pipeline for Removing Paralogs in Target Enrichment Data.

Wenbin Zhou1, John Soghigian2,3, Qiu-Yun Jenny Xiang1.   

Abstract

Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing method that has been increasingly used for phylogenomic studies. Unfortunately, current widely used pipelines for analysis of target enrichment data do not have a vigorous procedure to remove paralogs in target enrichment data. In this study, we develop a pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and the entire pipeline applies criteria in both sequence similarity and heterozygous sites at each locus in the identification of paralogs. Users may adjust the thresholds of sequence identity and heterozygous sites to identify and remove paralogs according to the level of phylogenetic divergence of their group of interest. The new pipeline also removes highly polymorphic sites attributed to errors in sequence assembly and gappy regions in the alignment. We demonstrated the value of the new pipeline using empirical data generated from Hyb-Seq and the Angiosperms353 kit for two woody genera Castanea (Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales). Comparisons of data sets showed that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed evident differences between data from HybPiper and data from our new PPD pipeline. We further evaluated the accuracy and error rates of PPD by BLAST mapping of putative paralogous and orthologous sequences to a reference genome sequence of Castanea mollissima. Compared to HybPiper alone, PPD identified substantially more paralogous gene sequences that mapped to multiple regions of the reference genome (31 genes for PPD compared with 4 genes for HybPiper alone). In conjunction with HybPiper, paralogous genes identified by both pipelines can be removed resulting in the construction of more robust orthologous gene data sets for phylogenomic and divergence time analyses. Our study demonstrates the value of Hyb-Seq with data derived from the Angiosperms353 probe set for elucidating species relationships within a genus, and argues for the importance of additional steps to filter paralogous genes and poorly aligned regions (e.g., as occur through assembly errors), such as our new PPD pipeline described in this study. [Angiosperms353; Castanea; divergence time; Hamamelis; Hyb-Seq, paralogs, phylogenomics.].
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Entities:  

Mesh:

Year:  2022        PMID: 34146111      PMCID: PMC8974407          DOI: 10.1093/sysbio/syab044

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  69 in total

1.  Quartet inference from SNP data under the coalescent model.

Authors:  Julia Chifman; Laura Kubatko
Journal:  Bioinformatics       Date:  2014-08-07       Impact factor: 6.937

Review 2.  Gene duplication as a driver of plant morphogenetic evolution.

Authors:  Stefan A Rensing
Journal:  Curr Opin Plant Biol       Date:  2013-11-28       Impact factor: 7.834

Review 3.  Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing.

Authors:  Morten T Limborg; Lisa W Seeb; James E Seeb
Journal:  Mol Ecol       Date:  2016-04-20       Impact factor: 6.185

4.  Resolving Rapid Radiations within Angiosperm Families Using Anchored Phylogenomics.

Authors:  Étienne Léveillé-Bourret; Julian R Starr; Bruce A Ford; Emily Moriarty Lemmon; Alan R Lemmon
Journal:  Syst Biol       Date:  2018-01-01       Impact factor: 15.683

5.  Impact of the partitioning scheme on divergence times inferred from Mammalian genomic data sets.

Authors:  Carolina M Voloch; Carlos G Schrago
Journal:  Evol Bioinform Online       Date:  2012-05-14       Impact factor: 1.625

6.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

7.  Calibrated tree priors for relaxed phylogenetics and divergence time estimation.

Authors:  Joseph Heled; Alexei J Drummond
Journal:  Syst Biol       Date:  2011-08-18       Impact factor: 15.683

8.  Accurate phylogenetic tree reconstruction from quartets: a heuristic approach.

Authors:  Rezwana Reaz; Md Shamsuzzoha Bayzid; M Sohel Rahman
Journal:  PLoS One       Date:  2014-08-12       Impact factor: 3.240

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

Authors:  Kelly M Robinson; Aziah S Hawkins; Ivette Santana-Cruz; Ricky S Adkins; Amol C Shetty; Sushma Nagaraj; Lisa Sadzewicz; Luke J Tallon; David A Rasko; Claire M Fraser; Anup Mahurkar; Joana C Silva; Julie C Dunning Hotopp
Journal:  Microb Genom       Date:  2017-07-08
View more
  2 in total

1.  Target capture data resolve recalcitrant relationships in the coffee family (Rubioideae, Rubiaceae).

Authors:  Olle Thureborn; Sylvain G Razafimandimbison; Niklas Wikström; Catarina Rydin
Journal:  Front Plant Sci       Date:  2022-09-08       Impact factor: 6.627

2.  Amis Pacilo and Yami Cipoho are not the same as the Pacific breadfruit starch crop-Target enrichment phylogenomics of a long-misidentified Artocarpus species sheds light on the northward Austronesian migration from the Philippines to Taiwan.

Authors:  Chia-Rong Chuang; Chia-Lun Hsieh; Chi-Shan Chang; Chiu-Mei Wang; Danilo N Tandang; Elliot M Gardner; Lauren Audi; Nyree J C Zerega; Kuo-Fang Chung
Journal:  PLoS One       Date:  2022-09-30       Impact factor: 3.752

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.