Literature DB >> 34143216

Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model.

Jun Huang1,2, Jeremy Bennett1,3, Tomáš Flouri1, Adam D Leaché4, Ziheng Yang1.   

Abstract

Genome sequencing projects routinely generate haploid consensus sequences from diploid genomes, which are effectively chimeric sequences with the phase at heterozygous sites resolved at random. The impact of phasing errors on phylogenomic analyses under the multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer simulation to evaluate the performance of four phase-resolution strategies (the true phase resolution, the diploid analytical integration algorithm which averages over all phase resolutions, computational phase resolution using the program PHASE, and random resolution) on estimation of the species tree and evolutionary parameters in analysis of multilocus genomic data under the MSC model. We found that species tree estimation is robust to phasing errors when species divergences were much older than average coalescent times but may be affected by phasing errors when the species tree is shallow. Estimation of parameters under the MSC model with and without introgression is affected by phasing errors. In particular, random phase resolution causes serious overestimation of population sizes for modern species and biased estimation of cross-species introgression probability. In general, the impact of phasing errors is greater when the mutation rate is higher, the data include more samples per species, and the species tree is shallower with recent divergences. Use of phased sequences inferred by the PHASE program produced small biases in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution strategies have similar impacts on practical data analyses. We suggest that genome sequencing projects should produce unphased diploid genotype sequences if fully phased data are too challenging to generate, and avoid haploid consensus sequences, which have heterozygous sites phased at random. In case the analytical integration algorithm is computationally unfeasible, computational phasing prior to population genomic analyses is an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species tree.].
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Entities:  

Mesh:

Year:  2022        PMID: 34143216      PMCID: PMC8977997          DOI: 10.1093/sysbio/syab047

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  53 in total

1.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Genetics       Date:  2003-08       Impact factor: 4.562

2.  Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data.

Authors:  Dingqiao Wen; Luay Nakhleh
Journal:  Syst Biol       Date:  2018-05-01       Impact factor: 15.683

Review 3.  Haplotype phasing: existing methods and new developments.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Nat Rev Genet       Date:  2011-09-16       Impact factor: 53.242

4.  Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent.

Authors:  Tomáš Flouri; Xiyun Jiao; Bruce Rannala; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2018-10-01       Impact factor: 16.240

5.  Human DNA sequence variation in a 6.6-kb region containing the melanocortin 1 receptor promoter.

Authors:  K D Makova; M Ramsay; T Jenkins; W H Li
Journal:  Genetics       Date:  2001-07       Impact factor: 4.562

6.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Syst Biol       Date:  2017-09-01       Impact factor: 15.683

7.  Unguided species delimitation using DNA sequence data from multiple Loci.

Authors:  Ziheng Yang; Bruce Rannala
Journal:  Mol Biol Evol       Date:  2014-10-01       Impact factor: 16.240

8.  Direct determination of diploid genome sequences.

Authors:  Neil I Weisenfeld; Vijay Kumar; Preyas Shah; Deanna M Church; David B Jaffe
Journal:  Genome Res       Date:  2017-04-05       Impact factor: 9.043

9.  Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements.

Authors:  Benjamin R Karin; Tony Gamble; Todd R Jackman
Journal:  Mol Biol Evol       Date:  2020-03-01       Impact factor: 16.240

10.  Haplotype-resolved genome analyses of a heterozygous diploid potato.

Authors:  Qian Zhou; Dié Tang; Wu Huang; Zhongmin Yang; Yu Zhang; John P Hamilton; Richard G F Visser; Christian W B Bachem; C Robin Buell; Zhonghua Zhang; Chunzhi Zhang; Sanwen Huang
Journal:  Nat Genet       Date:  2020-09-28       Impact factor: 38.330

View more
  4 in total

1.  Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies.

Authors:  Yuttapong Thawornwattana; Fernando A Seixas; Ziheng Yang; James Mallet
Journal:  Syst Biol       Date:  2022-08-10       Impact factor: 9.160

2.  Geographic Mosaic of Extensive Genetic Variations in Subterranean Mole Voles Ellobius alaicus as a Consequence of Habitat Fragmentation and Hybridization.

Authors:  Valentina Tambovtseva; Irina Bakloushinskaya; Sergey Matveevsky; Aleksey Bogdanov
Journal:  Life (Basel)       Date:  2022-05-13

3.  Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability.

Authors:  Ziheng Yang; Tomáš Flouri
Journal:  Mol Biol Evol       Date:  2022-05-03       Impact factor: 8.800

4.  Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent.

Authors:  Tomáš Flouri; Jun Huang; Xiyun Jiao; Paschalia Kapli; Bruce Rannala; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2022-08-03       Impact factor: 8.800

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.