Literature DB >> 30535394

A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering.

Matthew G Johnson1,2, Lisa Pokorny3, Steven Dodsworth3,4, Laura R Botigué3,5, Robyn S Cowan3, Alison Devault6, Wolf L Eiserhardt3,7, Niroshini Epitawalage3, Félix Forest3, Jan T Kim3, James H Leebens-Mack8, Ilia J Leitch3, Olivier Maurin3, Douglas E Soltis9,10, Pamela S Soltis9,10, Gane Ka-Shu Wong11,12,13, William J Baker3, Norman J Wickett2,14.   

Abstract

Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Entities:  

Keywords:  Angiosperms; Hyb-Seq; k-means clustering; k-medoids clustering; machine learning; nuclear genes; phylogenomics; sequence capture; target enrichment

Mesh:

Substances:

Year:  2019        PMID: 30535394      PMCID: PMC6568016          DOI: 10.1093/sysbio/syy086

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  53 in total

1.  Bridging the micro- and macroevolutionary levels in phylogenomics: Hyb-Seq solves relationships from populations to species and above.

Authors:  Tamara Villaverde; Lisa Pokorny; Sanna Olsson; Mario Rincón-Barrado; Matthew G Johnson; Elliot M Gardner; Norman J Wickett; Julià Molero; Ricarda Riina; Isabel Sanmartín
Journal:  New Phytol       Date:  2018-07-17       Impact factor: 10.151

2.  Pseudo-parallel patterns of disjunctions in an Arctic-alpine plant lineage.

Authors:  Rebecca L Stubbs; Ryan A Folk; Chun-Lei Xiang; Douglas E Soltis; Nico Cellinese
Journal:  Mol Phylogenet Evol       Date:  2018-02-26       Impact factor: 4.286

3.  Plastid phylogenomic analysis of green plants: A billion years of evolutionary history.

Authors:  Matthew A Gitzendanner; Pamela S Soltis; Gane K-S Wong; Brad R Ruhfel; Douglas E Soltis
Journal:  Am J Bot       Date:  2018-03-30       Impact factor: 3.844

4.  A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing.

Authors:  Richard O Prum; Jacob S Berv; Alex Dornburg; Daniel J Field; Jeffrey P Townsend; Emily Moriarty Lemmon; Alan R Lemmon
Journal:  Nature       Date:  2015-10-07       Impact factor: 49.962

5.  Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery.

Authors:  Elliot M Gardner; Matthew G Johnson; Diane Ragone; Norman J Wickett; Nyree J C Zerega
Journal:  Appl Plant Sci       Date:  2016-07-13       Impact factor: 1.936

6.  HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment.

Authors:  Matthew G Johnson; Elliot M Gardner; Yang Liu; Rafael Medina; Bernard Goffinet; A Jonathan Shaw; Nyree J C Zerega; Norman J Wickett
Journal:  Appl Plant Sci       Date:  2016-07-12       Impact factor: 1.936

7.  From algae to angiosperms-inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes.

Authors:  Brad R Ruhfel; Matthew A Gitzendanner; Pamela S Soltis; Douglas E Soltis; J Gordon Burleigh
Journal:  BMC Evol Biol       Date:  2014-02-17       Impact factor: 3.260

8.  Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics.

Authors:  Kevin Weitemier; Shannon C K Straub; Richard C Cronn; Mark Fishbein; Roswitha Schmickl; Angela McDonnell; Aaron Liston
Journal:  Appl Plant Sci       Date:  2014-08-29       Impact factor: 1.936

9.  The Phenotypic and Genetic Underpinnings of Flower Size in Polemoniaceae.

Authors:  Jacob B Landis; Rebecca D O'Toole; Kayla L Ventura; Matthew A Gitzendanner; David G Oppenheimer; Douglas E Soltis; Pamela S Soltis
Journal:  Front Plant Sci       Date:  2016-01-05       Impact factor: 5.753

10.  Targeted NGS for species level phylogenomics: "made to measure" or "one size fits all"?

Authors:  Malvina Kadlec; Dirk U Bellstedt; Nicholas C Le Maitre; Michael D Pirie
Journal:  PeerJ       Date:  2017-07-25       Impact factor: 2.984

View more
  62 in total

1.  PhyloHerb: A high-throughput phylogenomic pipeline for processing genome skimming data.

Authors:  Liming Cai; Hongrui Zhang; Charles C Davis
Journal:  Appl Plant Sci       Date:  2022-06-02       Impact factor: 2.511

2.  The Implications of Incongruence between Gene Tree and Species Tree Topologies for Divergence Time Estimation.

Authors:  Tom Carruthers; Miao Sun; William J Baker; Stephen A Smith; Jurriaan M de Vos; Wolf L Eiserhardt
Journal:  Syst Biol       Date:  2022-08-10       Impact factor: 9.160

3.  Paralogs and off-target sequences improve phylogenetic resolution in a densely-sampled study of the breadfruit genus (Artocarpus, Moraceae).

Authors:  Elliot M Gardner; Matthew G Johnson; Joan T Pereira; Aida Shafreena Ahmad Puad; Deby Arifiani; Norman J Wickett; Nyree J C Zerega
Journal:  Syst Biol       Date:  2020-09-24       Impact factor: 15.683

4.  A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life.

Authors:  William J Baker; Paul Bailey; Vanessa Barber; Abigail Barker; Sidonie Bellot; David Bishop; Laura R Botigué; Grace Brewer; Tom Carruthers; James J Clarkson; Jeffrey Cook; Robyn S Cowan; Steven Dodsworth; Niroshini Epitawalage; Elaine Françoso; Berta Gallego; Matthew G Johnson; Jan T Kim; Kevin Leempoel; Olivier Maurin; Catherine Mcginnie; Lisa Pokorny; Shyamali Roy; Malcolm Stone; Eduardo Toledo; Norman J Wickett; Alexandre R Zuntini; Wolf L Eiserhardt; Paul J Kersey; Ilia J Leitch; Félix Forest
Journal:  Syst Biol       Date:  2022-02-10       Impact factor: 15.683

5.  Aiming off the target: recycling target capture sequencing reads for investigating repetitive DNA.

Authors:  Lucas Costa; André Marques; Chris Buddenhagen; William Wayt Thomas; Bruno Huettel; Veit Schubert; Steven Dodsworth; Andreas Houben; Gustavo Souza; Andrea Pedrosa-Harand
Journal:  Ann Bot       Date:  2021-11-09       Impact factor: 5.040

6.  Can we have it all? Repurposing target capture for repeat genomics. A commentary on: 'Aiming off the target: recycling target capture sequencing reads for investigating repetitive DNA'.

Authors:  Tony Heitkam; Sònia Garcia
Journal:  Ann Bot       Date:  2021-11-09       Impact factor: 5.040

7.  Admixture may be extensive among hyperdominant Amazon rainforest tree species.

Authors:  Drew A Larson; Oscar M Vargas; Alberto Vicentini; Christopher W Dick
Journal:  New Phytol       Date:  2021-09-12       Impact factor: 10.323

8.  A New Pipeline for Removing Paralogs in Target Enrichment Data.

Authors:  Wenbin Zhou; John Soghigian; Qiu-Yun Jenny Xiang
Journal:  Syst Biol       Date:  2022-02-10       Impact factor: 15.683

9.  An updated infra-familial classification of Sapindaceae based on targeted enrichment data.

Authors:  Sven Buerki; Martin W Callmander; Pedro Acevedo-Rodriguez; Porter P Lowry; Jérôme Munzinger; Paul Bailey; Olivier Maurin; Grace E Brewer; Niroshini Epitawalage; William J Baker; Félix Forest
Journal:  Am J Bot       Date:  2021-07-05       Impact factor: 3.325

10.  Taxon-specific or universal? Using target capture to study the evolutionary history of rapid radiations.

Authors:  Gil Yardeni; Juan Viruel; Margot Paris; Jaqueline Hess; Clara Groot Crego; Marylaure de La Harpe; Norma Rivera; Michael H J Barfuss; Walter Till; Valeria Guzmán-Jacob; Thorsten Krömer; Christian Lexer; Ovidiu Paun; Thibault Leroy
Journal:  Mol Ecol Resour       Date:  2021-10-10       Impact factor: 8.678

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.