Literature DB >> 31058458

An empirical pipeline for choosing the optimal clustering threshold in RADseq studies.

Evan McCartney-Melstad1, Müge Gidiş2, H Bradley Shaffer1.   

Abstract

Genomic data are increasingly used for high resolution population genetic studies including those at the forefront of biological conservation. A key methodological challenge is determining sequence similarity clustering thresholds for RADseq data when no reference genome is available. These thresholds define the maximum permitted divergence among allelic variants and the minimum divergence among putative paralogues and are central to downstream population genomic analyses. Here we develop a novel set of metrics to determine sequence similarity thresholds that maximize the correct separation of paralogous regions and minimize oversplitting naturally occurring allelic variation within loci. These metrics empirically identify the threshold value at which true alleles at opposite ends of several major axes of genetic variation begin to incorrectly separate into distinct clusters, allowing researchers to choose thresholds just below this value. We test our approach on a recently published data set for the protected foothill yellow-legged frog (Rana boylii). The metrics recover a consistent pattern of roughly 96% similarity as a threshold above which genetic divergence and data missingness become increasingly correlated. We provide scripts for assessing different clustering thresholds and discuss how this approach can be applied across a wide range of empirical data sets.
© 2019 John Wiley & Sons Ltd.

Entities:  

Keywords:  zzm321990Rana boyliizzm321990; RADseq; parameter choice; population genomics

Mesh:

Year:  2019        PMID: 31058458     DOI: 10.1111/1755-0998.13029

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  4 in total

1.  Population Genomics Analysis with RAD, Reprised: Stacks 2.

Authors:  Angel G Rivera-Colón; Julian Catchen
Journal:  Methods Mol Biol       Date:  2022

2.  Long-read genotyping with SLANG (Simple Long-read loci Assembly of Nanopore data for Genotyping).

Authors:  Marco Dorfner; Tankred Ott; Philipp Ott; Christoph Oberprieler
Journal:  Appl Plant Sci       Date:  2022-06-14       Impact factor: 2.511

3.  Phylogenomics and species delimitation of the economically important Black Basses (Micropterus).

Authors:  Daemin Kim; Andrew T Taylor; Thomas J Near
Journal:  Sci Rep       Date:  2022-06-06       Impact factor: 4.996

4.  Drivers of phenotypic divergence in a Mesoamerican highland bird.

Authors:  Sahid M Robles-Bello; Melisa Vázquez-López; Sandra M Ramírez-Barrera; Alondra K Terrones-Ramírez; Blanca E Hernández-Baños
Journal:  PeerJ       Date:  2022-02-18       Impact factor: 2.984

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.