| Literature DB >> 25424532 |
Balaji Chattopadhyay1, Kritika M Garg, Uma Ramakrishnan.
Abstract
Reduced representation libraries are being used as a preferred source of markers to address population genetic questions. However, libraries of RAD-Seq variants often suffer from significant percentage of missing data. In addition, algorithms used to mine SNPs from the raw data may also underscore biological variation. We investigate the effect of biological diversity in mining SNPs from the program STACKS and the effect of missing data on individual assignment implemented in STRUCTURE. We observed that changing diversity parameters in STACKS significantly alters the number of SNPs discovered and allowing for higher percentage of missing data retrieves more loci and possibly more power for individual assignment.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25424532 PMCID: PMC4256836 DOI: 10.1186/1756-0500-7-841
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Ancestry coefficient for individuals (obtained from structure) A) with increase in number of mismatches allowed to generate loci in Stacks (default: two mismatches between reads within a locus, M2n2: two mismatches between reads within a locus and two mismatches between loci when comparing across individuals, M3n5: three mismatches between reads within a locus and five mismatches between loci when comparing across individuals, M3n7: three mismatches between reads within a locus and seven mismatches between loci when comparing across individuals and M3n5N7: three mismatches between reads within a locus and five mismatches between loci when comparing across individuals and additionally allowing seven mismatches to align secondary reads to generate a locus) and B) increasing the proportion of missing data.