| Literature DB >> 35350241 |
Meiling Zou1,2, Sirong Jiang2, Fang Wang3, Long Zhao2,3, Chenji Zhang2, Yuting Bao2, Yonghao Chen2, Zhiqiang Xia1,2.
Abstract
With the rapid development of molecular breeding technology and many new varieties breeding, a method is urgently needed to identify different varieties accurately and quickly. Using this method can not only help farmers feel convenient and efficient in the normal cultivation and breeding process but also protect the interests of breeders, producers and users. In this study, single nucleotide polymorphism (SNP) data of 533 Oryza sativa, 284 Solanum tuberosum and 247 Sus scrofa and 544 Manihot esculenta Crantz were used. The original SNPs were filtered and screened to remove the SNPs with deletion number more than 1% or the homozygous genotype 0/0 and 1/1 number less than 2. The correlation between SNPs were calculated, and the two adjacent SNPs with correlation R2 > 0.95 were retained. The genetic algorithm program was developed to convert the genotype format and randomly combine SNPs to calculate a set of a small number of SNPs which could distinguish all varieties in different species as fingerprint data, using Matlab platform. The successful construction of three sets of fingerprints showed that the method developed in this study was effective in animals and plants. The population structure analysis showed that the genetic algorithm could effectively obtain the core SNPs for constructing fingerprints, and the fingerprint was practical and effective. At present, the two-dimensional code of Manihot esculenta Crantz fingerprint obtained by this method has been applied to field planting. This study provides a novel idea for the Oryza sativa, Solanum tuberosum, Sus scrofa and Manihot esculenta Crantz identification of various species, lays foundation for the cultivation and identification of new varieties, and provides theoretical significance for many other species fingerprints construction.Entities:
Keywords: DNA molecular markers; SNP; feature compression; fingerprint; genetic algorithm
Year: 2022 PMID: 35350241 PMCID: PMC8957834 DOI: 10.3389/fgene.2022.757524
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Calculation principle of genetic algorithm.
SNP number after filtration.
| Species |
|
|
|
|
|---|---|---|---|---|
| Raw data | 7546976 | 4786686 | 3667783 | 1641026 |
| After filtration | 174819 | 1127 | 6702 | 6886 |
FIGURE 2Calculation of the optimal number of SNPs in fingerprint. (A) The fitting diagram of comparing fingerprints composed of different SNPs with original SNP data. Dark blue represents the graph of original SNP data, orange represents the fingerprint curves of 80 SNPs, grey indicates the fingerprint curves of 100 SNPs, light blue shows the fingerprint curves of 200 SNPs and yellow illustrates the fingerprint curves of 300 SNPs. (B) The correlation value between each fingerprint and the original SNP data.
FIGURE 3Variation statistics of SNPs in Oryza sativa before and after genetic algorithm calculation. (A) The density distribution of original SNPs of Oryza sativa. (B) The PIC value statistics before and after genetic algorithm calculation. (C) The density distribution of 100 SNPs of Oryza sativa fingerprint.
FIGURE 6Cluster comparison of each species before and after genetic algorithm calculation. (A) The CV error value statistics and the Nj cluster tree of 533 Oryza sativa populations. The left one shows the clustering of original data, and the right one shows the clustering of fingerprint data. Different colours represent different subgroups. (B). The CV error value statistics and the Nj cluster tree of 284 Solanum tuberosum populations. (C). The CV error value statistics and the Nj cluster tree of 247 Sus scrofa populations.
FIGURE 4Variation statistics of SNPs in Solanum tuberosum before and after genetic algorithm calculation. (A) The density distribution of original SNPs of Solanum tuberosum. (B) The PIC value statistics before and after genetic algorithm calculation. (C) The density distribution of 100 SNPs of Solanum tuberosum fingerprint.
FIGURE 5Variation statistics of SNPs in Sus scrofa before and after genetic algorithm calculation. (A) The density distribution of original SNPs of Sus scrofa. (B) The PIC value statistics before and after genetic algorithm calculation. (C) The density distribution of 100 SNPs of Sus scrofa fingerprint.