| Literature DB >> 33815430 |
Yang Yang1, Hongli Tian1, Rui Wang1, Lu Wang1, Hongmei Yi1, Yawei Liu1, Liwen Xu1, Yaming Fan1, Jiuran Zhao1, Fengge Wang1.
Abstract
Molecular marker technology is used widely in plant variety discrimination, molecular breeding, and other fields. To lower the cost of testing and improve the efficiency of data analysis, molecular marker screening is very important. Screening usually involves two phases: the first to control loci quality and the second to reduce loci quantity. To reduce loci quantity, an appraisal index that is very sensitive to a specific scenario is necessary to select loci combinations. In this study, we focused on loci combination screening for plant variety discrimination. A loci combination appraisal index, variety discrimination power (VDP), is proposed, and three statistical methods, probability-based VDP (P-VDP), comparison-based VDP (C-VDP), and ratio-based VDP (R-VDP), are described and compared. The results using the simulated data showed that VDP was sensitive to statistical populations with convergence toward the same variety, and the total probability of discrimination power (TDP) method was effective only for partial populations. R-VDP was more sensitive to statistical populations with convergence toward various varieties than P-VDP and C-VDP, which both had the same sensitivity; TDP was not sensitive at all. With the real data, R-VDP values for sorghum, wheat, maize and rice data begin to show downward tendency when the number of loci is 20, 7, 100, 100 respectively, while in the case of P-VDP and C-VDP (which have the same results), the number is 6, 4, 9, 19 respectively and in the case of TDP, the number is 6, 4, 4, 11 respectively. For the variety threshold setting, R-VDP values of loci combinations with different numbers of loci responded evenly to different thresholds. C-VDP values responded unevenly to different thresholds, and the extent of the response increased as the number of loci decreased. All the methods gave underestimations when data were missing, with systematic errors for TDP, C-VDP, and R-VDP going from smallest to biggest. We concluded that VDP was a better loci combination appraisal index than TDP for plant variety discrimination and the three VDP methods have different applications. We developed the software called VDPtools, which can calculate the values of TDP, P-VDP, C-VDP, and R-VDP. VDPtools is publicly available at https://github.com/caurwx1/VDPtools.git.Entities:
Keywords: loci combination screening; molecular markers; plant variety discrimination; simple sequence repeats; single nucleotide polymorphism; variety discrimination power
Year: 2021 PMID: 33815430 PMCID: PMC8014032 DOI: 10.3389/fpls.2021.566796
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Variable settings of SNP simulated data of gradient variety difference degree.
| No. | Data array 1 | Data array 2 | Data array 3 | Data array 4 | ||||||||
| 1 | 204 | 2 | 1 | 204 | 2 | 1 | 201 | 5 | 1 | 206 | 10 | 0 |
| 2 | 196 | 10 | 1 | 196 | 2 | 5 | 196 | 5 | 2 | 196 | 10 | 1 |
| 3 | 186 | 20 | 1 | 186 | 2 | 10 | 186 | 5 | 4 | 186 | 10 | 2 |
| 4 | 176 | 30 | 1 | 176 | 2 | 15 | 176 | 5 | 6 | 176 | 10 | 3 |
| 5 | 166 | 40 | 1 | 166 | 2 | 20 | 166 | 5 | 8 | 166 | 10 | 4 |
| 6 | 156 | 50 | 1 | 156 | 2 | 25 | 156 | 5 | 10 | 156 | 10 | 5 |
| 7 | 146 | 60 | 1 | 146 | 2 | 30 | 146 | 5 | 12 | 146 | 10 | 6 |
| 8 | 136 | 70 | 1 | 136 | 2 | 35 | 136 | 5 | 14 | 136 | 10 | 7 |
| 9 | 126 | 80 | 1 | 126 | 2 | 40 | 126 | 5 | 16 | 126 | 10 | 8 |
| 10 | 116 | 90 | 1 | 116 | 2 | 45 | 116 | 5 | 18 | 116 | 10 | 9 |
| 11 | 106 | 100 | 1 | 106 | 2 | 50 | 106 | 5 | 20 | 106 | 10 | 10 |
| 12 | 96 | 110 | 1 | 96 | 2 | 55 | 96 | 5 | 22 | 96 | 10 | 11 |
| 13 | 86 | 120 | 1 | 86 | 2 | 60 | 86 | 5 | 24 | 86 | 10 | 12 |
| 14 | 76 | 130 | 1 | 76 | 2 | 65 | 76 | 5 | 26 | 76 | 10 | 13 |
| 15 | 66 | 140 | 1 | 66 | 2 | 70 | 66 | 5 | 28 | 66 | 10 | 14 |
| 16 | 56 | 150 | 1 | 56 | 2 | 75 | 56 | 5 | 30 | 56 | 10 | 15 |
| 17 | 46 | 160 | 1 | 46 | 2 | 80 | 46 | 5 | 32 | 46 | 10 | 16 |
| 18 | 36 | 170 | 1 | 36 | 2 | 85 | 36 | 5 | 34 | 36 | 10 | 17 |
| 19 | 26 | 180 | 1 | 26 | 2 | 90 | 26 | 5 | 36 | 26 | 10 | 18 |
| 20 | 16 | 190 | 1 | 16 | 2 | 95 | 16 | 5 | 38 | 16 | 10 | 19 |
| 21 | 1 | 205 | 1 | 6 | 2 | 100 | 6 | 5 | 40 | 6 | 10 | 20 |
FIGURE 1Sensitivity of four loci combination appraisal methods on simulated data of gradient variety difference degree. (A) In Data Array 1, repeated samples were those that converged toward the same variety. (B–D) In Data Array 2, 3, and 4, repeated samples converged toward various varieties but with decreasing numbers of repeated groups.
FIGURE 2Sensitivity of four loci combination appraisal methods on real data of gradient loci numbers. Real data contained genotype data of (A) sorghum samples based on SSR loci, (B) wheat samples base on SSR loci, (C) maize samples based on SNP loci, and (D) rice samples base on SNP loci.
FIGURE 3Influence of different variety thresholds on estimated C-VDP and R-VDP values. Genotype data of sorghum samples based on SSR loci is used to evaluate the influence of different variety thresholds on (A) C-VDP and (C) R-VDP. Genotype data of maize samples based on SNP loci is used to evaluate the influence of different variety thresholds on (B) C-VDP and (D) R-VDP. Suffix number after C-VDP or R-VDP indicates the threshold setting. For example, C-VDP:1 means the variety threshold of C-VDP was set as “different variety when number of different loci of samples was ≥1.”
FIGURE 4Influence of different missing data rates on the stability of three loci combination appraisal methods. Genotype data of sorghum samples based on SSR loci is used to evaluate the influence of different missing data rates on the stability of (A) TDP, (C) C-VDP, and (E) R-VDP. Genotype data of maize samples based on SNP loci is used to evaluate the influence of different missing data rates on the stability of (B) TDP, (D) C-VDP, and (F) R-VDP. Suffix number after TDP, C-VDP, or R-VDP indicates the missing rate of a dataset. For example, TDP:0.5 means a dataset with missing data rates of 50% was used to estimate the stability of TDP.