| Literature DB >> 33053706 |
Magdalena Góralska1, Jan Bińkowski1, Natalia Lenarczyk1, Anna Bienias1, Agnieszka Grądzielewska2, Ilona Czyczyło-Mysza3, Kamila Kapłoniak3, Stefan Stojałowski1, Beata Myśków1.
Abstract
The standard approach to genetic mapping was supplemented by machine learning (ML) to establish the location of the rye gene associated with epicuticular wax formation (glaucous phenotype). Over 180 plants of the biparental F2 population were genotyped with the DArTseq (sequencing-based diversity array technology). A maximum likelihood (MLH) algorithm (JoinMap 5.0) and three ML algorithms: logistic regression (LR), random forest and extreme gradient boosted trees (XGBoost), were used to select markers closely linked to the gene encoding wax layer. The allele conditioning the nonglaucous appearance of plants, derived from the cultivar Karlikovaja Zelenostebelnaja, was mapped at the chromosome 2R, which is the first report on this localization. The DNA sequence of DArT-Silico 3585843, closely linked to wax segregation detected by using ML methods, was indicated as one of the candidates controlling the studied trait. The putative gene encodes the ABCG11 transporter.Entities:
Keywords: Keywords: ATP-binding cassette (ABC) transporters; Secale cereale L.; fatty acid desaturase (FAD), genetic map; glaucousness; large-scale sequence-based markers
Mesh:
Substances:
Year: 2020 PMID: 33053706 PMCID: PMC7593958 DOI: 10.3390/ijms21207501
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Stems and leaves of parental components of rye mapping population BK2: glaucous inbred line AK1 (A) and nonglaucous AKZ (B).
The significance of differences between the group of waxy (1) and waxless (2) plants of mapping population BK2, demonstrated using the t-test.
| Trait | Generation | Mean | Standard Deviation | Sample Number | Degrees of Freedom (df) | Probability Level ( | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 2 | 1 | 2 | |||||
| PH * | F2 | 116.10 | 127.07 | 29.10 | 26.34 | 200 | 68 | −2.75 | 266 | 0.006 |
| F3 | 96.40 | 102.25 | 26.19 | 28.44 | 248 | 128 | −1.99 | 374 | 0.047 | |
| TN | F2 | 3.83 | 3.38 | 2.11 | 1.60 | 202 | 69 | 1.62 | 269 | 0.106 |
| F3 | 3.53 | 3.36 | 1.88 | 1.65 | 248 | 128 | 0.88 | 374 | 0.380 | |
| SL | F2 | 10.45 | 10.21 | 1.74 | 1.72 | 196 | 67 | 0.97 | 261 | 0.333 |
| F3 | 9.27 | 9.23 | 1.40 | 1.29 | 245 | 126 | 0.30 | 369 | 0.768 | |
| SNPS * | F2 | 32.84 | 34.27 | 4.84 | 4.97 | 196 | 67 | −2.07 | 261 | 0.040 |
| F3 | 30.75 | 32.27 | 4.32 | 3.79 | 245 | 126 | −3.34 | 369 | 0.001 | |
| CT * | F2 | 31.84 | 33.93 | 4.54 | 4.46 | 196 | 67 | −3.27 | 261 | 0.001 |
| F3 | 33.41 | 35.34 | 3.73 | 4.38 | 245 | 126 | −4.43 | 369 | 0.000 | |
| GNPS | F2 | 39.14 | 39.90 | 25.68 | 27.20 | 196 | 67 | −0.20 | 261 | 0.839 |
| F3 | 21.21 | 21.68 | 17.60 | 17.32 | 245 | 126 | −0.25 | 369 | 0.807 | |
| GWPS | F2 | 1.36 | 1.36 | 0.60 | 0.69 | 159 | 55 | −0.04 | 212 | 0.965 |
| F3 | 0.64 | 0.57 | 0.39 | 0.41 | 182 | 106 | 1.41 | 286 | 0.160 | |
| TGW | F2 | 28.16 | 27.41 | 6.43 | 7.07 | 159 | 55 | 0.74 | 212 | 0.463 |
| F3 | 22.21 | 21.80 | 6.72 | 6.21 | 182 | 106 | 0.51 | 286 | 0.611 | |
| FD * | F3 | 28.61 | 29.01 | 0.87 | 0.89 | 133 | 50 | −2.73 | 181 | 0.007 |
PH—plant height (cm), TN—tiller number, SL—spike length, SNPS—spikelet number per spike, CT—spike compactness (spikelet number per 10 cm), GNPS—grain number per spike, GWPS—grain weight per spike (g), TGW—thousand grain weight (g), FD—flowering data (days from May 1st), * statistically significant differences.
DArTseq statistic in rye mapping population BK2-F2.
| Silico | SNP | Total | |
|---|---|---|---|
| incomplete data | 447 | 2914 | 3361 |
| monomorphic | 2630 | 1267 | 3897 |
| polymorphic | 25812 | 6545 | 32357 |
| including | |||
| 1R * | 917 | 191 | 1108 |
| 2R * | 927 | 208 | 1135 |
| 3R * | 977 | 152 | 1129 |
| 4R * | 873 | 188 | 1061 |
| 5R * | 1086 | 169 | 1255 |
| 6R * | 1165 | 231 | 1396 |
| 7R * | 789 | 183 | 972 |
| mean per chromosome | 962 | 189 | 1151 |
| unassigned ** | 19152 | 5070 | 24222 |
*/** DArTseq assigned/unassigned to chromosomes based on literature data [56,57].
Figure 2Position of the wax locus on the rye chromosome 2R of the BK2 mapping population with reference to the RIL-S map [56]. DArTseqs indicated in the ML analysis are blue.
Figure 3Projection of two classes of plants (wax and waxless) in a two-dimensional space characterized on the basis of DArTseqs assigned to chromosome 2R (A) and not assigned (B). The figures on the right show the ordering result using markers selected based on ML algorithms.
DArTseqs indicated by three machine learning (ML) algorithms as important for distinguishing between waxy and waxless plants. Coefficient values are draw out directly from models (in the case of logistic regression (LR) absolute values of coefficients were used). Impact value is the sum of coefficients.
| Marker | Assignment to 2R | LR Coefficient | Random Forest Coefficient | XGBoost Coefficient | Impact | Distance from | Annotation |
|---|---|---|---|---|---|---|---|
| 3591025 | A | 0.636 | 0.017 | 0.825 | 1.478 | 1.691 | - |
| 3593882 | B | 0.553 | 0.022 | 0.024 | 0.599 | unmapped | + |
| 3578307_27:A>G | B | 0.413 | 0.013 | 0.013 | 0.439 | 1.944 | - |
| 3889647 | A | 0.285 | 0.025 | 0.035 | 0.345 | 0.088 | + |
| 3908692_28:C>T | B | 0.315 | 0.014 | 0.008 | 0.337 | 1.097 | - |
| 3362575_18:C>T | B | 0.192 | 0.014 | 0.086 | 0.293 | 0.81 | + |
| 4485942_42:T>G | B | 0.192 | 0.002 | 0.069 | 0.263 | 0.86 | - |
| 3597393_10:T>G | B | 0.062 | 0.005 | 0.162 | 0.228 | unmapped | - |
| 3358122 | A | 0.162 | 0.014 | 0.019 | 0.195 | 7.162 | - |
| 3585843 | A | 0.169 | 0.015 | 0.01 | 0.193 | 2.056 | + |
| 3341848 | A | 0.064 | 0.002 | 0.002 | 0.067 | 6.917 | + |
| 4092788_55:G>A | B | 0.044 | 0.004 | 0.009 | 0.058 | 22.593 | + |
| 3750485 | B | 0.022 | 0.01 | 0.002 | 0.034 | unmapped | - |
Annotations of markers most strongly linked to wax locus (<3 cM) found in the NCBI database.
| Marker | Description | Total Score | Identity | Accession | |
|---|---|---|---|---|---|
| 3889647 1 | Predicted: glycine soja long-chain-fatty-acid--AMP ligase FadD28-like (LOC114378589), mRNA | 71 | 1E–09 | 90% | XM_028337226.1 |
| 3362575_18:C>T 1,2 | Predicted: | 112 | 4E–22 | 96% | XM_020315644.1 |
| 3585843 1 | Predicted: | 128 | 2E–26 | 100% | XM_006652486.2 |
1 marker pointed with use ML; 2 marker pointed with use MLH (JoinMap 5.0).
Figure 4Stability assessment of three reference genes across all rye samples using two different algorithms: NormFinder (SD) and geNorm (M-Value) provided in GeneEx 7.0 software (bioMCC, Freising-Weihenstephan, Germany).
Figure 5Relative expression (fold change in relation to the ACT and GAPDH) of putative ScABCG11 established in qPCR (quantitative PCR) for glaucous (AK1) and nonglaucous (AKZ) rye inbred line. Vertical bars indicate +/− standard error. Differences are statistically significant in Kruskal-Wallis test.
Primers used in qPCR for amplification of reference genes and studied gene ScABCG11.
| Gene | Primer Pair | Sequence 5′–3′ |
|---|---|---|
| Actin (ACT) | ACT Fw | AAGATGGGACGTCTTGATGG |
| ACT Rev | GGATCTTCATCGGCATCACT | |
| Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) | GAPDH Fw | AGATGCCCCTATGTTTGTGG |
| GAPDH Rev | GTGGTGCAGCTAGCATTTGA | |
| RNase L inhibitor (RLI) | RLI Fw | TTGAGCAACTCATGGACCAG |
| RLI Rev | TGCTTTCCAAGGCACAAACAT | |
| ATP binding cassette transporter, subfamily G ( | ABCG_F_1297 | GGTGATGGATTCAAGGGGCA |
| ABCG_R_1382 | CGCGCGACATGTTGATGAAT |