| Literature DB >> 35729192 |
Kuldeep Kumar1,2, Priyanka Anjoy3, Sarika Sahu3, Kumar Durgesh4, Antara Das1, Kishor U Tribhuvan1,5, Amitha Mithra Sevanthi1, Rekha Joshi4, Pradeep Kumar Jain1, Nagendra Kumar Singh1, Atmakuri Ramakrishna Rao3, Kishor Gaikwad6.
Abstract
Pigeonpea, a tropical photosensitive crop, harbors significant diversity for days to flowering, but little is known about the genes that govern these differences. Our goal in the current study was to use genome wide association strategy to discover the loci that regulate days to flowering in pigeonpea. A single trait as well as a principal component based association study was conducted on a diverse collection of 142 pigeonpea lines for days to first and fifty percent of flowering over 3 years, besides plant height and number of seeds per pod. The analysis used seven association mapping models (GLM, MLM, MLMM, CMLM, EMLM, FarmCPU and SUPER) and further comparison revealed that FarmCPU is more robust in controlling both false positives and negatives as it incorporates multiple markers as covariates to eliminate confounding between testing marker and kinship. Cumulatively, a set of 22 SNPs were found to be associated with either days to first flowering (DOF), days to fifty percent flowering (DFF) or both, of which 15 were unique to trait based, 4 to PC based GWAS while 3 were shared by both. Because PC1 represents DOF, DFF and plant height (PH), four SNPs found associated to PC1 can be inferred as pleiotropic. A window of ± 2 kb of associated SNPs was aligned with available transcriptome data generated for transition from vegetative to reproductive phase in pigeonpea. Annotation analysis of these regions revealed presence of genes which might be involved in floral induction like Cytochrome p450 like Tata box binding protein, Auxin response factors, Pin like genes, F box protein, U box domain protein, chromatin remodelling complex protein, RNA methyltransferase. In summary, it appears that auxin responsive genes could be involved in regulating DOF and DFF as majority of the associated loci contained genes which are component of auxin signaling pathways in their vicinity. Overall, our findings indicates that the use of principal component analysis in GWAS is statistically more robust in terms of identifying genes and FarmCPU is a better choice compared to the other aforementioned models in dealing with both false positive and negative associations and thus can be used for traits with complex inheritance.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35729192 PMCID: PMC9211048 DOI: 10.1038/s41598-022-14568-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Description of the association mapping models.
| Model | Description |
|---|---|
| General linear model (GLM) | GLM induces the simplest structure for single-locus analysis with population structure (Q) as fixed effect, whereas no random effect component is involved in the model; principal components are used as covariates in such a model to reduce the false positives |
| Mixed linear model (MLM) | MLM includes the kinship matrix (K) as an additional random effect component; hence it is also called the Q + K model |
| Multiple loci MLM (MLMM) | MLMM is designed for multiple locus analysis, is an improvement over MLM which incorporates multiple markers simultaneously as covariates in order to partially remove the confounding between testing markers and kinship. Gapit uses forward and backward stepwise linear mixed-model regression to include the markers as covariates |
| Compressed MLM (CMLM) | In CMLM the similar individuals are assigned into groups through cluster analysis and then groups are used as elements of reduced kinship matrix for random effect structure. The model has improved statistical power compared to regular MLM methods due to grouping or clustering |
| Enriched CMLM (ECMLM) | ECMLM calculates kinship using different algorithms and then chooses the best combination between kinship algorithms and grouping algorithms |
| Fixed and random model circulating probability unification (FarmCPU) | This is an iterative approach which iteratively fits both fixed and random effect model to eliminate the models overfitting problem while using stepwise regression in MLMM. To control the false positives, kinship derived from associated markers is used |
| Settlement of MLM under progressively exclusive relationship (SUPER) | The SUPER model uses the associated genetic markers (pseudo Quantitative Trait Nucleotides) to derive the kinship matrix, instead of all the markers. Whenever a pseudo QTN is correlated with the testing marker, it is excluded from those used to derive kinship. The method has higher statistical power than regular MLM |
Figure 1Scatter diagrams showing collinearity among the selected phenotypic traits for different years. Upward linear pattern indicates greater extent of positive correlation. Days to first flowering (DOF), days to fifty percent flowering (DFF), plant height (PH) and average number of seeds/pod (SPP).
Pooled Analysis of Variance (Pooled-ANOVA) for four traits evaluated in different environments.
| Source | PH | SPP | DOF | DFF | ||||
|---|---|---|---|---|---|---|---|---|
| d.f. | MSS | d.f. | MSS | d.f. | MSS | d.f. | MSS | |
| Year | 1 | 25,764.88*** | 1 | 0.07 | 2 | 12,345.08*** | 2 | 21,545.25*** |
| Genotype | 141 | 1157.32*** | 141 | 0.96*** | 141 | 907.85*** | 141 | 895.31*** |
| Genotype × Year | 282 | 359.84 | 282 | 0.29 | 282 | 91.81 | 282 | 97.91 |
| Broad Sense heritability ( | 0.5449 | 0.5897 | 0.6593 | 0.7094 | ||||
Figure 2Population structure analysis revealed three major clusters in the pigeonpea mini core collection.
Figure 3Manhattan plots for DOF (Left side) and DFF (Right side) for the year 2017–18. Top to bottom order is GLM, MLM, MLMM, CMLM, ECMLM, FarmCPU and SUPER.
List of selected SNPs further used for annotation analysis.
| S. no. | SNP id | Chromosome | Physical location | Year (trait) |
|---|---|---|---|---|
| 1 | 812678863:261: + | NW_017988637.1 | 1117 | 2017–18 (DOF) |
| 2 | 392468479:318: + | NW_017984071.1 | 155,917 | 2017–18 (DOF) |
| 3 | 725832748:272: + | NW_017985276.1 | 22,384 | 2017–18 (DOF) |
| 4 | 791831919:74: + | NW_017986933.1 | 11,488 | 2017–18 (DOF) and 2019–20 (DOF) |
| 5 | 142343707:25: + | NC_033807.1 | 9,366,686 | 2017–18 (DOF) |
| 6 | 760222832:55: + | NW_017985856.1 | 27,685 | 2018–19 (DOF), 2018–19 (DFF) and 2018–19 (PC1) |
| 7 | 812678807:41: + | NW_017988637.1 | 869 | 2018–19 (DOF) and 2019–20 (DOF), 2018–19 (DFF) and 2019–20 (DFF) |
| 8 | 376936577:87: − | NW_017984062.1 | 168,305 | 2018–19 (DOF) |
| 9 | 652249420:11: + | NW_017984675.1 | 23,436 | 2018–19 (DOF) |
| 10 | 709017214:7: − | NW_017985090.1 | 533 | 2018–19 (DOF) |
| 11 | 633271872:58: + | NW_017984581.1 | 74,012 | 2018–19 (DOF) |
| 12 | 392479221:11: + | NW_017984071.1 | 161,167 | 2019–20 (DOF), 2019–20 (DFF), and 2018–19 (PC1) |
| 13 | 164755426:8: + | NC_033809.1 | 6,932,346 | 2019–20 (DOF) and 2019–20 (DFF) |
| 14 | 781124881:96: − | NW_017986454.1 | 6679 | 2019–20 (DOF) |
| 15 | 812679326:250: − | NW_017988637.1 | 863 | 2017–18 (DFF), 2017–18 (PC1) |
| 16 | 35373484:284: + | NC_033805.1 | 6,362,335 | 2017–18 (DFF) |
| 17 | 785047004:88: + | NW_017986607.1 | 3977 | 2018–19 (DFF) |
| 18 | 330539130:289: + | NC_033814.1 | 21,328,862 | 2019–20 (DFF) |
| 19 | 21256769:324: + | NC_033804.1 | 14,401,967 | 2017–18 (PC1) |
| 20 | 740074801:308: − | NW_017985477.1 | 249 | 2017–18 (PC1) |
| 21 | 324910270:94: − | NC_033814.1 | 17,612,083 | 2018–19 (PC1) |
| 22 | 593701379:271: + | NW_017984430.1 | 87,462 | 2018–19 (PC1) |
Annotation of the SNPs showing marker trait association reveals role of auxin pathway genes in flower induction.
| S. no. | SNP id | Putative candidate regulators in 2 kb window of associated SNPs |
|---|---|---|
| 1. | 812678863:261: + | Transcript (TRINITY_DN34349_c0_g1_i9) annotated as cytochrome P450-like TATA box binding protein (cytochrome P450-like TBP) |
| 2. | 812679326:250: + | |
| 3. | 812678807:41: + | |
| 4. | 760222832:55: + | TRINITY_DN35027_c3_g2_i12 was annotated as putative rRNA methyltransferase |
| 5. | 785047004:88: + | TRINITY_DN34404_c4_g1_i14 an auxin response factor |
| 6. | 633271872:58: + | Genic SNP: i |
| 7. | 593701379:271: + | TRINITY_DN32710_c2_g1_i2 annotated as F-box protein SKIP23 |
| 8. | 376936577:87: − | TRINITY_DN34296_c0_g1_i10 a serine/threonine protein phosphatase 2A |
| 9. | 834373094:36: − | GENIC SNP: ribosomal protein S2 |
| 10. | 834384838:29: − | GENIC SNP: cytochrome P450 b559 alpha subunit |
| 11. | 164755426:80: − | TRINITY_DN34186_c2_g3_i4; annotated as Cytochrome P450 89A2 |
| 12. | 35373484:284: + | TRINITY_DN33874_c0_g1_i3 annotated as U-box domain-containing protein and TRINITY_DN34453_c0_g3_i10 annotated as chromatin structure remodelling complex protein BSH |
Figure 4Quantile–Quantile (Q–Q) plots based on GWAS results from different association models for DOF in the year 2017–18. Model representations are GLM (a), MLM (b), MLMM (c), CMLM (d), ECMLM (e), FarmCPU (f) and SUPER (g). x axis plots expected − log10(p) values and y axis plots observed − log10(p) values respectively.
Figure 5Box plot of observed traits vs. predicted flowering days through genomic prediction using RRBLUP method across different year’s data. The middle line in each box is the median value. Model accuracy (MA) is provided by setting 80:20 training and testing data sets.
Figure 6Expression pattern of the genes found in vicinity of associated SNPs which might have an important role in flowering. Vegetative leaves (VL), reproductive leaves (RL), shoot apical meristem (SAM) and reproductive buds (Bud).