Literature DB >> 35899191

Genome-wide Association Study for Yield and Yield-Related Traits in Diverse Blackgram Panel (Vigna mungo L. Hepper) Reveals Novel Putative Alleles for Future Breeding Programs.

Lovejit Singh1, Guriqbal Singh Dhillon2, Sarabjit Kaur2, Sandeep Kaur Dhaliwal1, Amandeep Kaur2, Palvi Malik2, Ashok Kumar3, Ranjit Kaur Gill1, Satinder Kaur2.   

Abstract

Blackgram (Vigna mungo L. Hepper) is an important tropical and sub-tropical short-duration legume that is rich in dietary protein and micronutrients. Producing high-yielding blackgram varieties is hampered by insufficient genetic variability, absence of suitable ideotypes, low harvest index and susceptibility to biotic-abiotic stresses. Seed yield, a complex trait resulting from the expression and interaction of multiple genes, necessitates the evaluation of diverse germplasm for the identification of novel yield contributing traits. Henceforth, a panel of 100 blackgram genotypes was evaluated at two locations (Ludhiana and Gurdaspur) across two seasons (Spring 2019 and Spring 2020) for 14 different yield related traits. A wide range of variability, high broad-sense heritability and a high correlation of grain yield were observed for 12 out of 14 traits studied among all environments. Investigation of population structure in the panel using a set of 4,623 filtered SNPs led to identification of four sub-populations based on ad-hoc delta K and Cross entropy value. Using Farm CPU model and Mixed Linear Model algorithms, a total of 49 significant SNP associations representing 42 QTLs were identified. Allelic effects were found to be statistically significant at 37 out of 42 QTLs and 50 known candidate genes were identified in 24 of QTLs.
Copyright © 2022 Singh, Dhillon, Kaur, Dhaliwal, Kaur, Malik, Kumar, Gill and Kaur.

Entities:  

Keywords:  GWAS; MTA; QTL; Vigna mungo; allelic effects; heritability; yield

Year:  2022        PMID: 35899191      PMCID: PMC9310006          DOI: 10.3389/fgene.2022.849016

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.772


Introduction

Blackgram (Vigna mungo L. Hepper), a diploid (2n = 2X = 22), short duration legume crop of family Leguminaceae, was domesticated in Northern South Asia from progenitor Vigna mungo var. silvestris (Lukoki et al., 1980). It is cultivated throughout Southeast Asia because of its multiple benefits to soil and human health. It is nutritionally important crop with about 25% protein—nearly three times that of cereals, 60% carbohydrates, 1.3% fat (Das et al., 2021) as well as important vitamins and minerals (Ghafoor et al., 2001), making it a balanced vegan diet when supplemented with cereals. The ability of its roots to fix atmospheric nitrogen (42 kg/ha/year) (Dey et al., 2020) contribute towards soil health while deep-roots prevents soil erosion by binding soil particles. Short duration of blackgram makes it suitable for intercropping with corn or millet or rotation with cereals like rice or wheat (Muthusamy and Pandiyan, 2018), adding another benefit for farmer. India is the largest producer and consumer of blackgram as it is colossally grown in almost every agro-climatic zone (Raizada and Souframanien, 2021). However, the crop accounts for only 13% of the total area (56.02 lakh hectares) and 10% of total pulses production (30.60 lakh tons) in India (Muthusamy and Pandiyan, 2018) with productivity of 5.46 quintals per hectare (Singh et al., 2020). Moreover, around 2-3 million tons of pulses are imported annually to fulfill the domestic consumption requirement. Despite the economic and nutritional value of black Gram, the sluggish growth in production is due to lack of commercialized market setup, multiple biotic stresses (mosaic, seedling blight, leaf blight, leaf crinkle virus, leaf folder, Bihar hairy caterpillar, whitefly) and abiotic stresses (drought, salinity, waterlogging). Photo- and thermo-sensitivity of crop with indeterminate habit of flowering and fruiting leads to competition of assimilates between vegetative and reproductive sinks throughout the growth period causing low harvest index and poor grain yield (Somta et al., 2019; Sahu et al., 2020). The expansion of the crop is constrained by lack of genetic and genomic resources along with limited diversity (Chaitieng et al., 2006; Gupta et al., 2008; Somta et al., 2019 A systematic program of identification, genetic and genomic characterization and utilization of diverse germplasm is required for successfully decoding the genetic architecture of agronomically important traits for blackgram improvement. Genome and transcriptome sequencing (Pootakham et al., 2021; Raizada and Souframanien, 2021), developing dense molecular linkage maps, and using high-throughput genotyping techniques can widen the horizons improvement of this crop. Genotyping–by- sequencing (GBS) is one of the cost-efficient genomic techniques which includes multiplex sequencing of a subset of the genome and generates numerous SNP markers for linkage mapping (Varshney et al., 2009; Elshire et al., 2011; Noble et al., 2018). Genome wide association studies (GWAS) coupled with GBS have been promising tool for estimating the genetic diversity in different crops of soybean (Hwang et al., 2014), chickpea (Plekhanova et al., 2017), cowpea (Xu et al., 2017), pigeonpea (Varshney et al., 2009), and mungbean (Sokolkova et al., 2020) providing insights to underlying genetic architecture of complex traits (Lorenz et al., 2010; Scherer and Christensen, 2016). In the present study we performed the GWAS on diverse blackgram germplasm panel to assess its genetic diversity and population structure, and to identify MTA (Marker trait associations) involved in yield and yield related traits using GBS. This study provides a unique genomic resource for the genetic dissection of important traits aimed at blackgram improvement.

Materials and Methods

Plant Material and Field Trials

A panel consisting of 100 blackgram (V. mungo) genotypes was used for the present study. These included 54 genotypes procured from National Bureau of Plant Genetic Resources (NBPGR), New Delhi, India, while, the remaining genotypes were from germplasm collection of Punjab Agricultural University (PAU), Ludhiana, India (Supplementary Table S1). The genotypes were evaluated using a simple lattice design (10 × 10), in two replications at two locations (Ludhiana and Gurdaspur) across two seasons (Spring 2019 and Spring 2020). The seeds were sown in a 2 m long row with 10 cm plant to plant spacing and 30 cm row to row spacing. Ludhiana (30.9°N, 75.85°E) lies in a sub-tropical zone characterized by relatively high temperatures and low precipitation while Gurdaspur region (32.02°N, 75.24°E) is characterized by lower temperature and high humidity coupled with abundant rainfall. The weekly mean temperature, relative humidity and rainfall for Ludhiana and Gurdaspur have been given in Supplementary Figure S1.

Phenotypic Evaluation and Statistical Analysis

Data was collected in three replicates from five randomly selected plants of each genotype in each replicate for plant height at 90% pod maturity (PHM), branches per plant (BpP), nodes per plant (NpP), internodal length (IL = PHM/NpP), clusters per plant (CpP), pods per plant (PpP), pod length (PL), seeds per pod (SpP), biological yield per plant (BYpP), yield per plant (YpP), harvest index (HI) and hundred seed weight (HSW). Days to 50% flowering (DtF) and days to 90% pod maturity (DtM) were recorded based on the entire plot. For phenotypic analysis, environments were designated as E1 (Ludhiana, year 2019), E2 (Ludhiana, year 2020), E3 (Ludhiana combined for years 2019 & 2020), E4 (Gurdaspur, year 2019), E5 (Gurdaspur, year 2020) and E6 (Gurdaspur combined for years 2019 & 2020). Due to the significant differences between two selected locations, combined analysis over two selected locations has not been done. Descriptive statistical analysis across all the environments was done using Meta-R v6.0 software (Alvarado et al., 2020). Statistical analysis for individual and multi-environment was performed using “lme4” (Bates et al., 2015) and “Residual Maximum Likelihood (REML)” (Laird and Ware, 1982) methods. The linear model for analyzing individual environments for simple lattice design was done using the formula: where Yijk and Yijkl represent the trait of interest, μ is the overall mean effect, Repi is the effect of ith replicate, Blockj (Repi) is the effect of jth incomplete block within the ith replicate, Genk is the effect of the kth genotype and εijk is the error effect associated with the ith replication, jth incomplete block and kth genotype, assumed to be normally distributed with zero mean and variance σ2ε (Alvarado et al., 2020). Yearl and Genl x Yeari are the effects of the lth year and Genotype x Year (G x Y) interactions represented by effect on the ith genotype in the lth year in the linear model for integrated analysis for multi-environment (across the years). The resulting analysis produced the adjusted trait phenotypic values as BLUPs (Best linear unbiased predictions) within and across environments. The genotypes are considered random effects in the BLUPs model, minimizing/eliminating the effect of the environment from phenotypic effects. The broad-sense heritability of trait at individual environment and across environments was calculated as Where and are the genotype, and error variance components, respectively, is genotype by environment interaction variance, n env is the number of environments, and n reps is the number of replicates. The estimated broad-sense heritability provides valuable insight into the breeding program’s quality, with all effects considered as random effects (Alvarado et al., 2020). The LSD at 5% level of significance was calculated as where t is the cumulative Student’s t-distribution, 0.05 is the selected α level (5%), dfErr is the degrees of freedom for error in the linear mixed model, and ASED is the average standard error of the differences of the means. The coefficient of variation (%) was calculated as: where MSE is the mean squared error, and Grand mean is the mean of the trait. BLUPs for recorded traits in all environments were plotted using the ggplot2 v3.3.2 package (Wickham, 2016) in R v4.0.3 (Core R Team 2019).

DNA Isolation and Genotyping

Total genomic DNA was isolated from fresh leaves of single plant of each genotype using modified cetyl trimethyl ammonium bromide (CTAB) method (Saghai-Maroof et al., 1984). Genotyping-by-sequencing (GBS) of DNA samples was outsourced to Novogene Co. Ltd., China. The GBS library was prepared using double digest restriction enzyme DNA (ddRAD) and sequencing was done with Illumina HiSeq 2000. The raw FASTQ files (obtained from Illumina pipeline CASAVA v1.8.2) were processed for quality control and filtered using Trimmomaticv0.39 software (Bolger et al., 2014) Reads with no matching barcode or cut sites overhangs, having more than 10% unidentified bases (N), with adapter dimers, with lower quality reads, and with Qphred score less than 34, were excluded from further analysis. High-quality paired end reads were aligned using Burrows-Wheeler Aligner (BWA) (Li et al., 2009) to Vigna radiata reference genome (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/741/045/GCF_000741045.1_Vradiata ver6). A quality threshold score of 10 was applied to validate SNP loci (Wu et al., 2020). Sorted binary alignment map (BAM) files were converted to variant caller format (vcf) files using mpileup function of bcftools v1.10.2 in samtools v1.10 software package (Li et al., 2009) with minimum read depth ≥4. Haplotype map (hapmap) format files were generated from vcf files using Tassel v5.0 software (Bradbury et al., 2007). After SNP calling, raw hapmap file was filtered by removing indels, minor allele frequency (maf) > 0.05, genotype missing data less than 10% and heterozygosity less than 30% (Torkamaneh et al., 2020). The generated raw reads were submitted to the NCBI sequence read archives (SRA) with accession number PRJNA802066.

Population Structure and Phylogeny Analysis

Bayesian-based approach in STRUCTURE v2.3.4 software (Pritchard et al., 2000) using a burn-in period of 10,000 and Markov chain Monte Carlo iterations of 100,000 for k ranging from 1 to 8 was done to investigate the population structure of germplasm. Evanno’s method (Evanno et al., 2005) and cross-entropy method (Chan and Kroese, 2012) were used to obtain an optimum number of sub-populations. Filtered SNPs were used to calculate genetic distance among genotypes and the phylogenetic tree was constructed using neighbour-joining tree in TASSEL v5.0 (Bradbury et al., 2007) and visualized in iTOL v5 (Letunic and Bork, 2016).

Genome-Wide Association Analysis

Filtered SNPs and BLUPs were used to perform association analysis using the Mixed Linear Model (MLM) (Zhang et al., 2010) and FarmCPU algorithms (Liu X. et al., 2016) with GAPIT v3 (Lipka et al., 2012) in R v4.0.3 (Core R Team 2019). The FarmCPU method was used to control false positives and false negatives by iteratively using a fixed-effect model (FEM) and random effect model (REM), testing marker associations from FEM as covariates in REM. p-value of 0.001 or -log10 p-value of 3.00 was used as a threshold to determine the significance of association (Ikram et al., 2020). The marker-trait associations (MTAs) identified for the same trait within a region of 100bp was considered as part of one QTL. The phenotypic variation explained (PVE) by each significant SNP was calculated as the squared correlation between the phenotype and genotype of the associated SNP (Bhandari et al., 2020). MTAs were considered stable QTLs if they were identified across all the environments of the respective locations with -log10 p-value ≥ 3 and PVEW ≥10%. t-test based determination of significance based on phenotypic data in two allelic groups was estimated at p-value ≤ 0.05 (Xiong et al., 2019).

Postulation of Candidate Genes and KEGG Pathway Analysis

Candidate genes were postulated using the functional gene annotations of Vigna radiata reference genome (www.ncbi.nlm.nih.gov/assembly/GCF_000741045.1). A window of 200-kb region, upstream and downstream of the associated SNPs was searched to identify already reported candidate genes related to different traits studied (Park et al., 2019). Selected candidate genes were subjected to Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis using Omics box 2.0.36 combined pathway analysis plugin.

Results

Phenotypic Evaluation

All the recorded traits showed high variability across the six given environments (Figure 1). All the traits followed normal distribution if considered in environments of each location however if compared across the two locations, the distribution of traits showed skewness. All the genotypes performed better in Gurdaspur than in Ludhiana. The negative skewness of trait data for the Gurdaspur location indicated an overall better performance of genotypes there, as compared to Ludhiana (Figure 1; Table 1). Since BLUPs accounted for the variation across the years for individual locations, the phenotypic evaluation is explained only for E3 and E6 to explain the overall variation at a particular location, for ease of understanding the variations (Table 1; Supplementary Table S2). DtM, and PL at Gurdaspur, and HI at Ludhiana were negatively skewed while all the other traits were either positively skewed or normally distributed (Figure 1). Genotypes sown in Gurdaspur showed delayed flowering (E6: 40.65–49.27 days) as compared to Ludhiana (E3: 39.42–47.08 days). The variation in DtM at Gurdaspur (E6: 68.88–90.73 days) were higher than at Ludhiana (E3:63.62–73.33 ± 2.3 days) with delayed maturation at Gurdaspur. The observed range of CpP and PpP was 5.59–17.08, and 12.07–40.43, respectively, at Ludhiana (E3) whereas 7.62–23.35, and 17–50.28, respectively, at Gurdaspur (E6), indicating better plant phenology at Gurdaspur. The range of PL was slightly more in Gurdaspur (E6: 03.85–05.16 cm) than Ludhiana (E3: 03.66–04.46 cm), eventually leading to higher SpP in Gurdaspur (E6: 05.63–07.68) than Ludhiana (E3: 05.45–07.08). Larger variability along with higher values were observed for BYpP at Gurdaspur (E6: 14.94–45.85 g) relative to Ludhiana (E3: 13.57–30.83 g). At Ludhiana (E3), YpP ranged from 2.45 to 8.44 g whereas it varied from 2.83 to 9.83 g at Gurdaspur (E6). HSW ranged from 3.71 to 5.26 g at Ludhiana (E3) and 4.12–5.33 g at Gurdaspur (E6). The five best performing genotypes representing best of all the traits at Ludhiana and Gurdaspur are presented in Supplementary Table S3.
FIGURE 1

Distribution of 14 characters measured for 100 blackgram germplasm lines across all environments- Ludhiana 2019 (E1); Ludhiana 2020 (E2); Ludhiana combined across years (E3); Gurdaspur 2019 (E4); Gurdaspur 2020 (E5) and Gurdaspur combined across years (E6).

TABLE 1

Phenotypic evaluation of 100 V. mungo genotypes for 14 traits recorded at two locations of Ludhiana and Gurdaspur as BLUPs of 2 years.

TraitEnvMean ± SDRangeGeno.SignLSDCVH2
DtFE341.81 ± 1.339.42–47.082E-2801.9703.610.85
E644.57 ± 1.740.65–49.271E-2601.8102.640.91
DtME368.22 ± 2.363.62–73.332E-3902.5602.780.92
E682.29 ± 4.768.88–90.731E-3503.7802.710.94
PHME320.93 ± 5.713.70–59.373E-8202.8509.770.99
E629.90 ± 10.317.07–94.341E-7305.6612.600.98
BpPE302.98 ± 0.901.33–05.052E-5405.7607.280.95
E603.49 ± 0.701.82–05.862E-6304.4404.450.97
NpPE309.84 ± 1.807.03–16.413E-4101.5912.000.92
E612.74 ± 2.307.98–20.738E-4302.1411.570.93
ILE302.18 ± 0.701.14–06.698E-7100.3712.190.98
E602.35 ± 0.501.21–04.784E-4700.4513.370.94
CpPE309.19 ± 1.705.59–17.082E-4701.5312.280.93
E613.23 ± 3.207.62–23.356E-4602.4813.850.93
PpPE323.89 ± 5.412.07–40.437E-4604.8014.450.93
E627.43 ± 6.017.00–50.287E-4604.9313.320.92
PLE304.19 ± 0.103.66–04.469E-1600.4805.020.81
E604.53 ± 0.303.85–05.164E-5500.3204.940.95
SpPE306.30 ± 0.405.45–07.083E-1600.6806.650.82
E606.68 ± 0.505.63–07.686E-4200.5706.220.92
BYpPE321.08 ± 4.013.57–30.834E-5203.5411.480.96
E625.82 ± 5.314.94–45.852E-6403.6009.800.97
YpPE304.45 ± 1.302.45–08.449E-6200.7211.450.96
E605.28 ± 1.402.83–09.835E-5700.8211.350.96
HSWE304.53 ± 0.303.71–05.265E-4800.2904.580.94
E604.56 ± 0.304.12–05.334E-3800.3004.610.92
HIE321.53 ± 4.609.49–30.552E-5202.9909.820.95
E621.08 ± 4.311.92–29.624E-4803.3211.440.94

SD -standard deviation, LSD- least significant difference, CV- coefficient of variation, H2—Broad sense heritability, Geno. Sign.-Genotype Significance.

Ludhiana BLUPs, 2 years (E3); and Gurdaspur BLUPs, 2 years (E6).

Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Branches per plant (BpP), Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Pod length (PL), Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI).

Distribution of 14 characters measured for 100 blackgram germplasm lines across all environments- Ludhiana 2019 (E1); Ludhiana 2020 (E2); Ludhiana combined across years (E3); Gurdaspur 2019 (E4); Gurdaspur 2020 (E5) and Gurdaspur combined across years (E6). Phenotypic evaluation of 100 V. mungo genotypes for 14 traits recorded at two locations of Ludhiana and Gurdaspur as BLUPs of 2 years. SD -standard deviation, LSD- least significant difference, CV- coefficient of variation, H2—Broad sense heritability, Geno. Sign.-Genotype Significance. Ludhiana BLUPs, 2 years (E3); and Gurdaspur BLUPs, 2 years (E6). Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Branches per plant (BpP), Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Pod length (PL), Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI). Higher broad sense heritability (H2) for all traits under all environments suggested strong genetic control (Table 1). Highest H2 was observed for PHM (0.99 and 0.98), whereas lowest for DtF (0.61 and 0.91) both at Ludhiana and Gurdaspur. CpP (E3—0.93; E6—0.93), PpP (E3—0.93; E6—0.92), PL (E3—0.81; E6—0.95), SpP (E3—0.82; E6—0.92), BYpP (E3—0.96; E6—0.97), YpP (E3—0.96; E6—0.96), HSW (E3—0.94; E6—0.92) and HI (E3—0.95; E4 - 0.94) also recorded high H2.

SNP Calling

A total of 35, 49,948 raw physically mapped SNPs were obtained through GBS using the genome sequence of Vigna radiata as reference (Kang et al., 2014). Of these SNPs, 26, 39,464 were mapped onto 11 chromosomes while 9, 10,484 were mapped to non - chromosomal contigs. After filtering only 6,967 SNPs were retained of which 2,344 SNPs mapped to non-chromosomal contigs were removed and 4,623 on chromosomal regions used for further analysis (Supplementary Table S4, Figure 2). The highest density of SNPs was observed in Chr 4 (18.60 markers per Mb), whereas the lowest density was observed in Chr 3 (4.86) with an average density of 15.23 markers per Mb.
FIGURE 2

Physical map of 4,623 SNPs identified by GBS of 100 blackgram germplasm lines showing all the 11 chromosomes. Physical position is also shown in million of base pairs (Mb) based. SNP density is also provided in colours Dark Green (1) to Red (127) to reveal the distribution among chromosomes.

Physical map of 4,623 SNPs identified by GBS of 100 blackgram germplasm lines showing all the 11 chromosomes. Physical position is also shown in million of base pairs (Mb) based. SNP density is also provided in colours Dark Green (1) to Red (127) to reveal the distribution among chromosomes.

Population Structure and Phylogenetic Analysis

Four sub-populations (K = 4) were depicted by both the methods of second-order rate of change of the likelihood (Figure 3A) as well as cross-entropy value (Figure 3B). The graphical representation of four sub-populations against the admixture coefficient showed that more than 50% of the genotypes contributed to one sub population (Figure 3C). The allele frequency divergence was highest between sub-population two and four and the lowest between 3 and 4 (Supplementary Table S5). Average distances or expected heterozygosity of individuals within the same cluster were 0.3780 (sub-population 1), 0.2719 (sub-population 2), 0.0551 (sub-population 3) and 0.0250 (sub-population 4) (Supplementary Table S5). The mean Fst value of sub-populations 1, 2, 3 and four were 0.0203, 0.4537, 0.8040 and 0.9098, respectively (Supplementary Table S5). Multivariate analysis also classified the germplasm into four clusters affirming the results of structure analysis. The germplasm’s phylogenetic structure using an unweighted neighbour-joining tree showed the distribution of the different genotypes among the sub-populations (Figure 3D).
FIGURE 3

Population structure of 100 blackgram germplasm lines using 4623 SNP markers (A) determination of number of sub-populations by DeltaK method by Evanno et al., 2005 (B) determination of number of sub-populations using cross entropy value method by Chan and Kroese, 2012 (C) population structure at k = 4 (D) phylogenetic analysis of 100 blackgram germplasm lines depicting four sub-populations.

Population structure of 100 blackgram germplasm lines using 4623 SNP markers (A) determination of number of sub-populations by DeltaK method by Evanno et al., 2005 (B) determination of number of sub-populations using cross entropy value method by Chan and Kroese, 2012 (C) population structure at k = 4 (D) phylogenetic analysis of 100 blackgram germplasm lines depicting four sub-populations.

Genome Wide Association Study

A total of 49 stable MTAs contributing to 42 QTLs were found to be significantly associated (-log10 p-value ≥ 3, PVE >10%) with 12 of the 14 traits studied, across the three environments of the each locations (Figure 4; Table 2; Supplementary Table S6). GWAS conducted using FarmCPU and MLM algorithms identified 31 and 27 QTLs, respectively of which 16 QTLs were common between two methods. However, only two MTAs Q. PHM.4 & Q. PHM.8 were significantly associated with PHM across both the locations. The negative log10 p-value of QTLs ranged from 3.009 to 5.112, whereas the PVE ranged from 10.06–24.26%. Among 42 QTLs identified in the study, four QTLs were associated with YpP, eight with PHM, one with BpP, two with NpP, four with IL, five with CpP, one with PL, four with SpP two with BYpP, seven with HI and three with HSW. However, no significant associations were obtained for DtM and PpP. SNPs S8.1.13991269 and S8.1.19533014, on chromosome 8, were found to be associated with two traits each i.e., PHM, IL (Q. PHM.8 and Q. IL.8) and YpP, HI (Q.YpP.8 and Q. HI.8.2), respectively (Figure 5).
FIGURE 4

Marker trait associations for different traits detected by genome wide association study across two different locations Ludhiana (A,B) and Gurdaspur (C,D) as manhattan plots and QQ-plots using 4623 SNP markers. #Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI).

TABLE 2

MTAs identified through Genome wide association study for 100 Vigna mungo germplasm lines in all the enviornments of two locations Ludhiana (L) or Gurdaspur (G) using FarmCPU and MLM algorithms.

MTATraitLocMethodSNPChrPosMb-log10PMAFEffectPVE
Q.DtF.10 DtFLFarmCPUS10.1.9527186109.52724.92080.071.6817.947
DtFLMLMS10.1.9527186109.52724.14570.071.548116.803
Q.PHM.3.1 PHMGFarmCPUS3.1.799314737.99313.65660.115-6.332116.506
Q.PHM.3.2 PHMLFarmCPUS3.1.821959438.21963.23370.15-3.066719.919
Q.PHM.4 PHMLFarmCPUS4.1.125980041.25983.89120.0657.00917.084
PHMGFarmCPUS4.1.125980041.25983.07610.06510.829916.46
PHMGMLMS4.1.125980041.25983.12260.06511.649610.076
Q.PHM.6.1 PHMGFarmCPUS6.1.23358875623.35893.47450.1058.557510.469
Q.PHM.6.2 PHMLFarmCPUS6.1.25029431625.02943.46640.205-3.338214.698
Q.PHM.8 PHMLFarmCPUS8.1.13991269813.99134.20620.0557.040217.697
PHMGFarmCPUS8.1.13991269813.99134.15990.05512.265211.234
PHMLMLMS8.1.13991269813.99133.39450.0556.21610.728
PHMGMLMS8.1.13991269813.99133.83560.05511.930513.02
Q.PHM.11.1 PHMGFarmCPUS11.1.163137481116.31373.13530.195.113815.263
Q.PHM.11.2 PHMLFarmCPUS11.1.168981331116.89813.0550.174.475415.99
PHMLFarmCPUS11.1.168981691116.89823.02040.1654.433516.003
PHMLFarmCPUS11.1.168981701116.89823.02040.1654.433516.003
PHMLFarmCPUS11.1.168982251116.89823.44430.1554.672915.565
Q.BpP.6 BpPGMLMS6.1.22219189622.21923.76150.19-0.600314.341
Q.NpP.4 NpPGFarmCPUS4.1.134301941.3433.63030.0652.270515.003
NpPGMLMS4.1.134301941.3433.48220.0652.285110.85
Q.NpP.6 NpPLFarmCPUS6.1.32970231632.97023.58320.0951.755918.857
NpPLFarmCPUS6.1.32970252632.97033.82880.0552.26719.048
NpPLFarmCPUS6.1.32970258632.97034.42950.0852.039622.292
NpPLMLMS6.1.32970258632.97033.43660.0851.855512.377
Q.IL.1.1 ILGFarmCPUS1.1.263631312.63634.40780.1150.487114.843
ILGMLMS1.1.263631312.63633.79440.1150.487114.693
Q.IL.1.2 ILLFarmCPUS1.1.35117546135.11753.3560.060.703514.089
Q.IL.5 ILLMLMS5.1.15033908515.03393.47910.080.814613.589
Q.IL.8 ILLFarmCPUS8.1.13991269813.99134.36150.0550.837214.825
ILLMLMS8.1.13991269813.99133.66020.0550.825414.476
Q.CpP.1 CpPGFarmCPUS1.1.25545554125.54564.17850.1-2.150724.26
Q.CpP.4 CpPGMLMS4.1.77487240.77493.57680.07-3.113110.448
Q.CpP.7 CpPLFarmCPUS7.1.36781633736.78164.45720.061.996215.922
CpPLMLMS7.1.36781633736.78163.85410.061.996214.712
Q.CpP.9 CpPLFarmCPUS9.1.982325099.82333.27170.0651.657114.218
Q.CpP.10 CpPLFarmCPUS10.1.1346198101.34623.57920.1850.747712.487
CpPLMLMS10.1.1346198101.34623.19820.1850.747711.66
CpPLFarmCPUS10.1.1346205101.34624.23810.061.931416.817
CpPLMLMS10.1.1346205101.34623.69480.061.931413.959
CpPLFarmCPUS10.1.1346249101.34623.30040.0951.405512.019
Q.PL.1 PLLMLMS1.1.30381382130.38143.25220.075-0.145811.747
Q.SpP.3 SpPLFarmCPUS3.1.876793138.76794.12960.090.439715.931
SpPLMLMS3.1.876793138.76793.76250.090.458915.654
Q.SpP.6 SpPGMLMS6.1.29235889629.23593.5010.13-0.423414.263
Q.SpP.8.1 SpPLMLMS8.1.13990816813.99083.08080.165-0.341312.201
Q.SpP.8.2 SpPGMLMS8.1.16767849816.76783.12830.1450.378212.395
Q.BYpP.8 BYpPGFarmCPUS8.1.33339056833.33913.48480.153.794513.775
Q.BYpP.9 BYpPLFarmCPUS9.1.533525195.33533.31480.175-2.079513.007
Q.YpP.4.1 YpPLMLMS4.1.961570549.61573.23640.071.393812.669
Q.YpP.4.2 YpPGFarmCPUS4.1.11912711411.91273.60250.0951.194512.541
YpPGMLMS4.1.11912711411.91273.45160.0951.21411.667
Q.YpP.5 YpPGFarmCPUS5.1.35989113535.98913.70020.08-1.260511.018
YpPGMLMS5.1.35989113535.98913.30010.08-1.236511.032
Q.YpP.8 YpPLFarmCPUS8.1.19533014819.5333.75960.17-1.029810.751
YpPLMLMS8.1.19533014819.5333.34170.17-1.025713.184
Q.HI.1 HILFarmCPUS1.1.854978818.54984.23880.095-4.783811.425
HILMLMS1.1.854978818.54983.80810.095-4.807915.933
Q.HI.5 HIGFarmCPUS5.1.29623709529.62373.50940.145-3.466611.234
Q.HI.6 HIGMLMS6.1.16902154616.90223.56940.084.565614.564
Q.HI.7 HIGFarmCPUS7.1.49416375749.41643.62770.16-3.440118.045
Q.HI.8.1 HIGMLMS8.1.418121584.18123.37580.0954.519113.586
Q.HI.8.2 HILFarmCPUS8.1.19533014819.5333.70940.17-3.588812.072
HILMLMS8.1.19533014819.5333.26180.17-3.545513.138
Q.HI.10 HIGMLMS10.1.2365261102.36533.14610.085-4.260212.443
Q.HSW.6 HSWLFarmCPUS6.1.870798568.7083.43740.065-0.347210.822
Q.HSW.7 HSWGFarmCPUS7.1.20599131720.59914.37880.080.265412.201
HSWGMLMS7.1.20599131720.59913.58990.080.245512.339
Q.HSW.10 HSWLFarmCPUS10.1.209525501020.95263.12350.1350.253611.731

Marker trait associations (MTAs, Environment (Env), Chromosome (Chr), Position in million basepairs (PosMb), minor allele frequency (MAF), Phenotypic variation explained in percentage (PVE).

Days to 50% flowering (DtF); Plant height at 90% pod maturity (PHM); Branches per plant (BpP), Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pod length (PL), Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); Harvest index (HI) and Hundred seed weight (HSW).

FIGURE 5

Physical map of QTLs of different traits detected by GWAS in the present study. Circles represent detection of across different environments at Ludhiana and diamonds represent detection of across different environments at Gurdaspur ##Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI).

Marker trait associations for different traits detected by genome wide association study across two different locations Ludhiana (A,B) and Gurdaspur (C,D) as manhattan plots and QQ-plots using 4623 SNP markers. #Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI). MTAs identified through Genome wide association study for 100 Vigna mungo germplasm lines in all the enviornments of two locations Ludhiana (L) or Gurdaspur (G) using FarmCPU and MLM algorithms. Marker trait associations (MTAs, Environment (Env), Chromosome (Chr), Position in million basepairs (PosMb), minor allele frequency (MAF), Phenotypic variation explained in percentage (PVE). Days to 50% flowering (DtF); Plant height at 90% pod maturity (PHM); Branches per plant (BpP), Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pod length (PL), Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); Harvest index (HI) and Hundred seed weight (HSW). Physical map of QTLs of different traits detected by GWAS in the present study. Circles represent detection of across different environments at Ludhiana and diamonds represent detection of across different environments at Gurdaspur ##Days to 50% flowering (DtF); Days to 90% pod maturity (DtM); Plant height at 90% pod maturity (PHM); Nodes per plant (NpP); Internodal length (IL), Clusters per plant (CpP); Pods per plant (PpP); Seeds per pod (SpP); Biological yield per plant (BYpP); Yield per plant (YpP); hundred seed weight (HSW) and Harvest index (HI). A few genomic regions harbored multiple significantly associated SNPs, as Q. PHM.11.2 had four SNPs (-log10 p-value 3.0114–3.5826 and PVE 15.193–16.439%), Q. NpP.6 (-log10 p-value 3.15–4.77, PVE 15.69–23.89) and Q. CpP.10 both had three SNPs (-log10 p-value 3.11–4.66 and PVE 11.29–18.07%), whereas other QTLs were defined by a single SNP. Two QTLs for plant height at 90% pod maturity Q. PHM.4 and Q. PHM.8 showed consistent effect across all environments of both the locations. Among four QTLs identified for YpP, two (QTLs Q. YpP.4.1 and Q. YpP.8) were detected at Ludhiana, whereas Q. YpP.4.2 and Q. YpP.5 were detected at Gurdaspur. Three QTLs for HSW were detected, two for Ludhiana and one for Gurdaspur (Q.HSW.7) which was detected by both the FarmCPU (-log10 p-value 4.38 and PVE 12.2%) and MLM (-log10 p-value 3.59 and PVE 12.34%) algorithms. Out of seven QTLs for HI, two were detected for Ludhiana (Q. HI.1 and Q. HI.8.2) and five were detected for Gurdaspur (Q. HI.5, Q. HI.6, Q. HI.7, Q. HI.8.1 and Q. HI.10). The phenotypic variation explained (PVE) by significant SNPs under FarmCPU method ranged from 10.751% (Q.YpP.8) to 22.292% (Q.NpP.6) at Ludhiana and 10.47% (Q. PHM.6.1) to 24.26% (Q. CpP.1) at Gurdaspur, whereas under MLM method, it ranged from 10.73% (Q.SpP.8.1) to 16.80% (Q.PHM.8) at Ludhiana and 10.08% (Q.PHM.4) to 14.69% (Q.CpP.4) at Gurdaspur.

Allelic Effects

Out of 49 SNPs/MTAs, 44 SNPs representing 37 QTLs showed significant difference, whereas five SNPs representing five QTLs were found to be non-significant (using the t-test statistic) for the respective traits (Supplementary Table S7, Supplementary Figures S2A–I. A total of 32, 32, 32, 28, 28 and 28 SNPs were statistically and significantly different for respective traits under E1, E2, E3, E4, E5 and E6, respectively. A total of 15 SNPs associated with 12 QTLs namely, Q. IL.1.1, Q. IL.8, Q. NpP.6, Q. PHM.11.1, Q. PHM.11.2, Q. PHM.3.1, Q. PHM.3.2, Q. PHM.4, Q. PHM.6.1, Q. PHM.6.2, Q. PHM.8, and Q. SpP.8.2 were found to be significantly different in all of the six environments studied. The 28 SNP associations were significant in three of the six environments. The superior and inferior alleles along with allele count (%) and significantly different mean values observed at Ludhiana (E3) and Gurdaspur (E6) and are presented in Supplementary Table S8. The allele with SNP GG associated with Q. SpP.8.2 was found to be superior at Ludhiana, whereas alternative allele was found to be superior at Gurdaspur. Allelic effects of YpP, HI and HSW are presented in Figure 6. The significant phenotypic differences produced by superior and inferior alleles for YpP were 1.3 g at Ludhiana (E3) by Q. YpP.4.1; 1.38 g at Gurdaspur (E6) by Q. YpP.4.2; 1.26 g at Gurdaspur (E6) by Q. YpP.5 and 0.94 g at Ludhiana (E3) by Q. YpP.8; whereas significant phenotypic differences produced by superior and inferior alleles for HI were 3.91% at Ludhiana (E3) by Q. HI.1; 3.19% at Gurdaspur (E6) by Q. HI.5; 3.89% at Gurdaspur (E6) by Q. HI.6; 4.38% at Gurdaspur (E6) by Q. HI.7.1; 3.26% at Gurdaspur (E6) by Q. HI.8.1; 3.43% at Ludhiana (E3) by Q. HI.8.2 and 3.66% at Gurdaspur (E6) by Q. HI.10. Likewise, significant phenotypic differences produced by superior and inferior alleles for HSW were 0.32 g at Ludhiana (E3) by Q. HSW.6, 0.25 g at Gurdaspur (E6) by Q. HSW.7 and 0.24 g at Ludhiana (E3) by Q. HSW.10. A total of six genotypes (MASH218, IC274597, IC370938, IC557431, KUG673 and IC328783) were selected carrying superior alleles for all the traits under study and respective QTLs presented in Supplementary Table S9.
FIGURE 6

Allelic effect of QTLs associated with (A) YpP (Yield per Plant); (B) HI (Harvest Index) and (C) HSW (Hundred Seed Weight) at Ludhiana and Gurdaspur. p-value is provided from t-test for the respective environment.

Allelic effect of QTLs associated with (A) YpP (Yield per Plant); (B) HI (Harvest Index) and (C) HSW (Hundred Seed Weight) at Ludhiana and Gurdaspur. p-value is provided from t-test for the respective environment.

Postulation of Candidate Genes

A total of 50 genes for 24 QTLs were identified with different functions for different traits, whereas no known gene was found for the remaining 18 QTLs (Table 3). Maximum genes with known function linked to the trait were identified for PHM 15) followed by CpP (9), YpP (7), SpP (5), IL (5), HI (5), HSW (3), and DtF (1), whereas no genes with known function for BpP, NpP, PL and BYpP were identified.
TABLE 3

List of candidate genes with their previously reported biological pathway function obtained in the putative QTL regions.

QTLTraitGene IDDist (Kb)Function
Q.DtF.10 DtF LOC106774489 73.878PHD finger-like domain-containing protein 5B
Q.PHM.3.1 PHM LOC106757287 -100.237E3 ubiquitin-protein ligase MARCH1
LOC106757069 -65.973bZIP transcription factor 53
LOC106757136 -47.55protein trichome birefringence-like 6
LOC106756978 -43.026histone-lysine N-methyltransferase EZ2-like
LOC111241394 -33.113DELLA protein RGL1-like
LOC106757804 -2.328DEAD-box ATP-dependent RNA helicase 24
LOC106757666 50.731probable WRKY transcription factor 23
LOC106756983 98.775gibberellin 2-beta-dioxygenase 2
LOC106756984 118.512transcription factor JAMYB-like
Q.PHM.4 PHM LOC106759452 -178.201tropinone reductase homolog
LOC106758588 -69.349tropinone reductase homolog At5g06060
Q.PHM.6.2 PHM LOC106764341 -82.244steroid 5-alpha-reductase DET2
Q.PHM.11.1 PHM LOC106777611 -171.802squamosa promoter-binding-like protein 14
LOC106777237 -112.535cytochrome P450 71D11
LOC106777539 -82.108pentatricopeptide repeat-containing protein At3g48810
Q.IL.1.1 IL LOC106766854 -33.322pectate lyase-like
LOC106765724 -33.083pectate lyase
Q.IL.1.2 IL LOC106762425 -99.425cytokinin hydroxylase
Q.IL.5 IL LOC106760883 -189.1purine permease 1
LOC106762422 16.957ethylene-responsive transcription factor RAP2-4
Q.CpP.1 CpP LOC106768944 -108.532SKP1-interacting partner 15
LOC106760064 -32.774receptor-like protein 12
LOC106760083 68.491polygalacturonase-like
Q.CpP.7 CpP LOC106765735 -115.443protein POLLEN DEFECTIVE IN GUIDANCE 1
LOC106766388 42.507LRR receptor-like serine/threonine-protein kinase RPK2
Q.CpP.9 CpP LOC111242573 -79.009eukaryotic translation initiation factor 3 subunit H-like
LOC106773784 166.402MLO-like protein 1
Q.CpP.10 CpP LOC106775061 131.587DDB1- and CUL4-associated factor 13
Q.SpP.3 SpP LOC106757271 71.381galactinol synthase 2
LOC106757661 95.044Golgi apparatus membrane protein-like protein ECHIDNA
LOC106756994 100.043alkaline/neutral invertase A, mitochondrial
Q.SpP.6 SpP LOC106765120 64.962dihydrofolate synthetase
Q.SpP.8.2 SpP LOC106770299 147.684ethylene-responsive transcription factor 1B-like
Q.YpP.4.1 YpP LOC106759105 7.311myb-related protein 305-like
Q.YpP.5 YpP LOC106761836 -177.436CLAVATA3/ESR (CLE)-related protein 5-like
LOC106760678 -107.67transcription factor PCF5
LOC106762074 -50.026sodium/calcium exchanger NCL
LOC106759995 141.103basic leucine zipper 34 isoform X1
Q.YpP.8 YpP LOC111242272 116.438alpha-mannosidase-like
LOC106771274 129.392putative 12-oxophytodienoate reductase 11
Q.HI.1 HI LOC106758323 -174.544UV-B-induced protein, chloroplastic isoform X1
Q.HI.5 HI LOC106760579 -189.197cytochrome P450 CYP72A219-like
Q.HI.7 HI LOC106769438 105.344protein root UVB sensitive 6
Q.HI.8.2 HI LOC111242272 116.438alpha-mannosidase-like
LOC106771274 129.392putative 12-oxophytodienoate reductase 11
Q.HSW.6 HSW LOC106764301 -15.52putative pentatricopeptide repeat-containing protein At1g12700, mitochondrial isoform X1
LOC106765194 12.233peroxidase 4
Q.HSW.10 HSW LOC106776199 -175.528bromodomain-containing protein 4B

Distance of 5′ position of the gene from SNP, identified associated with the QTL, where–sign shows that the gene was located upstream of the SNP, and +sign shows that the gene was located downstream of the SNP.

Days to 50% flowering (DtF); Plant height at 90% pod maturity (PHM); internodal length (IL), clusters per plant (CpP); seeds per pod (SpP); yield per plant (YpP); harvest index (HI) and hundred seed weight (HSW).

List of candidate genes with their previously reported biological pathway function obtained in the putative QTL regions. Distance of 5′ position of the gene from SNP, identified associated with the QTL, where–sign shows that the gene was located upstream of the SNP, and +sign shows that the gene was located downstream of the SNP. Days to 50% flowering (DtF); Plant height at 90% pod maturity (PHM); internodal length (IL), clusters per plant (CpP); seeds per pod (SpP); yield per plant (YpP); harvest index (HI) and hundred seed weight (HSW). The gene LOC106774489 was found 73 kb upstream of the Q. DtF.10 coding PHD finger-like domain-containing enzyme 5B. Nine genes LOC106757287 (-100.237 kb), LOC106757069 (-65.973 kb), LOC106757136 (-47.55 kb), LOC106756978 (-43.026 kb), LOC111241394 (-33.113 kb), LOC106757804 (-2.328 kb), LOC106757666 (50.731 kb), LOC106756983 (98.775 kb), LOC106756984 (118.512 kb) coding for already known enzyme E3 ubiquitin-enzyme ligase MARCH1, bZIP transcription factor 53, enzyme trichome birefringence-like 6, histone-lysine N-methyltransferase EZ2-like, DELLA enzyme RGL1-like, DEAD-box ATP-dependent RNA helicase 24, probable WRKY transcription factor 23, gibberellin 2-beta-dioxygenase two and transcription factor JAMYB-like for plant height were located near Q. PHM.3.1. For internodal length, two genes LOC106766854 (-33.322 kb) and LOC106765724 (-33.083 kb), were found close to the Q. IL.1.1 coding for pectate lyase-like and pectate lyase enzymes, respectively. Another gene LOC106762425 (-99.425 kb-), with enzyme cytokinin hydroxylase, was found in proximity with Q. IL.1.2. The QTL Q. IL.5 was near the genes LOC106760883 (-189.1 kb) and LOC106762422 (16.957 kb) coding for purine permease 1 and ethylene-responsive transcription factor RAP2-4 enzymes. For clusters per plant, three genes LOC106768944, LOC106760064 and LOC106760083 were in proximity of Q. CpP.1 (-108.532kb, -32.774 and 68.491 kb) coding SKP1-interacting partner 15, receptor-like protein 12 and polygalacturonase-like enzyme, respectively. For seeds per pod, three genes LOC106757271 (71.381 kb), LOC106757661 (95.044 kb) and LOC106756994 (100.043 kb) were found close to the Q. SpP.3 with enzyme galactinol synthase 2, Golgi apparatus membrane enzyme-like enzyme ECHIDNA and alkaline/neutral invertase A respectively. A gene LOC106765120 (64.962 kb) with enzyme dihydrofolate synthetase, was found in the vicinity of Q. SpP.6. Another ethylene-responsive transcription factor 1B-like enzyme coding gene LOC106770299 (147.684 kb) was found close to the Q. SpP.8.2. For yield per plant, a gene LOC106759105 (7.311kb) with MYB-related protein 305-like enzyme was close to the QTL: Q. YpP.4.1. Four genes LOC106761836 (-177.436 kb), LOC106760678 (-107.67 kb), LOC106762074 (-50.026 kb) and LOC106759995 (141.103 kb) coding CLAVATA3/ESR (CLE)-related protein 5-like, transcription factor PCF5, sodium/calcium exchanger NCL and basic leucine zipper 34 isoform X1 enzymes were in vicinity of Q. YpP.5. Two genes LOC111242272 (116.438 kb) and LOC106771274 (129.392 kb) with alpha-mannosidase-like and putative 12-oxophytodienoate reductase 11 enzymes was close to Q. YpP.8. For harvest index, gene LOC106758323 (-174.544 kb) coding for UV-B-induced protein At3g17800, chloroplastic isoform X1 was related to Q. HI.1. The gene LOC106760579 (-189.197 kb) was found in the proximity of Q. HI.5 coding cytochrome P450 CYP72A219-like enzyme. The gene LOC106769438 with protein root UVB sensitive six enzyme function was lying 105.344 kb of Q. HI.7. Two genes LOC111242272 (116.438 kb) and LOC106771274 (129.392 kb), with enzymes alpha-mannosidase-like and putative 12-oxophytodienoate reductase 11 respectively, were close to the Q. HI.8.2. Two genes LOC106764301 (-15.52 kb) and LOC106765194 (12.233 kb) with enzymes putative pentatricopeptide repeat-containing protein At1g12700 and peroxidase four, respectively, were close to the Q. HSW.6. The gene LOC106776199 (-175.528 kb) encoding bromodomain-containing protein 4B enzyme was related to Q. HSW.10. The nodes per plant (NpP) had two QTLs, but no genes already known for nodes per plant around the vicinity of 200 kb of significant QTLs were found.

KEGG Pathway Analysis

Some of the genomic regions significantly associated with the trait loci (Supplementary Table S10) such as Gibberellin 2-beta-dioxygenase (LOC106756983), cytochrome P450 CYP72A219 (LOC106760579) are involved in Diterpenoid biosynthesis (ko00904) pathway (Supplementary Figure S3). PHD finger-like domain-containing protein 5B (LOC106774489), and DEAD-box ATP-dependent RNA helicase 24 (LOC106757804) play role in Spliceosome (ko03040) process. Moreover, histone-lysine N-methyltransferase EZ2 found to be involved in Lysine degradation (Amino acid metabolism) ko0310. Besides that, pectate lyase-like (LOC106766854), pectate lyase (LOC106765724), and polygalacturonase-like (LOC106760083) have been found to play important role in Pentose and glucuronate interconversions (ko00040). LRR receptor-like serine/threonine-protein kinase RPK2 (LOC106766388), alkaline/neutral invertase A (LOC106756994) participates in Starch and sucrose metabolism (k000500). Galactinol synthase 2 (LOC106757271), alkaline/neutral invertase A (LOC106756994) have been shown to play role in Galactose metabolism (ko00052). Peroxidase 4 (LOC106765194) was found to play role in Phenylpropanoid biosynthesis (ko00940) (Supplementary Figure S4).

Discussion

Blackgram is one of the most popular pulses in Southeast Asia, with India contributing 54% of world production (Singh et al., 2016). Inspite of having high nutritional value, short duration, and photo insensitivity the crop has not been exploited to its full potential. Multiple biotic and abiotic stresses and narrow genetic base of this crop is major hindrance in its expansion. Thus, a systematic approach is required to exploit untapped genetic diversity present within the country so that the germplasm could be exploited for improving breeding potential. . Present study is an effort to exploit the collection of diverse blackgram genotypes for important yield related component traits. Environment factors highly influence the complex traits; therefore, the material was replicated both in time and space. Previous studies have also reported a wide range of phenotypic variability for traits such as PHM, BpP, IL, CpP, PpP and HI (Panigrahi et al., 2014; Panda et al., 2017; Senthamizhselvi et al., 2019). High heritability was observed for all the traits indicating high genetic control and an effective phenotypic selection for these traits. Different studies have reported high broad-sense heritability for traits, i.e., DtF, DtM, PHM, CpP, PL and YpP (Panigrahi et al., 2014; Kumar et al., 2015); BYpP, HI and HSW (Rolaniya al., 2017; Kuralarasan et al., 2018). High heritability with high selection intensity helps breeders to shorten the breeding cycles of the program leading to faster and higher genetic gains. Many genotypes harbored combination of superior alleles for different yield related traits. For instance, the genotype IC274597 for CpP, PpP, SpP and BYpP; IC370938 for PpP, YpP, HSW and HI; KUG673 for DtF, DtM and HI; IC328783 for PHM, NpP and BYpP; These genotypes can be used in further breeding programs to improve desirable characters.

Population Structure

Ad-hoc delta K and Cross entropy values suggested presence of four sub-populations in the blackgram germplasm which was further supported by phylogenetic analysis which in turn indicated significant diversity in the panel. High Fst values in 2, 3 and 4 sub-populations indicated them to be highly differentiated. A diverse germplasm panel can be a good source for a wide range of traits for breeding and research purpose (Govindaraj et al., 2015). Modelling of genetic structures as covariates helps in controlling the false positives while conducting GWAS. This is the first report on population structure analysis in blackgram; however, four sub-populations in mungbean germplasm have been reported earlier (Breria et al., 2020).

Genome Wide Association Studies (GWAS)

GWAS for 14 yield associated traits identified 49 SNPs contributing to 42 QTLs to be strongly associated with 12 traits except DtM and PpP. Among 42 significant genomic regions identified in the study, the number of QTLs for each trait were as YpP (4), DtF (1), PHM (8), BpP (1), NpP (2), IL (4), CpP (5), PL (1), SpP (4), BYpP (2), HI (7) and HSW (3). High -log10 p-value and PVE suggested presence of major QTLs i.e. Q. DtF.10, Q. PHM.3.2, Q. NpP.6, Q. IL.8, Q. CpP.10, Q. SpP.3, Q. BYpP.9, Q. YpP.8, and Q. HI.8.2 at Ludhiana, whereas Q. PHM.3.1, Q. PHM.6.1, Q. NpP.4, Q. IL.1.1, Q. CpP.4, Q. SpP.6, Q. BYpP.8, Q. YpP.4.2, and Q. HI.6 at Gurdaspur were found significant for trait regulation. QTLs Q. PHM.8, and Q. IL.8 were found to be located at same position on eighth chromosome controlling PHM and IL, respectively. The pleiotropy of increased PHM with increased IL has been earlier reported in faba bean (Hughes et al., 2020). The significant allelic effects of MTAs suggested selection of superior alleles could substantially lead to significant improvement of crop. At Ludhiana, QTL Q. DtF.10 with superior allele TT resulted in earlier flowering with a mean of 41.42 days as compared to 43.05 with alternative allele. Q. PHM.3.1 with superior allele AA recorded higher mean plant height of 28.54 cm, and 43.23 cm as compared to 19.21 cm, and 27.21 cm with alternative allele GG at Ludhiana, and Gurdaspur respectively. Superior allele TT of QTL Q. IL.8 decreased internode length with mean value of 2.01 cm, whereas alternate allele observed higher internodal length with mean value of 2.83 cm. For clusters per plant, Q. CpP.1 with superior allele AA exhibited mean of 18.84 clusters, whereas alternate allele TT exhibited mean of 12.73 clusters with The SNP S6.1.29235837 of Q. SpP.6 having allele AA produced more average seeds per pod of 6.71 in contrast to 6.45 by alternate allele. Higher yield per plant with average of 5.47 g was observed by presence of AA allele of QTL Q. YpP.5 while 4.21 g yield was observed with alternate allele. Q. HI.1 with allele GG exhibited an increased mean value of harvest index (22.53%) as compared to a lower harvest index due to alternate allele (18.62%). Q. HI.8.2 with allele TT showed a higher harvest index (22.87%) as compared to alternate allele (19.44%). Allelic effects with superior alleles and alternative alleles have been reported for significantly associated markers for root, nutrient uptake and yield related traits in rice (Subedi et al., 2019), for spike ethylene and spike dry weight in wheat (Valluru et al., 2017), and for yield related and heat tolernce traits in wheat (Dhillon et., 2021).

Postulation of Genes

Blackgram is a highly self-pollinated crop and with the given narrow genetic base, is expected to have large LD blocks and large chunks of haplotypes being transferred over the generations without recombination. Keeping this in view, the SNPs identified for traits can be searched upstream and downstream for candidate genes governing those traits (Dhillon et al., 2020). A total of 50 genes for 24 QTLs were identified with different functions with respect to different traits. The QTL Q. DtF.10 identified for days to flowering was present in the vicinity of the gene coding plant homeodomain finger-like domain-containing enzyme 5B can be a putative gene as PHD finger-like genes are reported to delay flowering Arabidopsis (Greb et al., 2007). The enzymes encoded by genes found for QTLs of PHM were involved in controlling the plant height as supported by earlier studies of enzyme E3 ubiquitin-enzyme ligase MARCH1 in rice (Hu et al., 2013), bZIP transcription factor 53 in soybean (Ali et al., 2016), enzyme trichome birefringence-like six in Rice (Gao et al., 2017), histone-lysine N-methyltransferase EZ2-like (https://www.uniprot.org/uniprot/O23372), DELLA enzyme RGL1-like in Arabidopsis (Serrano-Mislata et al., 2017), DEAD-box ATP-dependent RNA helicase 24 in rice (Wang et al., 2016), probable WRKY transcription factor 23 in rice (Cai et al., 2014), gibberellin 2-beta-dioxygenase two in Arabidopsis (Rosin et al., 2003), transcription factor JAMYB-like in rice (Zhang et al., 2017), acyl transferase four in rice (Basu et al., 2017), tropinone reductase homolog (Stead, 1989), in grapevine (Guillaumie et al., 2020), steroid 5-alpha-reductase DET2 in soybean (Huo et al., 2018), squamosa promoter-binding-like enzyme 14 in rice (Lu et al., 2013), cytochrome P450 71D11 in cucumber (Wang et al., 2017), pentatricopeptide repeat-containing enzyme At3g48810 in Arabidopsis (Lee et al., 2019), and pentatricopeptide repeat-containing enzyme At1g31430 in Arabidopsis (Lee et al., 2019). For internodal length, genes coding for pectate lyase-like (PLL), and pectate lyase (PL) in rice (Leng et al., 2017), cytokinin hydroxylase in Arabidopsis (Kiba et al., 2013), purine permease 1 in cotton (Wang et al., 2020), ethylene-responsive transcription factor RAP2-4 enzymes Arabidopsis (Hinz et al., 2010) have been earlier reported. For clusters per plant, genes SKP1-interacting partner 15 (Lu et al., 2016), receptor-like protein 12 (Wang et al., 2008), polygalacturonase-like enzyme (Xiao et al., 2014) in Arabidopsis, probable trehalose-phosphate phosphatase C isoform X2 in Nelumbo nucifera (Jin et al., 2016), lysine-specific demethylase JMJ25 isoform X1 in Arabidopsis (Jiang et al., 2007), pollen defective in guidance 1 (Li et al., 2011), LRR receptor-like serine/threonine-protein kinase RPK2 enzymes in Arabidopsis (Mizuno et al., 2007), eukaryotic translation initiation factor 3 subunit H-like in Arabidopsis (Zhou et al., 2014), MLO-like protein 1 reported in peach (Ruperti et al., 2002), and DDB1- and CUL4-associated factor 13 enzyme (Bjerkan and Grini, 2013) has been earlier reported to be involved in controlling cell elongation and flower development. For seeds per pod, galactinol synthase two in Chickpea (Salvi et al., 2016), Golgi apparatus membrane enzyme-like enzyme ECHIDNA in Arabidopsis (Jia et al., 2018), 60S ribosomal protein L28-2 in soybean (Jones et al., 2020), dihydrofolate synthetase in Arabidopsis (Corral et al., 2018), and ethylene-responsive transcription factor 1B-like enzyme in brassica (Kaur et al., 2020) were known to have a role in increasing number of seeds per pod/siliqua. For yield per plant, myb-related protein 305-like in Arabidopsis (Ambawat et al., 2013), CLAVATA3/ESR (CLE)-related protein 5-like in Arabidopsis (Fletcher, 2018), transcription factor PCF5 in wheat (Zhao et al., 2018), sodium/calcium exchanger NCL in soybean (Liu Y. et al., 2016), basic leucine zipper 34 isoform X1 enzymes in wheat (Sornaraj et al., 2016), alpha-mannosidase-like in wheat (Dal Cortivo et al., 2020), and putative 12-oxophytodienoate reductase 11 in wheat (Pigolev et al., 2018) were previously known for yield improvement. For harvest index, UV-B-induced protein At3g17800 in basil (Dou et al., 2019), cytochrome P450 CYP72A219-like enzyme in Arabidopsis (Bak et al., 2011), root UV-B sensitive in wheat (Agrawal et al., 2004), alpha-mannosidase-like in wheat (Dal Cortivo et al., 2020), and putative 12-oxophytodienoate reductase 11 in wheat (Pigolev et al., 2018) are known for triggering reductions in biomass, yield and harvest index. The genes detected for HSW, pentatricopeptide repeat-containing protein At1g80550, have been earlier reported for kernel development in maize (Dai et al., 2018; Ren et al., 2019); peroxidase four in soybean (Zhang et al., 2015). The clusters per plant had several QTLs, but no known genes were identified in 200 kb genomic region. KEGG analysis revealed role of candidate genes in biological pathways related to respective traits. Gibberellins (GAs) are endogenous phytohormones that are involved in the regulation of the life cycle of plants. It has been identified that the locus encoding gibberellin 2-beta-dioxygenase/GA 2-oxidase present in vicinity of SNP S3.1.7993147 on chromosome 3 significantly associated with plant height at 90% pod maturity participates in Diterpenoid biosynthesis (ko00904) pathway. The role of this locus in the regulation of plant growth in rice has been demonstrated by Sakamoto et al., 2001. KEGG pathway established role of another gene coding for steroid 5-alpha-reductase DET2 with QTL associated with PHM on chromosome six to participate in Brassinosteroid biosynthesis. Ortholog of this gene in cotton (Gossypium hirsutum L.), GhDET2 along with BRs are known to play a crucial role in the initiation and elongation of cotton fiber cells (Luo et al., 2007). Similarly, DET2 steroid 5d-reductase in Arabidopsis catalyzes a major rate-limiting in brassinosteroid (BR) biosynthesis (Nakaya et al., 2002). Additionally, Cytochrome P450 CYP72A219-like has also been reported to participate in Diterpenoid biosynthesis pathway (Bathe and Tissier, 2019). Using KEGG pathway analysis, function of only two QTLs could be established with diterpenoid pathway and brassinosteroid biosynthesis pathway. However, the function of remaining significant genomic regions could not be established using this analysis.

Conclusion

GWAS analysis led to identification of novel MTA and few putative candidate genes. Though candidate genes need to be examined further and detailed investigations would validate their roles in governing agronomically important traits but the MTA will really help in selecting the genotypes with positive associations. The information derived from this study can be used in the generation of SNP based molecular markers to select traits of interest and accelerate blackgram breeding programme with a better ideotype. Since the blackgram is neglected crop in term of number of genetic and genomic resources, the current study, first of its kind in this crop will definitely open new avenues for broadening its base.
  81 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  A Rice PECTATE LYASE-LIKE Gene Is Required for Plant Growth and Leaf Senescence.

Authors:  Yujia Leng; Yaolong Yang; Deyong Ren; Lichao Huang; Liping Dai; Yuqiong Wang; Long Chen; Zhengjun Tu; Yihong Gao; Xueyong Li; Li Zhu; Jiang Hu; Guangheng Zhang; Zhenyu Gao; Longbiao Guo; Zhaosheng Kong; Yongjun Lin; Qian Qian; Dali Zeng
Journal:  Plant Physiol       Date:  2017-04-28       Impact factor: 8.340

3.  Development of a black gram [Vigna mungo (L.) Hepper] linkage map and its comparison with an azuki bean [Vigna angularis (Willd.) Ohwi and Ohashi] linkage map.

Authors:  B Chaitieng; A Kaga; N Tomooka; T Isemura; Y Kuroda; D A Vaughan
Journal:  Theor Appl Genet       Date:  2006-08-24       Impact factor: 5.699

4.  Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics.

Authors:  M A Saghai-Maroof; K M Soliman; R A Jorgensen; R W Allard
Journal:  Proc Natl Acad Sci U S A       Date:  1984-12       Impact factor: 11.205

5.  Two Trichome Birefringence-Like Proteins Mediate Xylan Acetylation, Which Is Essential for Leaf Blight Resistance in Rice.

Authors:  Yaping Gao; Congwu He; Dongmei Zhang; Xiangling Liu; Zuopeng Xu; Yanbao Tian; Xue-Hui Liu; Shanshan Zang; Markus Pauly; Yihua Zhou; Baocai Zhang
Journal:  Plant Physiol       Date:  2016-11-18       Impact factor: 8.340

6.  A genome-wide functional investigation into the roles of receptor-like proteins in Arabidopsis.

Authors:  Guodong Wang; Ursula Ellendorff; Ben Kemp; John W Mansfield; Alec Forsyth; Kathy Mitchell; Kubilay Bastas; Chun-Ming Liu; Alison Woods-Tör; Cyril Zipfel; Pierre J G M de Wit; Jonathan D G Jones; Mahmut Tör; Bart P H J Thomma
Journal:  Plant Physiol       Date:  2008-04-23       Impact factor: 8.340

Review 7.  Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives.

Authors:  M Govindaraj; M Vetriventhan; M Srinivasan
Journal:  Genet Res Int       Date:  2015-03-19

8.  A genome-wide association study of seed protein and oil content in soybean.

Authors:  Eun-Young Hwang; Qijian Song; Gaofeng Jia; James E Specht; David L Hyten; Jose Costa; Perry B Cregan
Journal:  BMC Genomics       Date:  2014-01-02       Impact factor: 3.969

9.  Genome-wide association study reveals significant genomic regions for improving yield, adaptability of rice under dry direct seeded cultivation condition.

Authors:  Sushil Raj Subedi; Nitika Sandhu; Vikas Kumar Singh; Pallavi Sinha; Santosh Kumar; S P Singh; Surya Kant Ghimire; Madhav Pandey; Ram Baran Yadaw; Rajeev K Varshney; Arvind Kumar
Journal:  BMC Genomics       Date:  2019-06-10       Impact factor: 3.969

10.  Effects of Seed-Applied Biofertilizers on Rhizosphere Biodiversity and Growth of Common Wheat (Triticum aestivum L.) in the Field.

Authors:  Cristian Dal Cortivo; Manuel Ferrari; Giovanna Visioli; Marta Lauro; Flavio Fornasier; Giuseppe Barion; Anna Panozzo; Teofilo Vamerali
Journal:  Front Plant Sci       Date:  2020-02-26       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.