Literature DB >> 31889880

Legume genomics and transcriptomics: From classic breeding to modern technologies.

Muhammad Afzal¹, Salem S Alghamdi¹, Hussein H Migdadi¹, Muhammad Altaf Khan¹, Shaher Bano Mirza^2,3, Ehab El-Harty¹.

Abstract

Legumes are essential and play a significant role in maintaining food standards and augmenting physiochemical soil properties through the biological nitrogen fixation process. Biotic and abiotic factors are the main factors limiting legume production. Classical breeding methodologies have been explored extensively about the problem of truncated yield in legumes but have not succeeded at the desired rate. Conventional breeding improved legume genotypes but with more resources and time. Recently, the invention of next-generation sequencing (NGS) and high-throughput methods for genotyping have opened new avenues for research and developments in legume studies. During the last decade, genome sequencing for many legume crops documented. Sequencing and re-sequencing of important legume species have made structural variation and functional genomics conceivable. NGS and other molecular techniques such as the development of markers; genotyping; high density genetic linkage maps; quantitative trait loci (QTLs) identification, expressed sequence tags (ESTs), single nucleotide polymorphisms (SNPs); and transcription factors incorporated into existing breeding technologies have made possible the accurate and accelerated delivery of information for researchers. The application of genome sequencing, RNA sequencing (transcriptome sequencing), and DNA sequencing (re-sequencing) provide considerable insights for legume development and improvement programs. Moreover, RNA-Seq helps to characterize genes, including differentially expressed genes, and can be applied for functional genomics studies, especially when there is limited information available for the studied genomes. Genome-based crop development studies and the availability of genomics data as well as decision-making gears look be specific for breeding programs. This review mainly presents an overview of the path from classical breeding to new emerging genomics tools, which will trigger and accelerate genomics-assisted breeding for recognition of novel genes for yield and quality characters for sustainable legume crop production.

Entities: Chemical Disease Gene Species

Keywords: Classical breeding; Improvement; Legumes; RNA sequencing; Transcriptome

Year: 2019 PMID： 31889880 PMCID： PMC6933173 DOI： 10.1016/j.sjbs.2019.11.018

Source DB: PubMed Journal: Saudi J Biol Sci ISSN： 2213-7106 Impact factor: 4.219

Introduction

The shortage of food expected in coming years and the supply and demand curve of food is problematic because of the increasing population (7.2 billion to 9.6 billion) by 2050 (Gerland et al., 2014). The world population has increased the demand of food production which has raised the question of the available resources (Fedoroff, 2015). Natural resource depletion and changing climate have pretentious the continuous energies to attain the desired level of production. Food security with the aim of high nutrition and yield is main challenge for research scientist and farming community in 2050. Grain legumes contain high amounts of protein, vitamins, minerals, iron, calcium, zinc, magnesium, omega-3 and fatty acids. The value of grain legumes is even higher in the Asian region, where a large part of the population is vegetarian. Groundnut and soybean are important oilseed legumes, which are largely used in cuisine and in the preparation of sweetmeat. One of the most important characteristics of legumes is biological nitrogen fixation and, in this context, legumes considered an important crop in sustainable agriculture ensuring that residual nitrogen resources are made available to non-legume crops (Pandey et al., 2016). The highest protein content is found in soybean (33–45%), common bean (21–39%), wing bean (30–37%), cowpea (21–35%), groundnut (24–34%), pea (21–33%), moth bean and urd bean (21–31%), lentil (20–31%), grass pea (23–30%), chickpea (15–30%), horse gram (19–29%), and rice bean (18–27%) (Pandey et al., 2016). Legume production cannot be enhanced to the desired level because of biotic and abiotic limitations. Moreover, the decrease in available land areas and water resources (because of climate change) will exacerbate current adverse conditions in coming years and, as a result, protein crops such as legumes may be at risk. For this purpose, conventional breeding protocols have focused on the problems of low productivity of legumes over the last two decades, but the associated goals have not been fully achieved. Quality-related characteristics (i.e. Protein) (Gnanasambandam et al., 2012) and the absence of anti-nutritional factors (e.g., tannins; Mikic et al., 2009) are important targets for legume crop improvement programs. Gene regulation in both legume plants and bacteria occurs at all stages of nodule development. Numerous researches have focused on determining the genetic control of infection by bacteria and the initial processes involved in nodulation. Some early nodulin (e.g., ENOD11 and R1P1) and an essential Nod factor (NF) with signaling triggered by bacterial infection have been recognized (Yang et al., 1993). Plant breeding success has mostly depended on the use of genetics and mutation-induced breeding of genetic diversity resources to identify selected genotypes for breeding purposes. Also, it depends on the documentation of novel genes of interest (GOI) and selected breeding methodologies based on phenotype data (Perez et al., 2012). The development of genomic breeding values based on a high-throughput DNA sequencing method known as next-generation sequencing (NGS). NGS and other new technologies enable plant breeders to develop markers for diversity analyses, genotyping, high-density genetic maps, and population genetics studies, and these can be merged into available breeding technologies to achieve desired goals (Lorenz et al., 2011). Moreover, genomic approaches are essential to deal with multi-genic complex traits and showed high impact on the environment. Genomics tools are also essential to map quantitative trait loci (QTLs) of rare alleles that often remain undetected in the gene pool (Vaughan et al., 2007). Using more advanced techniques, such as cDNA sequencing [i.e., expressed sequence tags (ESTs)], provides a better understanding of genes expressed in leaves or roots, at a specific stage, or connected to a precise environmental condition. Even though there are some restrictions to the use of cDNA methodology (e.g., lack of evidence for non-coding regions), the identification and collection of ESTs are useful for researchers. Draft genome sequencing allows the improvement of crops based on genomic gains and on the selection of genes that confer favorable traits that increase yield and nutrient quality. It also provides the information necessary to understand genome construction and uncovers the pathways linked to stress response, besides determining the mutagenic changes resulting from deletions and insertions in the genome. The availability of genetic resources worldwide provides a chance for plant community researchers to detect unique alleles or gene combinations that are important for crop development (McCouch et al., 2013). This manuscript presents a review of the classical breeding methods and the most advanced technologies used in the development of legumes crops. We briefly discuss transcriptome technologies and computational biology approaches that help to facilitate the selection of GOIs for the development of legume improvement programs. Moreover, we also discuss the genomic tools that can be used to detect complex traits, such as nitrogen fixation, quality, and yield. Genomic-based techniques improve breeding programs; that is, the distribution of genomics and the provision of decision tools seems to be critical for selection. This review also offers an impression of new emerging genomic tools, such as genomic-assisted breeding, that may accelerate the production of sustainable legume crops.

Marker-assisted breeding (MAB)

Different applications of DNA markers are using for breeding purposes. However, the use of molecular breeding for cultivar development has become more popular because of the method‘s precise characteristics and because it is less time-consuming and provides maximum benefits. Moreover, simple traits are much more critical when associated with complex or multiple genes. Molecular markers are significantly crucial for the successful selection of cultivars that are resilient to parasitic weeds (e.g., boom rape) and diseases (e.g., Ascochyta blight, rust, and chocolate spot) (Rietzschel and De Buyzere, 2012). These markers can also be utilized to find linked genes responsible for the plant’s growth habit and quality (i.e., vicine and tannin content) (Torres et al., 2010). MAB can be used together with conventional field breeding to accelerate the selection of desirable traits (Collard and Mackill, 2007) and increase selection efficacy (Ragot et al., 2007). MAB relayed on the genotype of a specific marker system can be used to enhance the efficiency and effectiveness of marker selection relative to classical breeding (Collard and Mackill, 2008). Breeders can use these genes or alleles as diagnostic tools to identify desired genes or GOIs. The progress in linkage maps for faba beans has been slow compared to that of other legume crops. The development of low-density linkage maps and it was determined by a molecular marker system isozyme; Restriction Fragment Length Polymorphism (RFLP) and Random Amplification of Polymorphic DNA (RAPD) (Avila et al., 2003). Afterward, sequence-based characterization was done, i.e., using intron-targeted amplified polymorphisms (ITAPs), simple sequence repeats (SSRs), and sequence-characterized amplification regions (SCARs) which allowed the progress of more significant maps used for comparative genome study of legume (Zeid et al., 2009). A set of functional SSR markers have been developed using a genomic library rich with GA/CT motifs for utilization in legume breeding program (Verma et al., 2014). The development of ESTs (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c) or genome survey sequences (GSS) can extend marker development, especially of SSRs and SNPs used to improve map resolution (Webb et al., 2015). Later studies focused on specific population characters for genetic maps, discovery of QTLs based on morphological characters, and resilience against abiotic stresses and diseases (Atienza et al., 2016). The development of molecular markers (cross legume species) from Medicago, simplifies gene order and the comparison of genetic maps at micro and macro levels among legumes. The achievement of the MAB mainly based on many aspects such as the genetic traits, flanking regions distance and the gene of interest, information of genetic background of the gene of interest, and the methodology available (Francia et al., 2005). Also, the selection of primers depends on specific genes that have minimal effects but are relatively easy to map for phenotyping. These markers are not polymorphic, thus requiring different markers for genetic diversity. SSRs show that high levels of polymorphism and gene-rich regions may be effectively used for emerging maps by identifying the desired alleles and QTLs in faba bean (Song et al., 2004). SSRs markers have been used to construct maps based on traits (agronomy) and identified QTLs related to yield, resilience to pests and diseases in soybean (Glycine max L.) (Du et al., 2009). Many molecular markers used for mapping QTLs in food legumes. Some important linked molecular markers for agronomic traits and disease resistance genes and quantitative traits loci presented in Table 1. In the last decade, an effort was made to map soybean (Glycine max (L) Merr.) and Pea (Pisum sativum L.) genomes. Over 300 QTLs associated with various traits have been recognized using MAB (Gupta et al., 2010). Sequencing of wild and cultivated soybean genomes produced 205,614 SNPs in the wild accessions relative to the cultivated lines, which could be favorable for the QTL analysis (Lam et al., 2010). Some quantitative trait loci related to disease resistance in soybean reported in Table 2.

Table 1

Summary of the linked molecular markers for agronomic and disease resistance gene(s)/quantitative trait loci.

Traits	Mapping population	Linked markers	References
Determinate growth habit		CAPS-TFL1 (HindIII)^aTi-dCAPS^a	Avila et al. (2007)
Zero tannins	Vf6 × zt-1 (F₂)	SCC5₅₅₁/SCG11₁₁₇₁	Gutierrez et al. (2007)
Low vicine–convicine	Vf6 × vc- (F₂)	SCH01₆₂₀/SCAB12₈₅₀	Gutierrez et al. (2006)
Frost tolerance	Frost tolerance	U09₁₄₉₉/B20₈₀₃	Arbaoui and Link (2008)
Ascochyta blight resistance	Vf6 × Vf136 (F₂)	OPAC06₁₀₂₃	Diaz et al. (2005)
Broomrape resistance	Vf6 × Vf136 (F₂)Vf6 × Vf136 (RILs)	OPAI13₁₀₁₈/OPAC06₃₉₆	Diaz et al. (2005)
Rust resistance	2N52 × Vf176	OPI20₉₀₀/OPL18₁₀₃₂	Avila et al. (2003)

Table 2

Quantitative trait loci (QTLs) related to resistance to disease in soybean crops.

Traits	Gene	Linked markers	References
Resistance to leaf rust		Rpp5	Arahana et al. (2001)
White mold		Satt009	Arahana et al. (2001)
Resistance to cyst nematode			Concibido et al. (1997)
SCN	Rhg4, rhg1, and Rfs	Satt009	Meksem et al. (2000)
Resistance to Fusarium solani		Satt080	Njiti et al. (2002)
Black pod-of-staff	Rhg1	Satt038, Satt130Sat-168, Satt 309	Bachman et al. (2001)
Resistance to stain-frogeye	RCSPeking	Satt244	Yang et al. (2001)
Resistance root-knot nematode		Satt114	Gavioli (2011)

Summary of the linked molecular markers for agronomic and disease resistance gene(s)/quantitative trait loci. Quantitative trait loci (QTLs) related to resistance to disease in soybean crops. Some legume genome sequencing results, the discovery of genes, and ESTs generated. As of January 2014. According to the Dana Farber Gene Indices (compbio.dfci.harvard.edu/tgi/plant.html).

Proteomics

The advent of the recent ‘omics’ techniques has helped to deliver a new view for crop developing, including finding novel genes and intricate pathway mechanisms for agronomic traits integrated through genomics and functional approaches based on molecular and morphological data (Langridge and Fleury, 2011). Proteomics is based on cellular complements of proteins discussing its biological units at a specific stage or a specific condition to be analyzed (Jorrin-Novo et al., 2019). Studies on proteomics also involve quantitative measurements, such as composition, genetic alterations, and modulation of specific stages of stress conditions in crop plants. Protein-generated data together with genetic determinants may lead to the identify GOIs or desired traits, and later on, this data can then be incorporated into the molecular breeding improvement program (Vanderschuren et al., 2013). Proteomics comprises mapping, protein profiling, post-translational modification (PTMs), and interprotein interaction networks (Katam et al., 2015). An important potential area of crop research is translational proteomics, which can be applied to crop research for agricultural production (Jorrin-Novo et al., 2019). The application of ortho-proteomics and comparative analysis improve breeding programs by standardizing and maintaining high-quality protein data of specific crop (Vanderschuren et al., 2013). Recent developments in proteomic studies, such as the application of mass spectrometry (MS), increase precision and understanding, and different software used for advanced proteomics quantification. It also involved quantification based on gel, labeled probe, or label-free methods for protein quantification with precision and accuracy (Hu et al., 2015). It is currently possible to understand the mechanism involved under different stress conditions at different growth stages by using two-dimensional gel electrophoresis along with MS analyses. Moreover, two-dimensional protein gel apparatus is important for high resolving power by labeling samples with fluorescent dye, and they allow bands to separate in the same gel (Vadivel and Kumaran, 2015). Another method is the shotgun proteomics strategy using a peptide-centric database linked with liquid chromatography tandem-MS method. It provides an efficient high-throughput analysis of cell or organelle (proteome) of major proteins (Jorrin-Novo et al., 2019). Biomarker validation of a specific protein can determine by the selected reaction monitoring (SRM) method. SRM method determines the molecular mechanism underlying specific traits of crops (Jacoby et al., 2013). It is more sensitive method to determine the number of specific proteins in crop plants. The protein-protein interaction (PPI) methodology is important to determine the proteomics of molecular mechanisms, i.e., protein complexes, signal transduction, and stress signal (Westermarck et al., 2013). The more advanced approaches used for legume improvement and the application of these methods addressed in different legume species. The important methods to identify proteins and comparing the reference databases with MS/MS spectra database (Romero-Rodríguez et al., 2014). Moreover, these methods also include the Tandem, Sequest, and Mascot search databases, which used for identification of proteins (Senkler and Braun, 2012). Moreover, proteomics plays a significant part in the study of differences at cellular, subcellular, and plant levels under abiotic stress conditions (Hossain and Komatsu, 2014). Alghamdi et al. (2018) studied seven cowpea genotypes about solubility-based protein analysis; their results revealed that water-soluble proteins were dominant in cowpea seeds when compared to total proteins. Moreover, gel patterns revealed molecular diversity, total protein complement, and the presence of different protein fractions. Gel and nano liquid chromatography (LC) MS/MS techniques were used in soybean to determine the osmotic stress in plasma membranes (Nouri and Komatsu, 2010). The results obtained using the gel approach identified four upregulated and eight downregulated proteins, while the LC methodology recognized 11 upregulated and 75 downregulated proteins (Ahsan et al., 2010). Further studies determined abiotic stress response using protein profiling. Abiotic stresses in chickpea constitute a severe threat to its productivity; therefore, efforts have been made to determine the genetic basis of stress tolerance in chickpea. Heidarvand and Maali-Amiri (2013) recognized a unique dehydration signaling component, designated CaSUN1 (Cicer arietinum Sad1/UNC-84). The function of CaSUN1 is to localize to endoplasmic reticulum and nuclear membrane beside small vacuolar vesicles. In another study, they also determined protein changes in chickpea at early growth stages facing cold stress (Jaiswal et al., 2014). Some research was also conducted on soybean to determine protein profiling concerning water stress (Seminario et al., 2017). Irar et al. (2014) reported the presence of a signal pathway that restricts nitrogen fixation under drought conditions; 18 protein (Nodule regulated) by pea and rhizobium genome identified. Moreover, two-dimensional gel-electrophoresis and MALDI-TOF-MS approaches were used to determine the diversity for protein deposition at germination stage during osmotic stress in seeds (Brosowska-Arendt et al., 2014).

Metabolites in legumes

Plant metabolites represent biochemical markers or phenotype changes that occur in a cell or tissue part while their end products appear in the form of gene expression (Hong et al., 2016). Moreover, detailed quantitative and qualitative results provide an understanding of the gene function (Wink, 2013). The components related to stress such as metabolism, stress signal acclimation process, and transduction molecules could be identified by metabolomics study (Larrainzar et al., 2009). The recognized metabolites were further analyzed by direct calculations or by relating them with differences in protein and transcriptome results using mutant analysis. Currently, different metabolomics techniques are used for metabolomics profiling, including metabolite determination, analytical techniques based on separation, and detection methods (Doerfler et al., 2014). The separation approach includes gas chromatography (GC), which is used to determine the primary metabolites (i.e., sugars and amino acid) (Weckwerth, 2011); ultra-performance liquid chromatography; capillary electrophoresis (CE) which is used to determine ionic metabolites (Doerfler et al., 2014), and liquid chromatography which is used to determine the secondary metabolites (Weckwerth, 2011). Gas chromatography–Mass spectrometry (GC–MS) has been used to determine plant metabolites and electron impact (EI) with strong interface of GC with MS and allows them to separate fragment patterns to be extremely reproducible. Some important methods and application described below. Application of metabolomics profiling is limited in legumes, but Ramalingam et al. (2015) used a quantitative MS method to determine metabolomics diversity among the symbionts in Medicago. Later, a study conducted on salinity and drought stress, and metabolites related analysis were made use priming technique for symbiotic interaction (plant nodulation) and nitrogen fertilizer in Medicago (Staudinger et al., 2012). Sanchez et al. (2011) used GC with electron impact ionization to determine the soluble molecules profiled. Lotus plants can grow in highly saline seaside sections is compared with cultivated glycophytic grass fodder species. The results of a comparative analysis predicted the presence of metabolites conserved resulting from drought in Lotus, and that the extremophile Lotus species recorded upper salt levels with differential reordering of shoot nutrient when exposed to salinity. Abiotic stress may affect plants in many different manners, and each plant responds differently and produces different types of metabolomics compounds under stress. The metabolomics approach using Capillary electrophoresis-mass spectrometry (CE-MS) for flooding stress in soybean roots and hypocotyl and analysis recorded about 81 metabolites related to mitochondria. Moreover, tricarboxylic acid (TCA) analysis metabolites, amino acids, NADH, and pyruvate contents increased, while ATP decreased (Komatsu et al., 2011). However, the identification of these metabolites and detail analysis is required to determine the response in stress conditions. A study was conducted to determine the phosphorous response in common bean roots and nodules by metabolite profiling under stress conditions when phosphorus is deficient in the soil (Hernández et al., 2009). Results predicted that amino acid and several sugars levels increased under phosphorus-deficiency stress in the roots. In another study, the organic acid amount decreases when phosphorus is deficient in the root area and reproduce exudation of these compounds (metabolites) from root to rhizosphere (Hernaández et al., 2007). Dias et al. (2015) reported 49 primary metabolites from different salt stress response in chickpea genotypes. Integrated approaches include metabolomics, and transcriptomes techniques are important for worthy metabolomics reprogramming of the border cells in roots by specific data set pathways. This technique is important for identification of phytohormone levels, from auxin and lipoxygenase transcripts from root border cell and root tip. Similarly, another study was conducted using an integrated metabolomics and proteomic approach to elucidate the expression and regulation of metabolites under drought conditions in soybean (Komatsu et al., 2011). Moreover, metabolite study enabled showing high reprogramming of metabolism under drought stress conditions as well as the capability of nodule to recover after re-watering conditions.

Mutation breeding in legumes

Induced mutagenesis has been used in legume crops to develop varieties, as indicated in Table 4. Table 4 shows data for the released variety database from the MVD of IAEA (2018). The development type of mutant variety could be a direct, indirect, or spontaneous mutation. Direct mutation can be generated using physical or chemical mutagens, as well as a combination of mutagens. A spontaneous mutation is a mutation that occurs naturally in the field. The first released variety is reported in 1954 on the pea, and it continues until now. However, in some crops such as pea and pigeon pea, as well as faba bean, the development of mutant variety through induced mutagenesis is seemed to be stagnant. Soybean has the highest number of released mutants, followed by groundnut and the common bean with 174, 76, and 57 mutant varieties, respectively. However, the number of released mutant variety in legume crops has been lower than that of cereal crops, such as wheat with 282 mutant varieties, barley with 312, and rice with 828 mutant varieties (IAEA, 2018). Table 4 also shows that physical mutagens contributed to most of the mutant variety compared to chemical mutagens. Among the physical mutagens, gamma radiation is the most widely used mutagen in mutation breeding (Kodyma et al., 2011). Gamma radiation has been reported as an effective mutagen to induce diversity in several legume crops, such as pigeon pea (Desai and Rao, 2014), cowpea (Girija et al., 2013), chickpea (Wani and Anis, 2008), and mung bean (Sangsiri et al., 2005). Induced mutagenesis has been used not only to develop varieties but also to study the genes related to a particular trait through forward or reverse genetics. Table 5 shows several applications of induced mutagenesis to isolate genes and identify gene function in several legume crops. The mutant characteristics studied presented morphological differences, from seed quality composition to secondary metabolite differences. The model legume, Medicago truncatula has been the most widely used legume to study gene function due to small (~450 Mb) genome size, rapid life cycle, abundant collection of mutants, and ecotypes (Tang et al., 2014).

Table 4

Released mutants by induced mutagenesis in major legume crops (IAEA, 2018).

No	Legume name	Scientific name	No. of released mutants	Year of release		Mutagenic agent				S
No	Legume name	Scientific name	No. of released mutants	Oldest	Newest	P	C	NN	Co	S
1	Soybean	Glycine max	174	1962	2017	122	13	36	1	2
2	Groundnut	Arachis hypogaea	76	1971	2014	64	9	3	0	0
3	Common bean	Phaseolus vulgaris	57	1962	2007	26	22	9	0	0
4	Pea	Pisum sativum	34	1954	1995	20	10	3	0	1
5	Chickpea	Cicer arietinum	27	1981	2016	23	1	2	1	0
6	Faba bean	Vicia faba	20	1983	2008	12	6	2	0	0
7	Lentil	Lens culinaris	18	1981	2017	14	3	1	0	0
8	Cowpea	Vigna ungiculanta	13	1981	2017	7	4	2	0	0
9	Pigeonpea	Cajanus cajun	7	1977	2009	4	1	2	0	0

P: physical, C: chemical, NN: data not provided or/and indirect mutation, Co: combine treatment, S: spontaneous mutation.

Table 5

Gene isolation and identification of several legume crops through forward and reverse genetics using induced mutant plants.

Name of crop	Scientific name	Mutant phenotype	Mutagenic agent	Name of gene	Gene function	Reference
Soybean	Glycine max L.	Golden yellow trifoliate leaves, pods, and cotyledons as well as reduced plant height	EMS	Golden Yellow Leaves (GLY)	Chlorophyll biosynthesis	Li et al. (2017)
		Reduced plant height and shortened internodes	EMS	Glycine max Dwarf (GmDW1)	Gibberelline (GA) biosynthesis-deficient	Li et al. (2017)
		High seed oleic acid content	EMS	Fatty Acid Desaturase 2 (FAD2)	Conversion of oleic acid to linoleic acid	Lakhssassi et al. (2017)
		Early flowering	Gamma rays	Glyma10g36600 (GI), Glya02g33040 (AGL18), Glyma17g11040 (TOC1), and Glyma14g10530 (ELF3)	Affecting the expression of flowering promoter Glycine max Flowering Locus T 2a (GmFT2a)	Lee et al. (2016)
Pea	Pisum sativum	Novel leaf morphology (stipule size)	Fast neutron	Stipule reduce (St)	Regulating cell division and cell expansion in the stipule	Moreau et al. (2018)
		Non-flowering phenotype	X-rays	VEGETATIVE1 (VEG1)	Regulating secondary inflorescence	Berbel et al. (2012)
		Determinate growth habit	Gamma rays	Pisum sativum Terminal Flower 1a (PsTFL1a) or DETERMINATE (DET)	Maintain the indeterminacy of the apical meristem during flowering	Foucher et al. (2003)
		Early flowering	Gamma rays	Pisum sativum Terminal Flower 1c (PsTFL1c) or LATE FLOWERING (LF)	Repressor of flowering	Foucher et al. (2003)
Cowpea	Vigna unguiculata L.	Determinate growth habit	Gamma rays	Vigna unguiculata Terminal Flower 1 (VuTFL1)	Plant determinacy	Dhanasekar and Reddy (2015)
Barrel medic	Medicago truncatula	Pentafoliate leaf morphology, development of rachis structure, and alteration of petiole and rachis length	Fast neutron	Palmate-like pentafoliata (PALM1)	Encoding Cys(2)His(2) zinc finger transcription factor and maintaining trifoliate leaves morphology	Chen et al. (2010)
		Mutants unable to establish a symbiotic association with endomycorrhizal fungi	EMS and gamma rays	Doesn't Make Infections 1, 2, 3 (DMI1, DMI2, DMI3) and Nodulation Signaling Pathway (NSP)	Regulating rhizobium nodulation (Nod) factor transduction pathway	Catoira et al. (2000)
		Leaflets unable to fold in the dark	Fast neutron	Elongated Petiolule1 (ELP1)	Encoding a putative plant-specific LATERAL ORGAN BOUNDARIES (LOB) domain transcription factor and development of the motor organ	Chen et al. (2012)
		Different composition of hemolytic saponins	EMS	Cytochrome 72A67 (CYP72A67)	Synthesis of hydroxylation at the C-2 position downstream of oleanolic acid and catalyzer of oxidation at the C-2 position in the hemolytic sapogenin pathway	Biazzi et al. (2015)

Released mutants by induced mutagenesis in major legume crops (IAEA, 2018). P: physical, C: chemical, NN: data not provided or/and indirect mutation, Co: combine treatment, S: spontaneous mutation. Gene isolation and identification of several legume crops through forward and reverse genetics using induced mutant plants.

Next-generation genotyping and sequencing technology

NGS and genotyping are significant and cost-effective techniques that facilitate the identification of some important gene structures. Next-generation genotyping also helps to gain perception about the essential mechanism of gene expression and metabolism. Moreover, it also facilitates the characterization of genomic resources and development, evolution, and MAB, even in the absence of sequence data. NGS also contributes to the efforts to speed-up the development of transcriptome sequencing, de-novo assembly, and sequence-based marker resources for crop development that are less costly than phenotyping (Unamba et al., 2015). The arrangement of the sequence provides the information of genes controlling the growth, adaptation concerning environment, and development. Successful genome sequencing accomplished for five legumes crops: chickpea (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c), pigeon pea (Varshney et al., 2012), Trefoil sp. (Sato et al., 2008), soybean (Schmutz et al., 2010) and Medicago (Young et al., 2011). At large scale, draft genome sequencing of unique cultivars for important traits has made possible the identification of structural diversity. Currently, genome-based sequencing is more popular because of the detailed analysis of genetic traits in plants and other organisms. Genotyping by sequencing has been used for mapping, purity testing, marker-trait link, MAS, and genomic selection (Varshney et al., 2014). It includes target-based amplicon sequencing, hybridization, and representation sequencing. The choice of genotyping technology depends on many factors i.e. nature of the project, genome size, and funds. Also, MAB focused on less cost for running competitive allele-specific PCR (KASP) markers for diversity and identification of SNPs in a larger population (Thompson, 2014). Target sequencing based on amplicon technology has addressed many questions based on diversity and specific gene functions from population genetics (Naj et al., 2011). Besides, cheap and fast techniques are used to construct libraries with PCR, which is essential for identification of genes and development target amplicons and phylogenetic relationships (Bybee et al., 2011). However, this technique is restricted only to the identification of small loci and not significant for mapping complex traits. Moreover, SNPs by full genome-based delayed in many genomes such as groundnut and soybean. An approach called complexity reduction of polymorphic sequences (CRoPS) established on DNA restriction digestion (methylation) has the characteristics needed to reduce the difficulty of two or more genetically diverse samples prepared by amplified fragment length polymorphism (AFLP) (Van Orsouw et al., 2007). Genotyping assays and their conversion rates make them more attractive for medium- to large-scale genotyping, especially find a single copy of genome or fewer SNPs. Peterson et al. (2012) established a new technique called double-digest restriction-site associated DNA (ddRAD). The uniqueness of this technique is that it uses different sizes of the genome to recover specific regions, which are disseminated into a genome randomly, and it improves the ability for multiplexing of many as hundreds of samples. Therefore, sequencing by genotyping has an advantage over RAD sequencing because of SNP detection and genotyping side by side (Poland et al., 2012). Exome sequencing is also becoming necessary because of the discovery of low frequency and infrequent diversity in coding that can be examined analytically using complex traits for crop improvement (Kiezun et al., 2012). It also can use 100,000 more markers relative to the Illumina system (Unterseer et al., 2014). Because of the higher efficiency of the exome array technology, it is a more suitable approach used for genomic-based sequencing breeding. The generated data is used for determining genomic estimated breeding value as well as for trail populations. The successful SNP arrays have been used in rice (Kumar et al., 2015) and maize (Unterseer et al., 2014). Affymetrix arrays with 60 K SNPs first used for three legumes—i.e., groundnut, chickpea, and pigeon pea—for trait mapping and molecular breeding analysis (Varshney, 2016). Sequencing and re-sequencing of the plant genomes enabled researchers to dissever and indulge the procedures based on genetics and classification. Plant genomes have been decoded by genome sequencing as per requirements (Michael and Jackson, 2013). The first attempt for genome sequencing made in Arabidopsis (The Arabidopsis Genome Initiative, 2000) and later indica (Yu et al., 2002) and japonica genome sequencing (Sanger method) (International, 2005). Later, with the advancement in sequencing, NGS-based whole genome sequencing approaches were essential to reduce costs and time (Schwarze et al., 2018). The NGS application made it easy for researchers to decode the complex genome sequence. More than 100 plant genomes were sequenced (Varshney et al., 2009). However, chickpea, groundnut, and pigeon pea were considering orphan crops because of low genetic resources (Varshney et al., 2012). However, the invention of the NGS technique made possible the sequencing of these crops (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c). Illumina technology, together with Sanger-based BAC-end sequence used for pigeon pea (ICPL 87119) genome assembly. It has generated about 237.2 Gb of paired-end reads. Data generated by Illumina and Sanger sequencing covers 72% of the pigeon pea genome, which is approximately 606 Mb data. De-novo gene prediction and gene annotation method have identified a 48 K gene in pigeon pea genome, which represents the higher side de novo assembly of the genome (Varshney et al., 2012). Another attempt made for pigeon pea sequencing based on long-read using GS-flx pyrosequencing to assemble 548 Mb data, and functionally annotated genes were identified (Singh et al., 2016). There are two types of chickpea present Kabuli and Desi can differentiate based on color, size and seeds surface, morphology, and flower color. Both types are geographically different and vary in nutrition, adaptation and stress tolerance (biotic and abiotic stress) (Purushothaman et al., 2014). The NGS approach has also used for the Kabuli chickpea variety and approximately 153 Gb sequence data generated by Illumina sequencing. It contains 87.56 Gb of sequence information, representing 74% of the chickpea genome (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c). The combination of different approaches has functionally annotated over 90% of the chickpea genome. Application of RAD and the whole genome re-sequencing approach have been used to collect data from 90 chickpea accessions (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c). For the Desi chickpea (ICC 4958) accession, an updated version of the chickpea genome with a 2.7-fold increase of pseudo-molecules was determined (Parween et al., 2015). It includes more than 30 K protein-coding genes as compared to the previous chickpea genome. The chromosome sequencing approach is used later to detect misassembled region data at the chromosome level. These data help authenticate the genome assemblies at the chromosomal level in Desi and Kabuli draft genomes (Ruperao et al., 2014). Bertioli et al. (2015) took the initiative for the sequencing of groundnut progenitors. These data included genome A (V14167) and genome B (K30076) together, constituting the tetraploid genome of cultivated groundnut. The data took at plant tissue and different growth stages, and it covers about 155.5 Mb data (V14167) and about 154.4X coverage of the genome while 175.6 Mb (K30076) were assembled and cover about 120.6X of the total. The available data provide access to 97% of groundnut genes and would enable the development and understanding related to stress resilience and adaptability of the groundnut cultivars. A parallel study conducted for the documentation of genetic resources and allelic diversity important for agronomic traits available in the gene bank. Based on available information “International Crops Research Institute for the Semi-Arid Tropics” (ICRISAT) takes the initiative for re-sequencing the legume germplasm for the identification of unique combinations of genes. During the re-sequencing analysis, approximately 292 pigeon pea accessions were used to recognize 13.8 million Kb diverse sequences, with an average of 41.1/Kb data (Varshney et al., 2017). A detailed review of re-sequencing analysis delivered the genomic region and genetic diversity among the pigeon pea genome, which is helpful for selection and domestication. The most important is to recognize the genome based on morphological data considered for marking the traits linked with markers and important QTLs. A recent study conducted for the re-sequencing of 429 chickpea accessions collected from 45 countries. The results predicted that about 122 candidate regions and 204 genes identified under selection. Moreover, it was also predicted that the primary center was the eastern Mediterranean and transfer to Mediterranean/fertile to central Asia and then moved towards Central Asia to East Asia and finally to India. About 262 markers and many candidate genes for thirteen traits were identified from GWA (Varshney et al., 2019).

Transcriptomics, gene discovery, and marker development

Advancement in genetic diversity development is significant for molecular breeding and crop improvement program. Because of the limited availability of genetic resources, there is a dire need to discover more transcript genes and develop markers for specific traits and improve the legume breeding program. Besides the development of molecular markers, EST generation provided a quick and easy platform to discover the new gene, which is important for functional genomics and to help to understand the molecular mechanism for sustainable agriculture production. The NGS provides deep insight in transcriptome sequencing by quick and economical means (Morozova et al., 2009). The genome sequencing of five legumes genomes, gene discovery, and generation of ESTs presented in Table 3. A study was conducted to determine the drought-responsive gene and molecular markers development based on the gene in chickpea, where approximately 435,018 reads and 21,491 ESTs have produced. Moreover, the relative genome sequencing results for chickpea with the Medicago genome assembly predicted 42,141 aligned tentative unique sequences (TUSs) with putative gene structure generated. The tentative unique sequencing was also used to identify different markers. It includes SSR (728), SNPs (495), COS (387), and ISR (2088) (Hiremath et al., 2011). Similarly, in another study, 2 million sequences with an average length of 372 bp were generated in chickpea with pyrosequencing technology. The de novo assembly represents that hybrid long read and short read assembly gave good results. Approximately 34,760 transcripts were generated with an average of 1020 bp, which represents about 4.8% of the total chickpea genome (Garg et al., 2011). Varshney et al. (2009) reported 20,162 total ESTs and 48,796 bacterial artificial chromosomes (BAC). Nayak et al. (2010) reported approximately 2000 SSR markers. Moreover, for chickpea (80,238) sequence tags were generated from whole-genome sequence profiling (Molina et al., 2008).

Table 3

Some legume genome sequencing results, the discovery of genes, and ESTs generated.

Legumes	No. of genes*	No. of ESTs**	References
Cajanus cajan	48,680	25,640	Varshney et al. (2012)
Cicer arientum (Chickpea)	28,269	46,064	Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c)
Lotus japonicus (trefoil)	46,430	1,530,030	Schmutz et al. (2010)
Medicago truncatula (barrel medic)	47,845	286,175	Young et al. (2011)

As of January 2014.

According to the Dana Farber Gene Indices (compbio.dfci.harvard.edu/tgi/plant.html).

Similarly, Vatanparast et al. (2016) analyze transcriptomes of Sri Lankan wing bean and stated that approximately 804,757 de-novo assembly reads were recorded, which produced 16,115 contigs and covered more than 90% of significant sequence data. Moreover, 97,241 singletons contigs transcripts also produced from the data set, and 12,956 SSRs generated. It also includes 2594 repeats for primer designing and 5190 SNPs in Wing bean genotypes. A study on Pongamia (Millettia pinnata), an essential medicinal legume species with industrial application, was conducted to characterize the seed transcriptomics using Illumina sequencing technology. The results predicted that approximately 83 billion reads contain 53,586 assembled unigenes with an average length of 787 bp. Two data set of unigenes covers 73.90% and 44.93% showed similarity to protein from NCBI and with Swiss protein databases, respectively. Furthermore, 364 unigenes were intricate to oil biosynthesis and accumulation and had potential candidates for future functional genomics study. About 5710 (EST-SSR) with a density of 7.39 kb sequences were identified (Huang et al., 2016). Libault and Stacey (2010) reported a PLAC8 transcript that is linked to N2 symbiotically, and it highly expressed in nodule tissues. It also contains the cysteine-rich region and domain for regulating cell numbers. Cytochrome P450, a highly expressed transcript, was also observed. Annotation for P450 advised involved in biosynthesis reaction and generated many biomolecules. L. japonicus remorin one gene is present in infected nodule cells, and overexpression of remorin one gene increases nodule number (Tooth et al., 2012). The re-sequencing of the two-soybean species (Glycine max and Glycine soja) results identified 425 genes that were absent in G. soja. Along with 12 genes involved in seed development and two genes tangled in lipid metabolism (Joshi et al., 2013). Lam et al. (2010) later confirmed by sequencing and phylogenetic study that these two cultivars were from the same ancestor. A re-sequencing analysis of 90 chickpea genotypes identified 122 genes that appeared as essential genes and used in modern breeding efforts. Moreover, six genomic regions ranging from 50 to 200 kb also found. Many genes are linked with disease resistance traits and are essential for selection (Varshney et al., 2013a, Varshney et al., 2013b, Varshney et al., 2013c). Re-sequencing of stable mutant genotypes helps to map the mutational actions in order to link a phenotype with connecting mutation and provide the base for forwarding genetics as a genomic tool to determine gene function. Similar work has been done by Belfield et al. (2012) in Arabidopsis to find the fast neutron mutagenesis that induced the deletions and single base pair replacement detected by old chip methods that have the complication highlighted. By applying to this technology, Leshchiner et al. (2012) identified approximately 2700 indel coding regions that impact mutant phenotypes, besides 17,000 SNPs. Moreover, 0.1 kb deletions also recognized among genomic regions of three mutant lines. The efficacy of the technology led to the progress of various ongoing projects and helped to consider deletions in mutant phenotypes.

RNA sequencing in legumes

RNA sequencing is an advance method for profiling the transcriptome. Applying RNA-Seq techniques provide information about gene characterization, functional genomic studies, gene expression analysis, especially when scarce information is available for the studied genome. The use of NGS, i.e., genome sequencing, RNA sequencing (Transcriptome sequencing), and DNA sequencing (Resequencing), provide deep insight for legume development and improvement program by genome evolution, architecture, and domestication studies (Orourke et al., 2014). It is a cost-effective quantification method that produces high reproducibility, high accuracy, and wide dynamic range. Modern plant omics techniques, i.e., NGS, replace the chip-based approaches for genomic studies and put an extra burden on bioinformatics, data analysis, speed, and storage. Moreover, the facility of comprehensive transcript profiling from the seeds and contrasting phenotypes could provide gene of interest and functional groups occupied in control and more important agronomic traits and at the same time provide a useful plate form from which a gene map could assemble for the species. RNA-based sequence technology has been used for genome-wide transcriptome profile across different crops, such as rice (Davidson et al., 2012), chickpea, (Mashaki et al., 2018), maize (Kakumanu et al., 2012) and Raphanus satyus (Arun-Chinnappa and McCurdy, 2015). RNA (quantification) sequence technology is used to determine the gene expression profiling under specific conditions (Kaur et al., 2014). RNA-Seq can be used to determine drug response, biomarker detection, basic medical research, gene expression analysis, differential gene expression analysis, and gene ontology classification. Differential genome expansions within the Vicia genus are visually attributable to the amplification of large retroelement sequences (Neumann et al., 2006). RNA sequencing has been used to determine the development, stress acclimation, and nitrogen fixation is limited in legumes such as soybean (Atwood, 2014), Lupin (O’Rourke et al., 2013, O’Rourke et al., 2013b), and alfalfa (Yang et al., 2011). Furthermore, RNA-Seq has also been applied to studies on microRNA in soybeans (Turner et al., 2012) and common beans (Perez et al., 2012). Soybean genome annotation results predicted 46,430 high confidence genes, while 19,780 lower confidence genes and 90.4% gene model were transcriptionally active (Schmutz et al., 2010). Hierarchical clustering analysis from tissue and development transcript predicted three upper group tissue, lower or ground tissue, and seed tissue (Severin et al., 2010a, Severin et al., 2010b). Extraordinary high-throughput next-generation RNA-Seq has advantages in transcriptomics studies (Garg and Jain, 2013). Moreover, the availability of software that assembled the genome sequenced data (O’Rourke et al., 2013, O’Rourke et al., 2013b). The RNA-Seq data have collected in most of the legume crop but mainly focused on marker development in Medicago sativa (Han et al., 2011), chickpea (Garg et al., 2011), peanut (Zhang et al., 2012a, Zhang et al., 2012b), lentil (Kaur et al., 2011), and common bean (Kalavacharia et al., 2011). RNA-Seq studies, used to recognize development, stress response, composition, and nitrogen fixation have been limited to soybean (Atwood, 2014), lupin (O’Rourke et al., 2013, O’Rourke et al., 2013b), alfalfa (Yang et al., 2011), and M. truncatula (Boscari et al., 2013). Moreover, RNA-Seq also used to study microRNAs in legume in common bean (Perez et al., 2012), soybean (Turner et al., 2012), and M. truncatula (Simon et al., 2009). One study from our lab determine the root transcriptomes through RNA-Seq under drought stress differentially expressed gene (DEGs) using faba bean genotype (Hassawi 2), and samples collected at two different stages (vegetative and flowering). Data analysis results predicted about 624.8 M total high-quality reads and 198,155 unigenes with a mean length of 738 bp. Most of the unigenes were downregulated at both stages, i.e., vegetative and reproductive stage. However, 15,366 genes were upregulated at flowering stage, while 14,097 genes upregulated at vegetative stage. At flowering stage, 20,144 down regulated while 22,737 genes down regulated at the vegetative stage. Drought stress-responsive genes encoded various regulatory proteins (kinases, phosphatases), functional proteins with enzymes for osmoprotectant, detoxification, and transporters, transcription factors (TF), plant hormones were expressed and up-regulated. Identified DEGs were novel and showed substantial change at expression level under drought stress conditions. Consequently, the qRT-PCR results were coinciding with sequencing, which could help for functional genomics and tolerance mechanisms in crop plants (Alghamdi et al., 2018).

Computational resources for RNA-seq transcriptome analysis

Various studies on legumes have produced transcriptome data for few legumes such as Medicago, Soybean, Chickpea, and Lotus (Garg and Jain, 2013). The importance of transcriptome analyses has made the related research groups manage and make these data available to researchers to assist them in unleashing and evaluating specific transcription activity of different genes at specific developmental stages (Table 6). It also enables the various research studies to characterize and annotate these genes and define their function in legumes. NGS has accelerated the characterization and quantification of the transcriptome, which has also enhanced the developmental evolution of advanced bioinformatics software. Transcriptome analysis has comprehended as a crucial element for research in any organism (Yang and Kim, 2015). We have introduced in this review the routine software used in the pipeline of RNA-Seq, starting from designing the RNA-Seq experiment leading to quality control of sequence reads, their alignment, annotation, and quantitative expression analysis (Table 7). The software mentioned in Table 7 has a description of the use of their own software at a specific stage of process and reference to online sources.

Table 6

Legume transcriptome data repository; Web resources.

Web resource	Legume	Type of information
LegumeIP	Medicago, Lotus, Soybean	RNA-sequence data, Microarray data
http://plantgrn.noble.org/LegumeIP/	Medicago, Lotus, Soybean	RNA-sequence data, Microarray data
MtGEA	Medicago	Microarray data
http://mtgea.noble.org/v3/	Medicago	Microarray data
LjGEA	Lotus	Microarray data
http://ljgea.noble.org/v2/	Lotus	Microarray data
CTDB	Chickpea	RNA-sequence data
http://www.nipgr.res.in/ctdb.html	Chickpea	RNA-sequence data
SoyPLEX	Soybean	Microarray data
http://www.plexdb.org/plex.php?database=Soybean	Soybean	Microarray data
SoySeq	Soybean	RNA-sequence data
http://www.soybase.org/soyseq/	Soybean	RNA-sequence data

Table 7

Bioinformatics software packages and workbenches available for transcriptome data analysis.

Process	Package	Description	Reference
Design of RNA-Seq experiment	Scotty	Measure the differential gene expression	http://scotty.genetics.utah.edu/
	ssizeRNA	Sample size calculation	https://cran.r-project.org/web/packages/ssizeRNA/index.html
	RNAtor	Calculate optimal parameters for popular tools	https://rdrr.io/bioc/PROPER/
Quality control	fastqp	quality assessment	https://github.com/mdshw5/fastqp
	FastQC	Quality control	http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
	AfterQC	Automatic, read trimming, error removing	https://github.com/OpenGene/AfterQC
	QC-Chain	Quality control
	FASTX-Toolkit	Read trimming	http://hannonlab.cshl.edu/fastx_toolkit/
	NGS QC Toolkit	Quality control	http://www.nipgr.res.in/ngsqctoolkit.html
	PRINSEQ	Generate summary statistics of sequence	http://prinseq.sourceforge.net/manual.html
	SolexaQA	Calculates sequence quality statistics and creates visual representation	http://solexaqa.sourceforge.net/
	mRIN	Calculating mRNA integrity	http://zhanglab.c2b2.columbia.edu/index.php/MRIN
	clean_reads	Cleans NGS reads	http://bioinf.comav.upv.es/clean_reads/
	cutadapt	removes adapter sequences	https://cutadapt.readthedocs.org/en/stable/
	Deconseq	Remove contamination from sequence data	http://deconseq.sourceforge.net/
	htSeqTools	Quality control, remove over amplification artifacts	http://www.bioconductor.org/packages/2.9/bioc/html/htSeqTools.html
	SEECER	Correct Sequencing error	http://sb.cs.cmu.edu/
	UCHIME	Detect chimeric sequences	http://drive5.com/uchime
Alignment tools	Bowtie	Short reads aligner	http://bowtie-bio.sourceforge.net/index.shtml
	BWA	Map low divergent sequence against a large genome	http://bio-bwa.sourceforge.net/
	Mosaik	Short gap containing sequence aligner	http://bioinformatics.bc.edu/marthlab/Mosaik
	GNUMAP	Needleman-Wunsch algorithm-based aligner	http://dna.cs.byu.edu/gnumap/
	WHAM	HTS alignment	http://research.cs.wisc.edu/wham/
	BBMap	Align reads directly to transcriptome	http://sourceforge.net/projects/bbmap/
	STAR	Detects splice junctions	https://github.com/alexdobin/STAR
Quantitative analysis/ transcriptome reconstruction	Cufflinks	Measure global de novo transcript isoform expression	http://cufflinks.cbcb.umd.edu/
	Scripture	Transcriptome reconstruction	http://www.broadinstitute.org/software/scripture/
	RNAeXpress	Extract and annotate transcripts	http://www.rnaexpress.org/
	SLIDE	Isoform Discovery	https://sites.google.com/site/jingyijli/
	StringTie	RNA-Sequence alignment assembler into potential transcripts	http://www.ccb.jhu.edu/software/stringtie/
	Alexa-Seq	Gene expression analysis	http://www.alexaplatform.org/alexa_seq/downloads.htm
	ERANGE	Normalize and quantify genes	http://woldlab.caltech.edu/rnaseq
	GFOLD	Ranking differentially expressed genes	http://compbio.tongji.edu.cn/~fengjx/GFOLD/gfold.html
	NEUMA	RNA abundance estimator based on mRNA isoforms	http://neuma.kobic.re.kr/
	SpliceSEQ	Alternative mRNA splicing patterns investigator	http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview
miRNA prediction and analysis	miRDeep-P	miRNA analysis	http://faculty.virginia.edu/lilab/miRDP/
	miRPlant	miRNA analysis	https://sourceforge.net/projects/mirplant/
Freely available workbenches	BioJupies		http://biojupies.cloud/
	Galaxy	Analysis pipeline	http://galaxyproject.org/
	RobiNA	Analysis pipeline	http://mapman.gabipd.org/web/guest/robin
	NGSUtils	Analysis pipeline	http://ngsutils.org/
	MeV	Analysis pipeline

Legume transcriptome data repository; Web resources. Bioinformatics software packages and workbenches available for transcriptome data analysis.

Conclusion

As the human population increases, the possibility of food shortage also increases. It is necessary to augment the effort of developing high yielding lines and cultivars against various abiotic stresses. Classical breeding techniques may help to improve legume crop traits in terms of quality, nutrition, and production but not at the required rate. Cost-effective and improved molecular breeding techniques help scientists to make better genomes of legume crops. Speed up processing, genomic techniques such as NGS and RNA sequencing (transcriptomics) have developed for gene identification, mapping, and construction of gene maps, which may help to improve the traits in many crop plants. In this review, we have focused on the classical breeding methods used for trait introgression and on new, improved techniques, such as proteomics, metabolomics, RNA sequencing, and re-sequencing for trait, development of molecular markers, gene discovery for understanding the metabolomics pathways, and GOIs for crop-specific genome to achieve higher gains in short time. These improved technologies will contribute to the development and improvement of legume crops and breeding programs, with the advantage of being budget-effective, environment-friendly, and, most importantly, less time-consuming.

116 in total

Review 1. Making the most of 'omics' for crop breeding.

Authors: Peter Langridge; Delphine Fleury
Journal: Trends Biotechnol Date: 2010-10-26 Impact factor: 19.536

Review 2. High-sensitive C-reactive protein: universal prognostic and causative biomarker in heart disease?

Authors: Ernst Rietzschel; Marc De Buyzere
Journal: Biomark Med Date: 2012-02 Impact factor: 2.851

3. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

Authors:
Journal: Nature Date: 2000-12-14 Impact factor: 49.962

4. Improving the quality of protein identification in non-model species. Characterization of Quercus ilex seed and Pinus radiata needle proteomes by using SEQUEST and custom databases.

Authors: M Cristina Romero-Rodríguez; Jesús Pascual; Luis Valledor; Jesús Jorrín-Novo
Journal: J Proteomics Date: 2014-02-04 Impact factor: 4.044

5. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease.

Authors: Adam C Naj; Gyungah Jun; Gary W Beecham; Li-San Wang; Badri Narayan Vardarajan; Jacqueline Buros; Paul J Gallins; Joseph D Buxbaum; Gail P Jarvik; Paul K Crane; Eric B Larson; Thomas D Bird; Bradley F Boeve; Neill R Graff-Radford; Philip L De Jager; Denis Evans; Julie A Schneider; Minerva M Carrasquillo; Nilufer Ertekin-Taner; Steven G Younkin; Carlos Cruchaga; John S K Kauwe; Petra Nowotny; Patricia Kramer; John Hardy; Matthew J Huentelman; Amanda J Myers; Michael M Barmada; F Yesim Demirci; Clinton T Baldwin; Robert C Green; Ekaterina Rogaeva; Peter St George-Hyslop; Steven E Arnold; Robert Barber; Thomas Beach; Eileen H Bigio; James D Bowen; Adam Boxer; James R Burke; Nigel J Cairns; Chris S Carlson; Regina M Carney; Steven L Carroll; Helena C Chui; David G Clark; Jason Corneveaux; Carl W Cotman; Jeffrey L Cummings; Charles DeCarli; Steven T DeKosky; Ramon Diaz-Arrastia; Malcolm Dick; Dennis W Dickson; William G Ellis; Kelley M Faber; Kenneth B Fallon; Martin R Farlow; Steven Ferris; Matthew P Frosch; Douglas R Galasko; Mary Ganguli; Marla Gearing; Daniel H Geschwind; Bernardino Ghetti; John R Gilbert; Sid Gilman; Bruno Giordani; Jonathan D Glass; John H Growdon; Ronald L Hamilton; Lindy E Harrell; Elizabeth Head; Lawrence S Honig; Christine M Hulette; Bradley T Hyman; Gregory A Jicha; Lee-Way Jin; Nancy Johnson; Jason Karlawish; Anna Karydas; Jeffrey A Kaye; Ronald Kim; Edward H Koo; Neil W Kowall; James J Lah; Allan I Levey; Andrew P Lieberman; Oscar L Lopez; Wendy J Mack; Daniel C Marson; Frank Martiniuk; Deborah C Mash; Eliezer Masliah; Wayne C McCormick; Susan M McCurry; Andrew N McDavid; Ann C McKee; Marsel Mesulam; Bruce L Miller; Carol A Miller; Joshua W Miller; Joseph E Parisi; Daniel P Perl; Elaine Peskind; Ronald C Petersen; Wayne W Poon; Joseph F Quinn; Ruchita A Rajbhandary; Murray Raskind; Barry Reisberg; John M Ringman; Erik D Roberson; Roger N Rosenberg; Mary Sano; Lon S Schneider; William Seeley; Michael L Shelanski; Michael A Slifer; Charles D Smith; Joshua A Sonnen; Salvatore Spina; Robert A Stern; Rudolph E Tanzi; John Q Trojanowski; Juan C Troncoso; Vivianna M Van Deerlin; Harry V Vinters; Jean Paul Vonsattel; Sandra Weintraub; Kathleen A Welsh-Bohmer; Jennifer Williamson; Randall L Woltjer; Laura B Cantwell; Beth A Dombroski; Duane Beekly; Kathryn L Lunetta; Eden R Martin; M Ilyas Kamboh; Andrew J Saykin; Eric M Reiman; David A Bennett; John C Morris; Thomas J Montine; Alison M Goate; Deborah Blacker; Debby W Tsuang; Hakon Hakonarson; Walter A Kukull; Tatiana M Foroud; Jonathan L Haines; Richard Mayeux; Margaret A Pericak-Vance; Lindsay A Farrer; Gerard D Schellenberg
Journal: Nat Genet Date: 2011-04-03 Impact factor: 38.330

6. The Medicago genome provides insight into the evolution of rhizobial symbioses.

Authors: Nevin D Young; Frédéric Debellé; Giles E D Oldroyd; Rene Geurts; Steven B Cannon; Michael K Udvardi; Vagner A Benedito; Klaus F X Mayer; Jérôme Gouzy; Heiko Schoof; Yves Van de Peer; Sebastian Proost; Douglas R Cook; Blake C Meyers; Manuel Spannagl; Foo Cheung; Stéphane De Mita; Vivek Krishnakumar; Heidrun Gundlach; Shiguo Zhou; Joann Mudge; Arvind K Bharti; Jeremy D Murray; Marina A Naoumkina; Benjamin Rosen; Kevin A T Silverstein; Haibao Tang; Stephane Rombauts; Patrick X Zhao; Peng Zhou; Valérie Barbe; Philippe Bardou; Michael Bechner; Arnaud Bellec; Anne Berger; Hélène Bergès; Shelby Bidwell; Ton Bisseling; Nathalie Choisne; Arnaud Couloux; Roxanne Denny; Shweta Deshpande; Xinbin Dai; Jeff J Doyle; Anne-Marie Dudez; Andrew D Farmer; Stéphanie Fouteau; Carolien Franken; Chrystel Gibelin; John Gish; Steven Goldstein; Alvaro J González; Pamela J Green; Asis Hallab; Marijke Hartog; Axin Hua; Sean J Humphray; Dong-Hoon Jeong; Yi Jing; Anika Jöcker; Steve M Kenton; Dong-Jin Kim; Kathrin Klee; Hongshing Lai; Chunting Lang; Shaoping Lin; Simone L Macmil; Ghislaine Magdelenat; Lucy Matthews; Jamison McCorrison; Erin L Monaghan; Jeong-Hwan Mun; Fares Z Najar; Christine Nicholson; Céline Noirot; Majesta O'Bleness; Charles R Paule; Julie Poulain; Florent Prion; Baifang Qin; Chunmei Qu; Ernest F Retzel; Claire Riddle; Erika Sallet; Sylvie Samain; Nicolas Samson; Iryna Sanders; Olivier Saurat; Claude Scarpelli; Thomas Schiex; Béatrice Segurens; Andrew J Severin; D Janine Sherrier; Ruihua Shi; Sarah Sims; Susan R Singer; Senjuti Sinharoy; Lieven Sterck; Agnès Viollet; Bing-Bing Wang; Keqin Wang; Mingyi Wang; Xiaohong Wang; Jens Warfsmann; Jean Weissenbach; Doug D White; Jim D White; Graham B Wiley; Patrick Wincker; Yanbo Xing; Limei Yang; Ziyun Yao; Fu Ying; Jixian Zhai; Liping Zhou; Antoine Zuber; Jean Dénarié; Richard A Dixon; Gregory D May; David C Schwartz; Jane Rogers; Francis Quétier; Christopher D Town; Bruce A Roe
Journal: Nature Date: 2011-11-16 Impact factor: 49.962

7. Gel-based proteomics in plants: time to move on from the tradition.

Authors: Arun K Anguraj Vadivel
Journal: Front Plant Sci Date: 2015-05-27 Impact factor: 5.753

8. Application of genomic tools in plant breeding.

Authors: A M Pérez-de-Castro; S Vilanova; J Cañizares; L Pascual; J M Blanca; M J Díez; J Prohens; B Picó
Journal: Curr Genomics Date: 2012-05 Impact factor: 2.236

Review 9. Emerging Genomic Tools for Legume Breeding: Current Status and Future Prospects.

Authors: Manish K Pandey; Manish Roorkiwal; Vikas K Singh; Abirami Ramalingam; Himabindu Kudapa; Mahendar Thudi; Anu Chitikineni; Abhishek Rathore; Rajeev K Varshney
Journal: Front Plant Sci Date: 2016-05-02 Impact factor: 5.753

Review 10. Plant Metabolomics: An Indispensable System Biology Tool for Plant Science.

Authors: Jun Hong; Litao Yang; Dabing Zhang; Jianxin Shi
Journal: Int J Mol Sci Date: 2016-06-01 Impact factor: 5.923

11 in total

Review 1. Mechanism and application of Sesbania root-nodulating bacteria: an alternative for chemical fertilizers and sustainable development.

Authors: Kuldeep Singh; Rajesh Gera; Ruchi Sharma; Damini Maithani; Dinesh Chandra; Mohammad Amin Bhat; Rishendra Kumar; Pankaj Bhatt
Journal: Arch Microbiol Date: 2021-01-03 Impact factor: 2.552

Review 2. Prospects of next generation sequencing in lentil breeding.

Authors: Jitendra Kumar; Debjyoti Sen Gupta
Journal: Mol Biol Rep Date: 2020-10-10 Impact factor: 2.316

Review 3. Metabolome Profiling: A Breeding Prediction Tool for Legume Performance under Biotic Stress Conditions.

Authors: Penny Makhumbila; Molemi Rauwane; Hangwani Muedi; Sandiswa Figlan
Journal: Plants (Basel) Date: 2022-07-01

4. Exploring biomarkers and transcriptional factors in type 2 diabetes by comprehensive bioinformatics analysis on RNA-Seq and scRNA-Seq data.

Authors: Yalan Huang; Linkun Cai; Xiu Liu; Yongjun Wu; Qin Xiang; Rong Yu
Journal: Ann Transl Med Date: 2022-09

5. Transcriptome profiling of cashew apples (Anacardium occidentale) genotypes reveals specific genes linked to firmness and color during pseudofruit development.

Authors: Thais Andrade Germano; Matheus Finger Ramos de Oliveira; Shahid Aziz; Antonio Edson Rocha Oliveira; Kátia Daniella da Cruz Saraiva; Clesivan Pereira Dos Santos; Carlos Farley Herbster Moura; José Hélio Costa
Journal: Plant Mol Biol Date: 2022-03-25 Impact factor: 4.076

6. Integrated analysis of transcriptomic and metabolomic profiling reveal the p53 associated pathways underlying the response to ionizing radiation in HBE cells.

Authors: Ruixue Huang; Xiaodan Liu; He Li; Yao Zhou; Ping-Kun Zhou
Journal: Cell Biosci Date: 2020-04-15 Impact factor: 7.133

7. The soybean (Glycine max L.) cytokinin oxidase/dehydrogenase multigene family; Identification of natural variations for altered cytokinin content and seed yield.

Authors: Hai Ngoc Nguyen; Shrikaar Kambhampati; Anna Kisiala; Mark Seegobin; Robert Joseph Neil Emery
Journal: Plant Direct Date: 2021-02-16

Review 8. A review of biotechnological approaches towards crop improvement in African yam bean (Sphenostylis stenocarpa Hochst. Ex A. Rich.).

Authors: Olubusayo O Oluwole; Oluwadurotimi S Aworunse; Ademola I Aina; Olusola L Oyesola; Jacob O Popoola; Olaniyi A Oyatomi; Michael T Abberton; Olawole O Obembe
Journal: Heliyon Date: 2021-11-25

Review 9. Legume Seed Protein Digestibility as Influenced by Traditional and Emerging Physical Processing Technologies.

Authors: Ikenna C Ohanenye; Flora-Glad C Ekezie; Roghayeh A Sarteshnizi; Ruth T Boachie; Chijioke U Emenike; Xiaohong Sun; Ifeanyi D Nwachukwu; Chibuike C Udenigwe
Journal: Foods Date: 2022-08-02

10. Genome-wide identification of AP2/EREBP in Fragaria vesca and expression pattern analysis of the FvDREB subfamily under drought stress.

Authors: Chao Dong; Yue Xi; Xinlu Chen; Zong-Ming Cheng
Journal: BMC Plant Biol Date: 2021-06-26 Impact factor: 4.215