Literature DB >> 22802711

An overall evaluation of the Resistance (R) and Pathogenesis-Related (PR) superfamilies in soybean, as compared with Medicago and Arabidopsis.

Ana C Wanderley-Nogueira¹, Luis C Belarmino, Nina da M Soares-Cavalcanti, João P Bezerra-Neto, Ederson A Kido, Valesca Pandolfi, Ricardo V Abdelnoor, Eliseu Binneck, Marcelo F Carazzole, Ana M Benko-Iseppon.

Abstract

Plants have the ability to recognize and respond to a multitude of pathogens, resulting in a massive reprogramming of the plant to activate defense responses including Resistance (R) and Pathogenesis-Related (PR) genes. Abiotic stresses can also activate PR genes and enhance pathogen resistance, representing valuable genes for breeding purposes. The present work offers an overview of soybean R and PR genes present in the GENOSOJA (Brazilian Soybean Genome Consortium) platform, regarding their structure, abundance, evolution and role in the plant-pathogen metabolic pathway, as compared with Medicago and Arabidopsis. Searches revealed 3,065 R candidates (756 in Soybean, 1,142 in Medicago and 1,167 in Arabidopsis), and PR candidates matching to 1,261 sequences (310, 585 and 366 for the three species, respectively). The identified transcripts were also evaluated regarding their expression pattern in 65 libraries, showing prevalence in seeds and developing tissues. Upon consulting the SuperSAGE libraries, 1,072 R and 481 PR tags were identified in association with the different libraries. Multiple alignments were generated for Xa21 and PR-2 genes, allowing inferences about their evolution. The results revealed interesting insights regarding the variability and complexity of defense genes in soybean, as compared with Medicago and Arabidopsis.

Entities: CellLine Chemical Disease Gene Species

Keywords: Glycine max; Medicago truncatula; bioinformatics; biotic stress; pathogen response

Year: 2012 PMID： 22802711 PMCID： PMC3392878 DOI： 10.1590/S1415-47572012000200007

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

In order to prevent the effects of pathogen attack, plants evolved the ability to recognize the threat and struggle against the invader as well as trigger an effective response (Bolton, 2009). One of the most important steps of this complex response lies in the detection of pathogen invaders by the plant, a step where R (Resistance) genes play a crucial role. This sensing involves the recognition of a pathogen gene product called avirulence (avr) factor by a correspondent R gene. The plant will be resistant and the pathogen growth and establishment will be impaired when both avr and R genes are compatible, leading to the so-called Hypersensitive Response (HR) that triggers diverse responses, including local cell death to impair spreading of the pathogen (Bonas and Anckerveken, 1999). Besides this local reaction, the HR activates a signal cascade – including hormones and PR (Pathogen Related) genes, among others – that are able to establish resistance against a spectrum of different pathogen classes, this corroborating observations made at the beginning of the last century that plants, as well as animals (Benko-Iseppon ), may be immunized against the attack a of given pathogen after infection by another pathogen (Chester, 1933). Besides a local reaction, plants may also display the Systemic Acquired Resistance (SAR). The SAR pathway is also common in many non-compatible plant-pathogen interactions (Nurnberg and Brunner, 2002). As soon as the pathogenic agent is detected, the plant induces a complex set of signal molecules able to activate defense proteins that may have a direct antimicrobial effect, as in the case of Pathogenesis-Related (PR) genes (Durrant and Dong, 2004). Alternatively, they may induce the production of secondary metabolites that impair pathogen movement or growth within the plant tissues (Sparla ). Resistance genes are generally classified into five different groups or classes, defined according to their conserved domains (CD) (Bent, 1996; Hammond-Kosack and Jones, 1997; Ellis and Jones, 2000). The first class is represented by the HM1 gene of maize that encodes a reductase able to inactivate toxins produced by the fungus Helminthosporium carbonum (Joahal and Briggs, 1992). It is the only R gene class where conserved domains are absent. A second class is represented by the Pto gene from tomato that confers resistance against the bacterium Pseudomonas syringae pv. tomato. It is characterized by a serine/threonine-kinase (ser/thre-kinase) domain, able to interact with the avrPto gene (Tang ). This gene was also identified in other plants, such as Arabidopsis thaliana, Phaseolus vulgaris (Melotto ), eucalyptus (Barbosa-da-Silva ) and sugarcane (Wanderley-Nogueira ). The third class is represented by genes bearing two domains, viz. LRR (Leucine Rich Repeats) and NBS (Nucleotide Binding Site) (Liu ). This is the case of the Rpm1 and Rps2 genes from A. thaliana, the N gene from tobacco, L6 from flax, Prf from tomato and Rpg1 from soybean also found in common bean and faba bean (Mindrinos ; Lawrence ; Salmeron ; Ashfield ). The fourth R gene class encodes a membrane-anchored protein composed of an extracellular LRR domain, a transmembrane region and a short intra-cellular tail in the C terminal. The Cf gene from tomato is an example of this class, conferring resistance against Cladosporium fulvum (Dixon ). The Xa21 gene from rice confers resistance to the bacteria Xanthomonas oryzae pv. oryzae and is a representative of the fifth class (Song ; Wang ). This gene encodes an extracellular LRR domain (similar to the Cf gene), as well as a ser/thre-kinase domain (similar to the Pto gene), suggesting an evolutionary connection among different classes in the genesis of plant R genes (Song ). PR proteins comprise pathogen-induced proteins that are routinely classified into 17 families based on their biochemical and molecular biological properties, from PR-1 to PR-17 (van-Loon ). Similarities among sequences and serological or immunological properties form the basis of their classification (van-Loon ). Although most PR proteins are known to have antifungal activities, their active molecular mechanisms are not well understood except for PR-2 (β-glucanases) and PR3 (chitinases) (Kitajima and Sato, 1999). PR1 is the most abundantly accumulated protein after pathogen infection and its genes have been cloned in many plants, such as tobacco (Gaffney ), A. thaliana (Metzler ), tomato (Tornero ) and apple. Although its phytochemical functions are unknown in all these species, this gene class is nonetheless considered to be a typical SAR marker (Bonasera ). PR-5 is a thaumatin-like protein with high antifungal activity, being also expressed under cold stress in overwintering monocots where it exhibits antifreeze activities (Hon , Atici and Nalbantolu, 2003, Griffith and Yaish, 2004). Other families like PR-8 (Glycosyl hydrolase), PR-9 (secretory peroxydase), PR-14 (lipid transfer proteins), PR-15 (oxalate oxydase) and PR-17 (basic secretory proteins) (Nanda ) have been well studied and are believed to be involved in plant defense responses, although their molecular mechanisms have yet to be determined (Bolton, 2009). Most PR genes are expressed at a basal level under normal growth conditions, but are rapidly induced after pathogen infection. It is worthy of note that several PR genes are also regulated during development, leaf senescence and pollen maturation, as well as by environmental factors, such as osmotic, cold and light stress (Zeier ). Soybean (Glycine max) is a globally important crop, providing oil and at least twice as much protein per acre as any other major grain (Libault ). Economically, soybean is the most valuable source of protein and edible oil crop in the world and serves as a model for seed and other developmental processes (Cannon ). The present evaluation offers an overview of the main available sequences regarding plant-pathogen interaction of the R and PR classes in the soybean transcriptome, here compared with data available from Arabidopsis and Medicago, providing insights on the expression of such sequences in different tissues and inferring as to how these genes may have behaved over the course of evolution.

Material and Methods

Search and screening for R and PR genes in soybean, Medicago and Arabidopsis databases

For this purpose 59 proteins that play important roles in plant defense response were selected as seed sequences. The selected protein sequences were related to the 42 R and 17 PR gene classes described above. The R genes were previously compiled by Barbosa-da-Silva and Wanderley-Nogueira , and PR seed sequences are available in Table S1 (Supplementary Material). All 59 seed sequences regarded full cDNAs that were obtained from the NCBI database and conceptually translated to improve search strategies. For the identification of these gene analogs in soybean, Medicago and Arabidopsis transcriptomes, tBLASTx alignments were carried out against three platforms: GENOSOJA (The Brazilian Soybean Genome Consortium), TIGR (The Institute for Genomic Research) and TAIR (The Arabidopsis Information Resource), using 1e−05as the cut-off value. Obtained clusters were annotated and analyzed for score, e-values, sequence size and presence of conserved domains, as shown in Table 1. For this purpose all clusters were translated using the TRANSLATE tool of Expasy and screened for conserved motifs with the aid of the rps-BLAST CD-search tool (Altschul ). The best match for each gene in each studied species was submitted to a BLASTx alignment in NCBI GenBank in an effort to confirm their putative function.

Table 1

Soybean clusters matching results for each procured R and PR gene. Showing number of matches for each seed-sequence, e-value, score, size in nucleotide (nt) and amino-acid (aa), presence of conserved domains and number of matches in soybean (S) Medicago (M) and Arabidopsis (A). Abbreviations: (c) = Complete; (i) = Incomplete.

Gene class	Features of soybean clusters						# Matches
	Best match	e-value	Score	Size		Conserved domain (c/i)
	Best match	e-value	Score	(nt)	(aa)	Conserved domain (c/i)	S	M	A
PR1	Contig 5043	7e-47	181	498	165	SCP (c)	8	19	22
PR2	Contig 9520	1e-102	369	1047	348	Glyco-Hydro (c)	86	214	95
PR3	Contig 5557	4e-48	187	957	318	Chitinase (c)	7	21	15
PR4	Contig 10145	2e-67	250	636	211	Chitin binding/Barwin(c)	2	14	2
PR5	Contig 29866	5e-60	226	1041	345	Thaumatin (c)	21	36	29
PR6	Contig 5043	1e-46	181	495	164	SCP (c)	11	17	23
PR7	Contig 66	5e-141	481	2283	760	Peptidase/Subtilisin (c)	82	97	50
PR8	Contig 14006	4e-89	232	894	297	Hevamine (c)	11	22	1
PR9	Contig 1796	1e-120	428	978	325	Secretory peroxidase(c)	31	46	66
PR10	Contig 4865	6e-26	112	410	160	Bet v 1(c)	18	18	34
PR11	Contig 5806	9e-79	289	1098	365	Plant chitinase class V (c)	1	11	9
PR12	Contig 13869	1e-09	58	291	96	Gamma-thionin (i)	1	15	8
PR13	No match	-	-	-	-	-	-	-	4
PR14	Contig 13114	6e-18	86	357	118	Lipid-transfer protein (c)	18	36	16
PR15	SJ01-E1-UK1-089-G01-UC.F	1e-48	188	660	219	Cupin2 (c)	27	47	37
PR16	Contig 13716	2e-59	223	666	221	Cupin2 (c)	27	51	37
PR17	Contig 25189	2e-73	271	678	225	Basic secretory proteins (c)	2	1	5
Pto	Contig 5707	2e-143	505	2502	833	Ser-Thre Kinase (i)	238	239	248
Prf	Contig 5666	4e-34	142	2736	920	P-loop NTPase domain (c)	5	25	49
Pti4	SJ05-E1-S06-021-E06-UC.F	6e-33	136	825	274	DNA-binding domain (c)	89	90	119
Pti5	Contig 25338	6e-45	176	645	214	DNA-binding domain (c)	70	70	89
Pti1	SJ05-E1-UK1-024-H07-UC.F	2e-33	138	759	252	DNA-binding domain (c)	104	112	138
Pti6	Contig 10050	2e-146	514	1086	361	Tyr Kinase (i)	248	249	249
RAR1	Contig 27196	1e-76	281	672	223	CHORD superfamily (c)	1	2	1
RIN4	Contig 20845	7e-25	109	741	246	AvrRpt-cleavage (c)	2	8	1
RPM1	Contig 25089	5e-29	125	2781	926	P-loop NTPase-LRR (c)	14	73	90
RPS2	SJ01-E1-L06-046-G05-UC.F	7e-10	62	2538	845	P-loop NTPase-LRR (c)	4	36	90
PBS1	Contig 26006	3e-132	467	1152	383	Protein Kinase (c)	239	247	251
RPS5	Contig 10273	1e-17	87	1941	646	P-loop NTPase-LRR (c)	5	36	65
MLA10	SJ18-P1-S12-046-B20-UC.F	4e-07	51	913	305	P-loop NTPase-LRR (c)	0	21	30
L6	Contig 16939	5e-55	210	3198	1065	TIR-P-loop-LRR (c)	24	123	171
RRS1	Contig 14438	1e-30	107	2211	736	P-loop NTPase-LRR (c)	102	142	239
RPS4	Contig 16939	1e-35	148	3198	1065	TIR-P-loop-LRR (c)	50	198	226
Xa1	Contig 5507	5e-63	238	3609	1202	P-loop NTPase-LRR (c)	17	108	91
Hrt	Contig 16939	3e-54	207	3198	1065	TIR-P-loop-LRR (c)	61	208	181
Mi1	Contig 12827	2e-08	58.2	2733	910	TIR-P-loop-LRR (c)	1	29	50
BS2	Contig 10273	3e-14	76	1941	646	P-loop NTPase-LRR (c)	9	68	135
GPA2	SJ14-E1-S07-021-C03-UC.F	1e-22	104	2733	910	P-loop NTPase-LRR (c)	10	50	123
RX1	Contig 5666	4e-39	159	2736	920	P-loop NTPase-LRR (c)	14	61	112
Pi-ta	SJ14-E1-S07-021-C03-UC.F	1e-23	107	2733	910	P-loop NTPase-LRR (c)	2	17	62
I2	Contig 5507	8e-64	241	3609	1202	P-loop NTPase-LRR (c)	22	109	108
RPP8	SJ14-E1-S07-021-C03-UC.F	3e-19	94	2733	910	P-loop NTPase-LRR (c)	11	71	129
HERO	SJ14-E1-S07-021-C03-UC.F	1e-08	58	2733	910	P-loop NTPase-LRR (c)	5	39	78
L6	no match	-	-	-	-	-	-	-	-
RPP13	SJ14-E1-S07-021-C03-UC.F	1e-23	107	2733	910	P-loop NTPase-LRR (c)	2	51	77
RP1	Contig 10273	2e-26	86	1941	646	P-loop NTP-ase (c)	14	71	69
N	Contig 16939	2e-51	198	3198	1065	TIR- P-loop-LRR (c)	64	196	171
P	Contig 20164	3e-11	64	585	194	Dirigent super family (c)	17	37	18
M	no match	-	-	-	-	-	-	-	-
WRKY25	Contig 3637	4e-65	244	1761	586	WRKY superfamily 2 (c)	68	52	77
WRKY33	Contig 3637	7e-78	287	1761	586	WRKY superfamily 1 (c)	71	58	85
WRKY29	SJ01-E1-L08-116-F02-UC.F	5e-21	97.4	768	255	WRKY superfamily 2 (c)	28	22	37
Cf2	Contig 17295	1e-71	267	3132	1043	Multiple LRR (c)	249	250	266
Cf4	Contig 14446	4e-40	162	2256	751	Multiple LRR (c)	116	208	249
Cf5	Contig 6299	1e-39	160	2955	984	Multiple LRR (c)	123	207	219
Cf9	Contig 14446	5e-53	204	2256	751	Multiple LRR (c)	107	188	267
Xa21	Contig 439	3e-69	259	2913	970	LRR-Kinase (c)	251	249	247
FLS2	Contig 6299	6e-66	233	2955	984	LRR-Kinase (c/i)	174	251	249
EFR	Contig 439	2e-59	227	2913	970	LRR-Kinase (c)	250	239	253

In a second manual analysis redundancies, i.e. clusters that matched more than one gene due to common domains, were eliminated. For this purpose, clusters matching each query sequence were annotated on a local database (called ‘non-redundant’). The third step of the analysis aimed at comparing the number of R and PR candidate sequences obtained after the tBLASTn searches against the soybean, Arabidopsis and Medicago databases by direct counting of non-redundant clusters for each one of the 59 genes studied.

Phylogenetic analysis

Aiming to analyze the relationships among these genes, some R and PR gene candidates were selected from all three studied species for an evolutionary analysis using the maximum parsimony method and bootstrap function with 5,000 replicates. For this purpose CLUSTALx alignments were submitted to the program MEGA (Molecular Evolutionary Genetic Analysis), Version 4 for Windows (Tamura ).

Studying syntenic regions among the soybean and Medicago genomes

Best matches for all selected soybean genes were aligned against the M. truncatula pseudogenome aiming to anchor the 59 soybean sequences in virtual chromosomes through the CVit-BLAST procedure implemented in the Medicago sequencing resource website. BLAST algorithm parameters (score, e-value and percentage of identity) were adjusted to infer about the position of soybean sequences along the Medicago virtual chromosomes.

In silico expression assay based on GENOSOJA EST sequences

A preliminary analysis of the prevalence regarding the 59 genes in the soybean libraries was verified by direct correlation of the read frequencies of each cluster in various GENOSOJA cDNA libraries. Information regarding the 65 libraries that constitute the GENOSOJA database is available on The Soybean Genome Project Website. For practical purposes we combined some libraries that comprised different stages of the same tissue/organ (for example, B01 and B02 are here referred to as “B”), resulting in a total of 16 libraries (B: vegetable buds of field grown plants; C: cotyledons; EN: endosperm; EP: epicotyls; F: flowers; H: hypocotyls; LV: leaves; R: roots; SH: germination shoots; ST: stems; SO: somatic embryos; SC: soybean submitted to drought; LI: leaves infected with Asian rust; MJ: soybean submitted to Meloidogyne javanica; SD: seeds and UK: unknown). To generate an overall picture of selected R and PR gene expression patterns in soybean, a hierarchical clustering approach (Eisen ) was applied using normalized data and a graphic representation constructed with the aid of the CLUSTER program. Dendrograms including both axes (using the weighted pair-group for each cluster and library) were generated with aid of the TreeView program (Page, 1996). In these graphics, light yellow means no expression and red indicates all degrees of expression.

In silico expression assay based on the GENOSOJA SuperSAGE libraries

R and PR candidates were also used to screen the six SuperSAGE libraries generated by the GENOSOJA consortium. For the drought experiment, four libraries were generated using roots of two contrasting soybean genotypes, viz. Embrapa-48 (tolerant) and BR-16 cultivar (susceptible), both submitted to dehydration in the dark for 25 up to 150 min (all times bulked together), as compared with non-stressed controls. The other stressed library was generated using leaves of the resistant accession PI561356 inoculated with rust fungus and collected 12, 24 and 48 h post inoculation. For the composition of the pathogen-stressed library, equimolar amounts of the three inoculation times were used, as compared with the negative, non-inoculated control of the same genotype. The libraries were constructed at GenXPro GmbH (Frankfurt, Germany), essentially as described by Matsumura , and were subsequently sequenced via a SOLEXA platform. Aiming to perform an overview of the GENOSOJA SuperSAGE data associated with R and PR genes, SuperSAGE tags were submitted to a BLASTn (maximum e-value 1e−05) against the database generated from three comparisons of the six available libraries (1-Embrapa-48, drought tolerant stressed vs. negative control; 2- BR-16, drought susceptible stressed vs. negative control; 3-PI561356 fungus resistant stressed vs. negative control). Each SuperSAGE tag was annotated considering the respective library comparison and also the respective aligned ESTs.

Results

Description and distribution of R and PR genes in soybean, Medicago and Arabidopsis

The tBLASTn alignment against the soybean transcriptome using the 59 known R and PR gene probes returned 1,066 non-redundant sequences from the contigs and singlets deposited in the GENOSOJA database. Among them, 700 represented contigs and 366 singlets, which together encompassed 26,653 reads. Regarding the tBLASTn searches in the Medicago transcriptome, a total of 1,727 sequences were positive matches. In Arabidopsis, 1,533 sequences returned matches after the same procedure. A screening of R and PR genes in these three species resulted in the identification of 4,326 candidates, of which 3,065 were R and 1,261 PR gene candidates. A graphical representation regarding the prevalence of these sequences and how they are distributed among the soybean, Medicago and Arabidopsis transcriptomes is shown in Figure 1.

Figure 1

R and PR genes encountered in soybean, Arabidopsis and Medicago transcriptomes. R genes are represented in the outer circle and PR genes in the inner circle for each species.

After analyzing all results it was observed that only one PR (PR-13) and two R genes (L6 and M) were absent from the soybean transcriptome, while all the other 56 genes presented positive results in the tBLASTn searches. The same was denoted in the Medicago tBLASTn results for these three genes. Also in Arabidopsis no matches could be found for the two R genes L6 and M, but four candidate sequences could be identified for the PR-13 class, as shown in Table 1. A comparison of the distribution of non-redundant sequences in the three species revealed that the NBS-LRR family was the most frequent one in all cases, while the LRR-kinase class was the least represented in all studied organisms (Figure 2). Moreover, it was observed that while Arabidopsis presented a higher number of R gene candidates, Medicago matched the high number of PR genes. In both cases, soybean presented the lowest number of matches (Figure 3A).

Figure 2

Distribution of R gene families in soybean, Arabidopsis and Medicago in the four main R gene categories, considering their conserved domains. Column numbers correspond to the amount of non-redundant sequences for each class.

Figure 3

Distribution of R and PR genes in soybean, Medicago and Arabidopsis (A). Distribution of Xa21 and PR-2 in soybean, Medicago and Arabidopsis (B). Numbers of matches for each gene category are shown inside the columns.

The three most represented R and PR genes in all species were the same, with Xa21, EFR and Pti6 representing R genes and PR-2, PR-7 and PR-9 representing PR genes. Due to this abundance, both Xa21 and the PR-2 genes were selected for the construction of a dendrogram and expression analysis. Matching of Xa21 and PR-2 candidates in soybean, Medicago and Arabidopsis did not follow a regular distribution pattern, since soybean presented fewer matches for both genes, and most of the Xa21 candidate sequences were found in Medicago, whereas most PR-2 candidates were found in Arabidopsis (Figure 3B). Among the 310 PR genes of soybean only 40 matched with more than one seed sequence, all the others being exclusive to a given PR gene family. On the other hand, almost all R genes matched sequences that aligned with more than one probe, requiring manual sorting. Exceptions occurred only with respect to RAR, RIN, P, WRKY29, and Xa21, which aligned in most cases with exclusive sequences.

Phylogenetic analysis of Xa21 and PR-2 genes

Dendrograms generated for Xa21 and PR-2 genes using the soybean sequences and orthologs clearly divided dicots and monocots into distinct clades (Figure 4). In the Xa21 analysis, the fern Selaginella moellendorffii was placed in a basal position from which the two branches representing monocots and dicots emerged (Figure 4A). The monocots group included members of the Poaceae family in one branch, with a bootstrap CI of 95%, associated in the same branch with the palm Elaeis guineensis. Regarding the dicot group, it was observed that both Fabaceae members (G. max and M. truncatula) were positioned together, while the other branch included members of the suborder Eurosidae I (Vitis vinifera and Ricinus communis), together with A. thaliana, a member of the Eurosidae II suborder.

Figure 4

Dendrograms generated after maximum parsimony analysis showing the relationships among selected plant species considering sequences of (A) Xa21 and (B) PR-2. Keys in (1) represent monocots and in (2) dicots. Xa21: the circle on the root of A shows the divergence point between monocots and dicots. PR-2: the circle on the root of B shows an ancestor with a symplesiomorphic character. Numbers at the base of the branches denote bootstrap values and the bar represents the evolutionary scale.

Considering the PR-2 dendrogram (Figure 4B), the grasses (Poaceae represented by rice and maize) occupied a basal position, from which a clade containing two monocots, ginger (Zingiber officinale) and banana (Musa paradisiaca), emerged. Moreover, a large clade containing all dicots was split into two subclades that behaved as merophyletic groups. For example, tobacco (Nicotiana tabacum) and coffee (Coffea arabica), members of the Asterid order, remained together, but potato (Solanum tuberosum) of the same order was positioned on another branch. Soybean and Medicago were also positioned in separate subclades.

Expression pattern of R (Xa21) and PR (PR-2) genes in the soybean transcriptome

From the 26,653 reads identified, an in silico expression assay was carried out considering transcripts from both genes Xa21 (2,980 reads) and PR-2 (1,099 reads). This allowed identifying their prevalence and normalizing their distribution among the tissues and conditions represented in the 65 different libraries. Graphic illustrations of these comparisons are available as Figures S1 and S2 (Supplementary Material). The analysis of their expression pattern in soybean, obtained from normalized data, revealed that all libraries presented almost the same number of reads. The most representative library was from seed tissues (SD), presenting 10% of the identified reads. Expression in tissues from leaves (LV), roots (R) and flowers (F) presented similar expression, representing 9% of all reads in each tissue. The remaining tissues also presented significant expression (ranging from 5% to 8%), except in the case of libraries made from tissues submitted to the nematode Meloidogyne javanica (MJ), where no reads were identified.

Expression considering the SuperSAGE libraries

BLASTn results revealed that 944 soybean EST candidates aligned with 1,553 SuperSAGE tags when considering a cut-off value of ≤e−5. Among all tags, 1,072 aligned with the R gene candidates from different classes, with emphasis on the NBS-LRR class. Additionally, 481 tags aligned with PR gene candidates, most of them with the PR-9 secretory peroxidase family (Figure 5). Data concerning sequence-tag association are available as supplementary material (Tables S2, S3 and S4). The best results were obtained for comparison 1 (BR-16, drought susceptible stressed vs. negative control), which matched 613 non-redundant tags, while 465 were found for comparison 2 (Embrapa-48, drought tolerant stressed vs. negative control), and for comparison 3 (PI561356 fungus resistant stressed vs. negative control) 475 SuperTags were represented (Figure 5). It is noteworthy that many tags matched in more than one comparison.

Figure 5

Number of SuperSAGE tags matching soybean R and PR gene candidates from three different comparisons among the six libraries: 1-Embrapa-48, drought tolerant stressed vs. negative control; 2- BR-16, drought susceptible stressed vs. negative control; 3- PI561356 fungus resistant stressed with Phakopsora pachyrhizi vs. negative control.

Anchoring soybean R and PR genes in Medicago virtual chromosomes

The alignment of 59 soybean genes against the Medicago virtual chromosomes revealed 1,253 sites in all nine chromosomes, also including sub-telomeric regions (Figure 6). 58 genes presented similarities with distinct segments in the same chromosome or appeared twice in distinct chromosomes. Only the PR-1 sequence anchored in an exclusive chromosome (2).

Figure 6

Graphic representation of soybean R and PR sequences positioned on Medicago truncatula chromosomes (MtChr) with the aid of the CVit-BLAST resource available at the website http://www.medicago.org/. Arrows indicate genes that appear in tandem repetitions.

The highest number of anchored genes was found in chromosome 8, matching 32 of the 59 genes in 85 sites. On the other hand, chromosome 6 presented the lowest number of anchored genes (12). Nonetheless, this chromosome presented the highest number of duplications, matching 228 sites, most of them in tandem positions. Such tandem repetitions could be also observed in three sites of chromosome 3. The lowest gene density was observed in the long arm of chromosome 3. Syntenic regions were evident in chromosomes 2 and 4 (Figure 6). Several sequences clustered along the genome, with some chromosomes rich in resistance genes, especially chromosomes 2, 7, 8 and 9, with at least four distinct genes in very close positions. These blocks of genes always matched R genes, while PR genes generally appeared in the same chromosomes in distinct sites.

Discussion

The 1,066 soybean sequences resulting from tBLASTx alignments confirmed the excellent coverage that the existing GENOSOJA databank comprises, including the most important representatives from different gene families. Legumes are plants known to be able to withstand many kinds of stresses, including rapid climate changes, drought tolerance, exposure to diseases and pests, water logging and flooding (Cannon ), which could explain the higher number of PR genes encountered in Medicago in comparison to Arabidopsis, since these families of genes can be activated by different kinds of biotic or abiotic stress (Glombitza ). The low number of R and PR gene candidates found in soybean is curious when compared to Arabidopsis and Medicago, since these have smaller genomes (157 Mb and 583 Mb respectively) than that of G. max (1,115 Mb). This may be due to the analyzed sample, which was restricted to expressed sequence tags, whereas the databases of both Arabidopsis and Medicago are larger. Previous studies on legumes showed that despite the relatively large difference in genome sizes of soybean and Medicago, gene densities are similar, indicating that a given Medicago region is likely to correspond well with two soybean regions (Mudge ). This leads us to believe that additional expression assays in soybean may reveal important genes that are expressed under very specific conditions. The number of soybean clusters that aligned with more than one R gene seed sequence is not surprising. Similar results were observed in previous studies regarding R genes of eucalyptus (Barbosa-da-Silva ) and sugarcane (Wanderley-Nogueira ). This occurs due to the common domains shared by R genes, as for example the LRR domain that is present in the LRR, NBS-LRR and LRR-kinase gene families, facilitating alignments with more than one gene. This is rarer when considering PR gene categories that are more distinct in structure and function (Kitajima and Sato, 1999), as also observed herein. A higher number of sequences matching NBS-LRR families, when compared to other classes, was also reported by Barbosa-da-Silva and Wanderley-Nogueira , confirming the general observation that most R genes are members of this class. Dendrograms generated from these data revealed a similar picture in both gene classes selected (Xa21 and PR-2). In the case of Xa21, the positioning of Selaginella moellendorffii as an outgroup was expected, since this species figures as a member of an ancient vascular plant lineage that first appeared 400 million years ago, and thus represents a basal node on the plant evolutionary tree (Weng ). The analysis of the Xa21 orthologs from different species reflected their relationship according to classic taxonomy. Lilliopsida class (monocots) appeared as a monophyletic group uniting on the same branch Oryza sativa, Zea mays and Sorghum bicolor, which are all annual cereal grains of the Poaceae family, while the palm Elaeis guineensis (Arecaceae) was positioned on another branch. Considering the Magnoliopsida (dicots), the same occurred, since Medicago and soybean, both legumes and members of Fabaceae, appeared in a subclade, separated from the remaining species. R genes are considered fast evolving, due to their co-evolution with specific pathogens (Michelmore and Meyers, 1998). In the case of Xa21 the most polymorphic region is its extracellular LRR domain, which is responsible for pathogen specificity (Ellis ), defining the relationships of the dendrogram presented here. The PR-2 dendrogram topology showed two main clades, as expected, monocots and dicots. The grouping of monocots followed the taxonomic relationship, segregating Musa and Zingiber (Zingiberales) from Oryza and Zea (Poaceae). It was possible to identify that a symplesiomorphic character united all dicots, reflecting their common origin. Moreover, considering the Magnoliopsida group, the evolutionary model of the PR-2 class seemed to follow a synapomorphic pattern, leading to their diversification in different groups comprising families and orders, this probably reflecting divergent processes regarding this PR gene. The studied organisms presented different centers of origin, habitats and cycles of life, as well as tolerance, resistance and sensitivity to diverse kinds of biotic and abiotic stresses. Nonetheless, from an overall perspective and considering the position of different species in the dendrograms, it is evident that both Xa21 and PR-2 pathways genes were present in a common ancestor of the angiosperms, since they appear relatively conserved in different plant groups. Many PR genes are constitutively expressed in given plant tissues (Velazhahan and Muthukrishnan, 2003; Liu ), suggesting a link between biotic and abiotic stresses and indicating that at least some members of the PR proteins play important roles in plant development, besides their role in defense responses. This fact may explain why the expression of PR-2 gene can be observed at a basal level in almost all tissues, as seen when considering their frequencies in the soybean libraries. Studies carried out by Li and Libault revealed consistent differences in gene expression patterns among diverse tissues, especially between roots and aerial tissues, but also revealed similarities between expression levels in tissues such as flowers and leaves, corroborating our results. The most represented library was for seeds, including different development stages, which is not surprising, since previous evaluations also revealed that the soybean grain contained the vast majorities of expressed genes and regulatory sequences in the plant (Cannon ). In the case of the PR-2 protein, it is interesting to note that previous evaluations carried out by Leubner-Metzger (2005) in tobacco suggest that this gene could play a role in seed germination. Furthermore, the expression of both genes was also increased in leaves, roots and flowers, confirming their prevalence in developing tissues. As mentioned above, abiotic stress is able to trigger diverse plant responses. After an initial massive distribution of energy triggered by stress, a wide array of defense mechanisms is activated by R genes, inducing a signal cascade and increased PR gene transcription (Vergne ). This may justify the considerable amount of soybean SuperSAGE tags related to these genes among the three comparisons considered, with considerable representation in both biotic and abiotic (drought) conditions, as well as in the negative controls, with many tags represented in more than one treatment. The high number of tags that matched with BR-16 drought susceptible library vs. control could be explained by the ability of the plant to continue expressing genes related to systemic acquired resistance as a consequence of contact with any kind of previous stress, a crosstalk previously reported for other plants (Durrant and Dong, 2004; Kido ). Comparing the distribution between R and PR genes, both were representative with 1,072 tags matching R genes and 481 tags matching PR candidates, indicating that additional analytical efforts regarding the SuperSAGE candidates will reveal not only associations with specific situations, but also allelic differences important in the definition of biotic and abiotic stress responses. Flowering plants originated approximately 200 million years ago (Wilkstrom ) and subsequently diverged into several lineages. Legumes are an old family believed to have originated approximately 54 Mya (Lavin ). Soybean and other papilionoid legumes show evidence of an older shared duplication and probably soybean underwent polyploidy 13 Mya (Shoemaker ). These duplications are widely evident, both in number of similar duplicated genes and in large areas of synteny between chromosomal regions. Previous evidence indicates extensive similarities in gene densities and distribution among soybean and Medicago, inferring that a given Medicago region is likely to correspond well with two soybean regions (Mudge ). This evidence suggests that Medicago could represent “a simplified draft” of the soybean gene distribution, making an evaluation regarding R and PR soybean ortholog distribution in this crop most desirable. Hence, it is not surprising that all identified soybean R and PR transcripts appeared anchored in 1,253 sites in all segments of Medicago virtual chromosomes. The rich R gene regions found in chromosomes 2, 7, 8 and 9 confirm previous observations that most resistance genes reside in clusters (Kanazin ), as reported in maize (Dinesh-Kumar ), lettuce (Maisonneuve ), oat (Rayapati ) and flax (Ellis ). The formation of gene clusters is in general associated with a common ancestor, and the diversification of these genes is the result of duplication processes followed by diversification due to pathogen or environmental pressure. Clustering of R genes corroborates the existing theory that a common genetic mechanism involving duplication has been responsible for the evolution and diversification of this gene superfamily (Hulbert ). The four clusters presented similarities with distinct segments in the same chromosome, probably reflecting tandem gene duplication mechanisms. Such duplicated copies tend to diverge by acquiring additional mutations and may specialize or optimize to play slightly different roles (Alberts ). Regarding the duplicated segments considering the entire genome, 58 genes could be identified in at least two distinct chromosomes. Unlike tandem duplications, repetitions in distinct chromosomes resulted from events of duplication followed by translocations and sequence divergence, also allowing functional diversification (Wendell, 2000; Thiel ). There is also evidence that transposition outbreaks could be activated by severe environmental biotic or abiotic stress. Still regarding the duplication event analysis, a large in tandem repetition was evident in both chromosomes 3 and 6, represented by the genes Xa1/I2 and RRS1, respectively. Previous reports suggested that once duplicated, genes in tandem repetitions may expand rapidly through events of unequal crossing over, since the character could confer advantage to the organism (Alberts ), in this case a higher diversity of genes associated with resistance and stress response. This evidence supports assumptions that future efforts regarding increased pathogen resistance may rely on biotechnological inferences that consider whole gene clusters naturally associated in neighboring positions, rather than isolated genes (Dafny-Yelin and Tzfira, 2007), as has been traditionally done. In conclusion, the here identified sequences represent valuable resources for the soybean breeding program, allowing their use in biotechnological approaches, with emphasis on transgenes. They are also valuable for mapping purposes, considering the putative distribution here uncovered when considering available distribution of genes known from the Medicago genome. Considering gene diversity revealed especially by the SuperSAGE approach, their association with specific responses to biotic or abiotic stress conditions may reveal important gene variants for germplasm screening in the search for new accessions useful for breeding purposes, especially in association with marker assisted selection (MAS), saving decades of laborious research.

61 in total

Review 1. Overview on plant antimicrobial peptides.

Authors: Ana Maria Benko-Iseppon; Suely Lins Galdino; Tercilio Calsa; Ederson Akio Kido; Alessandro Tossi; Luis Carlos Belarmino; Sergio Crovella
Journal: Curr Protein Pept Sci Date: 2010-05 Impact factor: 3.272

2. A linkage map of diploid Avena based on RFLP loci and a locus conferring resistance to nine isolates of Puccinia coronata var. 'avenae'.

Authors: P J Rayapati; J W Gregory; M Lee; R P Wise
Journal: Theor Appl Genet Date: 1994-12 Impact factor: 5.699

Review 3. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process.

Authors: R W Michelmore; B C Meyers
Journal: Genome Res Date: 1998-11 Impact factor: 9.043

4. Overexpression of Pto activates defense responses and confers broad resistance.

Authors: X Tang; M Xie; Y J Kim; J Zhou; D F Klessig; G B Martin
Journal: Plant Cell Date: 1999-01 Impact factor: 11.277

5. The A. thaliana disease resistance gene RPS2 encodes a protein containing a nucleotide-binding site and leucine-rich repeats.

Authors: M Mindrinos; F Katagiri; G L Yu; F M Ausubel
Journal: Cell Date: 1994-09-23 Impact factor: 41.582

Review 6. Ethylene as a modulator of disease resistance in plants.

Authors: Leendert C van Loon; Bart P J Geraats; Huub J M Linthorst
Journal: Trends Plant Sci Date: 2006-03-10 Impact factor: 18.313

7. Reductase activity encoded by the HM1 disease resistance gene in maize.

Authors: G S Johal; S P Briggs
Journal: Science Date: 1992-11-06 Impact factor: 47.728

8. Cluster analysis and display of genome-wide expression patterns.

Authors: M B Eisen; P T Spellman; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1998-12-08 Impact factor: 11.205

Review 9. Systemic acquired resistance.

Authors: W E Durrant; X Dong
Journal: Annu Rev Phytopathol Date: 2004 Impact factor: 13.078

10. Transcriptional analysis of highly syntenic regions between Medicago truncatula and Glycine max using tiling microarrays.

Authors: Lei Li; Hang He; Juan Zhang; Xiangfeng Wang; Sulan Bai; Viktor Stolc; Waraporn Tongprasit; Nevin D Young; Oliver Yu; Xing-Wang Deng
Journal: Genome Biol Date: 2008-03-19 Impact factor: 13.583

4 in total

1. Whole-Genome Resequencing Deciphers New Insight Into Genetic Diversity and Signatures of Resistance in Cultivated Cotton Gossypium hirsutum.

Authors: Athar Hussain; Muhammad Farooq; Rubab Zahra Naqvi; Muhammad Qasim Aslam; Hamid Anees Siddiqui; Imran Amin; Chengcheng Liu; Xin Liu; Jodi Scheffler; Muhammad Asif; Shahid Mansoor
Journal: Mol Biotechnol Date: 2022-07-01 Impact factor: 2.695

2. A web-based bioinformatics interface applied to the GENOSOJA Project: Databases and pipelines.

Authors: Leandro Costa do Nascimento; Gustavo Gilson Lacerda Costa; Eliseu Binneck; Gonçalo Amarante Guimarães Pereira; Marcelo Falsarella Carazzolle
Journal: Genet Mol Biol Date: 2012-06 Impact factor: 1.771

3. Expression dynamics and genome distribution of osmoprotectants in soybean: identifying important components to face abiotic stress.

Authors: Ederson A Kido; José R C Ferreira Neto; Roberta L O Silva; Luis C Belarmino; João P Bezerra Neto; Nina M Soares-Cavalcanti; Valesca Pandolfi; Manassés D Silva; Alexandre L Nepomuceno; Ana M Benko-Iseppon
Journal: BMC Bioinformatics Date: 2013-01-14 Impact factor: 3.169

4. Theobroma cacao L. pathogenesis-related gene tandem array members show diverse expression dynamics in response to pathogen colonization.

Authors: Andrew S Fister; Luis C Mejia; Yufan Zhang; Edward Allen Herre; Siela N Maximova; Mark J Guiltinan
Journal: BMC Genomics Date: 2016-05-17 Impact factor: 3.969

4 in total