Literature DB >> 32214791

Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana.

Huawei Wei1,2, Jia Liu3, Qinwei Guo4, Luzhao Pan1, Songlin Chai1, Yuan Cheng1, Meiying Ruan1, Qingjing Ye1, Rongqing Wang1, Zhuping Yao1, Guozhi Zhou1, Hongjian Wan1,5.   

Abstract

NBS-LRR (nucleotide-binding site and leucine-rich repeat) is one of the largest resistance gene families in plants. The completion of the genome sequencing of wild tomato Solanum pimpinellifolium provided an opportunity to conduct a comprehensive analysis of the NBS-LRR gene superfamily at the genome-wide level. In this study, gene identification, chromosome mapping, and phylogenetic analysis of the NBS-LRR gene family were analyzed using the bioinformatics methods. The results revealed 245 NBS-LRRs in total, similar to that in the cultivated tomato. These genes are unevenly distributed on 12 chromosomes, and ~59.6% of them form gene clusters, most of which are tandem duplications. Phylogenetic analysis divided the NBS-LRRs into 2 subfamilies (CNL-coiled-coil NBS-LRR and TNL-TIR NBS-LRR), and the expansion of the CNL subfamily was more extensive than the TNL subfamily. Novel conserved structures were identified through conserved motif analysis between the CNL and TNL subfamilies. Compared with the NBS-LRR sequences from the model plant Arabidopsis thaliana, wide genetic variation occurred after the divergence of S. pimpinellifolium and A thaliana. Species-specific expansion was also found in the CNL subfamily in S. pimpinellifolium. The results of this study provide the basis for the deeper analysis of NBS-LRR resistance genes and contribute to mapping and isolation of candidate resistance genes in S. pimpinellifolium.
© The Author(s) 2020.

Entities:  

Keywords:  NBS-LRR; Solanum pimpinellifolium; comparative analysis; duplicated genes

Year:  2020        PMID: 32214791      PMCID: PMC7065440          DOI: 10.1177/1176934320911055

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Plants are surrounded by a wide variety of pathogens such as viruses, bacteria, fungi, nematodes, and aphids during their growth and development.[1] Some pathogens have successfully invaded crops and have caused severe damage to agricultural production and the quality of crops. To cope with disease attacks, the plants have evolved a series of sophisticated defense mechanisms to defend against various pathogens. Previous studies have shown that disease resistance (R) proteins of plants play an essential role in direct or indirect recognition of corresponding pathogens.[2] The plant-pathogen interaction model is regarded as the “gene-for-gene” interaction hypothesis.[3] In this hypothesis, an incompatible interaction of host R gene protein products with pathogen Avr proteins produces a defense response termed the hypersensitive response, which impedes pathogen progression via a variety of mechanisms, including localized programmed cell death and correlated immune responses.[4] Currently, numerous disease R genes of plants have been cloned, which not only confer resistance to a wide range of pathogens but also play a vital role in resistance to abiotic stress.[1,5,6] Presently, researchers had divided R genes into at least 5 diverse classes of families, including NBS-LRR (Nucleotide Binding Site and Leucine-Rich Repeat domains), LRR-TM (Leucine-Rich Repeat plus Transmembrane Receptor), STK (Serine-Threonine Kinase), RLK (Receptor-Like Kinase), and SA-CC (Signal Anchor plus Coiled-Coil).[7] Among them, the NBS-LRR family is the largest class of known R proteins in the plant kingdom,[1,7] whose encoded proteins are involved in an important part of the plant defense system. In Arabidopsis, researchers have reported that the NBS and LRR domains play different roles in the plant-microbe interaction. The former can bind and hydrolyze ATP or GTP, while the latter is involved in protein-protein interactions.[8,9] It was well known that the NBS-LRR protein encodes 3 main domains—N-terminal, NBS, and LRR domains.[5,9,10] There are 2 structures in the N-terminal, one is the TIR (Toll/Interleukin-1 receptor) structure, and the other is the non-TIR structure, usually known as CC (coiled-coil). In Arabidopsis, the N-terminal domain has been identified as similar to residues that can enhance gene expression and protein stability.[5] The TIR domain of the TNL proteins contain ~175 amino acid residues, and some conserved motifs have been reported in these domains of plant NBS-LRRs (motifs TIR-1, TIR-2, TIR-3, and TIR-4).[9,11] The N-terminal CC domain, as a characteristic motif in the N-terminus of the CNL R proteins, plays an important role in protein-protein interactions.[12,13] Although the specific mechanism of TIR (toll/interleukin-1 receptor) and CC (coiled-coil) domains’ interactions with pathogens remains unclear, it has been reported that they can stimulate EDS1 (enhanced disease susceptibility 1) and NDR1 (nonrace-specific disease resistance 1), respectively, in the downstream signaling system when the R gene recognizes the pathogen.[14,15] The NBS domain, composed of ~300 amino acid sequences, is the main structural domain of the NBS-LRR R genes. Eight distinct conserved motifs—P-loop, RNBS-A, Kinase2, RNBS-B, RNBS-C, GLPL, RNBS-D, and MHDV—have been confirmed in this domain.[5,16] However, these 8 motifs are not completely conserved in each subfamily. It has been proved that motifs P-loop, GLPL, Kinase2, and GLPL have high similarity in the TNL and CNL subfamilies, while the similarity levels of the RNBS-A, RNBS-D, and RNBS-C motifs are lower in the TNL and CNL of Arabidopsis.[9] Owing to the relatively higher conservativeness than the other 2 domains, the NBS domain is usually used for designing primers to amplify resistance genes.[17] The LRR region, with ~24 amino acid sequences, is characterized by leucine-rich repeated C-terminus to the NBS domain in many R genes. Meanwhile, the LRR (leucine-rich repeat) domain has a higher variation than neighboring regions and is assumed to play a key role in the resistance function.[18] The NBS-LRR superfamily, accounting for the largest gene family among plant genomes, has become the core of the resistance research field. Currently, NBS-LRR genes have been studied in many monocotyledon and dicotyledon plants, including Arabidopsis, chickpea, potato, rice, cassava, maize, Brassica napus, poplar, Medicago, sorghum, papaya, grape, and lotus.[5,18-29] Based on these previous studies, variations (ie, gene number, structural features, and evolutionary pattern) of the NBS-LRR gene family occur in different plants.[30] For example, the Arabidopsis genome has approximately 165 NBS-LRRs (112 TNLs and 53 CNLs). A total of 333 NBS-encoding genes have been identified in the Medicago genomic sequence and 92 NBS-LRR genes in B napus.[23,25] Tomato (Solanum lycopersicum L.) is one of the most important vegetable crops, whose production, productivity, and quality are adversely affected by abiotic and biotic stresses.[31] It is well known that abiotic stresses such as drought, extreme temperature, and high salinity affect almost every stage of a plant’s life cycle.[32-40] Depending on the plant stage and duration of the stress, abiotic stress causes serious yield loss.[41-46] Furthermore, compared with these gene-related abiotic stresses, which are induced widely by exogenous hormones and abiotic stress factors,[47-60] plant NBS-LRR resistance genes are rarely induced. Usually, although several wild tomato species have stress tolerance genes, it is very difficult to transfer them into cultivars due to high genetic distance and crossing barriers. On the contrary, biotic stress causes a significant yield reduction that can be solved using R genes from wild plant species. Currently, a large number of R genes have been found in a wild relative, Solanum pimpinellifolium, which is a critical source of R genes in tomato breeding. The completion of plant genome sequencing provides a rare opportunity to study members of gene families at the whole genome level.[61-64] In this study, we identified NBS-LRR-encoding genes in the whole S. pimpinellifolium genome. Similar to the NBS-LRR superfamily of other plants, these genes could be divided into 2 subfamilies—TNL and CNL. All of the NBS-LRR genes were unevenly located in clusters on the chromosomes. We also sought to provide further insights into the diversity or similarity of NBS-LRRs between S. pimpinellifolium and cultivated tomato and Arabidopsis via a genome-wide comparative analysis.

Materials and Methods

Identification of NBS-LRRs in S. pimpinellifolium

The NBS-LRR genes of S. pimpinellifolium were obtained, for the first time, via 3 steps. In the first step, protein sequences of the wild tomato S. pimpinellifolium were downloaded from the Sol Genomics Network (SGN, http://solgenomics.net/) database. In the next step, a local database was created using BioEdit software for the downloaded S. pimpinellifolium protein datasets. In the third step, the protein sequences of the NBS domain of the NBS-encoding sequences from Arabidopsis were used as query sequences to search NBS-LRR candidate genes of the S. pimpinellifolium genome. Subsequently, these candidates were submitted to the online software PFAM (http://pfam.sanger.ac.uk/) to determine whether there are TIR, NBS, and LRR domains. Meanwhile, the CC domain may contain some smaller individual motifs or too divergent proteins; hence, it was further identified using COILS Serve (http://www.ch.embnet.org/software/COILS_form.html). Finally, all NBS-LRR genes were classified according to the presence or absence of TIR, CC, NBS, or LRR domains.

Chromosome mapping of the NBS-LRR genes in S. pimpinellifolium

Information on the physical position on the chromosomes of each NBS-LRR gene from the S. pimpinellifolium genome was searched in the tomato genome database (http://mips.helmholtz-muenchen.de/plant/tomato/searchjsp/index.jsp). The software MapDraw V2.1 was used to draw chromosomal maps of NBS-LRRs.[65] Then, gene clusters and tandem duplication were also analyzed based on previously established criteria, respectively. A gene cluster was defined as a region where 2 neighboring homologous genes were less than 200 kb and fewer than 8 non-NBS-encoding genes between NBS-encoding genes.[9,26] A tandemly duplicated gene was determined based on Huang’s definition, in which tandem-duplicated candidate gene pairs were defined as a region including 2 or more adjacent homologous genes within 100 kb and the similarity of these genes more than 70%.[66]

Phylogenetic analysis of the NBS-LRR genes

All NBS-LRR genes with CNL and TNL domains were selected for phylogenetic analysis. The P-loop to GLPL motifs of these chosen NBS-LRR members were extracted to perform multiple sequence alignment using Clustalx 1.83. Then, phylogenetic analysis of the NBS-LRR gene family was performed using the neighbor-joining (NJ) and maximum likelihood (ML) methods. For NJ analysis, MEGA X software was selected.[67] The parameters of the phylogenetic tree were set as follows: 1000 bootstrap replications; p-distance model; and pairwise deletion gap. For ML analysis, ProtTest (version 2.4) software was used for model selection[68] and PhyML (version 3.0) software was used to construct ML trees with the Whelan and Goldman amino acid substitution model, γ-distribution, and 100 nonparametric bootstrap replicates.[69] Homologous comparison and phylogenetic analysis were also investigated between amino acid sequences of S. pimpinellifolium and Arabidopsis NBS-LRR genes. First, NBS-LRR protein sequences of the Arabidopsis genome were obtained from phytozome v12 (https://phytozome.jgi.doe.gov/pz/portal.html). The amino acid sequences of the P-loop to GLPL motifs from CNL and TNL in S. pimpinellifolium and Arabidopsis were applied to construct a phylogenetic tree, respectively. The method and parameter settings for building the phylogenetic tree were the same as described above.

Detection of the conserved motif

The NBS-encoding genes could be divided into CNL and TNL subfamilies based on phylogenetic analysis. To investigate the structural features of these genes, the sequences and distribution of the conserved motifs were analyzed individually using Multiple Expectation Maximization for Motif Elicitation (MEME) (http://meme-suite.org/tools/meme), and the parameters were set as follows: the maximum and minimum lengths of the conserved motif were 50 and 6, respectively, and the largest number of conserved motifs was 20; other parameters used the default settings.[70] Conservation or variation of each motif among NBS-LRR members was presented.

Results

Identification and classification of the NBS-LRR gene family in S. pimpinellifolium

Previous studies showed that the NBS-LRR gene family in plants contained different conserved domains, and the NBS-LRR gene family was further divided into 6 different subfamilies according to the domains.[5] In this study, we identified a total of 245 NBS-LRR resistance genes in S. pimpinellifolium, and they were classified into 6 NBS-LRR subfamilies: CC-NBS-LRR (CNL), TIR-NBS-LRR (TNL), CC-NBS (CN), TIR-NBS (TN), NBS-LRR (NL), and NBS (N), respectively. The number of genes corresponding to each subfamily is shown in Table 1.
Table 1.

Number and classifications of NBS-LRR genes.

Predicted ProteinsLetter Code Solanum Pimpinellifolium Solanum Lycopersicum [71] Potato[19] Arabidopsis [5] Maize[22]Rice[20]
CC-NBS-LRRCNL7893655158402
TIR-NBS-LRRTNL1518379200
CC-NBSCN54182461153
TIR-NBSTN75122100
NBS-LRRNL296617763174
NBSN62521041716
Total245252419177107545

Abbreviations: CC, coiled-coil; LRR, leucine-rich repeat; NBS, Nucleotide-binding site; TIR, toll/interleukin-1 receptor.

Number and classifications of NBS-LRR genes. Abbreviations: CC, coiled-coil; LRR, leucine-rich repeat; NBS, Nucleotide-binding site; TIR, toll/interleukin-1 receptor. Among them, both of CNL and TNL were the most typical subfamilies, with 78 and 15 NBS-LRR members, respectively. The numbers of the NBS-LRR genes in the N and CN subfamilies were 62 and 54, respectively. Also, 29 genes were identified as belonging to the NL subfamilies, and only 7 genes were predicted to encode the TIR domain; hence, it belonging to the TN subfamily. Meanwhile, the TN subfamily was the smallest NBS-LRR subfamily among these 6 subfamilies. Based on the above result, we can conclude that these 245 NBS-LRR genes were unevenly located on the 6 subfamilies. Together, high genetic variation was observed in S. pimpinellifolium

Comparative analysis of the NBS-LRR genes between S. pimpinellifolium and S lycopersicum

In addition to comparing the number of genes of different subfamilies in S. pimpinellifolium, we also analyzed the number of genes in these subfamilies in other species (Table 1). The total number of NBS-LRR genes in different plants and the number of genes in each subfamily were significantly different. First, S lycopersicum and S. pimpinellifolium have almost a similar total number of NBS-LRR genes—the former being 252[71] and the latter being 245—but there are substantial variations in the number of some NBS-LRR gene subfamilies. For example, the number of CN-type genes in wild tomato was 36 more than that in cultivated tomato, while the number of NL-type genes in wild tomato was 37 less than that in cultivated tomato. Second, 419 NBS-LRR genes were found in potato of the genus Solanum, among which 65 belonged to the CNL subfamily and 37 belonged to the TNL subfamily. In this context, we further analyzed the number and distribution of the NBS-LRR gene family on the chromosome between S. pimpinellifolium and S lycopersicum. The distribution of genes on different chromosomes in S. pimpinellifolium was mapped (Figure 1), where we found that 243 out of 245 NBS-LRR genes could be located on 12 chromosomes (Figure 1 and Table 2), whereas the remaining 2 NBS-LRR proteins (Sopim00g102400.0.1 and Sopim00g294230.0.1) could not be located on any chromosomes and were assigned to fictitious chromosome 0. Pairwise comparison of NBS-LRR genes on the corresponding chromosomes between wild and cultivated tomatoes is presented in Supplemental Table S1. It is obvious that the NBS-LRRs of these 2 tomato species are unevenly distributed on the chromosomes, which is in accordance with previous reports in Arabidopsis, Medicago truncatula, Populus trichocarpa, and Brassica rapa.[5,23-25] Among the 12 chromosomes, the largest number of NBS-LRRs was detected on chromosome 4, containing 52 members. By contrast, chromosome 3 included the smallest number NBS-LRRs, with only 7 genes (Table 2 and Supplemental Table S1).
Figure 1.

Chromosomal locations and duplications of the paralogous NBS-LRRs on S. pimpinellifolium chromosomes. Chromosome numbers are shown at the top of each bar. Predicted tandem-duplicated genes are indicated by thick blue lines. Gene clusters are marked by black braces. CC indicates coiled-coil; LRR, leucine-rich repeat; NBS, nucleotide-binding site; TIR, toll/interleukin-1 receptor.

Table 2.

Distribution and characteristic of nucleotide-binding site and leucine-rich repeats on different chromosomes between S lycopersicum and S pimpinellifolium.

ChromosomeGene Number
S Pimpinellifolium S Lycopersicum
Chr 0022
Chr 011316
Chr 021314
Chr 0377
Chr 045253
Chr 053134
Chr 061616
Chr 071917
Chr 081110
Chr 091717
Chr 101921
Chr 112827
Chr 121718
Total number245252
Chromosomal locations and duplications of the paralogous NBS-LRRs on S. pimpinellifolium chromosomes. Chromosome numbers are shown at the top of each bar. Predicted tandem-duplicated genes are indicated by thick blue lines. Gene clusters are marked by black braces. CC indicates coiled-coil; LRR, leucine-rich repeat; NBS, nucleotide-binding site; TIR, toll/interleukin-1 receptor. Distribution and characteristic of nucleotide-binding site and leucine-rich repeats on different chromosomes between S lycopersicum and S pimpinellifolium.

Gene cluster and tandem duplication of the NBS-LRR genes in S. pimpinellifolium

The distribution of NBS-LRR genes across the chromosomes was used to further analyze the evolutionary patterns of gene expansion (Figure 1). Six different colors represent the genes from the 6 subfamilies of NBS-LRRs in Figure 1. We found that NBS-LRRs of the CNL type were spread across all chromosomes, while TNL genes were selectively distributed on chromosomes 3, 6, 8, 10, and 12. It has been previously proved that most of the NBS-encoding genes are arranged in clusters on chromosomes.[5,25] Also, the gene cluster was previously determined by the following criteria: a cluster of NBS-LRR proteins was described as the distance between neighboring homologous genes less than 200 kb and fewer than 8 non-NBS-encoding genes between TNLs and CNLs.[5,19,28] Based on the above criteria, NBS-LRR gene clusters were carried out (Figure 1). A total of 49 gene clusters, including 146 NBS-LRR genes, were identified in wild tomato S. pimpinellifolium. In other words, only ~40% of genes did not reside in clusters. Of them, chromosome 4 had 11 gene clusters and 44 genes, occupying ~84.6% (44/52) of the total genes on this chromosome, while chromosomes 1, 2, and 3 each only possessed 1 gene cluster (Figure 1). Meanwhile, chromosome 4 had the most gene clusters (11), including gene members of 3 different subfamilies. Besides, all of these gene clusters consisted of 2 to 8 genes. The gene cluster containing genes from different subfamilies was commonly found in S. pimpinellifolium. For example, 27 out of 49 gene clusters were made up of genes from different subfamilies which were a vital source of gene variation.[72] Tandem duplication of NBS-LRRs was analyzed in S. pimpinellifolium (Figure 1). In total, 25 gene pairs were duplicated tandemly, which contained 80 NBS-LRR members. Remarkably, all of the tandem-duplicated NBS-LRRs were anchored in the gene clusters. The biggest gene cluster of chromosome 4 was also the tandem duplication with the largest number of genes. Hence, we speculated that the tandem duplication event was an important factor in the process of NBS-LRR gene cluster evolution. Out of the 25 tandem duplication events, 14 were made up of NBS-LRR members from 1 single subfamily (Figure 1).

Phylogenetic analysis of the NBS-LRR genes in S. pimpinellifolium

To explore the evolutionary relationships of NBS-LRR genes, we conducted phylogenetic analyses with an alignment of the NBS domain from these 2 species using neighbor-joining (NJ) and maximum likelihood (ML) methods. ML analysis showed that proteins from different species cluster together in clades with high support values (not shown), with support from NJ analysis for most results. Therefore, in the present study, the phylogenetic tree of the NBS-LRR genes from the TNL and CNL subfamilies constructed using the NJ method was selected for analyses. Four members (Sopim04g008170.0.1, Sopim11g069660.0.1, Sopim11g043070.0.1, and Sopim10g055080.0.1) were excluded due to the presence of incomplete NBS domains. Both of the CNL and TNL subfamilies were separated from each other in the phylogenetic tree, and CNL was further divided into eight small branches, namely CNL1 to CNL8, respectively (Figure 2). Inversely, the TNL subfamily remained as one branch owing to fewer gene numbers. Moreover, we found that the CNL1 and CNL7 branches each contained 16 NBS-LRRs, which were from seven different chromosomes, and these 2 branches had the largest gene numbers in the phylogenetic tree. In contrast to CNL1 and CNL7, the CNL5 branch only contained 2 genes from chromosomes 8 and 11, with the lowest number of genes (Figures 1 and 2).
Figure 2.

Phylogenetic relationship of nucleotide-binding site and leucine-rich repeats from CNL and TNL subfamilies in S pimpinellifolium. All of these proteins were grouped into 9 clades (CNL 1-8 and TNL). Numbers above branches represent the support values.

Phylogenetic relationship of nucleotide-binding site and leucine-rich repeats from CNL and TNL subfamilies in S pimpinellifolium. All of these proteins were grouped into 9 clades (CNL 1-8 and TNL). Numbers above branches represent the support values. To date, some NBS-LRR-encoding genes have been cloned, like Nrc1, I2, Bs4, Hero, and Rpi-blb1.[73-76] The members of the NBS-LRR gene family from S. pimpinellifolium have high sequence similarity with these cloned R genes in Solanaceae based on sequence alignment analysis (Supplemental Table S2). For example, in the CNL7 branch, the NBS-LRR genes were identified as having high homology with the late blight resistance proteins Rpi-bt1, Rpi-blb1/RB, R3a, andRGA2, as well as the R protein I2 to Fusarium oxysporum. Besides, each of the 4 branches (CNL2, CNL3, CNL4, and CNL6) had 1 homologous gene, respectively, with known R proteins (Supplemental Table S2). Both Sopim06g008440.0.1 and Sopim06g008450.0.1 were mapped in a gene cluster on chromosome 6 (Figure 1), and they had high homology with adjacent members of the cluster (Supplemental Table S2). Similarly, the same findings were also uncovered in CNL4. However, no gene clusters were identified in CNL3 and CNL6 (Figure 1). Homologous genes with the R protein of Leaf curl virus and tobacco mosaic virus were also found in the TNL branch, but they exist as single genes. This may explain why the TNL gene family in S. pimpinellifolium had a lower degree of expansion in the process of evolution. Apart from the members of the CNL and TNL subfamilies, NBS-LRR genes from other subfamilies also showed high homology with known functional genes. As shown in Supplemental Table S2, all subfamilies had homologous genes with known resistance genes, except for the TN subfamily. Twenty-one known resistance genes, involving multiple resistances to various pathogens, had as high as 80% similarity with the NBS-LRR genes in S. pimpinellifolium. It is worth mentioning that the similarity of 2 pairs of analogous genes, Sopim01g090430.0.1 and Nrc1, Sopim05g007850.0.1 and Bs4, was as high as 100%. All of the NBS-LRR genes, which had homology with the known resistance genes in S. pimpinellifolium, were mapped on 10 chromosomes of the wild tomato, except chromosomes 3 and 10. Only 6 genes existed alone, and all of the others were located in gene clusters.

Conserved motif analysis of the NBS-LRR genes

To uncover the structural characteristics of the NBS-LRR gene family, MEME was applied to analyze the structure and distribution of the conserved motifs among the TNL and CNL subfamilies. Twenty distinct motifs were determined in each subfamily (Tables 3 and 4). All of the conserved motifs displayed a diversity distribution in their respective subfamilies (Supplemental Figures S1-TNL and S2-CNL1-4, respectively).
Table 3.

Conserved motifs of the TNL subfamily in S pimpinellifolium.

DomainMotif NumberMotifWidthE-ValueMotif Sequence
TIRMotif01T-1401.20E-236xkYDVFLSFRGEDTRxtFtxHLYxaLxnrGIxTFxDdkrL
Motif03T-2292.00E-197AIeeSxxaxvIFSkNYAxSrWCLxELVkI
Motif02T-3297.70E-198qxViPvFYdVDPShVRxQxesfxeaFxkH
Motif08T-4219.30E-104VxrWRxALxxAAdlxGxDxxn
NBSMotif04P-loop302.40E-192dVRixGIwGxGGIGKTTiAkAxFdxlxxxF
Motif05Kinase-2/RNBS-B401.80E-246kKVLiVLDDvDhxdqLdyLagxxxWFGxGSRIIxTTRdKH
Motif09RNBS-C261.10E-96AxxLFnxhAFkxxxPxxxFxxlsxeV
Motif12GLPL157.90E-80VxhAxGLPLALKVlG
Motif20RNBS-D221.50E-39sxLhkrxxxxWrxtvxxlKxxp
Motif13TNBS-1155.80E-68dqxiFLDIACFfrGk
Motif18TNBS-2292.90E-65VxqILesCdFgAexGlxVLIdkSLVfISx
LRRMotif07L-1218.00E-120xnxixMHdLIqeMGxxiVrxe
Motif10L-2296.90E-92gkxSRlWxxeDxxxVlxxntgTxavEgIw
Motif17L-3213.80E-58LPENWYVsDNFLGFAVCYSGn
Motif14L-4211.00E-84yLPnxLRWlxWxxyPlxSlPx
Motif06L-5295.40E-153TPDFsgmPnLExLxLxxCxnLxEVHxSlG
Motif15L-6294.00E-97PxSIcxLkxLxxLxxsxCxkLexlPexiG
Motif16L-7296.60E-90dxxxPxDigxLSsLxxLxLxgNNFxxLPx
Motif19L-8405.30E-52TqLPEFPxQLDTIxADWSNDxICNSLFQNISsfQHDISAS
Motif11L-9296.70E-83aIHFFLVPLAGLWdTSkANGkTPNDYglI

Abbreviations: LRR, leucine-rich repeat; NBS, nucleotide-binding site; TIR, toll/interleukin-1 receptor.

Table 4.

Conserved motifs of the CNL subfamily in S pimpinellifolium.

DomainMotif NumberMotifWidthE-ValueMotif Sequence
CCMotif19C-1404.40E-177QYElLQNVcGNlRDFHgLIVNGCikhEtvEnVLPlFQLMA
Motif20C-2401.30E-193VMHICyTNLKASTSaEVGrFIKkLLETSPDILREYlIhLQ
Motif16C-3402.40E-223kLxxxLxxxqxfLxDAExKQxxdxxvxxWlxelxxxaxxA
Motif14C-4296.40E-267xxxxxxxvGxxxexxxixxxLxxxxxxxx
NBSMotif01P-loop263.3e-969vixIxGMgGxGKTTLAxkxyxxxxxx
Motif07RNBS-A291.5e-509FxxxaWxxVSqxxxxxxllxxixxxxxxx
Motif04Kinase-2153.7e-433ryLiVlDDVWxxxxx
Motif06RNBS-B154.4e-405xGsRIIxTTRxxxVa
Motif08RNBS-C218.9e-414xxxlxxLxxeeSWxLfxxkxF
Motif02GLPL291.1e-929xxxeLxxxgkxIaxkCxGLPLaixxxaGx
Motif15CNBS-1154.70E-254xxxxxlxLSYxxLpx
Motif03CNBS-2155.7e-477xLKxCFLYxxxfPeD
Motif10RNBS-D211.7e-375xxixxxxLixLWiAEGfvxxx
LRRMotif11CNBS-3219.20E-297xxExvaexylxdLixRsLvxx
Motif09MHDV151.8e-404cxxHDlxxdxxxxxa
Motif13L-1215.90E-274xxxlpxxixxLxhLRyLxxxx
Motif12L-2151.60E-291lPxsxxxLxnLqtLx
Motif17L-3151.50E-197xxxlxxLpxLexLxl
Motif18L-4291.10E-212xxexxFxxLKxLxlxxxxLxxWeaxxxxF
Motif05L-5314.7e-444xLxxLxlxxCxxLxxiPxxxxxxxxLxxxxx

Abbreviations: CC, coiled-coil; LRR, leucine-rich repeat; NBS, nucleotide-binding site.

Conserved motifs of the TNL subfamily in S pimpinellifolium. Abbreviations: LRR, leucine-rich repeat; NBS, nucleotide-binding site; TIR, toll/interleukin-1 receptor. Conserved motifs of the CNL subfamily in S pimpinellifolium. Abbreviations: CC, coiled-coil; LRR, leucine-rich repeat; NBS, nucleotide-binding site. In the TNL subfamily, there were 4, 7, and 9 motifs identified in the TIR, NBS, and LRR domains, respectively (Table 3 and Supplemental Table S3). The motifs in the TIR domain were named T-1 to T-4, and the motifs in the LRR domain were named L-1 to L-9. The motifs of the NBS domain were named following previous studies.[5] All of the 14 TNL members contained 4 motifs of the TIR domain, except that Sopim09g092410.0.1 lacked motif T-1 (Supplemental Figure S1 and Supplemental Table S3). The motif RNBS-A was not found in the NBS domain. Two novel motifs (TNBS-1 and TNBS-2) were identified in most of the TNL subfamily. Both of Kinase-2 and RNBS-B existed in Motif 5, as these 2 proteins were so close to each other. The motifs of the NBS domain had higher conservation in the 14 TNL genes, except TNBS-2, which was missed in some of the genes. Also, the motif compositions of the NBS-LRR genes provided further support for the grouping of phylogenetic branches. For example, 3 NBS-LRR genes (Sopim09g092410.0.1, Sopim04g056570.0.1, and Sopim07g052770.0.1) were locat-ed in the adjacent branches of the phylogenetic tree, and they did not contain the motif TNBS-2. In the CNL subfamily, there were 4, 11, and 5 motifs in the CC, NBS and LRR domains, respectively (Table 4 and Supplemental Table S4). A low degree of conservation of motifs was observed from CC to LRR. Of them, 3 out of the 4 conserved motifs (C-1, C-2, and C-3) had lower conservation in the CC domain compared with the NBS and LRR domains (Supplemental Figure S2). In other words, most of the genes in the CNL subfamily lost motifs C-1, C-2, and C-3. Besides, most of the NBS-LRRs from the CNL4 to 8 branches lacked motif L-4 (Supplemental Table S4). In the NBS domain, most of the genes from CNL 6 missed the RNBS-D and CNBS-3 motifs. The remaining conserved motifs were detected in most of the NBS-LRR genes. Overall, the motifs of the NBS domain were relatively conservative compared with that in the N-terminal domain (Supplemental Figure S2). When compared with the TNL and CNL subfamilies, some differences regarding the motif compositions were observed. For example, the MHDV motif was unique to the CNL family (Table 4). Besides, 2 conserved motifs (TNBS-1 and TNBS-2) were identified as novel members in the TNL subfamily, while 3 unique motifs (CNBS-1, CNBS-2, and CNBS-3) were only found in the CNL subfamily, with more diversity (Supplemental Table S3 and Table 4). In the TNL and CNL subfamilies, the conserved motifs of the LRR domain had relatively high diversity.

Evolutionary comparison of the NBS-LRR genes between S. pimpinellifolium and Arabidopsis

Previous findings have revealed that the NBS-LRR genes of Arabidopsis could be definitively divided into 2 subfamilies (TNL and non-TNL), and the CNL subfamily was the main member of the non-TNL group.[9] In this current study, to investigate gene distribution and expansion of the NBS-LRR gene family between S. pimpinellifolium and Arabidopsis, genes from the CNL and TNL subfamilies were selected for phylogenetic analysis. The NBS-LRRs from S. pimpinellifolium and Arabidopsis fell into 2 distinct branches (TNL and CNL) (Figures 3 and 4). Then, the TNL and CNL subfamilies were separately analyzed, and the results are as follows.
Figure 3.

Phylogenetic relationship of the TNL subfamily between Arabidopsis and S pimpinellifolium. Nucleotide-binding site and leucine-rich repeats are classified into 8 distinct branches (T1 T8), and these branches are shown in different colors in the phylogenetic tree, respectively. Numbers above branches represent the support values.

Figure 4.

Phylogenetic relationship of the CNL subfamily between Arabidopsis and S pimpinellifolium. Nucleotide-binding site and leucine-rich repeats are classified into 8 distinct branches (C1 C8), and these branches are shown in different colors in the phylogenetic tree, respectively. Numbers above branches represent the support values.

Phylogenetic relationship of the TNL subfamily between Arabidopsis and S pimpinellifolium. Nucleotide-binding site and leucine-rich repeats are classified into 8 distinct branches (T1 T8), and these branches are shown in different colors in the phylogenetic tree, respectively. Numbers above branches represent the support values. Phylogenetic relationship of the CNL subfamily between Arabidopsis and S pimpinellifolium. Nucleotide-binding site and leucine-rich repeats are classified into 8 distinct branches (C1 C8), and these branches are shown in different colors in the phylogenetic tree, respectively. Numbers above branches represent the support values. The phylogenetic tree of the TNL genes from the Arabidopsis and S. pimpinellifolium genomes is shown in Figure 3. The phylogenetic tree showed that the TNL genes were classified into 8 clades, namely T1 to T8. Among them, T1 to T6 only contained NBS-LRR genes from Arabidopsis, and the existence of multiple NBS-LRR gene copies across Arabidopsis suggests that the expansion of the NBS-LRR gene family occurred after differentiation of Arabidopsis and S. pimpinellifolium. One gene of S. pimpinellifolium was distributed in T7, while one gene of Arabidopsis was distributed in T8. Accordingly, 2 pairs of orthologous genes (Sopim05g006620.0.1 and At1g27170/At1g27180, and Sopim01g113620.0.1/Sopim01g102880.0.1 and At5g36930) were detected between the S. pimpinellifolium and Arabidopsis genomes. The phylogeny showed that multiple rounds of duplication events occurred in Arabidopsis. In the CNL subfamily, 53 NBS-LRRs in Arabidopsis and 73 NBS-LRRs in S. pimpinellifolium were classified into 8 branches in the phylogenetic tree, namely, C1 to C8 (Figure 4). Genes from the same species tend to cluster together, some of them directly forming a single branch. For example, both the C1 and C5 branches were NBS-LRR proteins from S. pimpinellifolium, while NBS-LRR members of C6 and C8 belonged to the Arabidopsis genome.C1 was composed of 37 members, and C5 only contained 3 members. The C6 and C8 branches had 7 and 3 NBS-LRR genes, respectively. The remaining 4 branches (C2, C3, C4, and C7) harbored genes from these 2 species, but genes from different species tended to cluster. C4 contained a large number of S. pimpinellifolium NBS-LRRs (16) and only 3 members from Arabidopsis. C2 and C7 had similar patterns that contained genes from both species: C2 included 20 genes, 14 members from Arabidopsis and 6 from S. pimpinellifolium; C7 included 33 genes, 25 from Arabidopsis, and 8 from S. pimpinellifolium. In these branches, C1 was the largest among the 8 branches, while the C3, C5, and C8 branches were the smallest. Based on the phylogeny, a mass of paralogous genes were observed in both species, and there were 3 pairs of orthologous genes (At3g07040 and Sopim08g05440.0.1, At3g50950 and Sopim02g084890.0.1, and At4g27220/At4g27190 and Sopim06g048910.0.1).

Discussion

Wild species, as an important component of germplasm resources, contain resistance to disease and abiotic stress genes and play a key role in the hereditary improvement of the cultivated species. In disease-resistant tomato cultivars, a crowd of resistance genes was derived from the wild species.[77] Among them, the wild tomato S. pimpinellifolium was one of the important wild types. Several NBS-LRR disease-resistant proteins from S. pimpinellifolium have been cloned.[78] Therefore, an in-depth analysis of the NBS-LRR genes in S. pimpinellifolium will help promote the progress of disease resistance breeding in tomato. In previous studies, the numbers of NBS-LRR gene families and members of each subfamily were different in distinct plants. In this study, a total of 245 NBS-LRR genes were identified, among which 78 genes belonged to the CNL subfamily and 15 genes belonged to the TNL subfamily (the ratio of CNL/TNL was about 5:1). A similar phenomenon has been found in other plants. For example, the number of CNL subfamilies in potato is 4.7 times that of that in the TNL subfamily,[19] being the closest ratio with S. pimpinellifolium, and the ratio in grape and poplar are 3.8 and 2.0, respectively.[28] Research has shown that the number of TNL subfamily genes is dominant in Arabidopsis and Brassica crops compared with other higher plants, which is due to the resistance of plants to the main pathological mechanism of long-term evolution.[79] Likewise, a large number of CNL genes in S. pimpinellifolium were also used for resistance development to major diseases in Solanaceae. Also, it was found that there were distinctive characteristics of NBS-LRR disease resistance genes between monocotyledon and dicotyledon. Rice, as a typical monocotyledon, contained 402 CNL subfamily genes but TNL-type genes were missing.[20] Similarly, the TNL-type genes were not detected in monocotyledon maize.[22] There was no significant difference in the NBS-LRR disease resistance genes between S. pimpinellifolium and S lycopersicum (Table 1). However, the members of the NBS-LRR genes in each subfamily were varied, which implies that there were different patterns of expansion in the evolution from wild species to cultivated species. Interestingly, there was no significant difference in the number of NBS-LRRs on each chromosome in both species, indicating that they have high homology. Also, some NBS-LRR disease resistance genes may be from other wild relatives of tomato, rather than from S. pimpinellifolium. The identified 245 NBS-LRR genes in S. pimpinellifolium were unevenly distributed across 12 chromosomes, most of which tended to form gene clusters of different sizes (Figure 1). The results are consistent with the distribution of NBS-LRR disease resistance genes of other plants.[5,23,28] It was reported that the expansion of the NBS-LRR gene family on plant chromosomes can occur via gene recombination, transform, duplication, and selection.[72,79,80] Tandem duplication genes with high similarity have common ancestors, and these duplicated genes were usually located in a homologous cluster. In S. pimpinellifolium, a total of 24 tandem duplication events, containing 80 NBS-LRR genes, were found in gene clusters. In addition, there were lots of singleton genes in these 2 plant species. Approximately 42.9% (12/28) of singleton NBS-LRR genes shared a homologous relationship with the cloned resistance genes (Supplemental Table S2). Several singletons were homologous genes on other chromosomes, such as singletons Sopim08g074250.0.1, Sopim03g005660.0.1, and Sopim04g056570.0.1, and some of them seem to have evolved independently. Two subfamilies, CNL and TNL, were identified through phylogenetic tree construction. The conserved motifs were used to distinguish the difference of protein sequences of N-terminal and NBS domains between these 2 subfamilies (Tables 3 and 4). Most of the conserved motifs were selectively distributed within a subfamily in the phylogenetic tree, implying that structural and functional similarities existed among NBS-LRRs within the same clade. In the TNL subfamily, all motifs were detected in all of the analyzed genes, except that the novel identified the TNBS-2 motif of the NBS domain was missed in 3 genes (Sopim09g092410.0.1, Sopim07g052770.0.1, and Sopim04g056570.0.1) (Supplemental Table S3 and Supplemental Figure S1). In contrast to the TNL subfamily, motifs of the CNL subfamily had much higher diversity (Supplemental Table S4 and Supplemental Figure S2). Some motifs were specific to each branch within the CNL subfamily, such as Motifs 03, 10, and 11 found in CNL4, while these motifs were not observed in CNL6. Whether this discovery reflected the more ancient origin of the CNL subfamily during plant evolution was unclear. We also found that some conserved motifs only existed in a particular clade, for example, Motif 19 and Motif 20 existed in the CNL2 and CNL3 branches, respectively. The motif analysis of the NBS-LRR genes in S. pimpinellifolium provided evidence of the complex evolutionary relationship among gene members in this family, which was also supported by the results of phylogenetic analysis. Although the functions of most of these motifs have not been identified, it is plausible that some probably involved a crucial role. For instance, previous reports demonstrated that 3 domains (LRR, TIR, and CC) regulated downstream signaling events through intramolecular interactions.[81-83] Proteins homologous to plant NBS-LRR proteins play a role in mammalian defense responses. In these mammalian proteins, the N-terminal domain is involved in downstream signaling partners through protein-protein interactions, the NBS hydrolyzes ATP functions as a regulatory domain, and LRR binds to upstream regulatory factors.[84,85] When compared with the TNL and CNL subfamilies between S. pimpinellifolium and Arabidopsis, a large number of species-specific expansions of the TNL subfamily were detected in the Arabidopsis lineage after the divergence of these 2 species (Figure 3). By contrast, the number of genes in the CNL subfamily relatively decreased in Arabidopsis. Moreover, a species-specific expansion of NBS-LRR genes was also observed in Cucurbitaceae.[86] These findings probably hint that these genes might have originated from NBS-encoding genes or resistance gene homologs. Comparative analysis of the CNL subfamily revealed that only 3 CNL genes were from Arabidopsis in the C4 branch, and the other members all belonged to S. pimpinellifolium which were abundant in this branch (Figure 4), suggesting that these 3 Arabidopsis genes were orthologous genes with those from S. pimpinellifolium. Also, the T8 branch belongs to this kind of clade (Figure 3).

Conclusions

In this study, a comprehensive and systematic analysis of the NBS-LRR genes of S. pimpinellifolium was conducted by the integration of chromosome distribution, phylogenetic relationships, and conserved motifs. The results revealed that a large number of NBS-LRR genes were present in gene clusters, and tandem duplication events were observed in these clusters. Phylogenetic analysis divided NBS-LRR genes into 2 distinct subfamilies, which was further supported by conservation or variation in motif compositions and functional divergence among clades. Moreover, a comparative analysis was performed in S. pimpinellifolium and Arabidopsis to reveal evolutionary relationships and functional characterization of the NBS-LRR gene family. This will provide a basis of information regarding the function research of NBS-LRR genes in S. pimpinellifolium, and possibly in other plant species. Click here for additional data file. Supplemental material, Figure_S1_xyz330564316c3b6 for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, Figure_S2_xyz33056e5992d7c for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, Table_S1_xyz33056b5977b4d for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, Table_S2_xyz33056ee822a92 for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, Table_S3_xyz330562220ba64 for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, Table_S4_xyz33056bbc1857b for Genomic Organization and Comparative Phylogenic Analysis of NBS-LRR Resistance Gene Family in Solanum pimpinellifolium and Arabidopsis thaliana by Huawei Wei, Jia Liu, Qinwei Guo, Luzhao Pan, Songlin Chai, Yuan Cheng, Meiying Ruan, Qingjing Ye, Rongqing Wang, Zhuping Yao, Guozhi Zhou and Hongjian Wan in Evolutionary Bioinformatics
  3 in total

Review 1.  The Tomato Interspecific NB-LRR Gene Arsenal and Its Impact on Breeding Strategies.

Authors:  Giuseppe Andolfo; Nunzio D'Agostino; Luigi Frusciante; Maria Raffaella Ercolano
Journal:  Genes (Basel)       Date:  2021-01-27       Impact factor: 4.096

2.  Oxford Nanopore and Bionano Genomics technologies evaluation for plant structural variation detection.

Authors:  Aurélie Canaguier; Romane Guilbaud; Erwan Denis; Ghislaine Magdelenat; Caroline Belser; Benjamin Istace; Corinne Cruaud; Patrick Wincker; Marie-Christine Le Paslier; Patricia Faivre-Rampant; Valérie Barbe
Journal:  BMC Genomics       Date:  2022-04-21       Impact factor: 4.547

3.  BLSSpeller to discover novel regulatory motifs in maize.

Authors:  Razgar Seyed Rahmani; Dries Decap; Jan Fostier; Kathleen Marchal
Journal:  DNA Res       Date:  2022-06-25       Impact factor: 4.477

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.