| Literature DB >> 34669691 |
Jiorgos Kourelis1, Toshiyuki Sakai1, Hiroaki Adachi1, Sophien Kamoun1.
Abstract
Reference datasets are critical in computational biology. They help define canonical biological features and are essential for benchmarking studies. Here, we describe a comprehensive reference dataset of experimentally validated plant nucleotide-binding leucine-rich repeat (NLR) immune receptors. RefPlantNLR consists of 481 NLRs from 31 genera belonging to 11 orders of flowering plants. This reference dataset has several applications. We used RefPlantNLR to determine the canonical features of functionally validated plant NLRs and to benchmark 5 NLR annotation tools. This revealed that although NLR annotation tools tend to retrieve the majority of NLRs, they frequently produce domain architectures that are inconsistent with the RefPlantNLR annotation. Guided by this analysis, we developed a new pipeline, NLRtracker, which extracts and annotates NLRs from protein or transcript files based on the core features found in the RefPlantNLR dataset. The RefPlantNLR dataset should also prove useful for guiding comparative analyses of NLRs across the wide spectrum of plant diversity and identifying understudied taxa. We hope that the RefPlantNLR resource will contribute to moving the field beyond a uniform view of NLR structure and function.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34669691 PMCID: PMC8559963 DOI: 10.1371/journal.pbio.3001124
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 3Domain architecture of the RefPlantNLRs.
Bar chart of the domain architecture of (A) RefPlantNLRs (N = 481), or (B) the per genus reduced redundancy RefPlantNLR set at an overall 90% amino acid similarity per genus (N = 303). C) Schematic representation of domain architecture. Used InterPro signatures for each of the domains are highlighted in the Material and methods. There is currently no InterProScan signature or motif for the CCG10 N-terminal domain. Underlying data and R code to reproduce the figures in . CC, coiled-coil; LRR, leucine-rich repeat; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.
Fig 4Phylogenetic diversity of RefPlantNLR sequences.
The tree, based on the NB-ARC domain, was inferred using the Maximum Likelihood method based on the JTT model [44]. The tree with the highest log likelihood is shown. NLRs with identical NB-ARC domains are collapsed, while for those with multiple NB-ARC domains, the NB-ARC are numbered according to order in the protein. The tree was rooted on the non-plant NLR outgroup. The TIR-NLR, CC-NLR, CCR-NLR, and CCG10-NLR subclades are indicated. Domain architecture is shown as in Fig 3. CC, coiled-coil; C-JID, C-terminal jelly roll/Ig-like domain; JTT, Jones–Taylor–Thornton; LRR, leucine-rich repeat; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor.
NLR annotation tools.
| Output | RefPlantNLR ( | ||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| AA/transcripts | Coils, custom HMM models, TMHMM | No | 100%/ | 45.2% | |
| Transcripts/Genomic | Coils, InterProScan | Yes | 98.0% | 31.5% | |
| Transcripts/Genomic | NLR motif MEME | Yes | 98.0% | 88.2% | |
| AA | Coils, InterProScan, Pfam, Phobius | No | 96.9% | 61.1% | |
| AA/transcripts | Coils, InterProScan | No | 95.4% | 61.9% | |
|
| AA/transcripts | InterProScan, NLR motif MEME | Yes | 100% | 100% |
*AA/CDS input.
**CDS/Genomic input. Gene models were available for 407 NLRs.
CDS, coding sequence; HMM, Hidden Markov model; NB-ARC, nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4; NLR, nucleotide-binding leucine-rich repeat.
Extraction of NLRs from the Arabidopsis, tomato, an d rice RefSeq proteomes.
|
|
|
|
|
|---|---|---|---|
|
| AA/transcripts | 94.5% | 94.4% |
|
| Transcripts/Genomic | 76.3% | 100% |
|
| Transcripts/Genomic | 88.4% | 100% |
|
| AA | 92.6% | 99.1% |
|
| AA/transcripts | 91.1% | 99.5% |
|
| AA/transcripts | 99.8% | 100% |
*Percentage of retrieved sequences being genuine NLRs.
NLR, nucleotide-binding leucine-rich repeat.