Literature DB >> 25586514

NLR-parser: rapid annotation of plant NLR complements.

Burkhard Steuernagel¹, Florian Jupe¹, Kamil Witek¹, Jonathan D G Jones¹, Brande B H Wulff¹.

Abstract

MOTIVATION: The repetitive nature of plant disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins hampers their prediction with standard gene annotation software. Motif alignment and search tool (MAST) has previously been reported as a tool to support annotation of NLR-encoding genes. However, the decision if a motif combination represents an NLR protein was entirely manual.
RESULTS: The NLR-parser pipeline is designed to use the MAST output from six-frame translated amino acid sequences and filters for predefined biologically curated motif compositions. Input reads can be derived from, for example, raw long-read sequencing data or contigs and scaffolds coming from plant genome projects. The output is a tab-separated file with information on start and frame of the first NLR specific motif, whether the identified sequence is a TNL or CNL, potentially full or fragmented. In addition, the output of the NB-ARC domain sequence can directly be used for phylogenetic analyses. In comparison to other prediction software, the highly complex NB-ARC domain is described in detail using several individual motifs.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2015 PMID： 25586514 PMCID： PMC4426836 DOI： 10.1093/bioinformatics/btv005

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Plants have evolved a multi-layered innate immune system to protect themselves against pests and pathogens (Jones and Dangl, 2006). Breeding efforts towards disease resistance in crops rely on the introgression of quantitative trait loci or major dominant disease resistance (R) genes from wild relatives (reviewed in (Dangl ). The largest class of R genes encodes nucleotide-binding domain leucine-rich repeat proteins (NLRs or NB-LRRs). These are key receptors that recognize secreted pathogen effector molecules or their effect in the plant. On recognition, these proteins commonly lead to a hypersensitive response in the form of local cell death to prevent further spread of pathogens relying on living tissue (Jones and Dangl, 2006). In dicotyledonous plants, NLR proteins come in two flavours that are determined by an N-terminal extension and internal amino acid motif composition. CNL proteins possess in most cases a coiled-coil domain followed by the highly conserved p-loop and RNBS-A motif (Meyers ). TNL proteins possess a Toll-interleukin receptor-like (TIR) domain followed by the p-loop but lack the RNBS-A motif. The TNL class is absent from monocotyledonous plants, like wheat and barley. A set of 20 NLR descriptive motifs have previously been identified using MEME (Bailey ), and were used in motif alignment and search tool (MAST) searches against predicted potato proteins (Jupe ). Originally set out to discover NLR sequences from members of the plant family Solanaceae, this set also contains two Triticeae specific motifs. The identification and annotation of the very large NLR gene family, with for example over 750 members in potato, is currently very laborious and time-consuming, as most automated gene callers fail to capture the full complement. Several studies have shown that these automated annotations miss up to 50% of the total NLR gene complement, or that full sequences are split into small fragments and then annotated as ‘partial’ (Meyers ; Jupe ; Andolfo ). There is, therefore, a clear need for an automated NLR annotation tool. Here, we present an NLR-MAST-parser, a java application for the identification of NLR-like sequences that uses the highly specific amino acid motif composition found in plant NLR gene products and parses this information into an easy-to-use tabular file. The impact of this tool comes from a high accuracy, reduction in hands-on time of NLR annotation projects and its independence from gene prediction software. We further provide evidence that it is functional in monocotyledonous and dicotyledonous plant species.

2 Methods

2.1 Motif composition discriminates NLRs

The amino acid motif composition of NLR gene products is highly conserved amongst all plant species, sufficient to separate these from other protein sequences and sufficient to separate the two main types of NLRs (TNL and CNL). We use 20 previously biologically characterized motifs (Jupe ) in the MAST tool to identify potential NLR encoding sequences. The NLR parser uses a variety of biologically defined input motif compositions to search the MAST xml-format output and report on confirmed NLRs only. These motif compositions can be found in the online manual.

2.2 Mast parser features

The annotation of NLR genes is a manual process that is simplified by several output features of this NLR parser. The MAST input is a protein sequence, which is usually not available from, for example de novo assembled genomes or NLR-enriched sequence data. The best procedure to identify NLRs in a set of sequences is to perform a translation into all six reading frames. The MAST Parser accepts a pattern, which splits a common prefix from frame-specific suffixes, as an input argument. That way, every nucleotide sequence can be annotated, regardless of the actual reading frame or even a shift of the frame. It has been shown that NLR genes are often under selection (Michelmore and Meyers, 1998), resulting in a large number of pseudogenes. We defined sets of motifs that indicate the completeness of an NLR gene. The output of the Mast Parser includes this annotation as a column. Finally, we add the class of each NLR, i.e. CNL or TNL, to the output.

2.3 TAIR validation

In a proof-of-concept study, we screened the available set of Arabidopsis thaliana TAIR proteins (TAIR10_pep_20101214) for NLR gene products using the here presented MAST pipeline. In total, we identified 266 from within 35 386 Arabidopsis proteins as partial or complete NLRs. The original TAIR protein annotation provides 219 sequences with one of the following annotation terms: ‘Toll-Interleukin-Resistance (TIR) domain’, ‘NB-ARC’ or ‘NBS-LRR’ and 212 of these were also identified with our MAST pipeline. Blastp analyses of the seven remaining proteins identified two false-negatives with an NB-ARC and LRR domains, but five that had neither an NB-ARC nor an LRR domain and thus can be excluded. Detailed analysis shows that the two false-negatives correspond to the ancient and small group of NLRs with similarity to ADR1 (Chini and Loake, 2005). Here, the discriminatory Motif 8 had a P-value of 8e−5 and was, therefore, discarded. We, therefore, observe a sensitivity of more than 99%. We found five complete NLRs with the NLR-Parser that were not annotated accordingly in TAIR. We validated the structure of those proteins by scanning for TIR, NB-ARC and LRR-related PFAM domains using HMMER (Eddy, 2011) and found consistently an NB-ARC domain and LRRs in each of the protein sequences (Supplementary Table 1). Therefore, we report a 100% specificity for the NLR-Parser.

2.4 Monocot validation

We further tested the MEME motifs in our NLR-parser for their functionality in monocotyledonous plant genomes and screened the publicly available set of annotated genes from Brachypodium distachyon. The NLR-parser pipeline identified 586 partial or complete NLRs. All 190 proteins that the NLR-parser annotated as complete NLRs have previously been annotated as resistance genes (http://phytozome.jgi.doe.gov/). The general quality of the Brachypodium annotation, relying on similarity to Arabidopsis and rice does not allow a precise estimation of sensitivity and selectivity. However, there is a good consistency between annotation, found PFAM domains and NLR-Parser. Eight genes with NB-ARC domain and LRR have not been found by the NLR-Parser, including an ADR1-like. Conversely, the NLR-Parser annotated 47 proteins as complete NLRs while HMMER only detected the NB-ARC domain, not any LRR (Supplementary Table 2).

3 Discussion

Due to the biological importance and relevance for breeding, the identification and annotation of NLR-type disease resistance genes has high priority in all plant genome sequencing projects. These annotations, however, rely heavily on gene-prediction software. In the past, we were able to show that up to 50% of the total NLR complement was either wrongly predicted or completely missing. Our MAST Parser tool provides high precision identification of NLR gene sequences from every input format that is available from genome sequencing projects including contigs, scaffolds, pseudomolecules or chromosomes. In two experiments with the model plants A. thaliana and B. distachyon, we were able to show the functionality of the 20 well-characterized MEME motifs in monocotyledonous and dicotyledonous plants. The output of this tool is directly usable for downstream applications including phylogenetic analyses, or visualization on the corresponding reference sequence. The tab delimited output format is publishable as a Supplementary Table.

4 Conclusion

The MAST Parser pipeline that we present here will streamline NLR identification efforts within genome sequencing projects in monocotyledonous and dicotyledonous plants.

10 in total

Review 1. The plant immune system.

Authors: Jonathan D G Jones; Jeffery L Dangl
Journal: Nature Date: 2006-11-16 Impact factor: 49.962

Review 2. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process.

Authors: R W Michelmore; B C Meyers
Journal: Genome Res Date: 1998-11 Impact factor: 9.043

3. Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis.

Authors: Blake C Meyers; Alexander Kozik; Alyssa Griego; Hanhui Kuang; Richard W Michelmore
Journal: Plant Cell Date: 2003-04 Impact factor: 11.277

4. Motifs specific for the ADR1 NBS-LRR protein family in Arabidopsis are conserved among NBS-LRR sequences from both dicotyledonous and monocotyledonous plants.

Authors: Andrea Chini; Gary J Loake
Journal: Planta Date: 2005-05-12 Impact factor: 4.116

Review 5. Pivoting the plant immune system from dissection to deployment.

Authors: Jeffery L Dangl; Diana M Horvath; Brian J Staskawicz
Journal: Science Date: 2013-08-16 Impact factor: 47.728

6. Accelerated Profile HMM Searches.

Authors: Sean R Eddy
Journal: PLoS Comput Biol Date: 2011-10-20 Impact factor: 4.475

7. MEME SUITE: tools for motif discovery and searching.

Authors: Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble
Journal: Nucleic Acids Res Date: 2009-05-20 Impact factor: 16.971

8. Identification and localisation of the NB-LRR gene family within the potato genome.

Authors: Florian Jupe; Leighton Pritchard; Graham J Etherington; Katrin Mackenzie; Peter J A Cock; Frank Wright; Sanjeev Kumar Sharma; Dan Bolser; Glenn J Bryan; Jonathan D G Jones; Ingo Hein
Journal: BMC Genomics Date: 2012-02-15 Impact factor: 3.969

9. Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations.

Authors: Florian Jupe; Kamil Witek; Walter Verweij; Jadwiga Sliwka; Leighton Pritchard; Graham J Etherington; Dan Maclean; Peter J Cock; Richard M Leggett; Glenn J Bryan; Linda Cardle; Ingo Hein; Jonathan D G Jones
Journal: Plant J Date: 2013-10-08 Impact factor: 6.417

10. Defining the full tomato NB-LRR resistance gene repertoire using genomic and cDNA RenSeq.

Authors: Giuseppe Andolfo; Florian Jupe; Kamil Witek; Graham J Etherington; Maria R Ercolano; Jonathan D G Jones
Journal: BMC Plant Biol Date: 2014-05-05 Impact factor: 4.215

10 in total

43 in total

1. Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence.

Authors: Long Wang; Lina Zhao; Xiaohui Zhang; Qijun Zhang; Yanxiao Jia; Guan Wang; Simin Li; Dacheng Tian; Wen-Hsiung Li; Sihai Yang
Journal: Proc Natl Acad Sci U S A Date: 2019-08-26 Impact factor: 11.205

2. Rapid cloning of disease-resistance genes in plants using mutagenesis and sequence capture.

Authors: Burkhard Steuernagel; Sambasivam K Periyannan; Inmaculada Hernández-Pinzón; Kamil Witek; Matthew N Rouse; Guotai Yu; Asyraf Hatta; Mick Ayliffe; Harbans Bariana; Jonathan D G Jones; Evans S Lagudah; Brande B H Wulff
Journal: Nat Biotechnol Date: 2016-04-25 Impact factor: 54.908

3. Convergent Loss of an EDS1/PAD4 Signaling Pathway in Several Plant Lineages Reveals Coevolved Components of Plant Immunity and Drought Response.

Authors: Erin L Baggs; J Grey Monroe; Anil S Thanki; Ruby O'Grady; Christian Schudoma; Wilfried Haerty; Ksenia V Krasileva
Journal: Plant Cell Date: 2020-05-14 Impact factor: 11.277

4. NLR-Annotator: A Tool for De Novo Annotation of Intracellular Immune Receptor Repertoire.

Authors: Wei Zhang
Journal: Plant Physiol Date: 2020-06 Impact factor: 8.340

5. Identifying mutations in sd1, Pi54 and Pi-ta, and positively selected genes of TN1, the first semidwarf rice in Green Revolution.

Authors: Jerome P Panibe; Long Wang; Yi-Chen Lee; Chang-Sheng Wang; Wen-Hsiung Li
Journal: Bot Stud Date: 2022-03-26 Impact factor: 2.787

6. Two NLR immune receptors acquired high-affinity binding to a fungal effector through convergent evolution of their integrated domain.

Authors: Aleksandra Białas; Thorsten Langner; Adeline Harant; Mauricio P Contreras; Clare Em Stevenson; David M Lawson; Jan Sklenar; Ronny Kellner; Matthew J Moscou; Ryohei Terauchi; Mark J Banfield; Sophien Kamoun
Journal: Elife Date: 2021-07-21 Impact factor: 8.140

7. The NLR-Annotator Tool Enables Annotation of the Intracellular Immune Receptor Repertoire.

Authors: Burkhard Steuernagel; Kamil Witek; Simon G Krattinger; Ricardo H Ramirez-Gonzalez; Henk-Jan Schoonbeek; Guotai Yu; Erin Baggs; Agnieszka I Witek; Inderjit Yadav; Ksenia V Krasileva; Jonathan D G Jones; Cristobal Uauy; Beat Keller; Christopher J Ridout; Brande B H Wulff
Journal: Plant Physiol Date: 2020-03-17 Impact factor: 8.340

8. Accelerated cloning of a potato late blight-resistance gene using RenSeq and SMRT sequencing.

Authors: Kamil Witek; Florian Jupe; Agnieszka I Witek; David Baker; Matthew D Clark; Jonathan D G Jones
Journal: Nat Biotechnol Date: 2016-04-25 Impact factor: 54.908

9. Fine mapping of Aegilops peregrina co-segregating leaf and stripe rust resistance genes to distal-most end of 5DS.

Authors: Deepika Narang; Satinder Kaur; Burkhard Steuernagel; Sreya Ghosh; Roopan Dhillon; Mitaly Bansal; Cristobal Uauy; Brande B H Wulff; Parveen Chhuneja
Journal: Theor Appl Genet Date: 2019-01-31 Impact factor: 5.699

10. Aegilops umbellulata introgression carrying leaf rust and stripe rust resistance genes Lr76 and Yr70 located to 9.47-Mb region on 5DS telomeric end through a combination of chromosome sorting and sequencing.

Authors: Mitaly Bansal; Nikolai M Adamski; Puneet Inder Toor; Satinder Kaur; István Molnár; Kateřina Holušová; Jan Vrána; Jaroslav Doležel; Miroslav Valárik; Cristobal Uauy; Parveen Chhuneja
Journal: Theor Appl Genet Date: 2020-01-02 Impact factor: 5.699