| Literature DB >> 31585525 |
Jiorgos Kourelis1, Farnusch Kaschani2, Friederike M Grosse-Holz1, Felix Homma1, Markus Kaiser2, Renier A L van der Hoorn3.
Abstract
BACKGROUND: Nicotiana benthamiana is an important model organism of the Solanaceae (Nightshade) family. Several draft assemblies of the N. benthamiana genome have been generated, but many of the gene-models in these draft assemblies appear incorrect.Entities:
Keywords: Genome annotation; Nicotiana benthamiana; Proteomics; Solanaceae; Subtilases
Mesh:
Substances:
Year: 2019 PMID: 31585525 PMCID: PMC6778390 DOI: 10.1186/s12864-019-6058-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Bioinformatics pipeline for improved Nicotiana benthamiana proteome annotation. The predicted proteins of Nicotiana species generated by the NCBI Eukaryotic Genome Annotation Pipeline were retrieved from Genbank and clustered at 95% identity threshold to reduce redundancy (Step 1), and used to annotate the Niben1.0.1 genome assembly (Step 2). Only those proteins with an alignment coverage ≥60% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 3) to produce the NbD core dataset. Similarly, the other draft genome assemblies were annotated (Step 4), and only those proteins with an alignment coverage ≥90% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 5). CD-HIT-2D was used at 100% sequence identity to retain proteins missing in NbD dataset (Step 6), resulting in supplemental dataset NbE. NbD and NbE can be combined (NbDE) to maximise the spectra annotation for proteomics experiments
Fig. 2Increased lengths, coverage and annotation of N. benthamiana proteins. a NbD/NbDE datasets have relatively few entries when compared to preceding datasets. b NbD/NbDE datasets contain nearly all benchmark genes as full-length genes, according to Benchmarking Universal Single-Copy Orthologs (BUSCO) of embryophyta. c The NbD/NbDE datasets have higher number of annotated PFAM domains. d NbD/NbDE datasets have relatively longer protein lengths. Violin and boxplot graph of log10 protein length distribution of each dataset. Jittered dots show the raw underlying data. e NbD/NbDE annotated proteins have a higher percentage coverage to the tomato proteins as determined by BLASTP
Fig. 3NbD/NbDE datasets outperform the annotation of spectra in proteomics. a Percentage of annotated MS/MS spectra in total leaf extract samples. b Average number of unique peptides assigned per protein in the different datasets. a and b Means and standard error of the mean are shown for four biological replicates of total leaf extracts. c Mis-annotations of papain-like Cys proteases (PLCPs) detected by activity-based protein profiling [54]. Leaf extracts were labelled with activity-based probes for PLCPs and labelled proteins were purified and analysed by MS. Shown are the protein annotations found in the NbDE (top) Niben1.0.1 (middle) and curated datasets (bottom), highlighting mis-annotations (red) caused by partial transcripts, mis-annotation of exon-intron boundaries, and mis-assemblies
Fig. 4Examples of subtilase mis-annotations in the different genome assemblies. a Gene-models corresponding to subtilase NbE05066806 and the corresponding annotations in the various datasets. This subtilase gene is fragmented in Niben1.0.1; truncated in Nbv0.5; and carries two SNPs and an extra sequence in Niben0.4.4. b Gene-models corresponding to subtilase NbE03059263 and the corresponding annotations in the various datasets. This subtilase has an inactivated homeolog (dark grey) that was not retained in the NbDE dataset as it encodes a protein with < 60% coverage because it contains premature stop codons. The truncated proteins caused mis-assembly in the Niben1.0.1 dataset, resulting in a hybrid sequence. Mis-annotated exon-intron boundaries also effected gene models in Niben1.0.1, Niben0.4.4 and Nbv5.1. Peptides matched to the different gene models are indicated below the gene models
Subtilase gene number according to familya
|
|
|
| |
|---|---|---|---|
| SBT1 | 9 | 53 | 37 (24) |
| SBT2 | 6 | 4 | 9 |
| SBT3 | 17 (1) | 1 | 0 |
| SBT4b | 16 (1) | 3 | 8 (8) |
| SBT5 | 6 | 8 (8) | |
| SBT6 | 1 | fragmented | 1 |
| SBT7 | 1 | 1 | 2 (3) |
| Total | 54 (2) | 68 | 65 (43) |
a truncated/inactive proteins in brackets; b including SBT5.2
Fig. 5Birth and death of subtilase paralogs in N. benthamiana. The evolutionary history of the subtilase gene family was inferred by using the Maximum Likelihood method based on the Whelan and Goldman model. The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analysed. Non-functional subtilases are indicated in grey. Subtilases identified in apoplastic fluid (AF) and/or total extract (TE) are indicated with yellow and green dots, respectively. Naming of subtilase clades is according to [51]. Additional file 1: Figure S2 includes the individual names
Fig. 6Annotation of the N. benthamiana apoplastic proteome. a Correlation matrix heat map of the log2 transformed LFQ intensity of protein groups in the four biological replicates of apoplastic fluid (AF) and total extract (TE) samples. Biological replicates are clustered on similarity. b A volcano plot is shown plotting log2 fold difference (log2FC) of AF/TE over –log10 BH-adjusted moderated p-values. Proteins with log2FC ≥ 1.5 and p ≤ 0.01 were considered apoplastic, as well as those only found in AF. Conversely, proteins with a log2FC ≤ 1.5 and p ≤ 0.01 were considered intracellular, as well as those found only in TE. c Percentage of proteins in each fraction annotated with biological process-associated GO-SLIM terms. d Percentage of proteins in each fraction annotated with molecular function-associated GO-SLIM terms. c and d GO-SLIM annotations are shown when significantly enriched or depleted (BH-adjusted hypergeometric test, p < 0.05) in at least one of the fractions (AF, TE, or both). Each bubble indicates the percentage of genes containing that specific GO-SLIM annotation in that compartment. Colours indicate whether the GO-SLIM annotations are enriched or depleted in that fraction (p < 0.05, n.s., non-significant).
Genomes and gene annotations useda
| Species | Genome build | Annotation | Reference |
|---|---|---|---|
| GCF_000001735.4 | RefSeq | Arabidopsis Genome Initiative, 2000 | |
|
| GCF_000511025.2 | NCBI | Dohm et al., 2014 |
| GCA_000512255.2 | Pepper.v.1.55.proteins.annotated | [ | |
| GCF_000710875.1 | NCBI | [ | |
|
| GCA_000950795.1 | CaChiltepin.pep | [ |
| GCF_001625215.1 | NCBI | [ | |
| GCF_001879085.1 | NCBI | [ | |
|
| Niben1.0.1 | Niben101_annotation.proteins.wdesc | [ |
|
| Niben0.4.4 | Niben.genome.v0.4.4.proteins.wdesc | [ |
|
| Nbv0.3 | – | [ |
|
| Nbv0.5 | – | [ |
|
| Nbv5.1 transcriptome | Nbv5.1_transcriptome_primary_alternate_correct | [ |
| GCA_002018475.1 | NIOBT_r1.0 | [ | |
|
| GCF_000393655.1 | NCBI | [ |
| GCF_000715135.1 | NCBI | [ | |
|
| GCF_000390325.1 | NCBI | [ |
| Petunia_axillaris_v1.6.2 | [ | ||
| Petunia_inflata_v1.0.1 | [ | ||
| GCF_000188115.4 | NCBI | [ | |
| GCA_000787875.1 | SME_r2.5.1_pep | [ | |
|
| GCF_001406875.1 | NCBI | [ |
| GCF_000226075.1 | NCBI | [ |
a Where available the NCBI assembly accession and annotation was taken