Literature DB >> 33977216

A consensus on the Aquaporin Gene Family in the Allotetraploid Plant, Nicotiana tabacum.

Michael Groszmann1, Annamaria De Rosa1, Jahed Ahmed2, François Chaumont2, John R Evans1.   

Abstract

Aquaporins (AQPs) are membrane-spanning channel proteins with exciting applications for plant engineering and industrial applications. Translational outcomes will be improved by better understanding the extensive diversity of plant AQPs. However, AQP gene families are complex, making exhaustive identification difficult, especially in polyploid species. The allotetraploid species of Nicotiana tabacum (Nt; tobacco) plays a significant role in modern biological research and is closely related to several crops of economic interest, making it a valuable platform for AQP research. Recently, De Rosa et al., (2020) and Ahmed et al., (2020), concurrently reported on the AQP gene family in tobacco, establishing family sizes of 76 and 88 members, respectively. The discrepancy highlights the difficulties of characterizing large complex gene families. Here, we identify and resolve the differences between the two studies, clarify gene models, and yield a consolidated collection of 84 members that more accurately represents the complete NtAQP family. Importantly, this consensus NtAQP collection will reduce confusion and ambiguity that would inevitably arise from having two different descriptive studies and sets of NtAQP gene names. This report also serves as a case study, highlighting and discussing variables to be considered and refinements required to ensure comprehensive gene family characterizations, which become valuable resources for examining the evolution and biological functions of genes.
© 2021 The Authors. Plant Direct published by American Society of Plant Biologists and the Society for Experimental Biology and John Wiley & Sons Ltd.

Entities:  

Keywords:  Aquaporins; gene family characterization; gene structure and evolution; major intrinsic proteins; orthologs; phylogenetics

Year:  2021        PMID: 33977216      PMCID: PMC8104905          DOI: 10.1002/pld3.321

Source DB:  PubMed          Journal:  Plant Direct        ISSN: 2475-4455


INTRODUCTION

AQUAPORINS – channel proteins with complex gene families and versatile applications

Aquaporins (AQPs) constitute a major family of membrane‐spanning channel proteins that selectively facilitate the passive bidirectional movement of substrates across biological membranes. AQPs are found across all phylogenetic kingdoms but are by far the most diversified in plants, transporting a range of solutes essential for numerous plant processes including, water relations, growth, stress responses, nutrient uptake, and photosynthesis (Chaumont & Tyerman, 2017). Plant AQPs are a complex gene family comprised of several subfamilies, some of which only exist in older plant lineages (e.g., mosses, green algae, and diatoms) and others having emerged in higher order plants (Groszmann et al., 2017; Laloux et al., 2018). Among angiosperms, the AQP family is typically comprised of PIPs, TIPs, NIPs, SIPs, and XIPs, with each subfamily characteristically localizing to different subcellular membranes and transporting somewhat unique sets of substrates. A series of subtypes and a high multiplicity of isoforms have evolved in plants, particularly in the PIPs, TIPs, and NIPs, with AQP families in angiosperms ranging from 30 to 120 members (Groszmann et al., 2017; Laloux et al., 2018). This rich diversification, assorted complement of transported substrates, and an increasing understanding of the many developmental and stress‐responsive processes that AQPs are involved in has resulted in a keen interest in targeting AQPs for engineering more resilient and productive plants, as well as for use in biomimetic membranes for industrial filtration applications (Hélix‐Nielsen, 2018; Tang et al., 2015). Expanding insight into the diversity and native biological roles of plant AQPs will improve their successful manipulation for translational outcomes.

Genomic sequencing and gene family characterizations

From their inception, whole‐genome datasets have been used to identify and examine gene families. The rapidly growing collection of fully sequenced plant genomes, including many crop species, represent incredible opportunities for comparative analysis. Provenance, expansion, and evolutionary history of gene families across species can inform us about biological functions. Numerous studies characterizing AQP gene families in plants have been reported. These cover a wide range of species and are contributing valuable insight into the evolution and diversity of plant AQPs. Coupling these with transcriptomic and functional studies is helping to reveal the many native roles of plant AQPs as well as identifying possible targets that could be manipulated for improving crop productivity. Understanding AQP biology and harnessing the potential of their diversity for engineering efforts relies foremost on the identification of complete gene families, with accurate gene and protein models that allow for the correct assignment of homologs and/or orthologs. Accurate curation of large gene families is time‐consuming, laborious, and requires an understanding of the strengths and limitations of datasets and methods being used. Analysis often commences with an iterative process of manual and computational analysis, often combining numerous BLAST searches to identify members of a given family. This is followed by the manual curation of gene models that are typically validated against RNA‐seq data. The encoded proteins can also be validated for the presence of conserved signature motifs, secondary/tertiary conformation and, if available, against curated orthologs from closely related species. Characterizing AQP gene families can be quite complex, given the occurrence of several subfamilies that are further divided into subgroups each with numerous isoforms (Groszmann et al., 2017; Laloux et al., 2018). This is made additionally difficult when examining polyploid species and their multiplicity in genomic content. Recently two concurrently published studies, De Rosa et al. (2020) and Ahmed et al. (2020), characterized the AQP gene family in the allotetraploid species of Nicotiana tabacum. Tobacco is a popular model species capable of scaling from the laboratory to the field. It belongs to the Solanaceae family of angiosperms and is therefore closely related to crops of economic interest such as tomatoes, potatoes, eggplants, and peppers. Tobacco itself has renewed commercial applications in the biofuel and plant‐based pharmaceutical sectors. The Nicotiana genus, to which tobacco belongs, has interesting attributes for studying gene family evolution. Its taxonomy is well characterized with approximately 75 species organized into several sections; several key members of which have whole‐genome sequencing data available (Bombarely et al., 2012; Schiavinato et al.2019, 2020; Sierro et al., 2014; Xu et al., 2017). Polyploids formed by interspecific hybridization reside within five of the sections (Clarkson et al., 2004; Leitch et al., 2008). The inception of these polyploids differs in both age and relatedness of the contributing parental species, providing opportunities to examine gene family changes at different stages of polyploid evolution (Leitch et al., 2008). For example, the allotetraploid Nicotiana benthamiana (section Suaveolentes), commonly used in labs, emerged ~10 MYA and is in an advanced stage of long‐term genome diploidization with extensive rearrangements of its subgenomes (Schiavinato et al., 2020). Whereas, the allotetraploid N. tabacum, which arose from a hybridization event between N. sylvestris and N. tomentosiformis, only emerged ~0.2 MYA and is still in a highly “duplicated” state (Edwards et al., 2017; Sierro et al., 2014). Accordingly, both De Rosa et al. (2020) and Ahmed et al. (2020) identified N. tabacum as having a large AQP gene family, currently, second only to the allopolyploid Brassica napus (Canola) (Sonah et al., 2017; Yuan et al., 2017). De Rosa et al. established 76 members of the NtAQP family, while Ahmed et al. reported 88 full‐length AQP genes. The discrepancy highlights the difficulties of characterizing large complex gene families. Here, we compare the two sets of NtAQP genes from De Rosa et al and Ahmed et al, clarify the discrepancies, and arrive at a consensus collection more accurately representing a complete NtAQP gene family to better guide evolutionary and functional studies.

RESULTS AND DISCUSSION

Determining an assured consensus NtAQP gene family

Each study applied slightly different approaches toward identifying a mostly similar collection of AQP genes. Briefly, De Rosa et al. relied on genomic sequence and computed annotations generated via the Solanaceae Genomics Network and performed local BLASTp and BLASTn searches using published potato and tomato AQP genes/protein sequences as queries. Importantly, De Rosa et al. focused on a single cultivar (TN90) for its primary discovery and the K326 cultivar for secondary confirmation. Gene models were manually refined and validated by comparisons against orthologs and against distribution patterns of mRNA‐seq reads mapped to the tobacco TN90 genomic sequence. Ahmed et al. used potato and tomato AQP protein sequences as queries in BLASTp searches through the NCBI portal which draws from the NCBI Reference Sequence (RefSeq) collection of gene/protein annotations (O'Leary et al., 2016). Ahmed et al. relied on RefSeq annotations, which for plant genes are largely predicted by automated computational analysis, and where available, incorporates RNAseq or EST data for validation (O'Leary et al., 2016). Sequences of all gene/protein models reported in the two studies were imported into Geneious (ver. 11.1.5) and compared using sequence MUSCLE alignments and Neighbor‐Joining phylogenetic trees to identify outliers and discrepancies. Flagged annotations were then manually investigated to determine their points of difference. This included scrutinizing the origins/sources of the genetic data, and inspecting splice variants using RNAseq distributions in the RefSeq browser and comparison against models from parental, tomato, and potato genomes. From this analysis, we were able to clarify all discrepancies between the two studies and refine the curation of the NtAQP gene family (Table 1; Figure 1).
TABLE 1

Comparing NtAQP gene annotations from De Rosa et al. (2020) and Ahmed et al. (2020) and deriving a consensus list of 84 members that more accurately represents the complete NtAQP family

This studyDe Rosa et al. (2020)Ahmed et al. (2020)Description of discrepancy
Consensus NtAQP familyGene ID a NCBI Accession (1)NCBI Accession (2)StatusGene ID b NCBI Accession (3)Status
NtPIP1;1s NtPIP1;1s BK011392 NtPIP1;7 NP_001312824.1
NtPIP1;1t NtPIP1;1t BK011393 NtPIP1;8 NP_001312222.1
NtPIP1;3 AAB04757.1Duplicated annotation. Additional copy NtPIP1;1t from Wisconsin 38 cultivar. Polymorphisms exist compared to cv. TN90.
NtPIP1;2s NtPIP1;2s BK011394 NtPIP1;12 NP_001312721.1
NtPIP1;2t NtPIP1;2t BK011395 NtPIP1;13 XP_016510215.1
NtPIP1;3s NtPIP1;3s BK011396 NtPIP1;1 NP_001313131.1NP_001313131.1 sequence is from cv. Petit Havana SR1; polymorphisms exist with cv. TN90.
NtPIP1;3t NtPIP1;3t BK011397 NtPIP1;6 XP_016476491.1
NtPIP1;5s NtPIP1;5s BK011398 NtPIP1;4 NP_001312189.1
NtPIP1;5 CAA04750.1Duplicated annotation. Identical to NtPIP1;5s (DeRosa, 2020) and NtPIP1;4 (Ahmed, 2020), a lab based submission with a likley non‐synonomous sequencing error of a strongly conserved functional residue in the widely studied NtAQP1; detailed in De Rosa et al. (2020).
NtPIP1;5t NtPIP1;5t BK011399 NtPIP1;2 XP_016508253.1
NtPIP1;7t NtPIP1;7t BK011401 NtPIP1;9 NP_001312921.1
NtPIP1;8s NtPIP1;8s BK011400 NtPIP1;10 XP_016458231.1
NtPIP2;1s NtPIP2;1s BK011402 NtPIP2;15 NP_001312333.1
NtPIP2;1x NtPIP2;1x BK011403 NtPIP2;13 NP_001312334.1Unlikely extended N‐terminus. Not supported by orthologs in other Solanaceae species and upstream AUG is an unfavorable Kozak's context.
NtPIP2;2t NtPIP2;2t BK011404 NtPIP2;16 XP_016513533.1
NtPIP2;3t NtPIP2;3t BK011405 NtPIP2;14 XP_016486700.1Unlikely extended N‐terminus. Not supported by orthologs in other Solanaceae species and upstream AUG is an unfavorable Kozak's context.
NtPIP2;4s NtPIP2;4s BK011406 NtPIP2;20 NP_001311719.1
NtPIP2;4t NtPIP2;4t BK011407 NtPIP2;21 NP_001311765.1
NtPIP2;5s NtPIP2;5s BK011408 NtPIP2;12 NP_001312276.1
NtPIP2;5t NtPIP2;5t BK011409 NtPIP2;11 NP_001311701.1
NtPIP2;6s NtPIP2;6s BK011410 NtPIP2;18 NP_001313066.1
NtPIP2;6t NtPIP2;6t BK011411 NtPIP2;17 NP_001312464.1
NtPIP2;7t NtPIP2;7t BK011412 NtPIP2;19 NP_001313208.1
NtPIP2;8s NtPIP2;8s BK011413 NtPIP2;3 NP_001312414.1
NtPIP2;8t NtPIP2;8t BK011414 NtPIP2;2 NP_001313091.1
NtPIP2;1 AAL33586.1Duplicated annotation. Additional copy of NtPIP2;8t from Petit Havana SR1 cultivar.
NtPIP2;9s NtPIP2;9s BK011415 NtPIP2;5 NP_001312874.1
NtPIP2;9t NtPIP2;9t BK011416 NtPIP2;4 NP_001312350.1
NtPIP2;11s NtPIP2;11s BK011417 NtPIP2;6 XP_016477641.1
NtPIP2;11bs NtPIP2;11spseudo NtPIP2;7 NP_001313061.1 Full length gene not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtPIP2;11t NtPIP2;11t BK011418 NtPIP2;8 XP_016476355.1
NtPIP2;13s NtPIP2;13s BK011419 NtPIP2;10 XP_016494749.1
NtPIP2;13t NtPIP2;13t BK011420 NtPIP2;9 NP_001312511.1
NtPIP1;11 XP_016515710.1Identified as a pseudo gene (NtPIP1;2bspseudo) with C‐terminal truncations in De Rosa et al. (2020).
NtTIP1;1s NtTIP1;1s BK011426 NtTIP1;3 NP_001312871.1
NtTIP1;1t NtTIP1;1t BK011427 NtTIP1;2 NP_001312131.1NP_001312131.1 sequence is from cv. Bright Yellow 2; polymorphisms exist with cv. TN90.
NtTIP1;1 BAF95576.1Duplicated annotation. Additional copy of NtTIP1;1t from Bright Yellow 2 cultivar.
NtTIP1;2s NtTIP1;2s BK011428 NtTIP1;6 XP_016487055.1
NtTIP1;2t NtTIP1;2t BK011429 NtTIP1;5 XP_016501711.1
NtTIP1;3s NtTIP1;3s BK011431 NtTIP1;9 XP_016450483.1
NtTIP1;3t NtTIP1;3t BK011430 NtTIP1;8 XP_016495978.1
NtTIP1;4t NtTIP1;4t BK011432 NtTIP1;4 XP_016513281.1
NtTIP1;5t NtTIP1;7 XP_016471957.1 Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtTIP2;1s NtTIP2;1s BK011433 XP_016462920.1 Not identified in Ahmed et al. (2020).
NtTIP2;1t NtTIP2;1t BK011434 NtTIP2;3 XP_016503582.1
NtTIP2;2s NtTIP2;2s BK011435 NtTIP2;7 XP_016481958.1
NtTIP2;2t NtTIP2;6 XP_016445220.1 Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtTIP2;3s NtTIP2;3s BK011436 NtTIP2;9 XP_016481922.1
NtTIP2;3t NtTIP2;3t BK011437 NtTIP2;8 NP_001312940.1
NtTIP2;10 P24422.2Duplicated annotation. Additional copy of NtTIP2;3s from Wisconsin 38 cultivar.
NtTIP2;4s NtTIP2;4s BK011438 NtTIP2;5 XP_016515893.1
NtTIP2;4t NtTIP2;4 XP_016480756.1 Not identified in De Rosa et al. (2020); not present in the TN90 cultivar genomic sequence. Identified in Ahmed et al. (2020) using MSK326 cultivar.
NtTIP2;5s NtTIP2;5s BK011439 NtTIP2;1 NP_001312646.1
NtTIP2;5t NtTIP2;5t BK011440 NtTIP2;2 XP_016495734.1
NtTIP3;1s NtTIP3;1s BK011441 NtTIP3;3 XP_016436583.1
NtTIP3;1t NtTIP3;1t BK011442 NtTIP3;4 XP_016500896.1
NtTIP3;2t NtTIP3;2t BK011443 NtTIP3;2 XP_016491898.1
NtTIP3;1 XP_016491554.1Duplicated annotation. Additional copy of NtTIP3;2t from MSK326 cultivar.
NtTIP4;1s NtTIP4;1s BK011444 NtTIP4;2 XP_016441470.1
NtTIP4;1t NtTIP4;1t BK011445 NtTIP4;1 NP_001311953.1
NtTIP5;1s NtTIP5;1s BK011446 NtTIP5;2 XP_016485861.1
NtTIP5;1t NtTIP5;1t BK011447 NtTIP5;1 XP_016462485.1
NtNIP1;1s NtNIP1;1s BK011376 Not identified in Ahmed et al. (2020).
NtNIP1;2s NtNIP1;2s BK011377 NtNIP1;1 XP_016487110.1
NtNIP1;2t NtNIP1;2t BK011378 XP_016445610.1 NtNIP1;2 XP_016445609.1Extended N‐terminal splice variant of NtNIP1;2t; not supported by RNA‐seq and models of parental and Solanaceae orthologs. Browser link: https://go.usa.gov/x7cQk
NtNIP2;1s NtNIP2;1s BK011379 NtNIP2;1 XP_016451246.1
NtNIP3;1s NtNIP3;1s BK011380 NtNIP3;2 XP_016515586.1
NtNIP3;1t NtNIP3;1 XP_016460638.1 Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtNIP4;1s NtNIP4;1s BK011381 NtNIP4;3 XP_016491262.1
NtNIP4;1t NtNIP4;1t BK011382 NtNIP4;4 XP_016453373.1
NtNIP4;2s NtNIP4;2s BK011383 NtNIP4;6 XP_016500017.1
NtNIP4;2t NtNIP4;2t BK011384 NtNIP4;5 XP_016456203.1
NtNIP4;3s NtNIP4;3s BK011385 NtNIP4;1 XP_016486634.1
NtNIP4;3t NtNIP4;2 XP_016455585.1 Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtNIP5;1s NtNIP5;1s BK011386 NtNIP5;2 NP_001312819.1
NtNIP5;1t NtNIP5;1t BK011387 NtNIP5;1 XP_016470302.1
NtNIP5;3 XP_016493176.1Duplicated annotation. Misidentified gene copy of an unlikely splice variant of NtNIP5;1s not supported by RNA‐seq distribution patterns. Browser link: https://go.usa.gov/x7cnz
NtNIP6;1s NtNIP6;1s BK011388 XP_016435921.1 NtNIP6;1 XP_016435920.1Extended N‐terminal splice variant of NtNIP6;1s; not supported by RNA‐seq and models of parental and Solanaceae orthologs. Browser link: https://go.usa.gov/x7cn5
NtNIP6;1t NtNIP6;1t BK011389 XP_016438238.1 NtNIP6;2 XP_016438237.1Extended N‐terminal splice variant of NtNIP6;1t; not supported by RNA‐seq and models of parental and Solanaceae orthologs. Browser link: https://go.usa.gov/x7cnR
NtNIP7;1s NtNIP7;1s BK011390 NtNIP7;2 XP_016496646.1
NtNIP7;1t NtNIP7;1t BK011391 NtNIP7;1 XP_016509644.1
NtNIP8;1s NtNIP8;1 XP_016468207.1 Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset.
NtNIP8;1t see Additional File 1 NtNIP8;2 XP_016451938.1Not identified in De Rosa et al. (2020)—no predicted gene model in Sol Genome dataset; see Additional File 1 for sequence of revised model.
NtSIP1;1t NtSIP1;1t BK011421 XP_016439605.1 NtSIP1;1 XP_016439604.1C‐terminal splice variant of NtSIP1;1t; not supported by RNA‐seq and models of parental and Solanaceae orthologs. Browser link: https://go.usa.gov/x7r4m
NtSIP1;2s NtSIP1;2s BK011422 XP_016461483.1 Not identified in Ahmed et al. (2020).
NtSIP1;2t NtSIP1;2t BK011423 NtSIP1;2 XP_016492107.1
NtSIP2;1s NtSIP2;1s BK011424 XP_016510678.1 Not identified in Ahmed et al. (2020).
NtSIP2;1t NtSIP2;1t BK011425 NtSIP2;1 XP_016496337.1
NtXIP1;6s NtXIP1;6s BK011448 NtXIP2;1 XP_016489264.1
NtXIP1;6t NtXIP1;6t BK011449 NtXIP2;2 XP_016488683
NtXIP1;7s NtXIP1;7s BK011450 NtXIP1;2 XP_016446694 β splice variant version—NCBI accession XP_016446695.1.
NtXIP1;7t NtXIP1;7t BK011451 NtXIP1;1 NP_001312796 β splice variant version—NCBI accession HM475295.1.

, Accurate annotation; , Minor inconsistency with annotation; , Significant inconsistency with annotation.

The consensus gene names and their correct corresponding NCBI accession identification codes are in bold.

NCBI Accession (1): Third‐party annotation submissions to NCBI representing curated gene/protein models in De Rosa et al.; (2): NCBI RefSeq records supporting De Rosa et al., TPA submissions in instances of incorrect/unlikely models proposed in Ahmed et al. (2020); (3): NCBI RefSeq records reported in Ahmed et al. (2020). Protein and coding sequences for all 84 AQP members are in Data File S1.

Abbreviation: cv, cultivar.

NtAQP gene identifiers as reported in De Rosa et al. (2020).

NtAQP gene identifiers as reported in Ahmed et al. (2020).

FIGURE 1

Phylogeny of the 86 protein products produced by the consensus 84 genes of the tobacco aquaporin family. Branches are color coded in reference to the five sub‐families of PIPs (blue), XIPs (purple), TIPs (green), NIPs (red), and SIPs (Orange). Phylogenetic tree was generated using the neighbor‐joining method with pair‐wise deletions (via MEGA10) from MUSCLE aligned protein sequences. Confidence levels (%) of branch points generated through bootstrapping analysis (n = 1,000). Suffix identifiers “s” or “t” to denote the sub‐genome origins of the sister genes (i.e., “s” = N. sylvestris and “t” = N. tomentosiformis) that reside as discrete pairings across the phylogeny. The six pseudogenes identified in De Rosa et al. (2020) have not been included in this phylogeny. Protein and coding sequences for all NtAQP members are in Data File S1

Comparing NtAQP gene annotations from De Rosa et al. (2020) and Ahmed et al. (2020) and deriving a consensus list of 84 members that more accurately represents the complete NtAQP family , Accurate annotation; , Minor inconsistency with annotation; , Significant inconsistency with annotation. The consensus gene names and their correct corresponding NCBI accession identification codes are in bold. NCBI Accession (1): Third‐party annotation submissions to NCBI representing curated gene/protein models in De Rosa et al.; (2): NCBI RefSeq records supporting De Rosa et al., TPA submissions in instances of incorrect/unlikely models proposed in Ahmed et al. (2020); (3): NCBI RefSeq records reported in Ahmed et al. (2020). Protein and coding sequences for all 84 AQP members are in Data File S1. Abbreviation: cv, cultivar. NtAQP gene identifiers as reported in De Rosa et al. (2020). NtAQP gene identifiers as reported in Ahmed et al. (2020). Phylogeny of the 86 protein products produced by the consensus 84 genes of the tobacco aquaporin family. Branches are color coded in reference to the five sub‐families of PIPs (blue), XIPs (purple), TIPs (green), NIPs (red), and SIPs (Orange). Phylogenetic tree was generated using the neighbor‐joining method with pair‐wise deletions (via MEGA10) from MUSCLE aligned protein sequences. Confidence levels (%) of branch points generated through bootstrapping analysis (n = 1,000). Suffix identifiers “s” or “t” to denote the sub‐genome origins of the sister genes (i.e., “s” = N. sylvestris and “t” = N. tomentosiformis) that reside as discrete pairings across the phylogeny. The six pseudogenes identified in De Rosa et al. (2020) have not been included in this phylogeny. Protein and coding sequences for all NtAQP members are in Data File S1 In De Rosa et al., we identified eight significant inconsistencies, all of which were related to unidentified genes (Table 1). Five of these were accounted for due to misses by the genome‐wide computational predictions present in the Sol Genome Network (SGN) database, but could be subsequently identified in this study by searching the genomic scaffold sequences directly. Such misses are not inconsistent with the generally lower accuracy of predicted NtAQP gene models found with the SGN database (De Rosa et al., 2020). A second copy of NtPIP2;11 inherited from the N. sylvestris parent (NtPIP2;11bs), originally thought to be a truncated pseudogene, was found to contain a full‐length sequence after further scrutiny of the TN90 genome assembly. NtTIP2;4t was not identified as its sequence is not present within the TN90 sequenced genome, and only identified in Ahmed et al. due to their non‐selective use of cultivars (see below). NtTIP1;5t was not identified as it was missed by the SGN computational predictions and also absent in both parental genomes. Consistent with its reporting in Ahmed et al. we managed to identify NtTIP1;5t by BLASTn query of the TN90 scaffold sequences directly. However, further surveys of the parental genomes did not uncover a copy of NtTIP1;5t. We did identify a highly homologous copy of NtTIP1;5t in the sequenced genome of Nicotiana otophora (SGN: scaffold Noto_AWOL01S0468679.1), which belongs to the Tomentosae section of the Nicotiana genus and thus a close relative to N. tomentosiformis (Leitch et al., 2008). Interestingly, early studies into the tobacco lineage implicated N. otophora as a part paternal donor to the tobacco genome through an introgressed hybrid with N. tomentosiformis (Kenton et al., 1993; Riechers & Timko, 1999). Whole‐genome sequencing has since revealed N. tomentosiformis as the paternal donor, which largely but not completely dismisses these early studies (Edwards et al., 2017; Sierro et al., 2014). A small fraction of the sequenced tobacco genome is still more similar to N. otophora than N. tomentosiformis, meaning that the presence of NtTIP1;5t in tobacco could be from a genomic introgression of N. otophora at some point in the evolution of tobacco. This would be consistent with the phylogenetic isolation of NtTIP1;5t with it being the only NtAQP to have no parental copy and an absence of a sister gene (Figure 1) (De Rosa et al., 2020). Although its origins are ambiguous, the NtTIP1;5t isoform was assigned the suffix “t” to denote that at the very least its lineage is from the N. tomentosiformis containing Tomentosae section of the Nicotiana genus as opposed to coming from the N. sylvestris parent. The NtAQP gene list in Ahmed et al. contained 12 significant inconsistencies (Table 1). Four of these were missed genes not present in the NCBI RefSeq database. One (Ahmed ID: NtPIP1;11) is a pseudogene identified in De Rosa et al. The remaining seven are duplicated annotations inflating gene numbers, that are either gene copies from another cultivar, an unsupported splice variant of an already identified gene (Ahmed ID: NtNIP5;3) or a duplicated gene containing a likely sequencing mutation (Ahmed ID: NtPIP1;5). Duplicate gene copies can be identified through NCBI BLASTp searches, even if using the RefSeq non‐redundant database, because multiple copies of the same gene will be found if polymorphisms are present; the most frequent instance being inter‐cultivar polymorphisms (e.g., Ahmed ID: NtPIP1;8 TN90 cultivar = NtPIP1;3 Wisconsin 38 cultivar). We identified these cultivar discrepancies by scrutinizing cultivar information embedded within the RefSeq sequence records or searching the linked publication. We also identified 10 minor inconsistencies in the Ahmed et al. NtAQP gene list (Table 1). Most of these were variant protein products encoded by unlikely splice variants that were inconsistent with RNA‐seq mapping distributions in tobacco, and gene models from the N. sylvestris and N. tomentosiformis parental genomes and tomato and potato orthologs; web links to browser views graphically representing this information are embedded in the description column of Table 1. From this comparative analysis, we derived a consensus set of 84 full‐length AQP genes (plus the six pseudogenes reported in De Rosa et al.), encoding for 86 protein products (two confirmed splice variants each for NtXIP1;7s and 1;7t), that represent a more accurate and complete NtAQP gene family (Table 1, Figure 1, Data File S1). The NtAQPs proteins segregate into the five distinct subfamilies common to higher plants, namely the NIPs (20 isoforms), SIPs (5), PIPs (30), TIPs (25), and XIPs (6). The majority of the NtAQPs are present as phylogenetic pairs, which are the sister genes inherited from the parental N. sylvestris and N. tomentosiformis parental genomes (Figure 1). These are still largely retained due to the limited genome downsizing that has occurred in tobacco given its short evolutionary time frame since inception (0.2 M years) (detailed in De Rosa et al., 2020). Nomenclature was allocated as set out in De Rosa et al. which adheres to the convention that gene names should reflect orthology across species (e.g., NtPIP1;1 is the ortholog of tomato SlPIP1;1). Unfortunately, the haphazard naming of genes across AQP gene family studies is prevalent and causing confusion through missed or incorrectly assigned orthology. Our nomenclature aligns the NtAQP naming convention with that of tomato and additionally incorporates suffix identifiers “t” or “s” to denote the parental sub‐genome origins of the sister genes in tobacco (Figure 1); a convention customary when naming genes in polyploid species, as for example, in canola and wheat (Adamski et al., 2018; Østergaard & King, 2008).

CONCLUSION

Here, we performed a comparative analysis of two independently collated NtAQP gene sets to produce an updated consensus tobacco AQP family. Each study used practises common to many gene family characterisations, and as would be expected, yielded a mostly similar collection of NtAQP genes. However, given the slightly different datasets and approaches used, incorrect identifications and missing family members needed to be resolved. The discrepancies reflect the difficulty of characterizing complex gene families, such as the AQPs, especially in polyploid species. Interrogation and consolidation of the two gene sets yielded a more accurate and complete consensus NtAQP family. The comparison also serves to highlight variables to be considered and the refinements required to ensure comprehensive gene family characterizations. These include the use of multiple databases of predicted annotations (if available) in initial BLAST searches and an understanding of the possible outputs, screening of genomic sequences for unannotated genes, careful manual curation of gene and protein models using RNAseq data and established structures of closely related orthologs for validation, and finally naming genes to reflect their orthology. Importantly, having this consensus NtAQP collection will reduce confusion and ambiguity that would inevitably arise from having two different descriptive studies and sets of NtAQP gene names. Our refined update to the NtAQP gene family can be confidently used for further assessing the evolution of AQPs, and together with the original studies (Ahmed et al., 2020; De Rosa et al., 2020), represent an excellent resource to guide further functional analysis and help transfer basic research to applied outcomes.

METHODS

Sequences of all gene/protein models reported in the two studies were retrieved from NCBI using the NCBI Batch Entrez portal. Sequences were then imported into Geneious (ver. 11.1.5) and compared using MUSCLE sequence alignments and Neighbor‐Joining phylogenetic trees to identify outliers and discrepancies between the two studies. Flagged annotations were then manually investigated to determine points of difference. This included scrutinizing; (a) sequence alignments; (b) origins/sources of the genetic data (e.g., cultivar, predicted gene model, secondary confirmation data); (c) inspecting splice variants and intron/exon gene models using RNAseq distributions in the RefSeq browser (e.g. links to browser shots can be found in Table 1); and (d) comparison of NtAQP models against models from parental, tomato, and potato genomes previously identified/collated in De Rosa et al. (2020). Additional required blast searches were against N. tabacum (TN90—TC586PI 543792; K326TC 319—PI 552505), N. tomentosiformis (Goodsp—TW142—PI 555572), and N. sylvestris (Speg—TW136—PI 555569); https://www.ars‐grin.gov/npgs/. Final phylogeny for Figure 1 was produced in MEGA 10 (Kumar et al., 2018).

CONFLICTS OF INTEREST

These authors declare no conflicts of interest.

AUTHOR CONTRIBUTIONS

M.G and A.D performed the analysis and prepared published data. M.G wrote the manuscript. J.R.E and F.C critically reviewed the manuscript. J.R.E, A.D, F.C, and J.A contributed to discussions on the analysis. Data S1 Click here for additional data file.
  21 in total

1.  Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions.

Authors:  James J Clarkson; Sandra Knapp; Vicente F Garcia; Richard G Olmstead; Andrew R Leitch; Mark W Chase
Journal:  Mol Phylogenet Evol       Date:  2004-10       Impact factor: 4.286

2.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

3.  A reference genome for Nicotiana tabacum enables map-based cloning of homeologous loci implicated in nitrogen utilization efficiency.

Authors:  K D Edwards; N Fernandez-Pozo; K Drake-Stowe; M Humphry; A D Evans; A Bombarely; F Allen; R Hurst; B White; S P Kernodle; J R Bromley; J P Sanchez-Tamburrino; R S Lewis; L A Mueller
Journal:  BMC Genomics       Date:  2017-06-19       Impact factor: 3.969

4.  Evolutionary and Predictive Functional Insights into the Aquaporin Gene Family in the Allotetraploid Plant Nicotiana tabacum.

Authors:  Jahed Ahmed; Sébastien Mercx; Marc Boutry; François Chaumont
Journal:  Int J Mol Sci       Date:  2020-07-03       Impact factor: 5.923

5.  Standardized gene nomenclature for the Brassica genus.

Authors:  Lars Ostergaard; Graham J King
Journal:  Plant Methods       Date:  2008-05-20       Impact factor: 4.993

6.  The tobacco genome sequence and its comparison with those of tomato and potato.

Authors:  Nicolas Sierro; James N D Battey; Sonia Ouadi; Nicolas Bakaher; Lucien Bovet; Adrian Willig; Simon Goepfert; Manuel C Peitsch; Nikolai V Ivanov
Journal:  Nat Commun       Date:  2014-05-08       Impact factor: 14.919

Review 7.  Plant and Mammal Aquaporins: Same but Different.

Authors:  Timothée Laloux; Bruna Junqueira; Laurie C Maistriaux; Jahed Ahmed; Agnieszka Jurkiewicz; François Chaumont
Journal:  Int J Mol Sci       Date:  2018-02-08       Impact factor: 5.923

8.  Parental origin of the allotetraploid tobacco Nicotiana benthamiana.

Authors:  Matteo Schiavinato; Marina Marcet-Houben; Juliane C Dohm; Toni Gabaldón; Heinz Himmelbauer
Journal:  Plant J       Date:  2020-01-13       Impact factor: 7.091

9.  Genome-wide identification and characterisation of Aquaporins in Nicotiana tabacum and their relationships with other Solanaceae species.

Authors:  Annamaria De Rosa; Alexander Watson-Lazowski; John R Evans; Michael Groszmann
Journal:  BMC Plant Biol       Date:  2020-06-09       Impact factor: 4.215

10.  A roadmap for gene functional characterisation in crops with large genomes: Lessons from polyploid wheat.

Authors:  Nikolai M Adamski; Philippa Borrill; Jemima Brinton; Sophie A Harrington; Clémence Marchal; Alison R Bentley; William D Bovill; Luigi Cattivelli; James Cockram; Bruno Contreras-Moreira; Brett Ford; Sreya Ghosh; Wendy Harwood; Keywan Hassani-Pak; Sadiye Hayta; Lee T Hickey; Kostya Kanyuka; Julie King; Marco Maccaferrri; Guy Naamati; Curtis J Pozniak; Ricardo H Ramirez-Gonzalez; Carolina Sansaloni; Ben Trevaskis; Luzie U Wingen; Brande Bh Wulff; Cristobal Uauy
Journal:  Elife       Date:  2020-03-24       Impact factor: 8.140

View more
  1 in total

1.  Mesophyll conductance is unaffected by expression of Arabidopsis PIP1 aquaporins in the plasmalemma of Nicotiana.

Authors:  Victoria C Clarke; Annamaria De Rosa; Baxter Massey; Aleu Mani George; John R Evans; Susanne von Caemmerer; Michael Groszmann
Journal:  J Exp Bot       Date:  2022-06-02       Impact factor: 7.298

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.