Literature DB >> 24083106

Discovery of novel plastid phenylalanine (trnF) pseudogenes defines a distinctive clade in Solanaceae.

Péter Poczai1, Jaakko Hyvönen.   

Abstract

BACKGROUND: The plastome of embryophytes is known for its high degree of conservation in size, structure, gene content and linear order of genes. The duplication of entire tRNA genes or their arrangement in a tandem array composed by multiple pseudogene copies is extremely rare in the plastome. Pseudogene repeats of the trnF gene have rarely been described from the chloroplast genome of angiosperms.
FINDINGS: We report the discovery of duplicated copies of the original phenylalanine (trnFGAA) gene in Solanaceae that are specific to a larger clade within the Solanoideae subfamily. The pseudogene copies are composed of several highly structured motifs that are partial residues or entire parts of the anticodon, T- and D-domains of the original trnF gene.
CONCLUSIONS: The Pseudosolanoid clade consists of 29 genera and includes many economically important plants such as potato, tomato, eggplant and pepper.

Entities:  

Keywords:  Chloroplast DNA (cpDNA); Gene duplications; Phylogeny; Plastome evolution; Solanaceae; Tandem repeats; trnL-trnF

Year:  2013        PMID: 24083106      PMCID: PMC3786074          DOI: 10.1186/2193-1801-2-459

Source DB:  PubMed          Journal:  Springerplus        ISSN: 2193-1801


Findings

The plastid trnT-trnF region has been widely applied to resolve phylogeny of embryophytes (Quandt and Stech 2004; Zhao et al. 2011) and to address various questions of population genetics since the development of universal primers by Taberlet et al. (1991). This marker is located in the large single copy region of the chloroplast genome and contains a co-transcribed region consisting of three highly conserved exons that code the transfer RNA (tRNA) genes for threonine (UGU), leucine (UAA) and phenylalanine (GAA). The region is interspersed by two intergenic spacers and by a group I intron intercalated within the first and second exon of the trnL(UAA) gene. Phylogenetic results obtained with the trnT-trnF region (or part of it) should be treated with caution. This is due to the fact that some recent studies (e.g. Koch et al. 2005; Pirie et al. 2007; Schmikl et al. 2009; Vivjerberg and Bachmann 1999) have shown that there are clearly several copies of certain parts of this region. If this is ignored, it will easily lead to situations where basic requirement of homology of the characters used for phylogenetic analyses is compromised. This might lead to false hypotheses of phylogeny, especially when they are based on the analyses of only this region. Larger structural changes (>50 bp) rarely occur in the plastome. However, duplications of the rpl2 or rpl23 genes (Bowman et al. 1988) or even the duplication of tRNAs (pseudogenes) are occasionally reported. The later are extremely rare in angiosperms and so far they have only been described from Asteraceae (Vijverberg and Bachmann 1999; Witzell 1999), Annonaceae (Pirie et al. 2007), Brassicaceae (Ansell et al. 2007; Koch et al. 2007; Tedder et al. 2010) and Juncaceae (Drábkova et al. 2004). In our recent study we reported a tandem repeat comprising of two to four pseudogene copies upstream of the original trnF gene in four Solanum (Solanaceae) species (Poczai and Hyvönen 2011a). We have characterized these structural duplications and shown that they consist of several highly structured motifs, which are partial residues, or entire parts of the anticodon, T- and D-domains of the original gene, but all lack the acceptor stems at the 5′ or 3′. We were further interested to evaluate the possible occurrence of complete or partial trnF pseudogenes in Solanaceae. This family contains many economically important plant species, e.g., potato (Solanum tuberosum L.), tomato (Solanum lycopersicum L.) and paprika (Capsicum annuum L.) and is under intensive phylogenetic investigation and the trnT-F plastid marker is commonly used in these studies. These sequences together with the results of molecular breeding programs provide large amount of data that is available in GenBank. During data mining we concentrated on a structured dataset generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohns 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohns 2007; Olmstead et al. 2008) that contained 195 taxa and 390 sequences. This dataset provided the basis for the latest robust phylogenetic hypothesis of the Solanaceae including 89 from the 98 (Olmstead and Bohns 2007) recognized genera. Manual search using the anticodon domain of the original trnF gene and automated tRNA recognition by CENSOR (Kohany et al. 2006) indicated the presence of pseudogene repeats in numerous genera of Solanaceae. We used the core trnL-F dataset to map the occurrence of pseudogenic repeats on the phylogenetic tree of Solanaceae. As presented in Figure 1 the distribution of pseudogenic duplications is in congruence with the previously published phylogeny of the Solanaceae (Olmstead et al. 2008), and it is obvious that the first pseudogenic copy evolved only once at the base of a highly supported clade within the subfamily Solanoideae. Among the members of this lineage, referred here as the Pseudosolanoid clade, the anticodon domain of the trnF gene exhibits extensive gene duplications with one to seven tandemly repeated copies in close 5′-proximity of the original functional gene (Table 1). The size of each pseudogenic copy ranged between 32 and 73 bp and the anticodon domain was identified as the most conserved element. A common ATT(G)n motif is of particular interest and its modifications were found to border the 5′ of the duplicated regions in the same way as found in Brassicaceae (Ansell et al. 2007; Koch et al. 2005 and 2007; Schmikl et al. 2009; Tedder et al. 2010). Other motifs were partial residues or entire parts of the T- and D-domains. The residues of the 3′ and 5′ acceptor stems were rarely found among the copies (see Table 1). The D-domain was more conserved than the T-domain among the copies and other internal repeats (AT, AAT, ATT, AATCC) were intercalated within this region for example in genus Lycianthes (Dunal.) Hassl. In addition to these newly discovered pseudogenes we were also able to characterize putative promoter motifs showing high similarity to a sigma70-type bacterial promoter. These two elements (−35 TTGACA/-10 GAGGAT) are consistently found in the trnL-F spacer region of embryophytes, and they are believed to represent the ancient and original trnF gene promoter (Quandt et al. 2004). Interestingly, pseudogenic repeats were found to be exclusively inserted after such motifs in Solanaceae, contrary to Brassicaceae, where similar pseudogenic repeats were found only between promoter motifs in the trnL-F intergenic spacer region (Koch et al. 2005). The later finding lead Koch et al. (2005) to support the conclusion by Kanno and Hirtai (1993) that these elements should be non-functional due to the intercalated position of pseudogenes between promoters. However, this may be challenged by the position of Solanaceae pseudogenes following the −10 and −35 promoters, which are also variable in number and composition.
Figure 1

Phylogeny of Solanaceae and the distribution and schematic structure ofF pseudogene copies. a) Suprageneric groups recognized are indicted to the right on the tree, while major clades are collapsed at the base node and their names follow Olmstead et al. (2008). The new Pseudosolanoid clade united by the presence of pseudogenic trnF gene duplication is marked with ‘ψ’ in the Solanoideae subfamily. b) The schematic representation of the plastidic trnL-F spacer region in Solanaceae and the intercalated pseudogene copies (PSC) in the intergenic spacer region close to 5′ of the trnF gene. Pseudogene repeats are variable in number and structure and are found after the putative promoter motifs that are also variable among species. The spacer region between the first PSC and promoter motifs consists of intergenic repeats of variable length. Each PSC is separated by a common bordering motif (ATTG) at the 5′end.

Table 1

Distribution ofF pseudogenes among Solanaceae and number of multiplicatedF anticodon domains

TaxaGenBankTribeCopy number
Acnistus arborescensEU580954Physaleae2a,b
Aureliana fasciculataEU580961Physaleae2
Brachistus stramonifoliusEU580963Physaleae3
Brugmansia aureaEU580965Datureae1c
Brugmansia sanguineaEU580966Datureae1c
Capsicum baccatumEU580969Capsiceae4a,b
Capsicum chinenseEU603443Capsiceae4d
Capsicum minutiflorumEU580970Capsiceae6d
Capsicum pubescensAY348982Capsiceae6b,d
Capsicum rhomboideumEU580971Capsiceae1e
Chamaesaracha coronopusEU580978Physaleae4
Chamaesaracha sordidaEU580979Physaleae4
Cuatresia exiguifloraEU580981Physaleae2
Cuatresia ripariaEU580982Physaleae2
Datura leichhardtiiEU580983Datureae1f
Datura stramoniumEU580984Datureae1f
Deprea sylvarumEU580985Physaleae3
Discopodium penninervumEU580986Physaleae4
Dunalia solanaceaEU580988Physaleae4
Eriolarynx lorenziiEU580990Physaleae4
Iochroma australeEU580999Physaleae4
Iochroma cardenasianumEU581000Datureae1f
Iochroma fuchsioidesEU581001Physaleae2
Iochroma umbellatumEU581002Physaleae2
Jaltomata auriculataEU581006Solaneae2f
Jaltomata grandifloraEU581007Solaneae2f
Jalotmata procumbensAY098695Solaneae1a,b,f
Jaltomata sinuosaDQ180418Solaneae2f
Larnax subtrifloraEU581009Physaleae3
Leucophysalis grandifloraEU581013Physaleae2
Leucophysalis nanaEU581014Physaleae2
Lycianthes bifloraEU581015Capsiceae2g
Lycianthes ciliolataEU581016Capsiceae4h,i
Lycianthes glandulosaEU581017Capsiceae3g,i
Lycianthes heteroclitaDQ180414Capsiceae2g,i
Lycianthes inaequilateraEU581018Capsiceae6h,i
Lycianthes multifloraEU581019Capsiceae3g,i
Lycianthes peduncularisEU581020Capsiceae4h,i
Lycianthes shanesiiEU581021Capsiceae1g,h,i
Margaranthus solanaceusEU581025Physaleae5i
Nectouxia formosaEU581031Salpichroina*1a,b
Nothocestrum latifoliumEU581037Physaleae2
Nothocestrum longifoliumEU581038Physaleae3
Oryctes nevadensisEU581039Physaleae3
Physalis alkekengiDQ180420Physaleae2
Physalis carpenteriEU581042Physaleae2
Physalis heterophyllaEU581043Physaleae2
Physalis peruvianaEU581044Physaleae4a,b
Physalis philadelphicaEU581045Physaleae5a,b
Quincula lobataEU581051Physaleae1a,b
Salpichroa origanifoliaEU581052Salpichroina*2a,b
Saracha punctataEU581053Physaleae4
Solanum abutiloidesAY266236Solaneae1f
Solanum aviculareHM006836Solaneae2a,b,f
Solanum betaceumDQ180426Solaneae1f
Solanum dulcamaraHM006840Solaneae1f
Solanum herculeumDQ180466Solaneae2f
Solanum lycopersicumNC007898Solaneae1f
Solanum melongenaEU176149Solaneae2h,i
Solanum pseudocapsicumDQ180436Solaneae1
Solanum torvumAY266246Solaneae4i
Solanum trisectumJN130370Solaneae2
Solanum wendlandiiDQ180440Solaneae1
Tubocapsicum anomalumEU581066Physaleae7
Vassobia dichotomaEU581067Physaleae4
Witheringia cuneataEU581070Physaleae2
Witheringia macranthaEU581071Physaleae5
Witheringia meianthaEU581072Physaleae4
Witheringia mexicanaEU581073Physaleae5
Witheringia solanaceaEU581074Physaleae3

Taxonomic classification and a GenBank accession number is provided for each species. *Unranked informal clade name.

aPartial pseudogenic copy at the 3′ end. bMissing original trnF gene. cIntact 5′acceptor stem present. dThree copies of −35 promoter (TTGACA) motif. eFour copies of −35 promoter (TTGACA) motif. fOne copy of −10 promoter (GAGGAT) motif present. gPseudogene repeats are separated by long internal repeats after the promoter motifs. hOnly one copy of −35 promoter (TTGACA) motif. i3′ acceptor stem present.

Phylogeny of Solanaceae and the distribution and schematic structure ofF pseudogene copies. a) Suprageneric groups recognized are indicted to the right on the tree, while major clades are collapsed at the base node and their names follow Olmstead et al. (2008). The new Pseudosolanoid clade united by the presence of pseudogenic trnF gene duplication is marked with ‘ψ’ in the Solanoideae subfamily. b) The schematic representation of the plastidic trnL-F spacer region in Solanaceae and the intercalated pseudogene copies (PSC) in the intergenic spacer region close to 5′ of the trnF gene. Pseudogene repeats are variable in number and structure and are found after the putative promoter motifs that are also variable among species. The spacer region between the first PSC and promoter motifs consists of intergenic repeats of variable length. Each PSC is separated by a common bordering motif (ATTG) at the 5′end. Distribution ofF pseudogenes among Solanaceae and number of multiplicatedF anticodon domains Taxonomic classification and a GenBank accession number is provided for each species. *Unranked informal clade name. aPartial pseudogenic copy at the 3′ end. bMissing original trnF gene. cIntact 5′acceptor stem present. dThree copies of −35 promoter (TTGACA) motif. eFour copies of −35 promoter (TTGACA) motif. fOne copy of −10 promoter (GAGGAT) motif present. gPseudogene repeats are separated by long internal repeats after the promoter motifs. hOnly one copy of −35 promoter (TTGACA) motif. i3′ acceptor stem present. The occurrence of pseudogenes provides strong evidence of relationships among some groups that had low support values in the previous analyses (e.g. Olsmtead et al. 2008). This event robustly separates the (1) Atropina (Hyoscyameae, Lycieae, Jabrosa, Latua, Nolana and Scleraphylax) and (2) Juanulloeae clades from the Pseudosolanoid clade composed by (3) Solaneae, Capsiceae, Physaleae and Datureae and (4) Salpichroina (Salpichroa Miers and Nectouxia Kunth). In clades (1) and (2) pseudogenes are absent while they appear at the basal node of clade (3) and (4). This lineage where pseudogene copies have been found includes 29 genera; here belongs also the clade of Solanum L. and Capsicum L. with many economically important plant species. However, sequence information was lacking for the genera Mellissia Hook. f. and Athenaea Adans. to confirm the presence of trnF pseudogenes. This is not surprising as available plant material of these taxa is very restricted. For example Mellissia is a genus with a single species, Mellissia begoniifolia (Roxb.) Hook. f. which is critically endangered and endemic to the island of Saint Helena. The larger clade of Solanoideae also includes several branches with low support values composed of small genera (Exodeconus Raf., Mandragora L., Nicandra (L.) Gaerten., Schultesianthus Hunz., Solandra Sw.) in the phylogeny proposed by Olmstead et al. (2008). These lineages are from the early diversification of the Solanoideae with no close relatives and all lack pseudogene repeats that could be informative to trace their ancestry. The latest large scale phylogenetic analysis of the Solanaceae (Olmstead et al. 2008) established major clades of the family but sampling in some of the lineages can still be improved. Goldberg et al. (2010) analyzed a larger data set but they did not focus on taxonomic relationships but rather on the evolution of self-compatibility. Some studies have attempted to calibrate a molecular clock for various groups within Solanaceae, but all of these used the same (Paape et al. 2008; Poczai and Hyvönen 2011b), or only few fossil records (Dillon et al. 2009; Tu et al. 2010). Fossil record of the Solanaceae has not been reviewed recently. This urges for the re-assessment of the specimens and could potentially provide more robust calibration points for the family (Särkinen, personal communication). Latest current estimates show the age of the Pseudosolanoids to be approximately 20 My (Särkinen, personal communication), and thus the origin of the pseudogene duplications of Solanaceae to be approximately of the same Miocene age as in Brassicaceae (16–21 My; Koch et al. 2005).

Conclusions

Despite of the extensive studies based on sequence level characters the taxonomy of the Solanaceae is not yet completely understood. However, there is ongoing work on different levels by multiple groups to resolve phylogenetic relationships (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008). There are a number of questions that should be answered regarding the discovery of trnF pseudogenes, for example: How did the duplications originate? Are the pseudogene copy numbers a useful character for phylogenetic inference? To what extent does the number of pseudogene copies vary within a single species? The evolution and structure of pseudogenic copies should be compared with others reported from different plant families especially from Brassicaceae. The potential of trnF pseudogenes as phylogenetic markers need to be investigated further in the future for better understanding of the evolution of Solanaceae. These investigations could answer what are the wider implications of the pseudogene repeats for Solanaceae studies that utilize the trnL-F spacer region.

Methods

Solanaceae sequence dataset

For the Solanaceae and several outgroups we used the trnL-F spacer data assembled by Olmstead et al. (2008). This dataset contained 195 taxa and 390 sequences generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008) and this was used to align and mask pseudogenic copies. The goal was to map the taxonomic distribution of pseudogenes at family level sampling as many genera as possible. This dataset and representative trees used in our study were previously deposited in TreeBASE (ID S2191). This alignment was also used to demonstrate copy number distribution corresponding to the published phylogenetic hypothesis that was not only based on the trnL-F spacer information but relied on sequence data from the ndhF region.

Recognition and copy number assessment of the trnF(GAA) pseudogenes

The complete chloroplast genome of Solanum bulbocastanum Dunal (DQ347958) was used to select the corresponding loci of the trnL-trnF spacer region (bp positions 48,854 to 49,382), to annotate ambiguous sequences regions, and to ensure that our interpretations are based on homologous positions. Putative pseudogene repeats were identified with screening using Repbase (Jurka 2000) with the “mask pseudogenes” and “report simple repeats” options of the online tool CENSOR (Kobany et al. 2006). This was done to identify repetitive elements by comparing our sequences to known eukaryotic repeats and prototypic sequences stored in Repbase utilizing WU-BLAST. A second search was conducted with FastPCR (Kalendar et al. 2009) using the repeat search option of the program. Under “type of repeats” we checked for simple, direct, inverted, direct antisense, and direct reverse repeats, respectively. Default values were used under a kMers repeat screening. After each search, repetitive motifs and sequences were recorded and compared with the results obtained from the Repbase search. After repeats were identified in the trnL-F IGS sequences, further structural trnF(GAA) gene elements or residues were annotated manually using the anticodon domain as reference. The annotated sequence alignment is shown in Additional file 1.

Sequence annotation and alignment

Masked pseudogenic copies were further edited using Geneious v.4.8.5 (Biomatters Ltd.). We used the Nicotiana tabacum L. complete chloroplast genome (NC001879; bp positions 49,840 to 50,318) for comparisons and to determine the subunits of pseudogenic repeats as this species lacks these gene duplications. Sequence break points were examined manually to determine the cut off points of pseudogenic copies and to identify bordering motifs. Identified copies were aligned with MUSCLE (Edgar 2004) as implemented in Geneious v.4.8.5 using default settings. The sequence alignment in FASTA format is available as Additional file 2. Additional file 1: Annotated sequence alignment of pseudogene repeats found in Solanaceae. Major parts of the trnF gene are marked as D- and T-domains and anticodon in the middle together with bordering 5′ and 3′ acceptor stems. The trnF gene of Nicotiana tabacum is used as a reference sequence to align different pseudogenes. (PDF 4 MB) Additional file 2: Sequence alignment of pseudogene copies. (FASTA 41 KB)
  23 in total

1.  Chloroplast DNA variation and reticulate evolution in sexual and apomictic sections of dandelions.

Authors:  H Wittzell
Journal:  Mol Ecol       Date:  1999-12       Impact factor: 6.185

2.  Repbase update: a database and an electronic journal of repetitive elements.

Authors:  J Jurka
Journal:  Trends Genet       Date:  2000-09       Impact factor: 11.639

3.  Species selection maintains self-incompatibility.

Authors:  Emma E Goldberg; Joshua R Kohn; Russell Lande; Kelly A Robertson; Stephen A Smith; Boris Igić
Journal:  Science       Date:  2010-10-22       Impact factor: 47.728

4.  Identification and characterization of plastid trnF(GAA) pseudogenes in four species of Solanum (Solanaceae).

Authors:  Péter Poczai; Jaakko Hyvönen
Journal:  Biotechnol Lett       Date:  2011-07-16       Impact factor: 2.461

5.  Supernetwork identifies multiple events of plastid trnF(GAA) pseudogene evolution in the Brassicaceae.

Authors:  Marcus A Koch; Christoph Dobes; Christiane Kiefer; Roswitha Schmickl; Leos Klimes; Martin A Lysak
Journal:  Mol Biol Evol       Date:  2006-09-20       Impact factor: 16.240

6.  Evolution of the trnF(GAA) gene in Arabidopsis relatives and the brassicaceae family: monophyletic origin and subsequent diversification of a plastidic pseudogene.

Authors:  Marcus A Koch; Christoph Dobes; Michaela Matschinger; Walter Bleeker; Johannes Vogel; Markus Kiefer; Thomas Mitchell-Olds
Journal:  Mol Biol Evol       Date:  2005-02-02       Impact factor: 16.240

7.  Molecular evolution of a tandemly repeated trnF(GAA) gene in the chloroplast genomes of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analyses.

Authors:  K Vijverberg; K Bachmann
Journal:  Mol Biol Evol       Date:  1999-10       Impact factor: 16.240

8.  Phylogeny of kangaroo apples (Solanum subg. Archaesolanum, Solanaceae).

Authors:  Péter Poczai; Jaakko Hyvönen; David E Symon
Journal:  Mol Biol Rep       Date:  2011-01-22       Impact factor: 2.316

9.  Molecular evolution of the trnTUGU-trnFGAA region in Bryophytes.

Authors:  D Quandt; M Stech
Journal:  Plant Biol (Stuttg)       Date:  2004-09       Impact factor: 3.081

10.  A transcription map of the chloroplast genome from rice (Oryza sativa).

Authors:  A Kanno; A Hirai
Journal:  Curr Genet       Date:  1993-02       Impact factor: 3.886

View more
  2 in total

1.  Pan-plastome approach empowers the assessment of genetic variation in cultivated Capsicum species.

Authors:  Mahmoud Magdy; Lijun Ou; Huiyang Yu; Rong Chen; Yuhong Zhou; Heba Hassan; Bihong Feng; Nathan Taitano; Esther van der Knaap; Xuexiao Zou; Feng Li; Bo Ouyang
Journal:  Hortic Res       Date:  2019-09-07       Impact factor: 6.793

2.  The chloroplast genome sequence of bittersweet (Solanum dulcamara): Plastid genome structure evolution in Solanaceae.

Authors:  Ali Amiryousefi; Jaakko Hyvönen; Péter Poczai
Journal:  PLoS One       Date:  2018-04-25       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.