ATP-binding cassette (ABC) proteins are the largest membrane transporter family in plants. In addition to transporting organic substances, these proteins function as ion channels and molecular switches. The development of multiple genes encoding ABC proteins has been associated with their various biological roles. Plants utilize many secondary metabolites to adapt to environmental stresses and to communicate with other organisms, with many ABC proteins thought to be involved in metabolite transport. Lithospermum erythrorhizon is regarded as a model plant for studying secondary metabolism, as cells in culture yielded high concentrations of meroterpenes and phenylpropanoids. Analysis of the genome and transcriptomes of L. erythrorhizon showed expression of genes encoding 118 ABC proteins, similar to other plant species. The number of expressed proteins in the half-size ABCA and full-size ABCB subfamilies was ca. 50% lower in L. erythrorhizon than in Arabidopsis, whereas there was no significant difference in the numbers of other expressed ABC proteins. Because many ABCG proteins are involved in the export of organic substances, members of this subfamily may play important roles in the transport of secondary metabolites that are secreted into apoplasts.
ATP-binding cassette (ABC) proteins are the largest membrane transporter family in plants. In addition to transporting organic substances, these proteins function as ion channels and molecular switches. The development of multiple genes encoding ABC proteins has been associated with their various biological roles. Plants utilize many secondary metabolites to adapt to environmental stresses and to communicate with other organisms, with many ABC proteins thought to be involved in metabolite transport. Lithospermum erythrorhizon is regarded as a model plant for studying secondary metabolism, as cells in culture yielded high concentrations of meroterpenes and phenylpropanoids. Analysis of the genome and transcriptomes of L. erythrorhizon showed expression of genes encoding 118 ABC proteins, similar to other plant species. The number of expressed proteins in the half-size ABCA and full-size ABCB subfamilies was ca. 50% lower in L. erythrorhizon than in Arabidopsis, whereas there was no significant difference in the numbers of other expressed ABC proteins. Because many ABCG proteins are involved in the export of organic substances, members of this subfamily may play important roles in the transport of secondary metabolites that are secreted into apoplasts.
ATP-binding cassette (ABC) proteins are the largest membrane transporter family in plants. ABC proteins possess a highly conserved nucleotide binding domain (NBD) containing an ABC signature sequence. Most ABC proteins have a transmembrane domain (TMD), consisting generally of six alpha-helices adjacent to an NBD, a structural unit called a half-size ABC protein., Many ABC proteins possess a tandem repeat structure of this unit, with these proteins called full-size ABC proteins. In addition, some ABC subfamilies consist of soluble proteins containing only NBDs. Following the discovery of the first plant ABC protein in 1992, several genome projects were performed using model plants, such as Arabidopsis, rice, and legumes, with the genome of each plant generally containing more than 120 ABC protein-encoding genes., Functional analyses have shown that many plant ABC proteins are involved in the transport of plant hormones, such as auxins and abscisic acid; wax components; sugar phosphates; and secondary metabolites, including alkaloids, phenolic glucosides, and carotenoid compounds.,, Other plant ABC proteins are involved in heavy metal transport., To date, however, even in Arabidopsis only about 20% of ABC proteins have been functionally characterized.Despite indications that ABC proteins are involved in the transport of secondary metabolites,, few studies to date have identified specific ABC proteins responsible for the transport of secondary metabolites. This may be due to limitations in the identification of accumulation sites of species-specific secondary metabolites in tissues, cell layers, and organelles, and/or the low production of such endogenous metabolites in model plants like Arabidopsis.Lithospermum erythrorhizon (Boraginaceae) is a perennial herbal plant that has been used as a crude drug and natural dye in Asian countries. This plant is widely known in plant biotechnology, because the cell suspension cultures of this plant were first utilized in the 1980s for the industrial production of the secondary metabolite, shikonin., This success led to the use of plant cell and tissue cultures to produce secondary metabolites. The quantities of shikonin produced by these cells are as high as 10% of cell dry weight, or 10 times higher than the amount that accumulates in intact roots (Fig. 1). Shikonin is a meroterpene biosynthesized via both the shikimate and mevalonate pathways. In addition to shikonin, L. erythrorhizon cell cultures can produce other secondary metabolites. Similar to other meroterpenes, high levels of benzoquinone derivatives, such as dihydroechinofuran and shikonofuran, are also produced, with some of these compounds secreted into the culture medium. L. erythrorhizon cells can also produce levels of caffeic acid oligomers, such as rosmarinic acid and lithospermic acid B, similar to those of shikonin., The production of these metabolites can be up- or down-regulated by altering the culture medium and by light irradiation, which induces the accumulation of an important intermediate in vacuoles., The high production of these secondary metabolites and knowledge of their subcellular localization can enable studies of their transport, with L. erythrorhizon being an appropriate plant to study proteins involved in the transport of secondary metabolites. The biochemical and molecular biological characteristics of this plant have been studied in depth in recent years. The present study was performed to identify all ABC proteins expressed in L. erythrorhizon, thereby enabling further research on the involvement of these proteins in the transport of secondary metabolites including distribution of biosynthetic intermediates.
Figure 1
Biosynthesis and transport of secondary metabolites in L. erythrorhizon. Major biosynthetic pathways of shikonin and lithospermic acid B are illustrated. Subcellular localization of metabolites, both end products and intermediates, are also indicated. Among end products of shikonin derivatives, such as dihydroechinofuran, lithospermic acid B, and p-O-glucosylbenzoic acid, showed considerable accumulation. Transport events are shown in single-headed block arrows, subfamilies of ABC proteins are shown as letters A–D and G, and ATP hydrolysis is also shown.
Biosynthesis and transport of secondary metabolites in L. erythrorhizon. Major biosynthetic pathways of shikonin and lithospermic acid B are illustrated. Subcellular localization of metabolites, both end products and intermediates, are also indicated. Among end products of shikonin derivatives, such as dihydroechinofuran, lithospermic acid B, and p-O-glucosylbenzoic acid, showed considerable accumulation. Transport events are shown in single-headed block arrows, subfamilies of ABC proteins are shown as letters A–D and G, and ATP hydrolysis is also shown.
2. Materials and methods
2.1. Genome and transcriptome assembly
The L. erythrorhizon draft genome was assembled using published Illumina paired-end reads sequenced at Nanjing University (PRJNA386534) and using Oxford Nanopore Technologies long reads sequenced at Purdue University (PRJNA596998). Assembled genomes were evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO), which estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Five assemblers were tested: ABySS, DBG2OLC, miniasm, Wengan, and HASLR, with Wengan showing the highest BUSCO completeness (90.9%; Supplementary Table S1). RNA-seq reads from different Boraginaceae species were assembled using Trinity (Supplementary Table S2). Transcripts were quantified used Salmon and scripts provided by Trinity. The detailed datasets and commands used for genome assembly are described in Supplementary File S1. Parameters used in trimmomatic were as follows; both LEADING and TRAILING were set at 30, SLIDINGWINDOW was set at 4:15, and MINLEN was set as 60. In Wengan, predicted genome size was set as 400 MB, and five round polish was done to reach plateau of improvement. For others, default parameters were used as shown in Supplementary File S1. Gene and isoform expression was calculated using Salmon.
2.2. Identification of ABC proteins in L. erythrorhizon
Genes encoding ABC proteins were identified in three L. erythrorhizon transcriptomes (LE_AM, LE_NJ_1, LE_PU_h). Open reading frames for polypeptides longer than 100 amino acids were extracted from the transcriptomes as database for BLASTp using Transdecoder. Proteins of the ABC subfamilies in Arabidopsis and functionally analysed ABC proteins in angiosperms were used as queries for BLAST searches. The queries lacking TMD and NBD domains and those <200 amino acids in length were manually excluded. BLASTp hits with query coverage <80% were also excluded.LeABC protein genes were counted by checking the loci of contigs in the L. erythrorhizon genome (Wengan and NextDenovo) using BLASTn to doubly confirm the loci in the genome. LeABC proteins were named according to Verrier et al. Contigs with lower sequence similarity than plant ABC species were subjected to manual BLAST searches to confirm that these were bacterial and fungal ABC homologues, which were excluded from the phylogenetic trees (Figs 2–6). These sequences were obtained from intact roots growing in soil. The filtered BLASTp hits, along with the query proteins, were aligned using MAFFT. Approximate maximum-likelihood phylogenetic trees were inferred from the alignments using FastTree, and the tree graphs were visualized using iTOL.Phylogenetic tree of plant ABC proteins. The phylogenetic tree was constructed based on the 118 L. erythrorhizon ABC proteins (LeABCs) and other plant ABC proteins in Supplementary Table S4. All ABC proteins are classified into A-type to G-type, while I-type is not included because of the too large sequence divergence. To build the phylogenetic tree, full-length amino acid sequences are aligned with MAFFT v7.487 using FFT-NS-2 strategy. Phylogenies are estimated with FastTree2.1.10 using the model BLOSUM45. Branch lengths are ignored. Clades being bootstrap values below 90% are collapsed using iTol. Bootstrap values are indicated for each subfamily.Phylogenetic relationship of members of the ABCB subfamily. B-type ABC proteins of L. erythrorhizon (LeABCBs) are shown in dark-colored letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades being bootstrap values below 90% are collapsed.Phylogenetic relationship of members of the ABCC subfamily. C-type ABC proteins of L. erythrorhizon (LeABCCs) are shown in dark-colored letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.Phylogenetic relationship of members of the ABCG subfamily. G-type ABC proteins of L. erythrorhizon (LeABCGs) are shown in dark-colord letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.Phylogenetic relationship of members of other ABC subfamilies. ABC proteins of L. erythrorhizon (LeABCs) are shown in black letters, whereas reported ABC proteins are shown in lighter-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.
2.3. Expression of L. erythrorhizon ABC transporter isoforms
RNAseq reads of three different L. erythrorhizon lines (LE_AM, LE_NJ, LE_PU_h) were assembled separately using Trinity (v2.12.0). Expression of isoforms, in transcripts per million, was calculated using Salmon (v1.5.0). The result is summarized in Supplementary Table S2 and the detailed datasets and commands are described in Supplementary File S1.
3. Results
3.1. Genome-wide identification of ABC proteins in L. erythrorhizon
Large-scale transcriptome data have been obtained from suspension cell cultures, hairy root cultures, and intact roots of L. erythrorhizon. However, some genes encoding ABC proteins may not be expressed in these samples, and some transcripts may have derived from soil-borne bacteria and other contaminants in intact roots. To better and more fully evaluate the ABC proteins in L. erythrorhizon, its draft genome was assembled from publicly available Illumina paired-end reads deposited by Nanjing University and long-read sequence data determined by Nanopore Technologies and deposited by Purdue University., To evaluate the completeness of the assembled genome, five assemblers were compared: ABySS, DBG2OLC, miniasm, Wengan, and HASLR. Wengan showed the highest rate of completeness (90.9%) and lowest missing rate of missing sequences (7.0%) (Supplementary Table S1). This draft genome was subsequently compared with two previously reported draft genome sequences of L. erythrorhizon (Supplementary Table S3). One draft genome, first described in 2020, was later reported to be of incorrect size, with a relative low rate of genome completeness in BUSCOs (79.3% complete BUSCOs). The second draft genome, assembled using NextDenovo in 2021, showed high genome completeness (88.7% complete BUSCOs relative to embryophyta_odb9 and 93.0% complete BUSCOs relative to eudicots_odb10) and high redundancy (14.1% duplicate BUSCOs relative to embryophyta_odb9 and 24.2% duplicate BUSCOs relative to eudicots_odb10), with the latter thought to be due to heterozygosity. The genomes assembled by NextDenovo and by Wengan in the present study were regarded as sufficiently complete and were used to identify ABC transporters (Supplementary Tables S3 and S4).Examination of the L. erythrorhizon genome sequence coupled with transcriptome data showed 118 genes putatively encoding ABC proteins, a number similar to that in Arabidopsis (Table 1). ABC proteins are present in all organisms, and are classified into nine groups, A to I, based on the structural features and sequence similarities, although plants do not possess group H ABC proteins. Determination of the numbers of the 8 other groups of ABC proteins showed that 7, 21, 14, 3, 3, 6, 43, and 21 proteins belonging to groups A, B, C, D, E, F, G, and I, respectively, were present in L. erythrorhizon (Supplementary Table S4). In contrast, 12, 29, 15, 2, 3, 5, 43, and 21 proteins belonging to groups A, B, C, D, E, F, G, and I, respectively, had been shown to be present in Arabidopsis. The nomenclature of LeABC genes are in order starting from the top of draft phylogenic tree according to the rule of Verrier et al. So far, only 2 ABC genes of L. erythrorhizon, LeMDR and LeMRP were reported, hence those are named as LeABCB1 and LeABCC1.
Table 1
Numbers of ABC proteins (LeABCs) in Lithospermum erythrorhizon
Subfamily
A
B
C
D
E
F
G
I
Total
Full/Half
Full
Half
Full
Half
Full
Full
Half
Soluble
Soluble
Full
Half
Soluble
Lithospermum erythrorhizon
1
6
13
8
14
2
1
3
6
12
31
21
118
Arabidopsis thaliana
1
11
22
7
15
1
1
3
5
15
28
21
130
uc
↓
↓
↑
uc
↑
uc
uc
↑
uc
↑
uc: almost unchanged; ↑: increased; ↓: decreased.
Numbers of ABC proteins (LeABCs) in Lithospermum erythrorhizonuc: almost unchanged; ↑: increased; ↓: decreased.Although the number of half-size ABCA proteins and full-size ABCB proteins was apparently lower in L. erythrorhizon than in Arabidopsis, the numbers in the other groups did not differ markedly. Further analysis based on sequence identity showed that the homologues of many genes was specifically amplified or reduced in L. erythrorhizon compared with Arabidopsis.Each ABC protein sequence data was aligned with MAFFT, and their phylogenetic relationship determined using FastTree (Fig. 2). In the phylogenetic tree, nodes with bootstrap values <90% were collapsed to avoid the confusion among subfamilies. Detailed subfamily-dependent comparisons identified characteristics specific to LeABC proteins members. It is also important to determine which of these genes are actually expressed in this plant species. Evaluation of the BUSCO completeness of transcriptome data assembled in this study (Supplementary Table S5) showed that the transcriptomes assembled in the present study showed a higher degree BUSCO completeness than the assembled genomes, indicating that measurable numbers of L. erythrorhizon genes were not present in those draft genome sequences. Because additional sequencing data and better methods of assembly are needed to build a seamless reference genome for L. erythrorhizon, the sequence information from transcriptome data was used for further detailed characterization, although the assembled genome sequence was used as much as possible as a reference.
Figure 2
Phylogenetic tree of plant ABC proteins. The phylogenetic tree was constructed based on the 118 L. erythrorhizon ABC proteins (LeABCs) and other plant ABC proteins in Supplementary Table S4. All ABC proteins are classified into A-type to G-type, while I-type is not included because of the too large sequence divergence. To build the phylogenetic tree, full-length amino acid sequences are aligned with MAFFT v7.487 using FFT-NS-2 strategy. Phylogenies are estimated with FastTree2.1.10 using the model BLOSUM45. Branch lengths are ignored. Clades being bootstrap values below 90% are collapsed using iTol. Bootstrap values are indicated for each subfamily.
3.2. Detailed analysis of each ABC protein group
Members of the ABC protein groups B, C, and G are often involved in transport of various organic substances, including secondary metabolites. The present study therefore focused primarily on these three ABC protein groups in L. erythrorhizon (LeABCs), including their possible involvement in the transport of secondary metabolites.
3.3. ABCB subfamily
The ABCB subfamily includes both half- and full-size ABC proteins. Twenty-one LeABCB proteins were detected in L. erythrorhizon, including 8 half-size and 13 full-size proteins, with most of them being homologues of Arabidopsis proteins (AtABCBs) (Fig. 3 and Table 1). Half-size ABCB members have been regarded as homologues of transporter associated with antigen processing and heavy metal tolerance factor, whereas full-size ABCB proteins have been regarded as homologues of multidrug resistance protein (MDR) and P-glycoprotein (PGP). To date, 24 ABCB proteins have been characterized in various plant species, with most being full-size ABCB proteins. The most intensively studied members are Arabidopsis AtABCB1, 4, 19, and 21, which are involved in auxin transport. One and two LeABCB homologues each of AtABCB1 and 19 were detected, respectively, whereas 1 LeABCB1 was found to be a homologue of 6 AtABCB proteins, AtABCB3, 4, 5, 11, 12, and 21 (Fig. 3 and Supplementary Table S4), suggesting that genes of this clade were amplified and duplicated in Arabidopsis but not in L. erythrorhizon. Because ABC proteins of other plant species in the LeABCB1 clade transport alkaloids, such as CjMDR1, ABC proteins in this clade may show relaxed substrate specificity. The L. erythrorhizon-specific clade consisting of LeABCB12, 13, and 14 seems to be amplified in L. erythrorhizon, whereas LeABCB15 has five homologues (AtABCB15–18 and 22) in Arabidopsis.
Figure 3
Phylogenetic relationship of members of the ABCB subfamily. B-type ABC proteins of L. erythrorhizon (LeABCBs) are shown in dark-colored letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades being bootstrap values below 90% are collapsed.
In silico expression of all LeABC genes was assessed using the data obtained by three different research groups, at Nanjin University, Purdue University, and Kyoto University (Supplementary Table S2). Among B subfamily members, LeABCB9 was found to be expressed in Kyoto University transcriptome (LE_AM) and showed some positive correlation with meroterpene production. Data from Purdue University showed that LeABCB1 was highly expressed, but did not show strong correlation with meroterpene production (Supplementary Table S2).
3.4. ABCC subfamily
The ABCC subfamily in L. erythrorhizon included 14 genes (Fig. 4). All C-type ABC proteins, which were previously called multidrug resistance-associated proteins (MRPs), are full-size (Table 1), with some localized to tonoplasts and others at the plasma membrane. Structurally, some members of this subfamily have a characteristic N-terminal extension of the transmembrane domain called TMD-0. Members of this subfamily have a preference for organic anions and transport their conjugates as substrates. The transport ability of this subfamily has been characterized in details. The transport substrates are excreted from the cytosol to apoplasts or vacuolar lumen depending on the membrane localization of members of this subfamily. Transport substrates recognized by members of this subfamily include glutathione and its conjugates,, phytate, phytochelatin complexes,, and folate. In maize, phenolic glucosides such as anthocyanin are also transport substrates of C-type ABC transporters.
Figure 4
Phylogenetic relationship of members of the ABCC subfamily. C-type ABC proteins of L. erythrorhizon (LeABCCs) are shown in dark-colored letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.
The total number of proteins in the LeABCC subfamily is similar to that in Arabidopisis, although some of their characteristics differ (Fig. 4 and Table 1). LeABCC2 is a homologue of AtABCC1, forming a group with AtABCC2, 11, and 12, although Lithospermum was also found to contain a specific gene, LeABCC1, which is similar to LeABCC2. LeABCC9 also showed characteristic gene duplication, encoding the proteins LeABCC9, C10, and C11. In contrast, its counterpart in Arabidopsis consists of a single gene, AtABCC10. LeABCC5 also showed a similar Lithospermum-specific gene amplification, giving rise to the LeABCC6 and C7 gene. LeABCC5 has two counterparts is Arabidopsis, AtABCC4 and C14, whereas LeABCC6 and C7 showed divergence from these Arabidopsis homologues. In contrast, the only counterpart of AtABCC9 in L. erythrorhizon is LeABCC13, whereas Arabidopisis has two closely homologous genes AtABCC9 and AtABCC15.Transcriptome data of the three research groups showed that LeABCC2 was ubiquitously expressed, with LeABCC7 showing a lower level of expression (Supplementary Table S2). However, the in silico expression data found no clear relationship with the production of secondary metabolites in L. erythrorhizon.
3.5. ABCG subfamily
The ABCG proteins constitute the largest subfamily among plant ABC proteins. Structurally, members of this subfamily show a reverse orientation of TMD and NBD compared with the ABCB and ABCC subfamilies, i.e. NBD is positioned at the N-terminal of TMD. Members of this subfamily include both half- and full-size ABC proteins, which were formerly designated white-brown complex (WBC) and pleiotropic drug resistance (PDR) proteins, respectively. All ABCG proteins characterized to date are localized at the plasma membrane, except for AtABCG28, which is localized to secretory vesicles at the growing pollen tube tip. Thirty-one half-size ABCG proteins were found in L. erythrorhizon, slightly more than the 28 in Arabidopsis (Fig. 5 and Table 1). The AtABCG1 and AtABCG12 groups in Arabidopsis consist of eight and three members, respectively, indicating Arabidopsis-specific gene amplification, whereas the LeABCG1 and LeABCG24 groups in L. erythrorhizon consist of two and one members, respectively. Conversely, AtABCG21 and G24 do not have similar pairs, whereas L. erythrorhizon has two copies each; and similarly AtABCG28 has three copies of LeABCGs (G4–G6), suggesting their functional importance in this plant species. Half-size ABCGs have been reported to be involved in suberization in rice roots and transport of plant hormones like abscisic acid., Further involvements of half-size ABCGs in various biological functions have been studied mostly in molecular genetics, e.g. cuticle formation,, suberin formation,, pollen physiology,, stigma exsertion, and cytokinin translocation, although relatively little is known about the biochemical transport function of half-size ABCGs, especially when compared with the number of genes.
Figure 5
Phylogenetic relationship of members of the ABCG subfamily. G-type ABC proteins of L. erythrorhizon (LeABCGs) are shown in dark-colord letters, and other reported ABCB family members are indicated in light-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.
In contrast to half-size ABCGs, the number of full-size ABCG members in L. erythrorhizon is 12, less than in Arabidopsis. Some of these genes are highly duplicated, whereas others are less diverse in L. erythrorhizon. For example, only one LeABCG34 corresponds to AtABCG29, 35, and 36, and only two LeABCG32-33 genes correspond to the six members of the AtABCG30 group (G30, 33, 37, 41, 42, and 43). In contrast, four LeABCG40-43 genes correspond to the single gene in Arabidopsis, AtABCG40. This G40 clade is noteworthy because more than 15 homologues have been reported in many plant species, including non-model plants, such as petunia, potato, tobacco, soybean, cucumber, Artemisia, and Medicago. Homologue of this gene undergo strong expressional regulation as stress responses, with knockout of this gene altering the phenotype in those species. These findings also suggest that each plant species have employed this particular gene for functional evolution necessary for divergent transport events.High expression of members of the LeABCG40-43 group was observed in cell cultures and hairy roots of L. erythrorhizon, with a member G41, being dominantly expressed in the transcriptome of each of the three research groups (Supplementary Table S2). The expression level appeared to show a positive correlation with meroterpene production. In addition, expression of LeABCG1 and LeABCG32-33 family members show weak correlation with meroterpene production by this plant.
3.6. Other ABC subfamilies
The other ABC protein subfamilies of L. erythrorhizon have not been shown to be associated with the transport of secondary metabolites. Known general functions of each subfamily and the outstanding features of these subfamilies in L. erythrorhizon are summarized below (Fig. 6 and Table 1).
Figure 6
Phylogenetic relationship of members of other ABC subfamilies. ABC proteins of L. erythrorhizon (LeABCs) are shown in black letters, whereas reported ABC proteins are shown in lighter-colored letters. Proteins were aligned using the MEGA7 neighbour-joining method with 1,000 bootstrap replicates. Clades with bootstrap values below 90% are collapsed.
The ABCA subfamily has been actively studied in mammalian systems. Many A-type ABC proteins are involved in the transport of lipids, such as cholesterol and phospholipids, and have therefore been identified in patients with genetic diseases characterized by lipid abnormalities. Little is known about the localization of A subfamily members in plant species, although, in mammalian systems, many of these proteins localize to plasma membranes. Although cultured L. erythrorhizon cells secrete large amounts of lipophilic shikonin derivatives, the expression of LeABCA proteins is not strongly induced upon induction of shikonin production in M9 medium in the dark. Interestingly seven LeABCA genes, encoding one full-size and six half-size proteins, do not seem to be close homologues to AtABCA members. However, Arabidopsis ABCA proteins are grouped by themselves whereas LeABCA genes appeared to have evolved specifically in this plant species (Fig. 6 and Table 1).The ABCE subfamily is a group of soluble proteins that was designated RNase L inhibitors., The proteins of this subfamily consist of two NBDs. The Arabidopsis genome contains three genes, AtABCE1-3, whereas the three proteins in L. erythrorhizon (LeABCE1-3) were all similar to AtABCE2 (Fig. 6 and Table 1). The biological function of this subfamily is still largely unknown.Members of the ABCD subfamily were originally called peroxisomal membrane proteins. Arabidopsis contains one full-size (AtABCD1) and one half-size (AtABCD2) D-type ABC protein. AtABCD1 has been reported to localize to the peroxisome and to be responsible for fatty acid transport from the cytosol to the peroxisome, leading to energy generation via β-oxidation of fatty acids.L. erythrorhizon contains two AtABCD1 homologues, LeABCD2 and D3, and one half-size AtABCD2 homologue, LeABCD1 (Fig. 6 and Table 1). Although LeABCD2 and D3 showed gene duplication, the D-type ABC proteins in plants have a common function. The similarity between shikonin and ubiquinone biosynthetic pathways suggests that peroxisomes may be involved in the synthesis of PHB from p-coumaric acid, a pathway in which LeABCDs may play an important role (Fig. 1).The ABCF subfamily is another group of soluble ABC proteins composed of two NBDs. Formerly, this family was called general control non-repressible., This subfamily consists of five members, AtABCF1 to F5, in Arabidopsis, whereas six members, LeABCF1 to F6 in L. erythrorhizon (Fig. 6 and Table 1). LeABCF1 is a homologue of both AtABCF2 and F5, whereas both LeABCF5 and F6 are homologues of AtABCF4. The functions of this protein subfamily have been reported in divergent biological events, such as protein degradation leading to stomatal aperture regulation, translational regulation and stress adaptation, DNA damage repair, response to fungal attack, and self-incompatibility.Members of the ABCI subfamily, which encode NBDs or TMDs, are often called bacterial type ABC proteins. These proteins are thought to associate with transmembrane proteins to form ABC transporter complexes to become biochemically functional. Because the amino acid sequences of these proteins differ markedly and because comparisons of a number of homologues in Arabidopsis and Lithospermum did not provide clear scientific meanings, these proteins were not further analysed (Fig. 6 and Table 1).
4. Discussion
The ABC protein family is conserved in all organisms, including humans, other mammals, yeast, and bacteria. Moreover, this family is the largest family of membrane transporters in plants. The intensive gene duplication suggests that these proteins play many important roles in plant life, emphasizing the need to characterize each protein and its function. To date, however, few ABC proteins have been characterized by large-scale phenotype screening of T-DNA tagging lines due to their redundancy. Because the numbers of specialized metabolites are much larger in plants than in mammals and yeast, some ABC proteins are likely involved in the transport of secondary metabolites.Because L. erythrorhizon has high secondary metabolic activities, ABC proteins were completely inventoried in this species. Axenic cultured cells of L. erythrorhizon are capable of producing large amounts of meroterpene compounds, as well as large amounts of the caffeic acid tetramer and dimer lithospermic acid B and rosmarinic acid, respectively, indicating that the phenylpropanoid pathway is also very active in this plant., Moreover, the production of these compounds is highly regulated by growth media and culture conditions, such as illumination. For example, light inhibition of shikonin production resulted in the accumulation of the intermediate p-hydroxytbenzoic acid as its glucoside, but did not affect lithospermic acid B production. It is highly important to identify the subcellular sites for the biosynthesis and accumulation of these compounds in cell culture systems.The draft genome of L. erythrorhizon has been reported to date by two research groups, at Nanjin University and Purdue Univerisity,,, resulting in 93% and 79% coverage of BUSCO core genes, respectively. The present study achieved a draft genome of 91% completeness using the Wengen assembler, and transcriptomes with higher completeness (95–96%). These findings indicate that the genomic sequence of L. erythrorhizon is not fully complete and remains insufficient for building a complete list of genes encoding ABC proteins in this species. Thus, ABC genes were surveyed initially in the genome assembly with supplementation by transcriptome data. This process led to the identification of several ABC protein-encoding genes in the transcriptome that were not identified in genome sequence.Members of the B, C, and G subfamilies of ABC proteins are the main candidate for the transporters of organic substances, including secondary metabolites. ABCB subfamily members, especially full-size B-type ABC proteins (MDR/PGP), localize to the plasma membrane, where they recognize various secondary metabolites, such as sesquiterpenes and alkaloids, as well as plant hormones such as auxins. B-type ABC proteins tend to transport compounds of relatively low hydrophilicity. L. erythrorhizon produces benzoquinone compounds, such as dihydroechinofuran and echinofuran B (collectively called echinofurans), which are biosynthesized from geranyl diphosphate and p-hydroxybenozoic acid, respectively. These compounds are synthesized through the mevalonate and shikimate pathways, respectively, indicating that these meroterpenes share the same biosynthetic pathway as shikonin.,, The echinofurans are actually branched products from the common biosynthetic intermediate, geranylhydroquinone, with benzoquinone compounds forming from furan/dihydrofuran rings rather than from naphthalene rings. Inhibition of naphthalene ring formation, as in Linsmaier–Skoog medium, results in the secretion into the medium of large amounts of dihydroechinofuran as a half water-soluble form. Isolation and structural characterization of a series of these echinofuran derivatives showed that one of the most oxidized products is echinofuran B, an orange oily substance. These benzoquinone derivatives are moderately hydrophobic, are soluble in medium, and do not accumulate in oleophilic granular structures as shikonin derivatives (Fig. 1). Because these benzoquinone compounds can be recovered only from the medium, membrane transporters are likely responsible for their secretion, with LeABCBs being good candidates.Shikonin derivatives, however, have been shown to accumulate in apoplastic spaces as red granules. Electron microscopy showed that these granules are surrounded by membrane structures,, making it unlikely that such large particles are transported by membrane transporter proteins. An ABCB-type transporter has been reported to mediate shikonin excretion, but this may occur only if shikonin derivatives are not tightly compartmented in lipid droplets. In contrast, most shikonin is likely secreted via membrane dynamics. Moreover, shikonin derivatives have been found to suppress the efflux activity of ABCB in mammalian systems.All members of the ABCC (MRP) subfamily are full-size proteins, with some localized to the plasma membrane and others to tonoplasts. The biochemical functions of these proteins include the transport of organic substances, the transport of heavy metal ions, the formation of ion channels, and molecular switching. In plants, these proteins have been reported to transport metabolites such as anthocyanin, phytate, glutathione conjugates, and phytochelatin. When compared with ABCB, ABCC proteins have preferences for hydrophilic substrates and anionic compounds. An ABCC protein in L. erythrorhizon has been grouped in a clade of vacuolar type ABCC proteins. Because most shikonin can be recovered from the cell surface and medium, it is unlikely that vacuolar ABCC proteins are directly involved in shikonin secretion into the medium. Rather, an ABCC protein may bind to the glucosylated organic anion p-hydroxybenzoic acid, a key precursor of shikonin and echinofuran derivatives, which accumulates in vacuoles upon tentative stoppage of shikonin production. Lithospermic acid B, which is strongly induced in shikonin production medium (M9), and rosmarinic acid have been shown to accumulate in vacuoles of Coleus cells. These caffeic acid oligomers appear to be preferable transport substrates of ABCC proteins as they are weak acid compounds similar to p-O-glucosylbenzoic acid.ABCG (PDR/WBC) is the most divergent plant ABC subfamily. Transport substrates include terpenes such as sclareol and capsidiol, flavonoids such as liquiritigenin, plant hormones such as abscisic acid, and heavy metals. The ABCG proteins are the most highly expressed ABC proteins in L. erythrorhizon, especially in root tissues, in which shikonin derivatives specifically accumulate. Genes of this clade are strong candidates for further detailed analysis as transporters of secondary metabolites. In plants, most half-size G-type ABC proteins are involved in the loading of extracellular lipid polymers, such as cutin, wax, suberin, and sporopollenin.,,, Because these end products are solid polymers, these ABCG proteins are thought to function biochemically in the transport of their monomers.Many studies on ABCG transporters in mammals have shown that members of this subfamily are involved in the transport of lipids, such as cholesterol and phospholipids. Because meroterpene compounds are typical lipophilic metabolites, ABCG proteins may be involved in their intracellular transport and/or excretion process. Analysis of the genomic LeABCG genes showed that the genes encoding the NBD of several ABCG members contain five introns, making genome-based analysis difficult. These genes were highly expressed in L. erythrorhizon, in particular in M9 medium in the dark. These proteins are likely associated with meroterpene production by this species of plant.Identification of molecules translocated by ABC proteins is important but difficult. Because plants have many paralogues of ABC proteins, knock-out of a single protein often does not result in a different phenotype or metabolite profile, probably due to the functional redundancy of these proteins. Although heterologous expression systems may reveal the substrates transported by each protein, some of these transporter genes may not express sufficient protein for analysis. For example, plant ABCG proteins are not always highly expressed in a heterologous host like budding yeast, usually the initial choice for a host organism to express a membrane protein. Although these proteins may be adequately expressed in plant hosts, the presence of endogenous transporter in the host organism may interfere with data interpretation. Plant ABC proteins can also be expressed in insect cell systems, although these foreign membrane proteins often stick to endo-membranes and do not localize to plasma membranes, thus hampering transport assays. Development of a general and reliable heterologous expression system will contribute to this field of research.,The number of studies of plant ABC proteins, especially in non-model plants, has increased in recent years. In addition, many recent studies have attempted to molecularly characterize L. erythrorhizon. The systematic inventory of ABC proteins in L. erythrorhizon will provide timely and valuable information for transport studies on plant secondary metabolites as a model experimental system.Click here for additional data file.
Authors: Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol Journal: Genome Res Date: 2017-02-23 Impact factor: 9.043