Literature DB >> 23032611

Phylogenomic study of lipid genes involved in microalgal biofuel production-candidate gene mining and metabolic pathway analyses.

Namrata Misra1, Prasanna Kumar Panda, Bikram Kumar Parida, Barada Kanta Mishra.   

Abstract

Optimizing microalgal biofuel production using metabolic engineering tools requires an in-depth understanding of the structure-function relationship of genes involved in lipid biosynthetic pathway. In the present study, genome-wide identification and characterization of 398 putative genes involved in lipid biosynthesis in Arabidopsis thaliana Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae was undertaken on the basis of their conserved motif/domain organization and phylogenetic profile. The results indicated that the core lipid metabolic pathways in all the species are carried out by a comparable number of orthologous proteins. Although the fundamental gene organizations were observed to be invariantly conserved between microalgae and Arabidopsis genome, with increased order of genome complexity there seems to be an association with more number of genes involved in triacylglycerol (TAG) biosynthesis and catabolism. Further, phylogenomic analysis of the genes provided insights into the molecular evolution of lipid biosynthetic pathway in microalgae and confirm the close evolutionary proximity between the Streptophyte and Chlorophyte lineages. Together, these studies will improve our understanding of the global lipid metabolic pathway and contribute to the engineering of regulatory networks of algal strains for higher accumulation of oil.

Entities:  

Keywords:  biofuel; bioinformatics; lipid biosynthetic genes; microalgae; phylogenomics

Year:  2012        PMID: 23032611      PMCID: PMC3460774          DOI: 10.4137/EBO.S10159

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Growing levels of atmospheric pollution, mounting energy demand, and the incessant rise in crude oil prices are some of the issues which have in recent times driven global efforts in biofuel research. Currently, commercial-scale biofuels are sourced primarily from a variety bioenergy crops that include sugarcane (Saccharum officinarum), sugar beet (Beta vulgaris), switch grass (Panicum virgatum), soybean (Glycine max), canola (Brassica napus) and sunflower (Helianthus annus).1 Although the environmental benefits of biofuels as compared to fossil fuels are well established, concerns are being raised about their long-term sustainability, especially against the backdrop of diversion of arable land for biofuel-based cropping systems and their corresponding adverse impact on the global food supply chain.2 In consequence, algae-based biofuels are increasingly gaining the attention of researchers due to their rapid growth rate coupled with high carbon dioxide uptake, high lipid content and comparatively low, marginal land usage rates.3 Notwithstanding the many advantages of biofuels and their technical feasibility, the commercial viability of the algal biofuel process is still an area of concern requiring better strain development and improved post-harvest process engineering.4 The major challenge is to achieve accumulation of improved lipid profiles with concomitant reduction in energy inputs in order to minimize the cost of production.2 The enhancement of lipid production in microalgal cells under controlled stress conditions and engineering metabolic pathways are promising strategies to obtain large amounts of standard biofuel for industry. Despite positive experimental reports on enhanced microalgal lipid accumulation under physiological or nutritional stress regimes, many contrasting studies have indicated a concomitant reduction in overall biomass yield under such conditions.5 In this context, harnessing the potential of genome-scale metabolic engineering has been suggested as a promising area of research to boost oil production in microalgal strains, including modification of algal lipid profile for improved biofuel properties.6,7 Over the past few years various studies have been carried out concerning alteration of fatty acid composition in plants through genetic engineering approaches, along with the development and deployment of a number of plant lipid-related genomics databases.8–11 Comparative genomics analyses using bioinformatics tools have also been performed recently to identify genes involved in lipid biosynthesis in various oleaginous plants. For example, a total of 1003 maize lipid-related genes were cloned and annotated by Lin et al,12 while Sharma and Chauhan13 identified a total of 261 lipid genes from the genome of Arabidopsis, Brassica, soybean and castor. Complete or near complete genome sequences have been reported for several algae.6 Yet, lack of adequate knowledge regarding the structure-function of lipid biogenesis genes in an evolutionary context is a major impediment in engineering metabolic pathways of algae for over-production of fuel precursors.14 Various experimental techniques like insertional mutagenesis and targeted gene disruption have been employed to analyze gene function in a few algae. However, many of these approaches are tedious, time-consuming, fiscally prohibitive and limited by a number of biological constraints.15 As an alternative, phylogenomics is now increasingly used to gain insights into metabolic pathways at the molecular level by comparative genomics and co-evolutionary analyses of related gene.16 Therefore the present work was designed to identify the genes involved in lipid metabolic pathway from the genomes of microalgae (including Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae) using sequence similarity search with Arabidopsis thaliana homologs. In addition phylogenomics protocols have been employed to study the structure-function relationship of the encoded proteins and to gain much needed insights into their phylogenetic evolution. We hope that the present study contributes to the biochemical and molecular information needed for augmentation of lipid synthesis in microalgae.

Materials and Methods

Gene retrieval and annotation

An initial set of lipid genes was obtained from the Arabidopsis thaliana lipid gene database (http://www.plantbiology.msu.edu/lipids/genesurvey/index.html) to construct a query protein set. The Arabidopsis lipid gene database is a convenient and reliable source of genes covering all the major biochemical events responsible for biosynthesis and catabolism of plant lipids.17 Subsequently, each protein in the query dataset was used to identify homologs in microalgae by subjecting it to BLASTp18 search with e-value inclusion threshold set to 0.001 against microalgal genome databases provided by Joint Genome Institute. These include Cyanidioschyzon merolae http://merolae.biol.s.u-tokyo.ac.jp/), Chlamydomonas reinhardtii (http://genome.jgi-psf.org/chlamy/chlamy.info.html), Volvox carteri (http://www.phytozome.net/volvox.php), Ostreococcus lucimarinus (http://genome.jgi-psf.org/Ost9901_3/Ost9901_3.home.html), Ostreococcus tauri (http://genome.jgi-psf.org/Ostta4/Ostta4.home.html). Based on multiple alignments and/or the presence of conserved motif patterns, some initial sequences “hits” were then discarded. Functional descriptions of genes or gene products were performed by annotation of Cluster of Orthologous groups (COGs) using KOGnitor program,19 the latter being a widely used tool in the field of computational genomics for detecting candidate set of orthologs in prokaryotes and eukaryotes.19 In addition, assignment of Gene Ontology (GO) terms describing biological processes and molecular function was annotated by the GO browser and annotation tool AmiGO.20 The Gene Ontology is currently the pre-eminent approach for functional annotation of homologous genes and protein sequences in multiple organisms.20

Metabolic pathway study

Metabolic pathways were subsequently analyzed using the KEGG pathway database,21 an extensively employed biochemical pathway database to analyze lipid pathways in diverse organisms.22 To enrich the pathway annotation, sequences were submitted to the KEGG Automatic Annotation Server (KAAS) to identify the orthologous gene groups.23 KAAS annotates every submitted sequence with a KEGG ortholog (KO) identifier that allows identification of orthologous and paralogous relationships between the genes of interest. Further, a set of six reference pathway maps, namely fatty acid biosynthesis, fatty acid metabolism, fatty acid elongation, glycerolipid metabolism, glycerophospholipid metabolism and pathway map for biosynthesis of unsaturated fatty acids, were downloaded from the KEGG database. This dataset contains a complete biochemical description of the pathways related to the lipid metabolism observed in different organisms. They were used as templates for comprehensive examination of the lipid biosynthetic genomic repertoire of microalgae by correlating genes in the genome with gene products (enzymes), in accordance with their respective Enzyme Commission (EC) number.

Prediction of subcellular localization

Three different protein targeting prediction programs were used to determine the putative subcellular localization of the candidate proteins: TargetP,24 ChloroP25 and WolfPsort.26 Each program is based on different terminology and predictions. The location assignment of TargetP is based on the presence of any of the N-terminal presequences: chloroplast transit peptides (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP). The ChloroP server predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites. WolfPsort is an extension of the PSORT II program for protein subcellular localization prediction. It classifies protein into more than 10 location sites, including dual localization such as proteins which shuttle between the cytosol and nucleus. The sensitivity and specificity of this program has been experimentally verified to be 70%.

Physico-chemical characterization and secondary structure prediction

Physico-chemical properties like length, molecular weight, isoelectric point (pI), total number of positive and negative residues, Instability Index,27 Aliphatic Index28 and Grand Average hydropathy (GRAVY)29 were computed using the Expasy’s ProtParam server.30 GOR IV server31 was employed for the prediction of secondary structural features like alpha helices, extended strands and random coils in terms of percentage in the protein sequences.

Calculation of the GC content

The GC content of the predicted genes was determined using Genscan web server.32

Motif identification

Protein sequence motifs for each gene family were identified using the MEME program.33 The analyses parameters were set as follows: number of repetitions-zero or one per sequence; maximum number of motifs—1; minimum and maximum width—6 and 50, respectively. The motif profile for each gene family is presented schematically. Domain arrangements along sequences were predicted using InterProscan34 to determine protein homolog relationships among species.

Exon-intron structure and phylogenetic analyses

The exon-intron structural patterns of the lipid biosynthetic genes were analyzed using the gene prediction algorithm of Genscan.32 To construct the phylogenetic tree, amino acid sequences were aligned using the ClustalX program implemented in BioEdit35 (v 7.1.3) with default settings and then manually refined by trimming of poorly conserved N and C termini. ClustalX36 has been demonstrated to be a user-friendly tool for providing good, biologically accurate alignments within a reasonable time limit. Many options are provided such as the realignment of selected sequences or blocks of conserved residues and the possibility of building up difficult alignments, making ClustalX an ideal tool for working interactively on alignments.36 Subsequently, sequence alignment of genes predicted to be in similar families were used as an input file for the MEGA 4 software.37 Phylogenetic tree was built via the neighbor-joining (NJ) method with evaluation of 1000 rounds of bootstrapping test, followed by identification of sub-tree.

Results and Discussion

Comparative genomic analyses of lipid genes in microalgal species

Interest in microalgae as a potential feedstock for biofuel production and other valuable biomaterials is rooted in the ability of microalgae to rapidly accumulate significant amounts of neutral lipids.38 Under optimal conditions, microalgae synthesize fatty acids used primarily for esterification into polar glycerol-based membrane lipids like glycosylglycerides and phosphoglycerides, whereas under stress conditions, many microalgae tend to accumulate storage lipids called triacylglycerol (TAGs).16 Although global fatty acid biosynthetic mechanisms are known in higher plants,39 pathways responsible for lipid accumulation in microalgae are not well studied. Hence, in order to bridge our existing knowledge gap regarding algal lipid metabolism, comparative metabolic pathway analyses have been performed across five microalgal genomes, using homologous plant genes as reference with an objective of functional characterization of predicted genes. EC numbers, Cluster of Orthologous Groups (COGs), protein domain family and GO terms were determined for the respective candidate genes. The above in silico approach has been reviewed recently to be reliable enough for accurate function prediction of uncharacterized proteins encoded by genes in a genome.40 In the present study, using the Arabidopsis annotation data as the BLAST input query set, a total of 398 orthologous genes present in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae genomes were identified. The above approach to identify candidate genes involved in biosynthesis and accumulation of storage oil has been successfully demonstrated in plants by Sharma and Chauhan.13 These 398 genes clustered into 40 gene families and includes 142, 56, 59, 47, 41 and 53 genes from A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae genomes, respectively (Table 1). The identified genes are involved in the synthesis of phospholipids, glycerolipid and storage lipids like TAG. We further divided the predicted genes into categories like gene-coding enzymes involved in biosynthesis and catabolism of fatty acid, TAG and membrane lipid. The comprehensive list of candidate genes along with experimental evidence of the respective enzyme action influencing lipid accumulation is presented in Table 1.41–74 Approximately 47% of the predicted gene products found in the present study were previously annotated as ‘predicted’, ‘probable’, ‘putative uncharacterized’ and ‘similar’ or ‘hypothetical’ proteins (Table 1). The annotation of these sequences has been improved and a role in lipid biosynthetic process was assigned to each of them by similarity search with homologous plant genes, annotation of Gene Ontology, and through identification of conserved domains or motifs. Furthermore, on comparison to the previous report on lipid gene identification in C. merolae genome by Sato and Moriyama,75 the present study has identified 20 additional genes involved in lipid biosynthesis.
Table I

Candidate genes involved in lipid biosynthetic pathway of Arabidopsis thaliana, Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae genome.

Gene/symbolEC no.KOG no.KEGG IDGene ontologyCorresponding homologous enzymes in algal species (SwissProt accession ID)JGI protein IDRef**

A. thalianaC. reinhardtiiV. carteriO. lucimarinusO. tauriC. merolae
Fatty acid biosynthesis
Homomeric acetyl-CoA carboxylase (ACC)6.4.1.2KOG0368K11262GO:0004075Q9C8G0, Q38970D8UA31*A4RRC3, A4S479Q01GA9, Q00ZG8CMM188C4146
Heteromeric ACC biotin carboxylase subunit (BCC)6.4.1.2/6.3.4.14KOG0238K01961GO:0004075GO:0003989O04983, F4JYE1, F4JYE0A8JGF4, A8JEW0D8UF54A4S140,Q013U7CMS299C
ACC carboxyl-transferase α-subunit (ACCCT α)6.4.1.2KOG0238K01962GO:0003989Q9LD43A8J646D8TNY0CMV056C
ACC CT β subunit (ACCCT β)6.4.1.2KOG0540K01963GO:0003989P56765A8JHU1D8U455,*CMV207C
ACC biotin carboxyl carrier protein (ACC-BCCP)6.4.1.2KOG0540K02160GO:0003989Q42533, F4KE21, Q9LLC1A8JDA7D8U256*CMV134C
Malonyl-CoA-ACP transacylase (MCT)2.3.1.39KOG2926K00645GO:0004314Q8RU07, Q8L5U2*,, F4IMR0A8HP61D8TTQ7A4S2U9, A4SAC5Q011G6, Q00S12CMT420C
β-ketoacyl-ACP synthase I (KAS I)2.3.1.41KOG1394K00647GO:0004315P52410, F4KHF4A8JEF7D8UDW0, D8TXC7A4RSM2, A4S713Q01EI4CMM286C47,48
β-ketoacyl-ACP synthase II (KAS II)2.3.1.179KOG1394K09458GO:0033817Q9C9P4, Q8L3X9A8JCK1, A8IG50D8TJC9A4S7B9, A4RTJ7Q00V56, Q01DP0*CML329C
β-ketoacyl-ACP synthase III (KAS III)2.3.1.180KOG1394K00648GO:0033818P49243, B9DHF9A8JHL7D8TXF1*A4S7P4Q00V15CMD118C
β-ketoacyl-ACP reductase (KAR)1.1.1.100KOG1200K00059GO:0004316P33207, Q9SQR4, Q9SQR2A8JBX4, Q84X75D8TK78*, D8TV99*A4RQY6Q01GL3CMS393C
3-hydroxyacyl-ACP dehydrase (HAD)4.2.1.-K02372GO:0008659Q9LX13, Q9SIE3Q8LBU6*,A8IX17D8TV61*A4RUS8CMI240C
Enoyl-ACP reductase (EAR)1.3.1.9KOG0725K00208GO:0004318Q9SLA8, Q9M672, O04942, Q9FEF2A8JFI7D8UC03*A4S0L7Q014N2CMT381C
Acyl-ACP thioesterase/Fatty acid thioesterase (FAT)3.1.2.14K10782GO:0000036Q42561, Q9SV64, Q9SJE2, Q42562, Q42558, Q41917A8HY17*D8TJT0*,A4RS92Q01FC44953
Fatty acid elongation
3-hydroxyacyl-CoA dehydrogenase (CHAD)1.1.1.35KOG2304K00074GO:0008691GO:000385Q9LDF5A8IVP3,D8UMK6*,A4RUY4Q01C53*,CMC137C
Enoyl-CoA hydratase (ECH)4.2.1.17KOG1680KOG1679K01692GO:0004300Q6NL24, O23468, Q0WRQ2, Q9T0K7A8I9B0,D8TRG5*A4SBD9Q010Z7CMK139CCMT074C
Enoyl-CoA reductase (TER)1.3.1.38KOG1639K10258GO:0019166Q8LCU7, F4J6R6*, Q9M2U2A8HM32, A8JAQ9D8THB1*, D8U5N0*,A4RUU7, A4RU17Q01D21CMD146CΔ
Fatty acid catabolism
Long chain acyl-CoA synthase (LACS)6.2.1.3KOG1256K01897GO:0004467Q9T0A0, Q9T009, Q8LPS1, Q8LKS5, Q9SJD4, Q9CAP8Q9C7W4, Q9XIA9O22898A8JH58, A8HRV2,D8TMY5*, D8TKU*, D8TP15*, D8TNJ2*, D8TS64*A4RWX1, A4S5G5Q00Y52*, Q00UP7CML197C, CME186C54
Acyl-CoA oxidase (AOX)1.3.3.6KOG0135K00232GO:0003997O65201, F4KG18, O65202, F4JMK8, Q96329, Q9ZQP2, Q9LMI7, P0CZ23A8ISE5, A8JGC8, A8JB97D8U3F9*, D8TVM2*, D8U064*, D8U3J5*A4RR33Q01GH2CMK115C55
Acyl-CoA dehydrogenase (ACADM)1.3.99.3KOG0139K00249GO:0003995Q8RWZ3, Q0WM98, Q67ZU5, Q9M7Y7A8J3M3D8U2A4*A4RQF1Q01H50*,CML080C
Enoyl-CoA hydratase (ECH)4.2.1.17KOG1680KOG1679K01692GO:0004300Q6NL24, O23468, Q0WRQ2, Q9T0K7A8I9B0,D8TRG5*A4SBD9, A4S307Q010Z7CMK139CCMT074C
3-hydroxyacyl-CoA dehydrogenase (CHAD)1.1.1.35KOG2304K00074GO:0008691GO:0003857Q9LDF5, Q9ZPI5, Q9ZPI6A8IVP3,D8UMK6*,A4RUY4Q01C53*,CMC137C
Acetyl-CoA acetyl-transferase (THIL)2.3.1.9KOG1390K00626GO:0003985Q8S4Y1, Q9FIK7, F4JYM8*, B9DGQ1, Q3E8F0A8J0X4D8UKX0*, D8TZN7*CMA042CCME087C
Fatty acid desaturation
Δ9 acyl-aCP desaturase (Δ9D)1.14.19.1KOG1600K00507GO:0004768Q9SID2, O65797, Q9FPD5, Q9LM13, Q9LM14, Q9LND8, Q9LND9, Q949X0, Q9LVZ3A8J015, A8JEN2, C6ZE81, A8IQB8D8U961, D8TRE9*,A4S9D8Q00T63CMJ201CCMM045C5659
Δ12 acyl-aCP desaturase (Δ12D)1.14.19.−KOG:TWOG0155K10256K10255GO:0045485P46313, P46312, Q8LFZ8, Q19MZ0A8IR24, O48663D8UB74, D8TTW0A4RWB5Q01DF5CMK291C
Triacylglycerol (TAG) biosynthesis and catabolism
Glycerol kinase (GK)2.7.1.30KOG2517K00864GO:0004370F4HS76, Q9M8L4, A0JPS9, C0Z2P8A8IT31D8TXT9*A4RTW5Q01D72*,CMJ173C
Glycerol-3-phosphate dehydrogenase (G3PDH)1.1.5.3KOG0042K00111GO:0004368Q9SS48A8HTE5D8TSE3*A4RU40Q01CZ8CML209C60
Glycerol-3-phosphate acyltransferase (GPAT)2.3.1.15KOG2898K00631K00630GO:0004366Q43307, Q9LHS7, Q8GWG0, Q9SYJ2, Q9LMM0, Q9FZ22, Q9SHJ5, Q0WPD4, O80437, Q9CAY3A8J0R2, A8HVM5D8TVT7*, D8TIB3*A4RT23, A4S945Q01F77CMK217CΔ,CMJ027CCMA017C61,62
1-acylglycerol-3-phosphate acyltransferase/Lysophosphatidi Acid acyl-transferase (AGPAT/LPAT)2.3.1.51KOG1505K00655K13519GO:0003841Q8GXU8, Q8LG50, Q9SYC8, Q8L4Y2, Q9LHN4A8J0J0D8U1V6*, D8TWQ3*A4S0H0Q014T8*, Q00SS2CME109C,CMF185CCMJ021C,63,64
Phosphatidate phosphatase (PP)3.1.3.4KOG3030K01080GO:0008195Q9ZU49, Q3EC91 Q8LFD1, A8MR10*, F4IX65, Q9XI60, Q9LJQ8A8JGB5,D8U3B0*A4RU93,Q01CT9*,CMR054CΔ,CMR488CΔ,
Diacylglycerol Acyltransferase (DGAT)2.3.1.20KOG0831KOG0380K00635K11155GO:0004144Q9SLD2, Q9ASU1, Q93ZR6A8IXB2D8UGA9*, D8UHL*A4S872Q00UG1*,CMJ162CΔ,CMQ199CCME100C6574
Triacylglycerol lipase (TAGL)3.1.1.3KOG4569K01046GO:0004806Q9LZA6, Q9M1I6, F4JY30D5LAZ6, D5LAW3, A8HYG2,D8TT81*, D8U4S5*,A4RQN3,, A4S9E4, A4RZ46Q00T58, Q016Q6CMS254CCMT151CΔ,
Membrane lipid biosynthesis
Ethanolamine phospho-transferase (EPT1)2.7.8.1KOG2877K00993GO:0004307O82567, F4HQU9Q6U9W9D8TWP7*A4S097,Q01BV3CMF133C
CDP-Diacylglycerol synthase (CDS1)2.7.7.41KOG1440K00981GO:0004605Q1PE48, F4JL60, O49639, F4JL62, O04928A8ILG5, A8IRM0A8IRL9D8TPH2, D8TK01A4RZR8, A4RWB0Q01AN2, Q015S5CMM311CCMN215CCMS056C
Phosphatidyl glycerol lphosphate synthase (PGP3)2.7.8.5KOG1617K00995GO:0008444O80952, Q67ZP8*, Q9M2W3A8JEJ8D8U650*, D8UDS7*,A4S5X3Q00W48CMN196CCMJ134C
Ethanolamine kinase (EKT1)2.7.1.82KOG4720K00894GO:0004305O81024*, Q8LAQ2*A8J2J5D8TJH5A4S0V5Q014D1*,CMR011C,
CTP: phospho-ethanolamine cytidyl transferase (ECT)2.7.7.14KOG2803K00967GO:0004306Q9ZVI9Q84JV7D8TWX6*A4S2P2Q011M7CMS052C
UDPsulfoquinovose synthase (SQD)3.13.1.12.4.1.-KOG1371KOG1111K06118K06119GO:0046507GO:0046510O48917, Q8S4F6Q763T6, A8JB95A8HMC2D8U760*, D8U5J8*A4S476, A4S792Q00ZH1, Q00V96CMR012CCMR015C
Monogalactosyl diacylglycerol synthase (MGDGS)2.4.1.46K03715GO:0046509Q9SI93, O81770A8HUF1D8TQW6*A4RT08CMI271C
Digalactosyl diacylglycerol synthase (DGDGS)2.4.1.241K09480GO:0035250Q9S7D1, Q8W1S1A8HU66D8TQZ2*,A4S4N5,, A4S0F1,Q00Z06, Q014V9,
Inositol phospho-transferase (PIS)2.7.8.11KOG3240K00999GO:0003881Q8LBA6, Q8GUK6, F4JTR2*A8ICX2D8TPK4A4SAF2Q00RY0CMM125C

Notes:

Putative uncharacterized proteins;

predicted proteins;

probable proteins;

similar protein;

absent in KEGG pathway database;

relevant references on experimental evidences of the respective enzyme action influencing lipid accumulation.

To investigate metabolic processes responsible for the synthesis of microalgal biofuel precursors, KO identifiers were assigned to the predicted 398 genes representing 36 unique EC numbers, which were subsequently used to study metabolic pathway maps available in KEGG pathway database. KEGG is considered one of the most important bioinformatics resources for understanding higher-order functional meaning and the utilities of the organism from its genome information. It hosts information on the majority of well-known metabolic pathways, including lipid pathways for several organisms such as higher plants, bacteria and algae. Recently, it has been used successfully by Rismani-Yazdi et al14 to identify pathways and the underlying gene responsible for production of biofuel precursors in Dunaliella tertiolecta, a potential microalgal biofuel feedstock. Using the above approach, a total of 79 lipid genes including 22 from A. thaliana, 21 from C. merolae, 10 from C. reinhardtii, 10 from O. lucimarinus and 8 each from V. carteri and O. tauri were recognized that were not earlier indexed in KEGG metabolic pathway database (Table 1). The global synthesis pathway of TAG begins with the basic fatty acid precursors, acetyl-CoA, and continues through fatty acid biosynthesis, complex lipid assembly and saturated fatty acid modification until TAG bodies are finally formed.76 A simplified overview of TAG biosynthetic pathway in microalgae is shown as Figure 1. Comparative analyses with the genomes of C. reinhardtii, V. carteri, O. lucimarinus, O. tauri, C. merolae and A. thaliana indicates that the majority of genes involved in lipid production are orthologous among these species. Additionally, the extensive amino acid sequence conservation (more than 60% pair-wise sequence identity) among the genes involved in lipid biosynthesis provides indications of functional equivalence between Arabidopsis and microalgal genes. Thus, the present results demonstrate that the underlying fatty acid and TAG biosynthesis process are directly analogous to those reported in higher plants.16 It may further be noted that although algae predominantly share similar lipid biosynthetic pathways with higher plants, the present in silico analyses revealed that the sizes of the gene families responsible for lipid biosynthesis in microalgae are smaller than Arabidopsis. Certain specific pathways were also observed to be absent in microalgae, including the fatty acid biosynthesis termination mechanism by FAT homologs in C. merolae. The above computational analyses find support from the previous experimental reports on the algal lipid metabolism.75
Figure 1

Schematic overview of Triacylglyceride (TAG) biosynthetic pathway in microalgae.

Notes: Free fatty acids and TAG are synthesised in the chloroplast and endoplasmic reticulum respectively. The vital enzymes reported by various experimental studies to be involved in accelerated lipid accumulation are marked with an asterisk.

Abbreviations: ACC, Acetyl-CoA carboxylase; MAT, Malonyl-CoA-ACP transacylase; KAS, 3-ketoacyl-ACP synthase; KAR, 3-ketoacyl-ACP reductase; HAD, 3-hydroxyacyl-ACP dehydratases; EAR, Enoyl-ACP reductase; FAT, Fatty acid thioesterase; G3PDH, Glycerol-3-phosphate dehydrogenase; GPAT, Glycerol-3-phosphate acyltransferase; AGPAT, 1-acylglycerol-3-phosphate acyltransferase also known as LPAT, lysophosphatidic acid acyl transferase; PP, Phosphatidate phosphatase; DGAT, Diacylglycerol acyltransferase.

Furthermore, our results conclusively indicate that enzymes that are responsible for higher lipid accumulation in plants and other eukaryotes, either through over-expression or gene knockout strategies, are present not only in oleaginous algal species (C. reinhardtii) but also in other algal species, notably O. tauri and C. merolae (Fig. 2). Comparison of the number of genes in each step of lipid metabolic pathway suggests that the green algae C. reinhardtii and V. carteri have an expanded array of genes involved in TAG biosynthesis and catabolism, including fatty acid thioesterase, long chain acyl-CoA synthase, acyl-CoA oxidase, desaturase, glycerol-3-phosphate acyltransferase, and diacylglycerol acyltransferase. Additionally, the proportion of these gene copy numbers appear to be correlated with the genome complexity of the organisms under study (Fig. 2).
Figure 2

Number of gene homologues in the TAG biosynthetic pathway in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.

Notes: For each reaction, coloured squares denotes the number of homologous genes in A. thaliana (blue), C. reinhardtii (yellow), V. carteri (pink), O. lucimarinus (green), O. tauri (purple) and C. merolae (light blue).

Prediction of subcellular location

The prediction of subcellular localization of proteins is essential to elucidate the spatial organization of proteins according to their function and to refine our knowledge of cellular metabolism.77 Thus, prediction of subcellular location provides valuable information about the function of proteins as well as the interconnectivity of biological processes.78 In the present study, subcellular location of lipid biosynthetic proteins by tools such as TargetP, ChloroP and WolfPsort showed different locations using several unique algorithms. The objective of using more than one analytical tool was to improve the specificity of the prediction, as various studies have shown that combined results from several prediction programs are advantageous to rule out false positives and false negatives.78 The available localization prediction tools show different strengths and no tool is clearly and globally optimal.77 Moreover, it is known that some localizations are badly predicted by all the algorithms, especially in the case of proteins exhibiting dual targeting to plastids and mitochondria, which could be a phenomenon more common than previously thought.79 This analyses showed that majority of the predicted proteins are located in four compartments: plastids (31%), mitochondria (26%), cytoplasmic (28%) and nucleus (6%) (Fig. S1 and Table S1). The above results are consistent with the experimental observations that de novo synthesis of fatty acids occurs primarily in the plastid and/or mitochondria.5 About 19% of the proteins revealed the presence of both the mitochondrial target peptide and chloroplast transit peptide in the sequences. Recent reports have shown an unexpectedly high frequency of dual targeting of proteins to both the mitochondria and chloroplast, hence making it difficult to predict the correct location of these proteins within a cell.80,81 Furthermore, approximately 3% of the predicted proteins were located in more than one compartment ie, nucleus and cytoplasm, which were the same highly paired compartments as identified in Arabidopsis82 and sugarcane83 proteome, suggesting that there is a significant amount of interactions between these two organelles. Hyunjong et al84 have reported that targeting a particular enzyme to several compartments simultaneously in the same plant will augment its production when compared to its individual compartments in the same plant. Hence the predicted localization information would certainly aid in targeting the lipid biosynthetic enzymes to enhance oil accumulation in microalgae. Various physico-chemical parameters were computed using Expasy’s ProtParam tool (Fig. 3 and Table S2). Molecular weight was observed between the ranges of 1116.818–299171.0 for all lipid biosynthetic proteins in microalgae. The majority of the predicted proteins were found to have a pI greater than 7, indicating that proteins involved in lipid biosynthesis are generally basic in nature. However, the deduced sequences for genes such as acetyl-CoA carboxylase, acetyl-CoA acetyltransferase, glycerol kinase, ethanolamine kinase and phosphoethanolamine cytidyl transferase were determined to be acidic. These values of isoelectric point (overall charge) will be useful for developing a buffer system for purification of the enzymes by an isoelectric focusing method. Instability Index analyses reveals the presence of certain dipeptides occurring at significantly different frequencies between stable and unstable proteins. Proteins with an instability index less than 40 are predicted to be stable while those with a value greater than 40 are assumed to be unstable. In the present study the high occurrence frequency of unstable proteins may be explained in the context of the recent work of Cao,85 who observed such a phenomenon in many plants and microorganisms due to the possible inherent feedback mechanism that regulates the optimal level of accumulation of cellular metabolites. The aliphatic index refers to the relative volume of a protein that is occupied by aliphatic side chains (eg, alanine, isoleucine, leucine and valine) and contributes to the increased thermal stability observed for globular proteins. Aliphatic index for the screened proteins ranged from 70.24 to 119.16. The very high aliphatic index for all sequences indicated that their structures are more stable over a wide range of temperature. The GRAVY index indicates the solubility of the protein. The lipid biosynthetic proteins which showed large negative values indicated that these proteins are relatively more hydrophobic when compared to proteins with less negative values.
Figure 3

Distribution of various physico-chemical characteristics of putative proteins encoded by lipid genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.

Note: The individual physico-chemical values for each protein as calculated by ProtParam server is provided in Supplementary Table 2.

The secondary structure of the microalgal proteins involved in lipid metabolism were analyzed by submitting the amino acid sequence to the GOR IV program, which has been experimentally cross validated to have a mean accuracy of 64.4% for the three state prediction.32 The secondary structure indicates whether a given amino acid lies in a helix, strand or a coil. Secondary structure features of the proteins are represented in Table S3. The results revealed that random coil to be predominant followed by alpha helices and extended strands in the majority of sequences.

GC-content analyses

The variations in the guanine (G) and cytosine (C) content observed between species is one of the central issues in evolutionary bioinformatics. The average GC-content of the lipid biosynthetic genes, as calculated by the Genscan server, was 39.89%, 63.35%, 56.92%, 59.88%, 59.04% and 55.57% for A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus O. tauri and C. merolae respectively. The GC values lie close to the calculated GC-content of the whole genome of the respective organisms under study.86–89 However, a slightly higher GC-content for the gene sequences was observed in contrast to the background GC-content for the entire genome of all the studied species. Among the microalgae, the highest GC-content was observed in C. reinhardtii. The GC-content of C. reinhardtii is also experimentally reported to be higher than that of the multicellular organisms.90 Comparative analyses of the GC-content of the individual genes revealed minor variations among the microalgal genomes (Fig. 4 and Table S4). The above finding is in congruence with the earlier report stating that eukaryotic genomes vary less in their GC content.91 Furthermore, GC-content analyses indicated that the genes with high GC-content were also identified to be stable by ProtParam server as compared to genes having low GC-content. This may apparently be due to the fact that GC pair is bound by 3 hydrogen bonds (H-bonds), compared to 2 H-bonds in AT, thus contributing to the greater stability of the gene products. In addition, analyses of individual predicted genes in O. lucimarinus and O. tauri revealed more or less similar GC-content in both the subspecies.
Figure 4

Comparison of the GC-content of lipid biosynthetic genes among five unicellular algae and the vascular plant, A. thaliana.

Notes: Columns represent the average GC content of the genes (in percentage) of each organism: A. thaliana (blue), C. reinhardtii (red), V. carteri (green), O. lucimarinus (purple), O. tauri (blue) and C. merolae (orange) in a down to up order. The individual GC-content values of each gene as calculated by Genscan web server are given in Supplementary Table 4.

Motif and domain architecture

A motif is a sequence pattern found conserved in a group of related protein or gene sequences.34 An exhaustive search of the protein motifs using the MEME program identified 36 core conserved sequences in the lipid biosynthetic genes of microalgae predicted in the present study (Fig. 5). The overall height of each stack indicates the sequence conservation at that position, whereas the height of symbols within each stack reflects the relative frequency of the corresponding amino acid (Fig. 5). The sequence logos showed that majority of the predicted motifs are basically composed of hydrophobic and polar uncharged residues. It is likely that these conserved residues are critical for the catalytic activity of the enzymes and may be involved in substrate binding, direct catalysis, and maintenance of the protein structure. In addition to motif analyses, a detailed comparison of the domain architectures of the gene products at the whole genome level is given in Figure 5. Results indicate that the majority of domains observed in genes involved in lipid biosynthesis are present in all microalgal species under study. Therefore, the critical amino acid residues present in the conserved motif and domain of the lipid genes will certainly act as a framework for better understanding their structure-function relationship.
Figure 5

Conserved domain architectures and sequence logo plots of lipid biosynthetic genes using InterProscan and MEME programs, respectively.

Notes: The overall height of each stack indicated the sequence conservation at that position, whereas the height of symbols within each stack reflects the relative frequency of the corresponding amino acid. The amino acids are colour coded as: A, C, F, I, L, V, W and M (Blue-Most hydrophobic); N, Q, S and T (Green-Polar, non-charged and non-aliphatic residues); D and E (Magenta-Acidic); K and R (Red-Positively charge).

In order to gain insights into the evolution of the lipid biosynthetic genes, we analyzed exon-intron structure patterns of the predicted gene homologs (Table S5). The results revealed that the exon-intron spilt pattern of C. reinhardtii and V. carteri genes were homologous to that of Arabidopsis, although insertion, deletion and intron-size variations were common. Likewise, conservation with respect to exon-intron number and size were observed between O. lucimarinus and O. tauri. The C. merolae genome is remarkable for its paucity of introns88 and in our study we also could not detect its presence in any of the predicted genes. O. lucimarinus and O. tauri genes contained fewer introns as compared to C. reinhardtii, V. carteri and A. thaliana and our present results confirms the previous report that C. reinhardtii lipid biosynthetic genes contain a higher number of introns.92 A phylogenetic tree was constructed to evaluate the evolutionary relationship among the predicted genes (Fig. 6). The phylogenetic tree showed that in the majority of predicted genes with similar functions and sharing similar intron-exon structure, conserved motif patterns were clustered together in the tree because of their common ancestry and in accordance with our expectations. In most of the gene families, it was observed that the protein sequence of the two sub-species O. lucimarinus and O. tauri (Prasinophytes) were present as sister clades and that it falls within the green algal cluster comprising of C. reinhardtii, V. Carteri (Chlorophytes) and A. thaliana (Streptophytes). The Chlorophytes and Streptophytes lineages are a part of the green plant lineage (Viridiplantae).93 Further, the phylogenetic analyses suggest that protein homologs of C. merolae (Rhodophytes) seem to diverge from the root of the green lineage. Overall, we found that components of lipid biosynthetic pathway are remarkably well conserved, particularly among the Viridiplantae lineage.
Figure 6

(A) Phylogenetic tree inferred from the amino acid sequences of lipid genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae. Proteins with identical functional characterization are represented by similar colour coded diamond shapes. Protein accession numbers are represented while organism names to which proteins belong are given in Table 1. Some homologous proteins were omitted to increase clarity of the remaining groups. The tree indicates that proteins with similar functions were clustered together and further, in most of the gene families for instance in desaturase (B), the protein sequence of the two sub-species O. lucimarinus and O. tauri were present as sister clades and falls within the green algal cluster comprising of C. reinhardtii, V. Carteri and A. thaliana, while the protein homologs of C. merolae seem to diverge from the root of the green lineage.

Conclusion

Identification of genes responsible for oil accumulation is a pre-requisite to targeting microalgae for enhanced yields of biofuel precursors using metabolic engineering. A comprehensive computational analyses of the predicted genes of microalgae against Arabidopsis was performed through gene annotation, subcellular localization, physico-chemical characterization, exon-intron pattern, motif/domain organization and phylogenomics studies. The results revealed that although each of the algal species maintains the basic genomic repertoire required for lipid biosynthesis, they possess additional lineage-specific gene groups. Additionally, the extensive sequence and structure conservation of the putative genes indicates functional equivalence between microalgae and Arabidopsis. Phylogenetic analyses demonstrated that genes of lipid biosynthetic pathway from Prasinophytes, Chlorophytes, Streptophytes and Rhodophytes were clustered according to their conserved motif pattern, exon-intron structure and functional equivalence. The in-depth broad investigation of each individual gene and their encoded products across the microalgal genome will certainly facilitate metabolic engineering of microalga for biofuel production. Classification of microalgal lipid biosynthetic proteins on the basis of subcellular localization using TargetP, ChloroP and WolfPsort prediction tools. Subcellular localisation prediction of proteins encoded by lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O.lucimarinus, O. tauri and C. merolae, using TargetP, ChloroP and WolfPsort programs. Subcellular localisation prediction of proteins encoded by lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O.lucimarinus, O. tauri and C. merolae, using TargetP, ChloroP and WolfPsort programs. The calculated secondary structures of the proteins encoded by lipid biosynthetic genes, using GOR IV program. GC-content values of lipid biosynthetic genes as calculated by Genscan web server. Exon-intron coordinates of lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.
  83 in total

1.  ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.

Authors:  O Emanuelsson; H Nielsen; G von Heijne
Journal:  Protein Sci       Date:  1999-05       Impact factor: 6.725

2.  Dual targeting of xylanase to chloroplasts and peroxisomes as a means to increase protein accumulation in plant cells.

Authors:  Bae Hyunjong; Dae-Seok Lee; Inhwan Hwang
Journal:  J Exp Bot       Date:  2005-11-29       Impact factor: 6.992

Review 3.  Plants to power: bioenergy to fuel the future.

Authors:  Joshua S Yuan; Kelly H Tiller; Hani Al-Ahmad; Nathan R Stewart; C Neal Stewart
Journal:  Trends Plant Sci       Date:  2008-07-16       Impact factor: 18.313

Review 4.  Metabolic engineering of fatty acid biosynthesis in plants.

Authors:  Jay J Thelen; John B Ohlrogge
Journal:  Metab Eng       Date:  2002-01       Impact factor: 9.783

Review 5.  Lipid biosynthesis.

Authors:  J Ohlrogge; J Browse
Journal:  Plant Cell       Date:  1995-07       Impact factor: 11.277

6.  The TAG1 locus of Arabidopsis encodes for a diacylglycerol acyltransferase.

Authors: 
Journal:  Plant Physiol Biochem       Date:  1999-11       Impact factor: 4.270

7.  Thermostability and aliphatic index of globular proteins.

Authors:  A Ikai
Journal:  J Biochem       Date:  1980-12       Impact factor: 3.387

Review 8.  Biodiesel from microalgae.

Authors:  Yusuf Chisti
Journal:  Biotechnol Adv       Date:  2007-02-13       Impact factor: 14.227

Review 9.  Genome analysis and its significance in four unicellular algae, Cyanidioschyzon [corrected] merolae, Ostreococcus tauri, Chlamydomonas reinhardtii, and Thalassiosira pseudonana.

Authors:  Osami Misumi; Yamato Yoshida; Keiji Nishida; Takayuki Fujiwara; Takayuki Sakajiri; Syunsuke Hirooka; Yoshiki Nishimura; Tsuneyoshi Kuroiwa
Journal:  J Plant Res       Date:  2007-12-12       Impact factor: 2.629

10.  KAS IV: a 3-ketoacyl-ACP synthase from Cuphea sp. is a medium chain specific condensing enzyme.

Authors:  K Dehesh; P Edwards; J Fillatti; M Slabaugh; J Byrne
Journal:  Plant J       Date:  1998-08       Impact factor: 6.417

View more
  8 in total

Review 1.  Agrigenomics for microalgal biofuel production: an overview of various bioinformatics resources and recent studies to link OMICS to bioenergy and bioeconomy.

Authors:  Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal:  OMICS       Date:  2013-09-17

2.  Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

Authors:  Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal:  Mol Biol Rep       Date:  2014-10-04       Impact factor: 2.316

Review 3.  Lipid metabolism and potentials of biofuel and high added-value oil production in red algae.

Authors:  Naoki Sato; Takashi Moriyama; Natsumi Mori; Masakazu Toyoshima
Journal:  World J Microbiol Biotechnol       Date:  2017-03-16       Impact factor: 3.312

4.  Glycerolipid Characterization and Nutrient Deprivation-Associated Changes in the Green Picoalga Ostreococcus tauri.

Authors:  Charlotte Degraeve-Guilbault; Claire Bréhélin; Richard Haslam; Olga Sayanova; Glawdys Marie-Luce; Juliette Jouhet; Florence Corellou
Journal:  Plant Physiol       Date:  2017-02-24       Impact factor: 8.340

5.  Revisiting the Algal "Chloroplast Lipid Droplet": The Absence of an Entity That Is Unlikely to Exist.

Authors:  Takashi Moriyama; Masakazu Toyoshima; Masakazu Saito; Hajime Wada; Naoki Sato
Journal:  Plant Physiol       Date:  2017-10-23       Impact factor: 8.340

6.  dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

Authors:  Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida; Barada Kanta Mishra
Journal:  PLoS One       Date:  2016-01-04       Impact factor: 3.240

7.  Comparative transcriptome of wild type and selected strains of the microalgae Tisochrysis lutea provides insights into the genetic basis, lipid metabolism and the life cycle.

Authors:  Gregory Carrier; Matthieu Garnier; Loïc Le Cunff; Gaël Bougaran; Ian Probert; Colomban De Vargas; Erwan Corre; Jean-Paul Cadoret; Bruno Saint-Jean
Journal:  PLoS One       Date:  2014-01-29       Impact factor: 3.240

8.  Dynamism of Metabolic Carbon Flow of Starch and Lipids in Chlamydomonas debaryana.

Authors:  Naoki Sato; Masakazu Toyoshima
Journal:  Front Plant Sci       Date:  2021-03-30       Impact factor: 5.753

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.