Literature DB >> 23032611

Phylogenomic study of lipid genes involved in microalgal biofuel production-candidate gene mining and metabolic pathway analyses.

Namrata Misra¹, Prasanna Kumar Panda, Bikram Kumar Parida, Barada Kanta Mishra.

Abstract

Optimizing microalgal biofuel production using metabolic engineering tools requires an in-depth understanding of the structure-function relationship of genes involved in lipid biosynthetic pathway. In the present study, genome-wide identification and characterization of 398 putative genes involved in lipid biosynthesis in Arabidopsis thaliana Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae was undertaken on the basis of their conserved motif/domain organization and phylogenetic profile. The results indicated that the core lipid metabolic pathways in all the species are carried out by a comparable number of orthologous proteins. Although the fundamental gene organizations were observed to be invariantly conserved between microalgae and Arabidopsis genome, with increased order of genome complexity there seems to be an association with more number of genes involved in triacylglycerol (TAG) biosynthesis and catabolism. Further, phylogenomic analysis of the genes provided insights into the molecular evolution of lipid biosynthetic pathway in microalgae and confirm the close evolutionary proximity between the Streptophyte and Chlorophyte lineages. Together, these studies will improve our understanding of the global lipid metabolic pathway and contribute to the engineering of regulatory networks of algal strains for higher accumulation of oil.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: biofuel; bioinformatics; lipid biosynthetic genes; microalgae; phylogenomics

Year: 2012 PMID： 23032611 PMCID： PMC3460774 DOI： 10.4137/EBO.S10159

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

Growing levels of atmospheric pollution, mounting energy demand, and the incessant rise in crude oil prices are some of the issues which have in recent times driven global efforts in biofuel research. Currently, commercial-scale biofuels are sourced primarily from a variety bioenergy crops that include sugarcane (Saccharum officinarum), sugar beet (Beta vulgaris), switch grass (Panicum virgatum), soybean (Glycine max), canola (Brassica napus) and sunflower (Helianthus annus).1 Although the environmental benefits of biofuels as compared to fossil fuels are well established, concerns are being raised about their long-term sustainability, especially against the backdrop of diversion of arable land for biofuel-based cropping systems and their corresponding adverse impact on the global food supply chain.2 In consequence, algae-based biofuels are increasingly gaining the attention of researchers due to their rapid growth rate coupled with high carbon dioxide uptake, high lipid content and comparatively low, marginal land usage rates.3 Notwithstanding the many advantages of biofuels and their technical feasibility, the commercial viability of the algal biofuel process is still an area of concern requiring better strain development and improved post-harvest process engineering.4 The major challenge is to achieve accumulation of improved lipid profiles with concomitant reduction in energy inputs in order to minimize the cost of production.2 The enhancement of lipid production in microalgal cells under controlled stress conditions and engineering metabolic pathways are promising strategies to obtain large amounts of standard biofuel for industry. Despite positive experimental reports on enhanced microalgal lipid accumulation under physiological or nutritional stress regimes, many contrasting studies have indicated a concomitant reduction in overall biomass yield under such conditions.5 In this context, harnessing the potential of genome-scale metabolic engineering has been suggested as a promising area of research to boost oil production in microalgal strains, including modification of algal lipid profile for improved biofuel properties.6,7 Over the past few years various studies have been carried out concerning alteration of fatty acid composition in plants through genetic engineering approaches, along with the development and deployment of a number of plant lipid-related genomics databases.8–11 Comparative genomics analyses using bioinformatics tools have also been performed recently to identify genes involved in lipid biosynthesis in various oleaginous plants. For example, a total of 1003 maize lipid-related genes were cloned and annotated by Lin et al,12 while Sharma and Chauhan13 identified a total of 261 lipid genes from the genome of Arabidopsis, Brassica, soybean and castor. Complete or near complete genome sequences have been reported for several algae.6 Yet, lack of adequate knowledge regarding the structure-function of lipid biogenesis genes in an evolutionary context is a major impediment in engineering metabolic pathways of algae for over-production of fuel precursors.14 Various experimental techniques like insertional mutagenesis and targeted gene disruption have been employed to analyze gene function in a few algae. However, many of these approaches are tedious, time-consuming, fiscally prohibitive and limited by a number of biological constraints.15 As an alternative, phylogenomics is now increasingly used to gain insights into metabolic pathways at the molecular level by comparative genomics and co-evolutionary analyses of related gene.16 Therefore the present work was designed to identify the genes involved in lipid metabolic pathway from the genomes of microalgae (including Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae) using sequence similarity search with Arabidopsis thaliana homologs. In addition phylogenomics protocols have been employed to study the structure-function relationship of the encoded proteins and to gain much needed insights into their phylogenetic evolution. We hope that the present study contributes to the biochemical and molecular information needed for augmentation of lipid synthesis in microalgae.

Materials and Methods

Gene retrieval and annotation

An initial set of lipid genes was obtained from the Arabidopsis thaliana lipid gene database (http://www.plantbiology.msu.edu/lipids/genesurvey/index.html) to construct a query protein set. The Arabidopsis lipid gene database is a convenient and reliable source of genes covering all the major biochemical events responsible for biosynthesis and catabolism of plant lipids.17 Subsequently, each protein in the query dataset was used to identify homologs in microalgae by subjecting it to BLASTp18 search with e-value inclusion threshold set to 0.001 against microalgal genome databases provided by Joint Genome Institute. These include Cyanidioschyzon merolae http://merolae.biol.s.u-tokyo.ac.jp/), Chlamydomonas reinhardtii (http://genome.jgi-psf.org/chlamy/chlamy.info.html), Volvox carteri (http://www.phytozome.net/volvox.php), Ostreococcus lucimarinus (http://genome.jgi-psf.org/Ost9901_3/Ost9901_3.home.html), Ostreococcus tauri (http://genome.jgi-psf.org/Ostta4/Ostta4.home.html). Based on multiple alignments and/or the presence of conserved motif patterns, some initial sequences “hits” were then discarded. Functional descriptions of genes or gene products were performed by annotation of Cluster of Orthologous groups (COGs) using KOGnitor program,19 the latter being a widely used tool in the field of computational genomics for detecting candidate set of orthologs in prokaryotes and eukaryotes.19 In addition, assignment of Gene Ontology (GO) terms describing biological processes and molecular function was annotated by the GO browser and annotation tool AmiGO.20 The Gene Ontology is currently the pre-eminent approach for functional annotation of homologous genes and protein sequences in multiple organisms.20

Metabolic pathway study

Metabolic pathways were subsequently analyzed using the KEGG pathway database,21 an extensively employed biochemical pathway database to analyze lipid pathways in diverse organisms.22 To enrich the pathway annotation, sequences were submitted to the KEGG Automatic Annotation Server (KAAS) to identify the orthologous gene groups.23 KAAS annotates every submitted sequence with a KEGG ortholog (KO) identifier that allows identification of orthologous and paralogous relationships between the genes of interest. Further, a set of six reference pathway maps, namely fatty acid biosynthesis, fatty acid metabolism, fatty acid elongation, glycerolipid metabolism, glycerophospholipid metabolism and pathway map for biosynthesis of unsaturated fatty acids, were downloaded from the KEGG database. This dataset contains a complete biochemical description of the pathways related to the lipid metabolism observed in different organisms. They were used as templates for comprehensive examination of the lipid biosynthetic genomic repertoire of microalgae by correlating genes in the genome with gene products (enzymes), in accordance with their respective Enzyme Commission (EC) number.

Prediction of subcellular localization

Three different protein targeting prediction programs were used to determine the putative subcellular localization of the candidate proteins: TargetP,24 ChloroP25 and WolfPsort.26 Each program is based on different terminology and predictions. The location assignment of TargetP is based on the presence of any of the N-terminal presequences: chloroplast transit peptides (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP). The ChloroP server predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites. WolfPsort is an extension of the PSORT II program for protein subcellular localization prediction. It classifies protein into more than 10 location sites, including dual localization such as proteins which shuttle between the cytosol and nucleus. The sensitivity and specificity of this program has been experimentally verified to be 70%.

Physico-chemical characterization and secondary structure prediction

Physico-chemical properties like length, molecular weight, isoelectric point (pI), total number of positive and negative residues, Instability Index,27 Aliphatic Index28 and Grand Average hydropathy (GRAVY)29 were computed using the Expasy’s ProtParam server.30 GOR IV server31 was employed for the prediction of secondary structural features like alpha helices, extended strands and random coils in terms of percentage in the protein sequences.

Calculation of the GC content

The GC content of the predicted genes was determined using Genscan web server.32

Motif identification

Protein sequence motifs for each gene family were identified using the MEME program.33 The analyses parameters were set as follows: number of repetitions-zero or one per sequence; maximum number of motifs—1; minimum and maximum width—6 and 50, respectively. The motif profile for each gene family is presented schematically. Domain arrangements along sequences were predicted using InterProscan34 to determine protein homolog relationships among species.

Exon-intron structure and phylogenetic analyses

The exon-intron structural patterns of the lipid biosynthetic genes were analyzed using the gene prediction algorithm of Genscan.32 To construct the phylogenetic tree, amino acid sequences were aligned using the ClustalX program implemented in BioEdit35 (v 7.1.3) with default settings and then manually refined by trimming of poorly conserved N and C termini. ClustalX36 has been demonstrated to be a user-friendly tool for providing good, biologically accurate alignments within a reasonable time limit. Many options are provided such as the realignment of selected sequences or blocks of conserved residues and the possibility of building up difficult alignments, making ClustalX an ideal tool for working interactively on alignments.36 Subsequently, sequence alignment of genes predicted to be in similar families were used as an input file for the MEGA 4 software.37 Phylogenetic tree was built via the neighbor-joining (NJ) method with evaluation of 1000 rounds of bootstrapping test, followed by identification of sub-tree.

Results and Discussion

Comparative genomic analyses of lipid genes in microalgal species

Interest in microalgae as a potential feedstock for biofuel production and other valuable biomaterials is rooted in the ability of microalgae to rapidly accumulate significant amounts of neutral lipids.38 Under optimal conditions, microalgae synthesize fatty acids used primarily for esterification into polar glycerol-based membrane lipids like glycosylglycerides and phosphoglycerides, whereas under stress conditions, many microalgae tend to accumulate storage lipids called triacylglycerol (TAGs).16 Although global fatty acid biosynthetic mechanisms are known in higher plants,39 pathways responsible for lipid accumulation in microalgae are not well studied. Hence, in order to bridge our existing knowledge gap regarding algal lipid metabolism, comparative metabolic pathway analyses have been performed across five microalgal genomes, using homologous plant genes as reference with an objective of functional characterization of predicted genes. EC numbers, Cluster of Orthologous Groups (COGs), protein domain family and GO terms were determined for the respective candidate genes. The above in silico approach has been reviewed recently to be reliable enough for accurate function prediction of uncharacterized proteins encoded by genes in a genome.40 In the present study, using the Arabidopsis annotation data as the BLAST input query set, a total of 398 orthologous genes present in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae genomes were identified. The above approach to identify candidate genes involved in biosynthesis and accumulation of storage oil has been successfully demonstrated in plants by Sharma and Chauhan.13 These 398 genes clustered into 40 gene families and includes 142, 56, 59, 47, 41 and 53 genes from A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae genomes, respectively (Table 1). The identified genes are involved in the synthesis of phospholipids, glycerolipid and storage lipids like TAG. We further divided the predicted genes into categories like gene-coding enzymes involved in biosynthesis and catabolism of fatty acid, TAG and membrane lipid. The comprehensive list of candidate genes along with experimental evidence of the respective enzyme action influencing lipid accumulation is presented in Table 1.41–74 Approximately 47% of the predicted gene products found in the present study were previously annotated as ‘predicted’, ‘probable’, ‘putative uncharacterized’ and ‘similar’ or ‘hypothetical’ proteins (Table 1). The annotation of these sequences has been improved and a role in lipid biosynthetic process was assigned to each of them by similarity search with homologous plant genes, annotation of Gene Ontology, and through identification of conserved domains or motifs. Furthermore, on comparison to the previous report on lipid gene identification in C. merolae genome by Sato and Moriyama,75 the present study has identified 20 additional genes involved in lipid biosynthesis.

Table I

Candidate genes involved in lipid biosynthetic pathway of Arabidopsis thaliana, Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae genome.

Gene/symbol	EC no.	KOG no.	KEGG ID	Gene ontology	Corresponding homologous enzymes in algal species (SwissProt accession ID)					JGI protein ID	Ref**

					A. thaliana	C. reinhardtii	V. carteri	O. lucimarinus	O. tauri	C. merolae
Fatty acid biosynthesis
Homomeric acetyl-CoA carboxylase (ACC)	6.4.1.2	KOG0368	K11262	GO:0004075	Q9C8G0, Q38970		D8UA31*	A4RRC3, A4S479¶	Q01GA9, Q00ZG8¶	CMM188C	41–46
Heteromeric ACC biotin carboxylase subunit (BCC)	6.4.1.2/6.3.4.14	KOG0238	K01961	GO:0004075GO:0003989	O04983, F4JYE1, F4JYE0	A8JGF4, A8JEW0	D8UF54	A4S140¶,†	Q013U7¶	CMS299C
ACC carboxyl-transferase α-subunit (ACCCT α)	6.4.1.2	KOG0238	K01962	GO:0003989	Q9LD43	A8J646	D8TNY0			CMV056C¶
ACC CT β subunit (ACCCT β)	6.4.1.2	KOG0540	K01963	GO:0003989	P56765	A8JHU1	D8U455¶,*			CMV207C¶
ACC biotin carboxyl carrier protein (ACC-BCCP)	6.4.1.2	KOG0540	K02160	GO:0003989	Q42533, F4KE21, Q9LLC1	A8JDA7	D8U256*			CMV134C¶
Malonyl-CoA-ACP transacylase (MCT)	2.3.1.39	KOG2926	K00645	GO:0004314	Q8RU07, Q8L5U2*,¶, F4IMR0	A8HP61	D8TTQ7¶	A4S2U9†, A4SAC5†	Q011G6•, Q00S12	CMT420C
β-ketoacyl-ACP synthase I (KAS I)	2.3.1.41	KOG1394	K00647	GO:0004315	P52410, F4KHF4	A8JEF7	D8UDW0, D8TXC7	A4RSM2, A4S713†	Q01EI4	CMM286C	47,48
β-ketoacyl-ACP synthase II (KAS II)	2.3.1.179	KOG1394	K09458	GO:0033817	Q9C9P4, Q8L3X9	A8JCK1, A8IG50	D8TJC9	A4S7B9, A4RTJ7†	Q00V56, Q01DP0*	CML329C
β-ketoacyl-ACP synthase III (KAS III)	2.3.1.180	KOG1394	K00648	GO:0033818	P49243, B9DHF9¶	A8JHL7†	D8TXF1*	A4S7P4†	Q00V15	CMD118C
β-ketoacyl-ACP reductase (KAR)	1.1.1.100	KOG1200	K00059	GO:0004316	P33207, Q9SQR4, Q9SQR2	A8JBX4, Q84X75	D8TK78, D8TV99	A4RQY6†	Q01GL3	CMS393C¶
3-hydroxyacyl-ACP dehydrase (HAD)	4.2.1.-	–	K02372	GO:0008659	Q9LX13, Q9SIE3Q8LBU6*,¶	A8IX17†	D8TV61*	A4RUS8†		CMI240C¶
Enoyl-ACP reductase (EAR)	1.3.1.9	KOG0725	K00208	GO:0004318	Q9SLA8, Q9M672, O04942, Q9FEF2	A8JFI7†	D8UC03*	A4S0L7†	Q014N2	CMT381C
Acyl-ACP thioesterase/Fatty acid thioesterase (FAT)	3.1.2.14	–	K10782	GO:0000036	Q42561, Q9SV64, Q9SJE2, Q42562, Q42558, Q41917	A8HY17*	D8TJT0*,¶	A4RS92†	Q01FC4•		49–53
Fatty acid elongation
3-hydroxyacyl-CoA dehydrogenase (CHAD)	1.1.1.35	KOG2304	K00074	GO:0008691GO:000385	Q9LDF5¶	A8IVP3†,¶	D8UMK6*,¶	A4RUY4†	Q01C53*,¶	CMC137C¶
Enoyl-CoA hydratase (ECH)	4.2.1.17	KOG1680KOG1679	K01692	GO:0004300	Q6NL24, O23468¶, Q0WRQ2¶, Q9T0K7	A8I9B0†,¶	D8TRG5*	A4SBD9†	Q010Z7	CMK139CCMT074C¶
Enoyl-CoA reductase (TER)	1.3.1.38	KOG1639	K10258	GO:0019166	Q8LCU7•, F4J6R6*, Q9M2U2	A8HM32†, A8JAQ9†	D8THB1, D8U5N0,¶	A4RUU7†, A4RU17†	Q01D21	CMD146CΔ
Fatty acid catabolism
Long chain acyl-CoA synthase (LACS)	6.2.1.3	KOG1256	K01897	GO:0004467	Q9T0A0, Q9T009, Q8LPS1, Q8LKS5, Q9SJD4, Q9CAP8Q9C7W4, Q9XIA9O22898	A8JH58†, A8HRV2†,¶	D8TMY5, D8TKU, D8TP15, D8TNJ2, D8TS64*	A4RWX1†, A4S5G5†	Q00Y52*, Q00UP7	CML197C•, CME186C	54
Acyl-CoA oxidase (AOX)	1.3.3.6	KOG0135	K00232	GO:0003997	O65201, F4KG18, O65202, F4JMK8, Q96329, Q9ZQP2, Q9LMI7, P0CZ23	A8ISE5†, A8JGC8†, A8JB97†	D8U3F9, D8TVM2, D8U064, D8U3J5	A4RR33†	Q01GH2	CMK115C	55
Acyl-CoA dehydrogenase (ACADM)	1.3.99.3	KOG0139	K00249	GO:0003995	Q8RWZ3, Q0WM98¶, Q67ZU5¶, Q9M7Y7¶	A8J3M3¶	D8U2A4*	A4RQF1†	Q01H50*,¶	CML080C
Enoyl-CoA hydratase (ECH)	4.2.1.17	KOG1680KOG1679	K01692	GO:0004300	Q6NL24, O23468¶, Q0WRQ2¶, Q9T0K7	A8I9B0†,¶	D8TRG5*	A4SBD9†, A4S307†	Q010Z7	CMK139CCMT074C¶
3-hydroxyacyl-CoA dehydrogenase (CHAD)	1.1.1.35	KOG2304	K00074	GO:0008691GO:0003857	Q9LDF5¶, Q9ZPI5, Q9ZPI6	A8IVP3†,¶	D8UMK6*,¶	A4RUY4†	Q01C53*,¶	CMC137C¶
Acetyl-CoA acetyl-transferase (THIL)	2.3.1.9	KOG1390	K00626	GO:0003985	Q8S4Y1, Q9FIK7, F4JYM8*, B9DGQ1, Q3E8F0	A8J0X4†	D8UKX0, D8TZN7			CMA042CCME087C•
Fatty acid desaturation
Δ⁹ acyl-aCP desaturase (Δ⁹D)	1.14.19.1	KOG1600	K00507	GO:0004768	Q9SID2, O65797, Q9FPD5, Q9LM13, Q9LM14, Q9LND8, Q9LND9, Q949X0¶, Q9LVZ3	A8J015, A8JEN2, C6ZE81¶, A8IQB8•	D8U961, D8TRE9*,¶	A4S9D8†	Q00T63	CMJ201CCMM045C	56–59
Δ¹² acyl-aCP desaturase (Δ¹²D)	1.14.19.−	KOG:TWOG0155	K10256K10255	GO:0045485	P46313, P46312, Q8LFZ8¶, Q19MZ0	A8IR24, O48663	D8UB74, D8TTW0	A4RWB5†	Q01DF5	CMK291C¶
Triacylglycerol (TAG) biosynthesis and catabolism
Glycerol kinase (GK)	2.7.1.30	KOG2517	K00864	GO:0004370	F4HS76, Q9M8L4, A0JPS9¶, C0Z2P8	A8IT31†	D8TXT9*	A4RTW5†	Q01D72*,¶	CMJ173C
Glycerol-3-phosphate dehydrogenase (G3PDH)	1.1.5.3	KOG0042	K00111	GO:0004368	Q9SS48	A8HTE5†	D8TSE3*	A4RU40	Q01CZ8	CML209C	60
Glycerol-3-phosphate acyltransferase (GPAT)	2.3.1.15	KOG2898	K00631K00630	GO:0004366	Q43307, Q9LHS7, Q8GWG0, Q9SYJ2, Q9LMM0, Q9FZ22, Q9SHJ5, Q0WPD4, O80437, Q9CAY3	A8J0R2, A8HVM5†	D8TVT7, D8TIB3	A4RT23•, A4S945†	Q01F77	CMK217CΔ,¶CMJ027CCMA017C•	61,62
1-acylglycerol-3-phosphate acyltransferase/Lysophosphatidi Acid acyl-transferase (AGPAT/LPAT)	2.3.1.51	KOG1505	K00655K13519	GO:0003841	Q8GXU8, Q8LG50, Q9SYC8, Q8L4Y2•, Q9LHN4•	A8J0J0	D8U1V6, D8TWQ3	A4S0H0†	Q014T8*, Q00SS2†	CME109C•,¶CMF185CCMJ021C•,¶	63,64
Phosphatidate phosphatase (PP)	3.1.3.4	KOG3030	K01080	GO:0008195	Q9ZU49¶, Q3EC91¶ Q8LFD1, A8MR10*, F4IX65¶, Q9XI60¶, Q9LJQ8¶	A8JGB5†,¶	D8U3B0*	A4RU93†,¶	Q01CT9*,¶	CMR054CΔ,¶CMR488CΔ,¶
Diacylglycerol Acyltransferase (DGAT)	2.3.1.20	KOG0831KOG0380	K00635K11155	GO:0004144	Q9SLD2, Q9ASU1¶, Q93ZR6¶	A8IXB2¶	D8UGA9, D8UHL	A4S872†¶	Q00UG1*,¶	CMJ162CΔ,¶CMQ199C¶CME100C	65–74
Triacylglycerol lipase (TAGL)	3.1.1.3	KOG4569	K01046	GO:0004806	Q9LZA6, Q9M1I6, F4JY30¶	D5LAZ6¶, D5LAW3¶, A8HYG2¶,†	D8TT81, D8U4S5,¶	A4RQN3†,¶, A4S9E4†, A4RZ46†	Q00T58†, Q016Q6†	CMS254C¶CMT151CΔ,¶
Membrane lipid biosynthesis
Ethanolamine phospho-transferase (EPT1)	2.7.8.1	KOG2877	K00993	GO:0004307	O82567, F4HQU9	Q6U9W9	D8TWP7*	A4S097†,¶	Q01BV3¶	CMF133C¶
CDP-Diacylglycerol synthase (CDS1)	2.7.7.41	KOG1440	K00981	GO:0004605	Q1PE48, F4JL60, O49639, F4JL62, O04928	A8ILG5, A8IRM0A8IRL9	D8TPH2, D8TK01	A4RZR8, A4RWB0	Q01AN2, Q015S5	CMM311C¶CMN215CCMS056C
Phosphatidyl glycerol lphosphate synthase (PGP3)	2.7.8.5	KOG1617	K00995	GO:0008444	O80952, Q67ZP8*,¶ Q9M2W3	A8JEJ8	D8U650, D8UDS7,¶	A4S5X3†	Q00W48	CMN196CCMJ134C¶
Ethanolamine kinase (EKT1)	2.7.1.82	KOG4720	K00894	GO:0004305	O81024, Q8LAQ2	A8J2J5	D8TJH5	A4S0V5†	Q014D1*,¶	CMR011C•,¶
CTP: phospho-ethanolamine cytidyl transferase (ECT)	2.7.7.14	KOG2803	K00967	GO:0004306	Q9ZVI9	Q84JV7	D8TWX6*	A4S2P2†	Q011M7	CMS052C
UDPsulfoquinovose synthase (SQD)	3.13.1.12.4.1.-	KOG1371KOG1111	K06118K06119	GO:0046507GO:0046510	O48917, Q8S4F6	Q763T6, A8JB95A8HMC2	D8U760, D8U5J8	A4S476, A4S792†	Q00ZH1, Q00V96	CMR012CCMR015C
Monogalactosyl diacylglycerol synthase (MGDGS)	2.4.1.46		K03715	GO:0046509	Q9SI93, O81770	A8HUF1†	D8TQW6*	A4RT08		CMI271C
Digalactosyl diacylglycerol synthase (DGDGS)	2.4.1.241		K09480	GO:0035250	Q9S7D1, Q8W1S1	A8HU66	D8TQZ2*,¶	A4S4N5†,¶, A4S0F1¶,†	Q00Z06, Q014V9¶,†
Inositol phospho-transferase (PIS)	2.7.8.11	KOG3240	K00999	GO:0003881	Q8LBA6, Q8GUK6, F4JTR2*	A8ICX2	D8TPK4	A4SAF2†	Q00RY0	CMM125C

Notes:

Putative uncharacterized proteins;

predicted proteins;

probable proteins;

similar protein;

absent in KEGG pathway database;

relevant references on experimental evidences of the respective enzyme action influencing lipid accumulation.

To investigate metabolic processes responsible for the synthesis of microalgal biofuel precursors, KO identifiers were assigned to the predicted 398 genes representing 36 unique EC numbers, which were subsequently used to study metabolic pathway maps available in KEGG pathway database. KEGG is considered one of the most important bioinformatics resources for understanding higher-order functional meaning and the utilities of the organism from its genome information. It hosts information on the majority of well-known metabolic pathways, including lipid pathways for several organisms such as higher plants, bacteria and algae. Recently, it has been used successfully by Rismani-Yazdi et al14 to identify pathways and the underlying gene responsible for production of biofuel precursors in Dunaliella tertiolecta, a potential microalgal biofuel feedstock. Using the above approach, a total of 79 lipid genes including 22 from A. thaliana, 21 from C. merolae, 10 from C. reinhardtii, 10 from O. lucimarinus and 8 each from V. carteri and O. tauri were recognized that were not earlier indexed in KEGG metabolic pathway database (Table 1). The global synthesis pathway of TAG begins with the basic fatty acid precursors, acetyl-CoA, and continues through fatty acid biosynthesis, complex lipid assembly and saturated fatty acid modification until TAG bodies are finally formed.76 A simplified overview of TAG biosynthetic pathway in microalgae is shown as Figure 1. Comparative analyses with the genomes of C. reinhardtii, V. carteri, O. lucimarinus, O. tauri, C. merolae and A. thaliana indicates that the majority of genes involved in lipid production are orthologous among these species. Additionally, the extensive amino acid sequence conservation (more than 60% pair-wise sequence identity) among the genes involved in lipid biosynthesis provides indications of functional equivalence between Arabidopsis and microalgal genes. Thus, the present results demonstrate that the underlying fatty acid and TAG biosynthesis process are directly analogous to those reported in higher plants.16 It may further be noted that although algae predominantly share similar lipid biosynthetic pathways with higher plants, the present in silico analyses revealed that the sizes of the gene families responsible for lipid biosynthesis in microalgae are smaller than Arabidopsis. Certain specific pathways were also observed to be absent in microalgae, including the fatty acid biosynthesis termination mechanism by FAT homologs in C. merolae. The above computational analyses find support from the previous experimental reports on the algal lipid metabolism.75

Figure 1

Schematic overview of Triacylglyceride (TAG) biosynthetic pathway in microalgae.

Notes: Free fatty acids and TAG are synthesised in the chloroplast and endoplasmic reticulum respectively. The vital enzymes reported by various experimental studies to be involved in accelerated lipid accumulation are marked with an asterisk.

Abbreviations: ACC, Acetyl-CoA carboxylase; MAT, Malonyl-CoA-ACP transacylase; KAS, 3-ketoacyl-ACP synthase; KAR, 3-ketoacyl-ACP reductase; HAD, 3-hydroxyacyl-ACP dehydratases; EAR, Enoyl-ACP reductase; FAT, Fatty acid thioesterase; G3PDH, Glycerol-3-phosphate dehydrogenase; GPAT, Glycerol-3-phosphate acyltransferase; AGPAT, 1-acylglycerol-3-phosphate acyltransferase also known as LPAT, lysophosphatidic acid acyl transferase; PP, Phosphatidate phosphatase; DGAT, Diacylglycerol acyltransferase.

Furthermore, our results conclusively indicate that enzymes that are responsible for higher lipid accumulation in plants and other eukaryotes, either through over-expression or gene knockout strategies, are present not only in oleaginous algal species (C. reinhardtii) but also in other algal species, notably O. tauri and C. merolae (Fig. 2). Comparison of the number of genes in each step of lipid metabolic pathway suggests that the green algae C. reinhardtii and V. carteri have an expanded array of genes involved in TAG biosynthesis and catabolism, including fatty acid thioesterase, long chain acyl-CoA synthase, acyl-CoA oxidase, desaturase, glycerol-3-phosphate acyltransferase, and diacylglycerol acyltransferase. Additionally, the proportion of these gene copy numbers appear to be correlated with the genome complexity of the organisms under study (Fig. 2).

Figure 2

Number of gene homologues in the TAG biosynthetic pathway in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.

Notes: For each reaction, coloured squares denotes the number of homologous genes in A. thaliana (blue), C. reinhardtii (yellow), V. carteri (pink), O. lucimarinus (green), O. tauri (purple) and C. merolae (light blue).

Prediction of subcellular location

The prediction of subcellular localization of proteins is essential to elucidate the spatial organization of proteins according to their function and to refine our knowledge of cellular metabolism.77 Thus, prediction of subcellular location provides valuable information about the function of proteins as well as the interconnectivity of biological processes.78 In the present study, subcellular location of lipid biosynthetic proteins by tools such as TargetP, ChloroP and WolfPsort showed different locations using several unique algorithms. The objective of using more than one analytical tool was to improve the specificity of the prediction, as various studies have shown that combined results from several prediction programs are advantageous to rule out false positives and false negatives.78 The available localization prediction tools show different strengths and no tool is clearly and globally optimal.77 Moreover, it is known that some localizations are badly predicted by all the algorithms, especially in the case of proteins exhibiting dual targeting to plastids and mitochondria, which could be a phenomenon more common than previously thought.79 This analyses showed that majority of the predicted proteins are located in four compartments: plastids (31%), mitochondria (26%), cytoplasmic (28%) and nucleus (6%) (Fig. S1 and Table S1). The above results are consistent with the experimental observations that de novo synthesis of fatty acids occurs primarily in the plastid and/or mitochondria.5 About 19% of the proteins revealed the presence of both the mitochondrial target peptide and chloroplast transit peptide in the sequences. Recent reports have shown an unexpectedly high frequency of dual targeting of proteins to both the mitochondria and chloroplast, hence making it difficult to predict the correct location of these proteins within a cell.80,81 Furthermore, approximately 3% of the predicted proteins were located in more than one compartment ie, nucleus and cytoplasm, which were the same highly paired compartments as identified in Arabidopsis82 and sugarcane83 proteome, suggesting that there is a significant amount of interactions between these two organelles. Hyunjong et al84 have reported that targeting a particular enzyme to several compartments simultaneously in the same plant will augment its production when compared to its individual compartments in the same plant. Hence the predicted localization information would certainly aid in targeting the lipid biosynthetic enzymes to enhance oil accumulation in microalgae. Various physico-chemical parameters were computed using Expasy’s ProtParam tool (Fig. 3 and Table S2). Molecular weight was observed between the ranges of 1116.818–299171.0 for all lipid biosynthetic proteins in microalgae. The majority of the predicted proteins were found to have a pI greater than 7, indicating that proteins involved in lipid biosynthesis are generally basic in nature. However, the deduced sequences for genes such as acetyl-CoA carboxylase, acetyl-CoA acetyltransferase, glycerol kinase, ethanolamine kinase and phosphoethanolamine cytidyl transferase were determined to be acidic. These values of isoelectric point (overall charge) will be useful for developing a buffer system for purification of the enzymes by an isoelectric focusing method. Instability Index analyses reveals the presence of certain dipeptides occurring at significantly different frequencies between stable and unstable proteins. Proteins with an instability index less than 40 are predicted to be stable while those with a value greater than 40 are assumed to be unstable. In the present study the high occurrence frequency of unstable proteins may be explained in the context of the recent work of Cao,85 who observed such a phenomenon in many plants and microorganisms due to the possible inherent feedback mechanism that regulates the optimal level of accumulation of cellular metabolites. The aliphatic index refers to the relative volume of a protein that is occupied by aliphatic side chains (eg, alanine, isoleucine, leucine and valine) and contributes to the increased thermal stability observed for globular proteins. Aliphatic index for the screened proteins ranged from 70.24 to 119.16. The very high aliphatic index for all sequences indicated that their structures are more stable over a wide range of temperature. The GRAVY index indicates the solubility of the protein. The lipid biosynthetic proteins which showed large negative values indicated that these proteins are relatively more hydrophobic when compared to proteins with less negative values.

Figure 3

Distribution of various physico-chemical characteristics of putative proteins encoded by lipid genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.

Note: The individual physico-chemical values for each protein as calculated by ProtParam server is provided in Supplementary Table 2.

The secondary structure of the microalgal proteins involved in lipid metabolism were analyzed by submitting the amino acid sequence to the GOR IV program, which has been experimentally cross validated to have a mean accuracy of 64.4% for the three state prediction.32 The secondary structure indicates whether a given amino acid lies in a helix, strand or a coil. Secondary structure features of the proteins are represented in Table S3. The results revealed that random coil to be predominant followed by alpha helices and extended strands in the majority of sequences.

GC-content analyses

The variations in the guanine (G) and cytosine (C) content observed between species is one of the central issues in evolutionary bioinformatics. The average GC-content of the lipid biosynthetic genes, as calculated by the Genscan server, was 39.89%, 63.35%, 56.92%, 59.88%, 59.04% and 55.57% for A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus O. tauri and C. merolae respectively. The GC values lie close to the calculated GC-content of the whole genome of the respective organisms under study.86–89 However, a slightly higher GC-content for the gene sequences was observed in contrast to the background GC-content for the entire genome of all the studied species. Among the microalgae, the highest GC-content was observed in C. reinhardtii. The GC-content of C. reinhardtii is also experimentally reported to be higher than that of the multicellular organisms.90 Comparative analyses of the GC-content of the individual genes revealed minor variations among the microalgal genomes (Fig. 4 and Table S4). The above finding is in congruence with the earlier report stating that eukaryotic genomes vary less in their GC content.91 Furthermore, GC-content analyses indicated that the genes with high GC-content were also identified to be stable by ProtParam server as compared to genes having low GC-content. This may apparently be due to the fact that GC pair is bound by 3 hydrogen bonds (H-bonds), compared to 2 H-bonds in AT, thus contributing to the greater stability of the gene products. In addition, analyses of individual predicted genes in O. lucimarinus and O. tauri revealed more or less similar GC-content in both the subspecies.

Figure 4

Comparison of the GC-content of lipid biosynthetic genes among five unicellular algae and the vascular plant, A. thaliana.

Notes: Columns represent the average GC content of the genes (in percentage) of each organism: A. thaliana (blue), C. reinhardtii (red), V. carteri (green), O. lucimarinus (purple), O. tauri (blue) and C. merolae (orange) in a down to up order. The individual GC-content values of each gene as calculated by Genscan web server are given in Supplementary Table 4.

Motif and domain architecture

A motif is a sequence pattern found conserved in a group of related protein or gene sequences.34 An exhaustive search of the protein motifs using the MEME program identified 36 core conserved sequences in the lipid biosynthetic genes of microalgae predicted in the present study (Fig. 5). The overall height of each stack indicates the sequence conservation at that position, whereas the height of symbols within each stack reflects the relative frequency of the corresponding amino acid (Fig. 5). The sequence logos showed that majority of the predicted motifs are basically composed of hydrophobic and polar uncharged residues. It is likely that these conserved residues are critical for the catalytic activity of the enzymes and may be involved in substrate binding, direct catalysis, and maintenance of the protein structure. In addition to motif analyses, a detailed comparison of the domain architectures of the gene products at the whole genome level is given in Figure 5. Results indicate that the majority of domains observed in genes involved in lipid biosynthesis are present in all microalgal species under study. Therefore, the critical amino acid residues present in the conserved motif and domain of the lipid genes will certainly act as a framework for better understanding their structure-function relationship.

Figure 5

Conserved domain architectures and sequence logo plots of lipid biosynthetic genes using InterProscan and MEME programs, respectively.

Notes: The overall height of each stack indicated the sequence conservation at that position, whereas the height of symbols within each stack reflects the relative frequency of the corresponding amino acid. The amino acids are colour coded as: A, C, F, I, L, V, W and M (Blue-Most hydrophobic); N, Q, S and T (Green-Polar, non-charged and non-aliphatic residues); D and E (Magenta-Acidic); K and R (Red-Positively charge).

In order to gain insights into the evolution of the lipid biosynthetic genes, we analyzed exon-intron structure patterns of the predicted gene homologs (Table S5). The results revealed that the exon-intron spilt pattern of C. reinhardtii and V. carteri genes were homologous to that of Arabidopsis, although insertion, deletion and intron-size variations were common. Likewise, conservation with respect to exon-intron number and size were observed between O. lucimarinus and O. tauri. The C. merolae genome is remarkable for its paucity of introns88 and in our study we also could not detect its presence in any of the predicted genes. O. lucimarinus and O. tauri genes contained fewer introns as compared to C. reinhardtii, V. carteri and A. thaliana and our present results confirms the previous report that C. reinhardtii lipid biosynthetic genes contain a higher number of introns.92 A phylogenetic tree was constructed to evaluate the evolutionary relationship among the predicted genes (Fig. 6). The phylogenetic tree showed that in the majority of predicted genes with similar functions and sharing similar intron-exon structure, conserved motif patterns were clustered together in the tree because of their common ancestry and in accordance with our expectations. In most of the gene families, it was observed that the protein sequence of the two sub-species O. lucimarinus and O. tauri (Prasinophytes) were present as sister clades and that it falls within the green algal cluster comprising of C. reinhardtii, V. Carteri (Chlorophytes) and A. thaliana (Streptophytes). The Chlorophytes and Streptophytes lineages are a part of the green plant lineage (Viridiplantae).93 Further, the phylogenetic analyses suggest that protein homologs of C. merolae (Rhodophytes) seem to diverge from the root of the green lineage. Overall, we found that components of lipid biosynthetic pathway are remarkably well conserved, particularly among the Viridiplantae lineage.

Figure 6

(A) Phylogenetic tree inferred from the amino acid sequences of lipid genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae. Proteins with identical functional characterization are represented by similar colour coded diamond shapes. Protein accession numbers are represented while organism names to which proteins belong are given in Table 1. Some homologous proteins were omitted to increase clarity of the remaining groups. The tree indicates that proteins with similar functions were clustered together and further, in most of the gene families for instance in desaturase (B), the protein sequence of the two sub-species O. lucimarinus and O. tauri were present as sister clades and falls within the green algal cluster comprising of C. reinhardtii, V. Carteri and A. thaliana, while the protein homologs of C. merolae seem to diverge from the root of the green lineage.

Conclusion

Identification of genes responsible for oil accumulation is a pre-requisite to targeting microalgae for enhanced yields of biofuel precursors using metabolic engineering. A comprehensive computational analyses of the predicted genes of microalgae against Arabidopsis was performed through gene annotation, subcellular localization, physico-chemical characterization, exon-intron pattern, motif/domain organization and phylogenomics studies. The results revealed that although each of the algal species maintains the basic genomic repertoire required for lipid biosynthesis, they possess additional lineage-specific gene groups. Additionally, the extensive sequence and structure conservation of the putative genes indicates functional equivalence between microalgae and Arabidopsis. Phylogenetic analyses demonstrated that genes of lipid biosynthetic pathway from Prasinophytes, Chlorophytes, Streptophytes and Rhodophytes were clustered according to their conserved motif pattern, exon-intron structure and functional equivalence. The in-depth broad investigation of each individual gene and their encoded products across the microalgal genome will certainly facilitate metabolic engineering of microalga for biofuel production. Classification of microalgal lipid biosynthetic proteins on the basis of subcellular localization using TargetP, ChloroP and WolfPsort prediction tools. Subcellular localisation prediction of proteins encoded by lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O.lucimarinus, O. tauri and C. merolae, using TargetP, ChloroP and WolfPsort programs. Subcellular localisation prediction of proteins encoded by lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O.lucimarinus, O. tauri and C. merolae, using TargetP, ChloroP and WolfPsort programs. The calculated secondary structures of the proteins encoded by lipid biosynthetic genes, using GOR IV program. GC-content values of lipid biosynthetic genes as calculated by Genscan web server. Exon-intron coordinates of lipid biosynthetic genes in A. thaliana, C. reinhardtii, V. carteri, O. lucimarinus, O. tauri and C. merolae.

83 in total

1. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.

Authors: O Emanuelsson; H Nielsen; G von Heijne
Journal: Protein Sci Date: 1999-05 Impact factor: 6.725

2. Dual targeting of xylanase to chloroplasts and peroxisomes as a means to increase protein accumulation in plant cells.

Authors: Bae Hyunjong; Dae-Seok Lee; Inhwan Hwang
Journal: J Exp Bot Date: 2005-11-29 Impact factor: 6.992

Review 3. Plants to power: bioenergy to fuel the future.

Authors: Joshua S Yuan; Kelly H Tiller; Hani Al-Ahmad; Nathan R Stewart; C Neal Stewart
Journal: Trends Plant Sci Date: 2008-07-16 Impact factor: 18.313

Review 4. Metabolic engineering of fatty acid biosynthesis in plants.

Authors: Jay J Thelen; John B Ohlrogge
Journal: Metab Eng Date: 2002-01 Impact factor: 9.783

Review 5. Lipid biosynthesis.

Authors: J Ohlrogge; J Browse
Journal: Plant Cell Date: 1995-07 Impact factor: 11.277

6. The TAG1 locus of Arabidopsis encodes for a diacylglycerol acyltransferase.

Authors:
Journal: Plant Physiol Biochem Date: 1999-11 Impact factor: 4.270

7. Thermostability and aliphatic index of globular proteins.

Authors: A Ikai
Journal: J Biochem Date: 1980-12 Impact factor: 3.387

Review 8. Biodiesel from microalgae.

Authors: Yusuf Chisti
Journal: Biotechnol Adv Date: 2007-02-13 Impact factor: 14.227

Review 9. Genome analysis and its significance in four unicellular algae, Cyanidioschyzon [corrected] merolae, Ostreococcus tauri, Chlamydomonas reinhardtii, and Thalassiosira pseudonana.

Authors: Osami Misumi; Yamato Yoshida; Keiji Nishida; Takayuki Fujiwara; Takayuki Sakajiri; Syunsuke Hirooka; Yoshiki Nishimura; Tsuneyoshi Kuroiwa
Journal: J Plant Res Date: 2007-12-12 Impact factor: 2.629

10. KAS IV: a 3-ketoacyl-ACP synthase from Cuphea sp. is a medium chain specific condensing enzyme.

Authors: K Dehesh; P Edwards; J Fillatti; M Slabaugh; J Byrne
Journal: Plant J Date: 1998-08 Impact factor: 6.417

8 in total

Review 1. Agrigenomics for microalgal biofuel production: an overview of various bioinformatics resources and recent studies to link OMICS to bioenergy and bioeconomy.

Authors: Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal: OMICS Date: 2013-09-17

2. Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

Authors: Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal: Mol Biol Rep Date: 2014-10-04 Impact factor: 2.316

Review 3. Lipid metabolism and potentials of biofuel and high added-value oil production in red algae.

Authors: Naoki Sato; Takashi Moriyama; Natsumi Mori; Masakazu Toyoshima
Journal: World J Microbiol Biotechnol Date: 2017-03-16 Impact factor: 3.312

4. Glycerolipid Characterization and Nutrient Deprivation-Associated Changes in the Green Picoalga Ostreococcus tauri.

Authors: Charlotte Degraeve-Guilbault; Claire Bréhélin; Richard Haslam; Olga Sayanova; Glawdys Marie-Luce; Juliette Jouhet; Florence Corellou
Journal: Plant Physiol Date: 2017-02-24 Impact factor: 8.340

5. Revisiting the Algal "Chloroplast Lipid Droplet": The Absence of an Entity That Is Unlikely to Exist.

Authors: Takashi Moriyama; Masakazu Toyoshima; Masakazu Saito; Hajime Wada; Naoki Sato
Journal: Plant Physiol Date: 2017-10-23 Impact factor: 8.340

6. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

Authors: Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida; Barada Kanta Mishra
Journal: PLoS One Date: 2016-01-04 Impact factor: 3.240

7. Comparative transcriptome of wild type and selected strains of the microalgae Tisochrysis lutea provides insights into the genetic basis, lipid metabolism and the life cycle.

Authors: Gregory Carrier; Matthieu Garnier; Loïc Le Cunff; Gaël Bougaran; Ian Probert; Colomban De Vargas; Erwan Corre; Jean-Paul Cadoret; Bruno Saint-Jean
Journal: PLoS One Date: 2014-01-29 Impact factor: 3.240

8. Dynamism of Metabolic Carbon Flow of Starch and Lipids in Chlamydomonas debaryana.

Authors: Naoki Sato; Masakazu Toyoshima
Journal: Front Plant Sci Date: 2021-03-30 Impact factor: 5.753

8 in total