Literature DB >> 26417197

Expression Divergence of Duplicate Genes in the Protein Kinase Superfamily in Pacific Oyster.

Dahai Gao1, Dennis C Ko2, Xinmin Tian3, Guang Yang4, Liuyang Wang2.   

Abstract

Gene duplication has been proposed to serve as the engine of evolutionary innovation. It is well recognized that eukaryotic genomes contain a large number of duplicated genes that evolve new functions or expression patterns. However, in mollusks, the evolutionary mechanisms underlying the divergence and the functional maintenance of duplicate genes remain little understood. In the present study, we performed a comprehensive analysis of duplicate genes in the protein kinase superfamily using whole genome and transcriptome data for the Pacific oyster. A total of 64 duplicated gene pairs were identified based on a phylogenetic approach and the reciprocal best BLAST method. By analyzing gene expression from RNA-seq data from 69 different developmental and stimuli-induced conditions (nine tissues, 38 developmental stages, eight dry treatments, seven heat treatments, and seven salty treatments), we found that expression patterns were significantly correlated for a number of duplicate gene pairs, suggesting the conservation of regulatory mechanisms following divergence. Our analysis also identified a subset of duplicate gene pairs with very high expression divergence, indicating that these gene pairs may have been subjected to transcriptional subfunctionalization or neofunctionalization after the initial duplication events. Further analysis revealed a significant correlation between expression and sequence divergence (as revealed by synonymous or nonsynonymous substitution rates) under certain conditions. Taken together, these results provide evidence for duplicate gene sequence and expression divergence in the Pacific oyster, accompanying its adaptation to harsh environments. Our results provide new insights into the evolution of duplicate genes and their expression levels in the Pacific oyster.

Entities:  

Keywords:  Pacific oyster; RNA-seq; duplicate genes; protein kinase superfamily

Year:  2015        PMID: 26417197      PMCID: PMC4573066          DOI: 10.4137/EBO.S30230

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Gene duplication plays key roles in organismal evolution.1,2 Duplicate genes initially have identical sequences but diverge in regulatory and coding regions during subsequent evolution. Divergence in regulatory regions could result in changes in expression levels, whereas changes in coding regions may lead to the acquisition of new functions.3–5 The rapid development of next-generation sequencing technology in the past decade provides unique opportunities to study the general pattern from the whole genome and expression level. Indeed, several studies have attempted to characterize the correlation between expression patterns and genomic divergence for duplicate genes in human being, yeast, Arabidopsis, and cow.5–9 Therefore, a general picture of the patterns of expression divergence in the evolution of duplicated genes is emerging. For instance, a positive correlation between synonymous sequence divergence and expression divergence was reported for human and yeast duplicate genes.10 Likewise, the analysis of bovine duplicate genes also revealed that expression changes were correlated with sequence divergence.11 On the other hand, unambiguous evidence of weak correlation between synonymous sequence divergence and expression divergence has been found in a case study of Arabidopsis duplicate genes.12 Owing to these inconsistent findings within this limited number of species, there is still a need to characterize the relationship between expression and sequence divergence, especially in the nonmodel organism. The Pacific oyster Crassostrea gigas is a representative species of phylum Mollusca, belonging to a large taxonomic group of protostomes and the group of marine animals with the largest number of identified species. Despite the species richness of this phylum, the genomes of Mollusca have only recently been examined. The whole-genome sequence and various developmental and stress-response RNA-seq transcriptomes for Pacific oyster were released recently,13,14 rendering this species more suitable for the study of the evolution of gene duplication. Thus, several studies have discussed the structural and expression divergence of some rapidly expanding immune gene families in this species.15,16 However, the patterns of divergence of duplicate genes with roles outside of immune function have been largely ignored. In the present study, we selected a set of genes with broad importance in cell signaling, the protein kinase superfamily, to analyze the evolutionary pattern between sequence and gene expression for duplicate genes. The protein kinase superfamily is one of the largest gene families in eukaryotes, comprising 2%–4% of all genes in human being and in several model species.17 The protein kinases are well known for regulating the majority of cellular pathways, especially those involved in signal transduction.18,19 Thus, studying the evolutionary history of protein kinases provides a window to the evolution of many organism’s signaling pathways. Therefore, we undertook the present study to identify duplicated protein kinase genes in oyster and to characterize the pattern of divergence between sequence and expression. A total of 64 putative duplicate gene pairs were identified from 320 protein kinase family members. We first show that these duplicate genes have experienced stronger selective constraints. We then find unequal distributions of correlation coefficients between duplicate genes for expression patterns under each of the five different developmental and stress-induced conditions. Finally, we investigated the relationship between sequence divergence and expression divergence and found that positive correlation exists between sequence divergence and expression divergence in each of the four conditions, suggesting that sequence divergence may generally explain the expression divergence under those conditions.

Results and Discussion

Identification of duplicate genes from the protein kinase family

Protein kinases represent one of the largest gene families in eukaryotes, and an enormous number of members have been reported in the model species. For example, there are 516, 238, and 425 kinase genes in human being, fruit fly, and nematode, respectively.19 In the present study, a total of 320 protein kinase family members are identified from the genome of Pacific oyster, which account for about 1.1% of all predicted genes. Based on rigorous phylogenetic and reciprocal BLAST analyses, a total of 64 pairs are identified as putative duplicated gene pairs (Fig. 1 and Table 1). Those 128 genes are located on 111 scaffolds, indicating the scattered and wide distribution on the whole genome. Nine of the pairs are present on the same scaffold instead of being present on two different scaffolds, indicating a pattern of tandem duplication. However, we note that the frequency of tandem duplication in protein kinase family may be underestimated because of incomplete genome assembly and annotations.
Figure 1

Phylogenetic relationship of protein kinases from Pacific oyster. NJ topology was represented and bootstrap values were shown for the clades with more than 50% support. The scale bar indicates the number of amino acid substitutions per site. The genes with red circles represent the identified duplicate paralogs.

Table 1

Identified duplicate protein kinase gene pairs and related information.

PAIR NAMEGENE NAMEAMINO ACID LENGTH (aa)SCAFFOLDSTRANDEXON NUMBER
pair_01CGI_10022762291scaffold4438
CGI_10009747319scaffold3227
pair_02CGI_10005548317scaffold2686
CGI_100101911058scaffold93013
pair_03CGI_10028660360scaffold150+1
CGI_10028659325scaffold1504
pair_04CGI_10006983563scaffold401+10
CGI_10016867989scaffold1579+8
pair_05CGI_10009421672scaffold11625
CGI_10009798439scaffold156013
pair_06CGI_100119171281scaffold187439
CGI_100146441414scaffold4396421
pair_07CGI_10012098327scaffold1195+10
CGI_10004264409scaffold40612+2
pair_08CGI_10009696701scaffold37216
CGI_10028458792scaffold1027
pair_09CGI_10021945442scaffold108611
CGI_10026068488scaffold1174+18
pair_10CGI_10026336486scaffold678+12
CGI_10003308360scaffold39368+10
pair_11CGI_100168031961scaffold55625
CGI_10010738350scaffold954+9
pair_12CGI_10015287936scaffold4400826
CGI_10009716504scaffold1028+13
pair_13CGI_10022933545scaffold95011
CGI_10013006595scaffold116413
pair_14CGI_100044181209scaffold201+32
CGI_100215681755scaffold23741
pair_15CGI_10007711482scaffold42776+14
CGI_100175211106scaffold120+6
pair_16CGI_10010914794scaffold1288+21
CGI_10001604528C3577614
pair_17CGI_10009216338scaffold16889
CGI_10003535354scaffold39740+10
pair_18CGI_10018112760scaffold396+16
CGI_10021856689scaffold164+16
pair_19CGI_10022111491scaffold10916
CGI_10024838468scaffold49213
pair_20CGI_10016201392scaffold324+9
CGI_10001632444scaffold277+9
pair_21CGI_10028745368scaffold1009+8
CGI_10026280361scaffold183611
pair_22CGI_10014307499scaffold737+2
CGI_10014308784scaffold737+2
pair_23CGI_10011211324scaffold1157+1
CGI_10002545387scaffold17958
pair_24CGI_100047681373scaffold1107+22
CGI_10001297894C3444412
pair_25CGI_10001599861scaffold1453+18
CGI_10008024952scaffold1277+21
pair_26CGI_10009988499scaffold4336613
CGI_10013117796scaffold1252+19
pair_27CGI_10024885832scaffold146+14
CGI_10019262615scaffold506+10
pair_28CGI_10019292484scaffold363+17
CGI_10003652993scaffold10888
pair_29CGI_100120761087scaffold14928
CGI_100120771082scaffold14928
pair_30CGI_100104041166scaffold4344616
CGI_10000974543scaffold14966
pair_31CGI_10010300567scaffold434262
CGI_10010302977scaffold434262
pair_32CGI_10001931910scaffold36398+17
CGI_10018647845scaffold509+18
pair_33CGI_10000466269C28760+5
CGI_100284391336scaffold10221
pair_34CGI_10014121370scaffold439328
CGI_100141261593scaffold43932+19
pair_35CGI_10011806539scaffold43696+11
CGI_10026689677scaffold5310
pair_36CGI_100208385054scaffold124472
CGI_100066991033scaffold42366+25
pair_37CGI_10018029387scaffold12+8
CGI_10006070862scaffold1840+17
pair_38CGI_10024845327scaffold492+6
CGI_10008929441scaffold635+9
pair_39CGI_10007062831scaffold175812
CGI_10007061994scaffold1758+19
pair_40CGI_10027170770scaffold159915
CGI_100281781359scaffold8625
pair_41CGI_100124292389scaffold498+19
CGI_10008499325scaffold43036+6
pair_42CGI_10011096516scaffold34014
CGI_10026549479scaffold145+6
pair_43CGI_10004703474scaffold1231+8
CGI_10003879403scaffold40120+8
pair_44CGI_10006185585scaffold152616
CGI_10006186730scaffold1526+19
pair_45CGI_10016954667scaffold117+15
CGI_10016955621scaffold117+15
pair_46CGI_10012313720scaffold47718
CGI_10012310720scaffold47718
pair_47CGI_10016396252scaffold594+3
CGI_10016395466scaffold594+3
pair_48CGI_10010613594scaffold43500+10
CGI_10027407432scaffold117912
pair_49CGI_10022789283scaffold443+7
CGI_10019511347scaffold376+7
pair_50CGI_100210301493scaffold67236
CGI_10021977774scaffold110819
pair_51CGI_10010190562scaffold93010
CGI_10005580517scaffold1708+10
pair_52CGI_10007244271scaffold493+9
CGI_10004779599scaffold106715
pair_53CGI_10018169273scaffold459+7
CGI_10002412290scaffold857+6
pair_54CGI_10001400485scaffold3499411
CGI_10013050331scaffold438367
pair_55CGI_10025852784scaffold1583+16
CGI_10010926384scaffold1288+14
pair_56CGI_10020312344scaffold522+9
CGI_10023891368scaffold4811
pair_57CGI_10019854401scaffold1512+4
CGI_100093551247scaffold43208+10
pair_58CGI_10012125434scaffold1890+13
CGI_10027818482scaffold198+13
pair_59CGI_10017610475scaffold1670+13
CGI_10021263584scaffold157+13
pair_60CGI_10025910645scaffold33411
CGI_10019663357scaffold1715+9
pair_61CGI_10020062555scaffold258+1
CGI_10027350709scaffold117914
pair_62CGI_10013344530scaffold189411
CGI_10023484516scaffold125813
pair_63CGI_100079231373scaffold42850+17
CGI_100286891383scaffold150+24
pair_64CGI_10015137208scaffold1671+1
CGI_10025899600scaffold733+2
Comparisons of duplicate gene pairs showed that 13 pairs (20.3%) have equal exon numbers, and 14 pairs (21.9%) have almost identical exon numbers (their exon number difference ≤2) (Table 1). In the remaining 37 pairs (57.8%), the exon–intron structures are divergent between duplicate paralogs, indicating that exon–intron structural divergence is a common occurrence in oyster protein kinase genes. By analyzing 612 pairs of sibling paralogs from Arabidopsis and rice, Xu et al.7 demonstrated that exon–intron structural variation is prevalent in three gene families and the proposed three mechanisms, including exon/intron gain/loss, exonization/pseudoexonization, and insertion/deletion, might contribute to the formation of structural changes. Their findings suggest that such structural divergences have played a vital role during the evolution of duplicate genes, and our findings are consistent with these studies.

Sequence divergence between duplicate gene pairs

To estimate the sequence divergence between duplicate gene paralogs, we calculate the synonymous (Ks) and nonsynonymous (Ka) substitution rates of coding sequences for each gene pair. The synonymous substitution rate Ks can be recognized as a proxy of divergence time between duplicated genes. The distribution of Ks has two major peaks around 0.2 and 4.1 in the density plot (Fig. 2A), indicating that those gene pairs originated at two major different stages and differed by evolutionary time. In addition, more than a half (61%) of duplicate pairs have Ks larger than 4, indicating highly diverged sequences and relatively long evolutionary time. In contrast, 25% of duplicate pairs have Ks less than 1, representing recently duplicated genes and relatively little sequence divergence.
Figure 2

The sequence divergence between duplicate pairs. (A) The density distribution of synonymous rate (Ks) for all duplicate gene pairs. (B) The comparisons of Ka/Ks and Ks values, where Ks is a proxy of divergence time between duplicated genes.

The ω (Ka/Ks) values reflect selection pressure during evolution. For all studied pairs, the ω values were lower than 1, suggesting that those pairs were all evolving under purifying selection with putative functional constraints (Fig. 2B). Moreover, in recent duplicated pairs, there are some gene pairs with ω values higher than 0.4, indicating that the evolutionary constraint might be relaxed in some degree. Those genes subjected to relaxed purifying selection may tend to accumulate more mutations, altering gene structure and expression. Intriguingly, in our recent study of the oyster TNF superfamily, we also found that recently originated duplicate genes were under purifying selection.15

Expression patterns of duplicate gene pairs

In order to characterize the expression divergence for all gene pairs, the RNA-seq data collected from 69 developmental and stress-induced RNA-seq datasets have been analyzed (expression values and detailed information are given in Supplementary File 3). The Pearson’s correlation coefficient r was calculated to quantify correlation between duplicate genes at the level of expression. We also identified two main clades based on expression data from the RNA-seq data, which displayed opposite patterns. Clade 1 (upper in Fig. 3A), which include over two-thirds of the pairs, exhibited large positive r values under the majority of expression conditions, suggesting consistent expression patterns and similar functionality within each pair after their duplication. In contrast, Clade 2 (lower in Fig. 3A) showed a majority of negative r values, indicating that the paralogs in each pairs had divergent expression under most conditions.
Figure 3

Expression patterns of duplicate gene pairs. (A) The heatmap was performed using Pearson’s correlation coefficient of gene expression under five conditions, and red to blue blocks indicate high-to-low correlation levels. (B) The boxplot represents the distribution of correlation coefficient values for each expression condition.

For each condition of developmental and stress-induced transcriptomes, the median value of Pearson’s r was positive. This indicated that most pairs have correlated expression patterns (Fig. 3B), suggesting that the genes in these pairs evolved under some functional constraint. Nonetheless, there is still a proportion of gene pairs also exhibiting negative r values, suggesting expression divergence within those gene pairs (Fig. 3B). Compared with the other four conditions of transcriptomes, the distribution of Pearson’s r value under the heat condition was slightly lower, with a considerable proportion of negative values. These results suggest that those duplicated genes may have gained novel functions via subfunctionalization and/or neofunctionalization after their duplication. We hypothesize that these protein kinase genes have evolved to adapt to various stress environments or specialized developmental roles via expression divergence in oyster.

Positively correlated sequence divergence and expression divergence

The relationship between sequence divergence and expression divergence was investigated in all the five conditions of developmental and transcriptional transcriptomes (Fig. 4). We found significant negative correlation between transformed r′ and Ka (or Ks) in four of the five conditions (P < 0.05, Fig. 4B–E). Interestingly, there is a significant positive correlation between expression divergence and sequence divergence among duplicate pairs under those conditions. However, for the set of transcriptomes comparing relative tissue abundance, the correlation was not statistically significant (P = 0.201 for Ka and P = 0.436 for Ks, Fig. 4A), suggesting less correlation between expression divergence and genome divergence. This pattern is mostly consistent with previous studies in yeast, human being, and cow.8,9,11
Figure 4

The relationship between the correlation coefficient (R) of gene expression and Ka (or Ks) in duplicate genes. (A) No correlation between and Ka (or Ks) for tissue expression transcriptomes. (B–E) Negative correlations between and Ka (or Ks) under developmental stages, dry treatments, salt treatments, and heat treatments, respectively. These imply positive correlation between sequence divergence and expression divergence because 1−r can be regarded as expression divergence. Each point represents one gene pair.

Gene duplications are widely present in eukaryotic genomes, providing increased opportunities for nonreciprocal recombination and allowing redundant genes to evolve new functions. However, the fate of duplicate genes is a widely discussed topic of genome evolution. Recently, the subfunctionalization and neofunctionalization models have been invoked to explain the retention of duplicate genes.2,20 In this study, we found that most gene pairs exhibited consistent expression patterns and underwent purifying selection. Sequence and expression divergence were positively correlated under four conditions, consistent with the hypothesis of sequence divergence driving expression divergence. For the transcriptomes of tissue expression levels, we observed non-significant correlation between expression divergence and genome divergence, possibly because of genome divergence in noncoding regions not being reflective of the pattern seen in coding regions (as exemplified by Ka and Ks), a complicated expression divergence pattern, or the limited sample size used in our analysis. Therefore, we hypothesize that the functional constraints of protein kinase genes may contribute to the evolution of the duplicate paralogs in oyster. To sum up, our results provide insights into duplicate gene sequence and expression divergence in the Pacific oyster and may help to elucidate its adaptation to different environments and development processes. Our results may also help to understand the mechanisms for the retention of duplicate genes in other gene families in Pacific oyster.

Methods

Identification of duplicate gene pairs

The Pacific oyster genome sequences were downloaded from OysterDB (http://oysterdb.cn/home.html). The hidden Markov model (HMM) method was carried out to retrieve sequences containing a protein kinase domain (PF00069). The presence of a protein kinase domain (PF00229) was validated by SMART (http://smart.embl-heidelberg.de/) and Pfam (http://pfam.sanger.ac.uk/), and a total of 320 protein kinase sequences were identified from the Pacific oyster genome (see Supplementary File 1 for details). Phylogenetic analyses and reciprocal BLAST were applied to identify the duplicate gene pairs and relationships. First, phylogenetic analysis was carried out to identify the sequence pairs with close evolutionary relationships. We used the HMMalign program21 to generate sequence alignments for protein kinase domain regions (provided as Supplementary File 2). The phylogenetic tree was reconstructed using the neighbor-joining (NJ) method from the MEGA 5.01 program22 and a total of 75 paralog pairs (Fig. 1) were identified. Second, the sibling paralog relationship for each pair was further confirmed by reciprocal best BLAST hits analysis. The BLASTP program was used to compare each protein in identified pairs against all other proteins with the E-value cutoff of 1e-5. As a result, a total of 64 duplicate gene pairs were identified (Table 1), and their genome location and annotation were parsed from GFF files by a custom Python script.

Sequence alignment and divergence analysis

Protein sequences for each duplicated gene pair were first aligned by ClustalW.23 Then, the PAL2NAL program was used to generate the codon alignment.24 The synonymous (Ks) and nonsynonymous (Ka) substitution rates of coding sequences for duplicate pairs were calculated by the KaKs_Calculator25 using the modified Yang–Nielsen algorithm (MYN).26

Expression profile analysis

The available expression values for C. gigas protein kinase genes under five varied condition and 69 RNA-seq transcriptome datasets (nine tissues, 38 developmental stages, eight dry treatments, seven heat treatments, and seven salty treatments) were obtained from the transcriptome dataset of oyster genome project (http://oysterdb.cn/home.html, Zhang et al.13). The Reads per Kilobase of exon Model per million mapped reads (RPKM) values were calculated to indicate the expression levels of each gene. The pairwise expression patterns for each gene pair under different conditions were visualized through heat-map in R version 2.13. Pearson’s correlation coefficient r of expression level was calculated to measure the correlation among the duplicate gene pairs at the level of expression. Following previous studies, the Pearson’s correlation coefficient r was further transformed to r′ using equation and the rescaled r′ is more appropriate for linear regression analysis.8,11 With the transformation, the negative regression coefficient between r′ and Ks (or Ka) represents a positive relationship between expression level and Ks (or Ka). Supplementary File 1. Full protein coding sequences of identified Pacific oyster protein kinase proteins genes (FASTA format). Supplementary File 2. Amino acid alignment of identified Pacific oyster protein kinase proteins sequences (FASTA format). Supplementary File 3. RPKM values for identified Pacific oyster protein kinase genes from five different developmental and stimuli-induced datasets (xlsx format).
  23 in total

1.  Genomics. Gene duplication and evolution.

Authors:  Michael Lynch
Journal:  Science       Date:  2002-08-09       Impact factor: 47.728

2.  Divergence of duplicate genes in exon-intron structure.

Authors:  Guixia Xu; Chunce Guo; Hongyan Shan; Hongzhi Kong
Journal:  Proc Natl Acad Sci U S A       Date:  2012-01-09       Impact factor: 11.205

Review 3.  The evolution of gene duplications: classifying and distinguishing between models.

Authors:  Hideki Innan; Fyodor Kondrashov
Journal:  Nat Rev Genet       Date:  2010-01-06       Impact factor: 53.242

4.  Multiple sequence alignment using ClustalW and ClustalX.

Authors:  Julie D Thompson; Toby J Gibson; Des G Higgins
Journal:  Curr Protoc Bioinformatics       Date:  2002-08

Review 5.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

6.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Authors:  Koichiro Tamura; Daniel Peterson; Nicholas Peterson; Glen Stecher; Masatoshi Nei; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2011-05-04       Impact factor: 16.240

Review 7.  Evolution of protein kinase signaling from yeast to man.

Authors:  Gerard Manning; Gregory D Plowman; Tony Hunter; Sucha Sudarsanam
Journal:  Trends Biochem Sci       Date:  2002-10       Impact factor: 13.807

8.  KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.

Authors:  Dapeng Wang; Yubin Zhang; Zhang Zhang; Jiang Zhu; Jun Yu
Journal:  Genomics Proteomics Bioinformatics       Date:  2010-03       Impact factor: 7.691

9.  Expression pattern divergence of duplicated genes in rice.

Authors:  Zhe Li; He Zhang; Song Ge; Xiaocheng Gu; Ge Gao; Jingchu Luo
Journal:  BMC Bioinformatics       Date:  2009-06-16       Impact factor: 3.169

10.  Sequence, structural and expression divergence of duplicate genes in the bovine genome.

Authors:  Xiaoping Liao; Hua Bao; Yan Meng; Graham Plastow; Stephen Moore; Paul Stothard
Journal:  PLoS One       Date:  2014-07-23       Impact factor: 3.240

View more
  3 in total

Review 1.  The Role of Sequence Duplication in Transcriptional Regulation and Genome Evolution.

Authors:  Luis M Vaschetto; Natalia Ortiz
Journal:  Curr Genomics       Date:  2019-09       Impact factor: 2.236

2.  RNA: An Expanding View of Function and Evolution.

Authors:  Xinwei Han; Yuan Chen; Liuyang Wang; Wenwen Fang; Ning Zhang; Qiyun Zhu
Journal:  Evol Bioinform Online       Date:  2016-01-14       Impact factor: 1.625

3.  The differential expression patterns of paralogs in response to stresses indicate expression and sequence divergences.

Authors:  Shuaibin Lian; Yongjie Zhou; Zixiao Liu; Andong Gong; Lin Cheng
Journal:  BMC Plant Biol       Date:  2020-06-16       Impact factor: 4.215

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.