Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O(2) delivery in the face of changing metabolic demands and environmental O(2) availability during different ontogenetic stages. During the course of development, regulatory changes in blood-O(2) transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O(2)-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and deletion events. Phylogenetic reconstructions revealed that paralogous genes convergently evolved similar functional properties in different teleost lineages. Consistent with other recent studies of globin gene family evolution in vertebrates, our results revealed evidence for repeated evolutionary transitions in the developmental regulation of Hb synthesis.
Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O(2) delivery in the face of changing metabolic demands and environmental O(2) availability during different ontogenetic stages. During the course of development, regulatory changes in blood-O(2) transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O(2)-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and deletion events. Phylogenetic reconstructions revealed that paralogous genes convergently evolved similar functional properties in different teleost lineages. Consistent with other recent studies of globin gene family evolution in vertebrates, our results revealed evidence for repeated evolutionary transitions in the developmental regulation of Hb synthesis.
Evidence suggests that two successive rounds of whole-genome duplication that occurred
early in vertebrate evolution may have played an important role in the evolution of
vertebrate-specific innovations (Holland et al.
1994; Meyer 1998; Meyer and Schartl 1999; Shimeld and Holland 2000; Wada 2001; Hoegg and Meyer 2005;
Wada and Makabe 2006; Zhang and Cohn 2008; Van
de Peer et al. 2009; Hoffmann, Opazo, and
Storz 2012). Roughly 320–400 Ma, a third genome duplication occurred in the
stem lineage of teleost fish (infraclass Teleostei) following divergence from nonteleost
ray-finned fish (Amores et al. 1998, 2011; Postlethwait et al. 2000; Taylor et al.
2001, 2003; Van de Peer et al. 2003; Hoegg et al. 2004; Jaillon et al.
2004; Meyer and Van de Peer 2005;
Kasahara et al. 2007; Sato and Nishida 2010). The teleost-specific genome duplication
(TGD) is thought to have provided raw materials for the physiological, morphological, and
behavioral diversification of teleost fish, perhaps facilitating the radiation of this
speciose group into diverse marine and freshwater environments across the planet. Evidence
in support of a causal connection between the TGD and phenotypic innovation is provided by
studies of TGD-derived gene duplicates that evolved distinct physiological or developmental
functions in various teleost lineages (Meyer and
Málaga-Trillo 1999; Lister et al.
2001; Mulley et al. 2006; Braasch et al. 2006, 2007; Hashiguchi and
Nishida 2007; Hoegg and Meyer 2007;
Sato and Nishida 2007; Siegel et al. 2007; Yu et
al. 2007; Douard et al. 2008; Braasch, Brunet, et al. 2009; Braasch, Volff, et al. 2009; Sato et al. 2009a, 2009b; Arnegard et al. 2010).The extreme physiological versatility of teleost fishes is manifest in their diversity of
blood–gas transport traits (Wells
2009). This diversity reflects the myriad solutions that have evolved to maintain
tissue O2 delivery in the face of changing metabolic demands and environmental
O2 availability during different ontogenetic stages. Relative to air-breathing
vertebrates, fish generally contend with far greater vicissitudes of environmental
O2 availability, which is largely because O2 solubility (and hence,
the availability of dissolved O2 for respiration) varies as a function of water
temperature. During ontogeny, regulatory changes in blood-O2 transport are
mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that
are adapted to the particular O2-transport challenges encountered by the
developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae
and adults (reviewed by Ingermann 1997; Jensen et al. 1998). As in other vertebrates, the
developmental regulation of Hb synthesis in fish involves differential expression of
duplicated genes that encode the α- and β-chain subunits of distinct tetrameric
α2β2 Hb isoforms (Chan et al. 1997; Brownlie et al.
2003; Maruyama, Yasumasu, and Iuchi
2004; Maruyama, Yasumasu, Naruse, et al.
2004; Tiedke et al. 2011).Most teleost fish also coexpress functionally distinct Hb isoforms during posthatching
life, and these isoforms can be broadly classified (based on electrophoretic mobility at pH
> 8.0) as “anodic” or “cathodic.” The anodic Hbs have relatively
low O2 affinities and a pronounced Bohr effect (decreased Hb-O2
affinity at low pH), whereas the cathodic Hbs have relatively high O2 affinities,
an enhanced responsiveness to allosteric regulation by organic phosphates, and a reversed
Bohr effect (increased Hb-O2 affinity at low pH) in the absence of organic
phosphates (Weber and Jensen 1988; Weber 1990, 2000; Jensen et al.
1998; Weber et al. 2000; Wells 2009). Experimental evidence for some
species suggests that regulatory changes in intraerythrocytic Hb isoform composition may
play a role in the acclimatization response to environmental hypoxia (e.g., Rutjes et al. 2007), but it has not been possible
to formulate any broadly consistent empirical generalizations (Weber and Jensen 1988; Weber, 1990, 2000; Ingermann 1997; Wells 2009). A remarkable feature of nearly all anodic Hb
isoforms of teleost fish is the Root effect, an extreme reduction in Hb-O2
binding capacity at low pH, even when blood O2 tension remains high. The Root
effect is considered a key evolutionary innovation in teleost fish, as it plays a critical
role in secreting O2 into the swim bladder for buoyancy control and in supplying
O2 to the avascular retina (Pelster and
Weber 1991; Berenbrink et al. 2005;
Berenbrink 2007; Wells 2009).The proto α- and β-globin genes of jawed
vertebrates (Gnathostomata) represent the product of an ancient gene duplication event that
occurred roughly 450–500 Ma in the Ordovician, before the divergence between
cartilaginous fish (Chondrichthyes) and the common ancestor of ray-finned fish
(Actinopterygii) and lobe-finned fish + tetrapods (Sarcopterygii; Goodman et al. 1987; Storz
et al. 2011, 2012; Hoffmann, Opazo, and Storz 2012). Subsequent
rounds of duplication and divergence gave rise to diverse repertoires of
α- and β-like globin genes that are
developmentally regulated in different ways in different vertebrate lineages (Hardison 2001; Hoffmann, Storz, et al. 2010). The ancestral linkage arrangement
of the α- and β-globin genes is still retained
in at least some cartilaginous fish (Marino et al.
2007), teleosts (Chan et al. 1997;
Miyata and Aoki 1997; Gillemans et al. 2003; Pisano et al. 2003), and amphibians (Hentschel et al. 1979; Jeffreys et al.
1980; Kay et al. 1980; Hosbach et al. 1983; Fuchs et al. 2006). In amniote vertebrates, by contrast, the
α- and β-globin gene clusters are located on
different chromosomes due to transposition of the proto β-globin gene
to a new genomic location sometime after the stem lineage of amniotes split from the line
leading to amphibians (Hardison 2008; Patel et al. 2008, 2010; Hoffmann, Opazo, and
Storz 2012).The main objective of the present study was to assess the relative contributions of
whole-genome duplication, large-scale segmental duplication, and small-scale gene
duplication in producing the extraordinary functional diversity of teleost Hbs. To
accomplish this, we integrated phylogenetic reconstructions with analyses of conserved
synteny to characterize the genomic organization and evolutionary history of the globin gene
clusters of teleost fish. These results were then integrated with available experimental
data on functional properties and developmental patterns of stage-specific gene expression.
Results of the phylogenetic and comparative genomic analyses revealed repeated evolutionary
transitions in stage-specific expression in different teleost lineages. Our analyses also
revealed that functionally distinct anodic and cathodic adult Hbs evolved independently in
different teleost lineages, providing evidence for convergence in the physiological division
of labor between coexpressed Hb isoforms.
Materials and Methods
Data Collection
We used bioinformatic techniques to manually annotate the full complement of globin genes
in the genomes of six teleost fish available in release 67 of the ensembl database (fugu,
Takifugu rubripes; medaka, Oryzias latipes; green
spotted puffer, Tetraodon nigroviridis; tilapia, Oreochromis
niloticus; three-spined stickleback, Gasterosteus aculeatus;
and zebrafish, Danio rerio). We also annotated the globin genes from a
live-bearing teleost (platyfish, Xiphophorus maculatus) and a nonteleost
ray-finned fish (spotted gar, Lepisosteus oculatus), both available from
the Pre!ensembl database. We compared the ensembl data with previous reports on the
genomic organization of the globin gene clusters in fugu (Flint et al. 2001), medaka (Maruyama, Yasumasu, and Iuchi 2004;
Maruyama, Yasumasu, Naruse, et al. 2004), and zebrafish (Brownlie et al. 2003). We also included coding sequences from
the full complement of globin genes from Atlantic cod (Gadus morhua;
Borza et al. 2009, Wetten et al. 2010) and Atlantic salmon (Salmo
salar; Quinn et al. 2010).
However, the fragmentary state of the cod and salmon genome assemblies precluded a
detailed comparative analysis of the globin gene clusters in these two species. Finally,
we included additional records from tetrapod vertebrates and cartilaginous fish as
outgroup sequences for phylogenetic analyses, and we included genomic contigs from
representative tetrapods for the purpose of making synteny comparisons. When possible, the
annotated genomic sequences were validated by comparison with the relevant expressed
sequence tag (EST) databases.
Assessments of Conserved Synteny
To examine patterns of conserved synteny, we annotated the genes found upstream and
downstream of the globin gene clusters of seven teleost species (fugu, medaka, platyfish,
green-spotted puffer, stickleback, tilapia, and zebrafish) and one nonteleost ray-finned
fish (spotted gar). Initial ortholog predictions were derived from the EnsemblCompara
database (Vilella et al. 2009) and were
visualized using the program Genomicus (Muffato et
al. 2010). In addition, we also used the program Genscan (Burge and Karlin 1997) to identify additional unannotated genes
lying upstream and downstream of the annotated globin genes. The unannotated genes were
compared with the nonredundant protein database using Basic Local Alignment Search Tool
(BLAST) (Altschul et al. 1990). Partial
sequences for genes of interest (representing pseudogenes or artifacts related to
incomplete sequence coverage) were identified and annotated with BLAST. To examine
large-scale patterns of sequence conservation, we conducted pairwise comparisons of
sequence similarity between globin gene clusters using the Pipmaker and Multipipmaker
programs (Schwartz et al. 2000, 2003). To facilitate comparisons, genes have
been labeled following the Zebrafish Model Organism Database nomenclature guidelines.
Finally, we conducted an analysis of conserved synteny between the globin gene clusters of
medaka and the reconstructed protokaryotype of the pre-TGDteleost common ancestor
provided by Kasahara et al. (2007) and Nakatani et al. (2007).
Sequence Alignment
Separate alignments of the α- and β-globin
coding sequences were based on conceptual translations of nucleotide sequences. Alignments
were performed using Muscle v 3.8 (Edgar
2004) and the E-INS-i, G-INS-I, and L-INS-i strategies from Mafft v6.8 (Katoh et al. 2009). We employed MUMSA (Lassmann and Sonnhammer 2005, 2006) to select the best-scoring multiple
alignment, and we then used the selected alignment to estimate phylogenetic relationships.
These sequence manipulations were carried out in the Mobyle platform server (Néron et al. 2009) hosted by the Institut
Pasteur (http://mobyle.pasteur.fr, last accessed
September 2012). All sequence alignments are provided in supplementary
data file S1, Supplementary
Material online.
Phylogenetic Analyses
We reconstructed separate phylogenies for the α- and
β-globin genes using Bayesian and maximum likelihood approaches.
We performed maximum likelihood analyses in Treefinder, version March 2011 (Jobb et al. 2004), and we evaluated support for
the nodes with 1,000 bootstrap pseudoreplicates. We used the “propose model”
tool of Treefinder to select the best-fitting models of amino acid and nucleotide
substitution, with an independent model for each codon position in analyses based on
nucleotide sequences. Model selection was based on the Akaike information criterion with
correction for small sample size. We estimated Bayesian phylogenies in MrBayes v.3.1.2
(Ronquist and Huelsenbeck 2003), running
six simultaneous chains for 2 × 107 generations, sampling every 2.5
× 103 generations, and using default priors. A given run was considered
to have reached convergence once the likelihood scores reached an asymptotic value and the
average standard deviation of split frequencies remained <0.01. We discarded all trees
that were sampled before convergence, and we evaluated support for the nodes and parameter
estimates from a majority rule consensus of the last 2,500 trees.
Results and Discussion
The comparative genomic analysis revealed that the Hb genes of teleost
fish are located in two separate chromosomal regions that are clearly delineated by distinct
sets of flanking loci (fig. 1). In contrast, the
Hb genes of the nonteleost gar are located in a single chromosomal
region. Following Hardison (2008), the teleostglobin gene cluster flanked by the mpg and nprl3 genes was
labeled the “MN” cluster, and the teleostglobin cluster flanked by the
lcmt1 and aqp8 genes was labeled the “LA”
cluster. In the platyfish assembly, we identified two separate scaffolds containing the MN
and LA clusters (fig. 1) and a third scaffold
(JH559524) that contained a single, putatively functional β-globin
gene. We excluded this latter scaffold from all subsequent analyses because it likely
represents an assembly artifact. The MN and LA clusters correspond to the medaka E1 and A1
clusters, respectively, that were described by Maruyama et al. (Maruyama, Yasumasu, and Iuchi 2004; Maruyama, Yasumasu, Naruse, et al. 2004). To facilitate
comparison, we report the order of genes in the same orientation as they appear in the
zebrafish genome assembly, regardless of how they are found in the ensembl database. Since
the MN and LA clusters of most teleosts harbor globin genes in both forward and reverse
orientations, we use the terms left and right to describe linear gene order. The individual
α- and β-globin genes in the MN cluster were
numbered from left to right, such that the functional globin gene in the leftmost position
of the MN cluster of zebrafish is labeled MN Hbb1, the next gene to the
right is MN Hba1, and so forth, whereas the genes in the LA cluster were
numbered from right to left, starting with the gene closest to aqp8 (fig. 2). In the case of cod and salmon globin genes,
we retained the labels from the original studies (Borza et al. 2009; Quinn et al.
2010). Sequence sources for the globin gene clusters used in this study are provided
in table 1, and the annotations for each
cluster are provided in supplementary table S1, Supplementary Material online. To facilitate comparisons with previous
studies, we compiled a list of previously used names for each of the annotated globin genes
(supplementary table
S2, Supplementary
Material online).
F
Unscaled depiction of the genomic organization of the MN and LA
globin gene clusters from representative teleost fishes, with the human
α-globin gene cluster provided as reference. To facilitate
comparisons, all clusters are presented in the same orientation as the zebrafish.
Genes in the forward orientation are shown on top of the chromosome, whereas genes in
the reverse orientation are shown below.
F
Genomic structure of the MN
and LA globin gene clusters of teleost fish. To facilitate comparisons, all clusters
are presented in the same orientation as the zebrafish. Genes in the forward
orientation are shown on top of the chromosome, whereas genes in the reverse
orientation are shown below. The green-spotted puffer globin genes are assumed to have
the same stage-specific expression profiles as their orthologous counterparts in fugu.
The Hbb pseudogene in the zebrafish MN cluster is not drawn to scale. Gene labels are
color coded based on the timing of their expression. Genes marked with an asterisk
were not included in the phylogenetic analyses.
Table 1.
Data Sources, Genomic
Coordinates and Orientations of the Globin Gene Clusters in Fugu, Green-Spotted
Puffer, Gar, Medaka, Platyfish, Tilapia, Stickleback, and Zebrafish.
Species
Release
Cluster
Location
Orientation
Start
End
Fugu (T. rubripes)
Fugu 4.0
LA
Sc_ 3
lcmt1 → rhbdf1b
2,511,982
2,517,737
MN
Sc_15
kank → nprl3
417,195
420,598
Green-spotted puffer (Tet. nigroviridis)
46
LA
Chr 2
rhbdf1b → aqp8
5,887,638
5,893,221
MN
Chr 3
kank → nprl3
12,162,093
12,165,924
Gar (Lepisosteus oculatus)
LepOcu1
Hb
LG13
nprl3 → luc7l
2,809
54,885
Medaka (O. latipes)
Medaka 1.0
LA
Chr 19
→ aqp8
1,478,030
1,487,664
MN
Chr 8
nprl3 → kank
8,378,078
8,412,019
Platyfish (X. maculatus)
Xipmac4.4.2
LA
JH557783
rhbdf1b → aqp8
22,438
37,618
MN
JH556906
nprl3 → kank
106,543
141,798
Unassigned
JH559524
5,235
8,503
Tilapia (Ore. niloticus)
Orenil1.0
LA
GL831136
rhbdf1b → aqp8
111,303
122,995
MN
GL831149
nprl3 → kank
110,462
169,554
Stickleback (G. aculeatus)
BROADS1
LA
Sc 112
c17orf28 → aqp8
339,530
343,463
MN
Gr XI
kank → nprl3
13,640,461
13,663,356
Zebrafish (D. rerio)
Zv9
LA
Chr 12
rhbdf1b → aqp8
21,688,806
21,705,956
MN
Chr 3
nprl3 → kank
55,938,147
55,999,373
Note.—In all cases, data were obtained from
Ensembl. The start and end points correspond to the most distant edges from the two
genes on either end of the cluster.
Unscaled depiction of the genomic organization of the MN and LA
globin gene clusters from representative teleost fishes, with the human
α-globin gene cluster provided as reference. To facilitate
comparisons, all clusters are presented in the same orientation as the zebrafish.
Genes in the forward orientation are shown on top of the chromosome, whereas genes in
the reverse orientation are shown below.Genomic structure of the MN
and LA globin gene clusters of teleost fish. To facilitate comparisons, all clusters
are presented in the same orientation as the zebrafish. Genes in the forward
orientation are shown on top of the chromosome, whereas genes in the reverse
orientation are shown below. The green-spotted puffer globin genes are assumed to have
the same stage-specific expression profiles as their orthologous counterparts in fugu.
The Hbb pseudogene in the zebrafish MN cluster is not drawn to scale. Gene labels are
color coded based on the timing of their expression. Genes marked with an asterisk
were not included in the phylogenetic analyses.Data Sources, Genomic
Coordinates and Orientations of the Globin Gene Clusters in Fugu, Green-Spotted
Puffer, Gar, Medaka, Platyfish, Tilapia, Stickleback, and Zebrafish.Note.—In all cases, data were obtained from
Ensembl. The start and end points correspond to the most distant edges from the two
genes on either end of the cluster.
Genomic Structure of the MN and LA Globin Gene Clusters in Teleosts
Patterns of Conserved Synteny
The genomic context of the teleostglobin gene clusters is relatively well conserved,
especially in the case of the MN cluster. In all teleost species analyzed, there is
perfect conservation for the five genes to the left of the MN cluster:
aanat, mgrn1, rhbdf1a,
mpg, and nprl3 (fig. 1). The two genes to the right, kank2 and
dock6, are also conserved in all species. The
genomic organization of the LA globin gene cluster is not as strongly conserved. Four of
the seven teleost species possess a single copy of rhbdf1b to the left
of the LA cluster, which is paralogous to the rhbdf1a gene found
adjacent to the MN cluster. On the right side of the LA cluster, all teleost species
possess copies of lcmt1 and arhgap17. Each of the
teleost species possess one or two copies of aqp8, with the exception
of the two tetraodontid species (fugu and green spotted puffer) that have secondarily
lost this gene (fig. 1).
Hb Gene Repertoires of Teleost Fish
There were several cases where our manual annotations of the globin gene clusters
differed from annotations provided in the most recent releases of the various teleost
genome assemblies. For example, no MN-linked globin genes were annotated in the most
recent release of the fugu genome in the ensembl database. However, BLAST comparisons
with an independent record of the fugu MN globin cluster (AY016024) revealed the
presence of two unannotated α-globin genes between
nprl3 and kank2, as reported by Flint et al. (2001). In addition, the only
annotated β-globin gene in the green-spotted puffer genome
(green-spotted puffer LA Hbb1) contained a 4 bp insertion in the second
exon that would render it nonfunctional. Comparisons with cDNA-derived sequence
databases revealed several putatively functional transcripts that lacked the
inactivating 4 bp insertion but were otherwise identical in sequence. We assumed that
the insertion was either a sequencing or assembly artifact, and we therefore used the
cDNA-derived sequence for all further analyses.The MN and LA clusters of the different species exhibited substantial variation in both
physical extent and gene content (fig. 2).
From the start codon of the first globin gene to the stop codon of the last globin gene,
the MN cluster ranged from 3.4 kb in fugu to 68.5 kb in zebrafish, and the LA cluster
ranged from 3.4 kb in stickleback to 17.2 kb in zebrafish. With respect to gene content,
the number of globin genes in these clusters ranged from 2 in the MN clusters of fugu
and green-spotted puffer and the LA cluster of stickleback, to 13 in the MN clusters of
tilapia and zebrafish (not including two genes with partial sequence coverage in the
tilapia assembly; fig. 2). Interspecific
comparisons revealed a higher rate of globin gene turnover in the MN cluster than in the
LA cluster. The MN clusters of fugu and green-spotted puffer possess only two
α-globin genes in the reverse orientation, whereas the MN
clusters of all other teleosts contain interspersed α- and
β-globin genes in both head-to-head and head-to-tail
orientations (fig. 2). In the case of
stickleback, all of the α-globin genes are found in the reverse
orientation, and all of the β-globin genes are in the forward
orientation (fig. 2). In all other teleosts,
in contrast, multiple α- and β-globin genes
are found in both forward and reverse orientations. In all species examined, the LA
cluster harbors two tandemly duplicated α-globin genes, and when
present, the β-globin genes are sandwiched in between the
α-globin genes but in the opposite orientation. The comparative
genomic analysis revealed that the β-globin genes of stickleback
are only present in the MN cluster, whereas the single β-globin
genes of fugu and green-spotted puffer are only present in the LA cluster. Thus, the
β-globin genes of stickleback and the two tetraodontid species
are not 1:1 orthologs. Furthermore, the set of three globin genes in the LA clusters of
fugu and green-spotted puffer appear to have been inverted relative to those of medaka,
stickleback, and zebrafish. This inversion hypothesis predicts that the LA
Hba1 genes from fugu and green-spotted puffer should be most closely
related to the LA Hba2 genes of medaka, platyfish, stickleback,
tilapia, and zebrafish.
The Globin Gene Cluster in Gar and the Origin of the MN and LA Clusters of
Teleosts
In most cases, orthologs of genes flanking the MN and LA globin clusters in teleost
fish are located in the vicinity of the α-globin gene cluster in
human and chicken, which appears to represent the ancestral location of the
proto-Hb gene in jawed vertebrates (Hoffmann, Opazo, and Storz 2012). The 2:1 pattern of conserved
synteny between teleost fish and tetrapods suggests that the MN and LA globin clusters
of teleost fish derive from the TGD, as suggested by Quinn et al. (2010). This inference is also supported by the
presence of duplicate copies of rhbdf1 in teleosts, which are
co-orthologs of the single-copy rhbdf1 in tetrapods. Additional
bioinformatic searches in the vicinity of the globin gene clusters revealed that most
teleosts also possess duplicate copies of shisa9 and
mlk2, one on the LA cluster and one on the MN cluster, that are
co-orthologous to single-copy genes on the same chromosome as the
α-globin gene cluster in human and chicken.Two additional lines of evidence support the hypothesis that the LA and MN clusters
represent paralogous products of the TGD. First, we tested the prediction that the
spotted gar (a nonteleost ray-finned fish) would possess a single globin gene cluster,
since the gar and teleost lineages diverged before TGD. Consistent with this prediction,
our comparative genomic analysis revealed that the spotted gar does indeed possess a
single globin gene cluster, ∼52 kb in length, that contains 5
α- and 5 β-globin genes in both forward
and reverse orientations (fig. 2). The
cluster is flanked by copies of c16or33, polr3k,
mgrn1, fox1j, aanat,
rhbdf1, mpg, and nprl3 on the left,
and by copies of luc7l and itfg3 on the right (fig. 1). Second, we tested the prediction that
the LA and MN gene clusters of teleosts descend from the same linkage group in the
reconstructed protokaryotype of the pre-TGDteleost ancestor. Consistent with this
prediction, an analysis of conserved synteny revealed that the MN and LA clusters of
medaka are embedded in paralogous chromosomal segments that trace their duplicative
origin to chromosome “e” in the pre-TGDteleost protokaryotype inferred by
Kasahara et al. (2007) and Nakatani et al. (2007).
Phylogenetic Relationships among Teleost α- and
β-Globin Genes
After characterizing the genomic organization of the globin gene clusters in spotted gar
and the seven teleost fish, we performed phylogenetic analyses to reconstruct the
duplicative history of the α- and β-globin
genes. For this analysis, we added the globin gene repertoires of cod and salmon to those
of fugu, green-spotted puffer, medaka, platyfish, stickleback, tilapia, and zebrafish, and
we also included sequences from representative tetrapods and cartilaginous fish for
comparative purposes. All of the different alignment strategies produced very similar
results for the α- and β-globin data sets,
and in both cases, we selected the L-INS-i alignment for use in the phylogenetic
reconstructions because it had the highest MUMSA score. Before estimating phylogenies, we
selected the best-fitting models of amino acid and nucleotide substitution based on the
Akaike information criterion with correction for small sample size. In analyses based on
nucleotide sequences, we selected an independent model for each codon position. Results of
the model estimation procedure can be found in supplementary
table S3, Supplementary
Material online.The estimated phylogenies of vertebrate globin sequences suggested that neither
α- or β-globin genes of ray-finned fish are
monophyletic relative to their tetrapod counterparts (fig. 3A and B). In the case of
α-globin genes, a clade of fish sequences that included a subset
of genes derived from the teleost LA cluster (LA Hba clade 1 + gar
Hba3) were placed sister to the chicken
α-globin gene, whereas all other fish
α-globins were placed in a second monophyletic group (fig. 3A). In the case of the
β-like globin genes, a clade of two gar sequences, including gar
Hbb4 and Hbb5, was placed sister to the chicken
β-globins. These arrangements suggest that multiple
α- and β-globin genes were present in the
common ancestor of Actinopterygii + Sarcopterygii.
F
Maximum likelihood phylogram depicting
relationships among the globin sequences of seven representative teleost fishes.
Phylogenetic reconstructions were based on the coding sequences of
α- and β-globin genes (panels
A and B, respectively). Cartilaginous fish
globins were used as outgroup sequences, and tetrapod sequences were included for
comparative purposes. Values on the nodes denote bootstrap support values (above)
and Bayesian posterior probabilities (below). Branches are color coded according to
the location of the genes: MN-linked genes are shown in blue, LA-linked genes are
shown in orange, and the gar genes are in green. Labels are color coded based on the
timing of their expression. The substitution models selected are listed in supplementary table S3, Supplementary Material online.
Maximum likelihood phylogram depicting
relationships among the globin sequences of seven representative teleost fishes.
Phylogenetic reconstructions were based on the coding sequences of
α- and β-globin genes (panels
A and B, respectively). Cartilaginous fish
globins were used as outgroup sequences, and tetrapod sequences were included for
comparative purposes. Values on the nodes denote bootstrap support values (above)
and Bayesian posterior probabilities (below). Branches are color coded according to
the location of the genes: MN-linked genes are shown in blue, LA-linked genes are
shown in orange, and the gar genes are in green. Labels are color coded based on the
timing of their expression. The substitution models selected are listed in supplementary table S3, Supplementary Material online.The phylogeny shown in figure
3A revealed that fish α-globins can be
arranged into two distinct clades, defined by the presence of gar Hba2
and gar Hba3, respectively. In turn, teleost
α-globins were arranged into five clades that (with the exception
of the cod LA Hba2 sequence) reflect their cluster of origin. The discordant position of
the cod LA Hba2 sequence probably represents an assembly artifact. Aside
from this cod sequence, all α-globin genes derived from the LA
cluster were grouped into two strongly supported clades: LA Hba clade 1
is sister to gar Hba3, and LA Hba clade 2 is embedded in
a strongly supported clade that includes all MN α-globin sequences
in addition to LA Hba2 from cod and Hba2,
Hba4, and Hba5 from spotted gar (fig. 3A). Genealogical relationships within
these two clades of LA α-globins are largely congruent with the
known organismal relationships, and in both cases, the deepest split separated the
zebrafish genes from those of the remaining euteleost taxa. As expected under the
cluster-inversion hypothesis, the leftmost α-globin genes in the LA
cluster of fugu and green-spotted puffer are most closely related to the rightmost
α-globin genes of medaka, platyfish, stickleback, tilapia, and
zebrafish, and vice versa (fig.
3A). Relationships among the α-globin
sequences in the MN cluster are more complex and are not easily reconciled with the
organismal phylogeny. The MN-linked genes are organized into three weakly supported clades
(fig. 3A). MN
Hba clade 1 contains salmon and platyfish sequences in addition to two
gar sequences, whereas MN Hba clade 2 contains zebrafish and salmon
sequences. MN Hba clade 3 was placed sister to LA Hba
clade 2 and includes sequences from all teleosts in addition to cod LA
Hba2. All species examined possess an α-globin
gene repertoire that includes representatives of at least three of the five clades, and
zebrafish possesses α-globin genes that are represented in four of
the five clades.In contrast to the α-globin genes, all teleost
β-globin genes were placed in a moderately well-supported clade,
which was placed sister to a clade of two gar Hbb sequences
(Hbb1 and Hbb2). The other two gar
Hbb sequences were placed sister to chicken Hbbs
(fig. 3B). The
β-globin genes could be arranged into four separate clades, three
of which were strongly supported, with sequences from the MN cluster forming a
paraphyletic group relative to those from the LA cluster. The
β-globins from the LA cluster were placed in a monophyletic group,
while those from the MN cluster can be grouped into three separate clades, with the
exception of Cod MN Hbb1, which is distantly related to the rest. MN
Hbb clade 1 contains sequences from medaka, salmon, tilapia, and
zebrafish; MN Hbb clade 2 contains sequences from platyfish, salmon,
tilapia, and zebrafish; and MN Hbb clade 3 contains sequences from cod,
medaka, platyfish, salmon, stickleback, tilapia, and zebrafish (fig. 3B). Within each of these clades, paralogs
from the same species almost invariably formed monophyletic groups, which likely reflects
a history of lineage-specific duplication, as with the Hba genes of the MN cluster. This
is particularly clear in the case of MN Hbb clade 3, where relationships
among the different paralogs are congruent with the known organismal phylogeny after
accounting for lineage-specific duplications.With the exception of β-globin genes from the LA cluster, globin
genes of the same subunit type from the MN or LA clusters did not form monophyletic
groups. Taken together, the analyses of conserved synteny (figs. 1 and 2) and
the phylogenetic reconstructions (fig. 3)
indicate that the pre-TGDglobin gene cluster of teleost fish contained at least two
α-globin genes and 2 β-globin genes.
Further, the positions of the gar sequences in the phylogenies of
α- and β-like globins indicate that multiple
globins of each subunit type were present in the common ancestor of gar and teleosts. If
further analyses confirm the paraphyly of ray-finned fish α- and
β-globins relative to their tetrapod homologs, it would indicate
that multiple α- and β-globin genes were
present in the common ancestor of Actinopterygii and Sarcopterygii. As for teleosts, after
the TGD but before divergence between zebrafish and the remaining euteleost species, one
of the two ancestral β-globin paralogs in the LA cluster was
secondarily lost such that the post-TGDglobin repertoire was reduced from 8 to 7 genes
(fig. 4). Similar lineage-specific patterns
of gene turnover have been documented in the α- and
β-globin gene clusters of mammals and other vertebrates (Hoffmann et al. 2008a, 2008b; Opazo et al.
2008a, 2008b, 2009; Hoffmann, Storz, et
al. 2010). On a deeper evolutionary timescale, lineage-specific duplications and
deletions have produced extensive variation in the size and membership composition of the
globin gene superfamily among different vertebrate classes and among different
deuterostome phyla and subphyla (Ebner et al.
2010; Hoffmann, Storz, et al. 2010;
Hoffmann et al. 2011; Storz et al. 2011; Hoffmann, Opazo, Hoogewijs, et al. 2012; Hoogewijs et al. 2012).
F
Evolutionary model describing the duplicative origins of the LA
and MN globin gene clusters of teleost fish and the inferred globin gene repertoire
in the common ancestor of teleosts and gar, a nonteleost ray finned fish. All
clusters depicted are hypothetical with the exception of the gar cluster. The order
of the α- and β-globin genes on the
hypothetical clusters is arbitrary.
Evolutionary model describing the duplicative origins of the LA
and MN globin gene clusters of teleost fish and the inferred globin gene repertoire
in the common ancestor of teleosts and gar, a nonteleost ray finned fish. All
clusters depicted are hypothetical with the exception of the gar cluster. The order
of the α- and β-globin genes on the
hypothetical clusters is arbitrary.The phylogenies in figure 3 indicate that all
salmon α- and β-globin genes are exclusively
found in association with MN-linked globin genes from other species. This reflects the
fact that salmonid fish have experienced an additional lineage-specific genome-duplication
and that all globin genes were deleted from the duplicated LA clusters and were retained
exclusively in the duplicated MN clusters (Quinn et
al. 2010). With the exception of fugu and green-spotted puffer, which possess
identical globin gene repertoires, all other species in our study show evidence of
lineage-specific duplications, which are much more frequent in the MN cluster. In fact,
aside from fugu and green-spotted puffer, all other species have expanded the repertoire
of α- and β-globin genes via lineage-specific
duplications. The most striking contrast is between the MN
α-globins from platyfish, tilapia, and zebrafish, and the MN
β-globins from stickleback. The stickleback
β-globins in the MN cluster derive from a recent set of
duplications, whereas the α-globins from the MN clusters of
platyfish, tilapia, and zebrafish derive from a combination of recent, lineage-specific
duplications of genes deriving from more ancient duplications that likely occurred before
the TGD.In addition to the differences in timing, these lineage-specific duplications also appear
to involve different mechanisms. In many instances, the expansions derive from single gene
duplications, such as the one giving rise to the duplicate Hbb paralogs in the zebrafish
LA cluster. On the other hand, the structure of the MN clusters of medaka, stickleback,
and zebrafish suggest that en bloc duplications are partly responsible for their
lineage-specific expansions in gene family size. In the case of the stickleback, the
presence of extensive internal colinearity within the MN cluster suggests that it expanded
by en bloc duplications involving either the
Hbb–Hba pair or an
Hbb–Hba–Hbb–Hba
four-gene set (fig. 5). The same can be said
for the zebrafishMN Hba2–Hbb2
gene pair and the MN Hba3–Hbb3
gene pairs in zebrafish (supplementary
fig. S1, Supplementary
Material online). However, comparisons of zebrafish MN
Hba4–6 and MN
Hbb4–6 gene pairs revealed low
levels of sequence similarity in flanking regions (supplementary
fig. S1, Supplementary
Material online).
F
Dot plots of intrachromosomal sequence similarity in the MN
globin gene clusters of medaka and stickleback. The fragment includes all genes in
the clusters in addition to 5 kb of flanking sequence. The diagonal self-identity
plot is shown in gray, as are the low-complexity areas in the medaka cluster. Note
that the intragenomic dot plot for the stickleback gene cluster shows longer tracts
of internal similarity off the self-identity diagonal relative to that for the
medaka gene cluster, shown in black.
Dot plots of intrachromosomal sequence similarity in the MN
globin gene clusters of medaka and stickleback. The fragment includes all genes in
the clusters in addition to 5 kb of flanking sequence. The diagonal self-identity
plot is shown in gray, as are the low-complexity areas in the medaka cluster. Note
that the intragenomic dot plot for the stickleback gene cluster shows longer tracts
of internal similarity off the self-identity diagonal relative to that for the
medaka gene cluster, shown in black.
Repeated Evolutionary Transitions in Functional Properties and Stage-Specific
Expression
In light of evidence that the developmental regulation of Hb synthesis has evolved
independently in multiple tetrapod lineages (Hoffmann, Storz, et al. 2010; Storz et
al. 2011, 2012), we tested for
evidence of a similar phenomenon in teleosts by reconstructing phylogenetic relationships
among α- and β-like globin genes that are
differentially expressed during development. For the purposes of this analysis, globin
genes were classified as “early-expressed” if they are preferentially
expressed during embryonic or larval developmental stages, whereas genes were classified
as “late-expressed” if they are preferentially expressed in juveniles or
adults (supplementary
table S4, Supplementary
Material online, fig. 2). Since
fugu and green-spotted puffer possess a single β-globin gene, we
assumed that this gene is expressed during all ontogenetic stages. For comparative
purposes, we included additional teleost globins that are known to be preferentially
expressed during embryogenesis in channel catfish (Ictalurus punctatus;
Chen et al. 2010), rainbow trout
(Oncorhynchus mykiss; Maruyama
et al. 1999), and salmon (Leong et al.
2010). We also analyzed late-expressed α- and
β-globin genes whose products are incorporated into tetrameric Hbs
with highly distinct functional properties, such as the
well-characterized anodic and cathodic Hbs of the European eel (Anguilla
anguilla; Fago et al. 1995, 1997) and dusky notothen (Trematomus
newnesi; Mazzarella et al. 1999).
Expression data for cod, medaka, and zebrafish were obtained from the literature. The cod
sequences were classified following Wetten et al.
(2010), the medaka sequences were classified following Maruyama, Yasumasu and Iuchi (2004), and the zebrafish sequences
were classified following Tiedke et al.
(2011). For globin genes in fugu, gar, green-spotted puffer, platyfish, salmon,
and tilapia, we inferred the timing of expression by identifying matches with sequences in
EST databases (supplementary
table S4, Supplementary
Material online). In the cases of sequences with no matches from the same
species as the query sequence or lack of developmental information for the EST matches,
the sequences were left as unclassified.Intriguingly, results of our analyses revealed repeated evolutionary transitions in
stage-specific expression during development. In some cases, paralogous genes in different
species evolved convergent expression patterns, and in other cases, orthologous genes
evolved divergent expression patterns. In the case of the α-like globin genes, LA
Hba clades 1 and 2 provide clear examples of probable 1:1 orthologs
that evolved differences in stage-specific expression (e.g., the early-expressed zebrafish
LA Hba1 and the late-expressed medaka LA Hba1; fig. 3A). In the case of
β-like globin genes, LA Hbb clade 1 illustrates a
similar pattern of replicated expression divergence (e.g., the early expressed zebrafish
LA Hbb1 and Hbb2 genes are clearly co-orthologous to
the adult-expressed medaka LA Hbb1; fig. 3B). These results demonstrate that the developmental
timing of globin gene expression is evolutionarily labile.In the α- and β-globin gene clusters of most amniotes, the
linear order of the genes reflects their temporal order of expression during development,
with early-expressed genes at the 5′ end of the cluster and late-expressed genes at
the 3′ end of the cluster (Hardison
2001). In the globin gene clusters of teleosts, in contrast, linear gene order is
not as strong a predictor of stage-specific expression. In the case of the zebrafish MN
cluster, all late-expressed genes are on the left and all early-expressed genes are on the
right, whereas in medaka, all genes on the left side are early-expressed and the genes on
the right are variable with respect to the developmental timing of expression, and in
tilapia, the early- and late-expressed genes are interspersed. Our results indicate that
the genes in the LA cluster provide the clearest evidence of lineage-specific changes in
gene expression.Since embryonic/fetal Hbs and adult-expressed Hbs exhibit consistent differences in
Oaffinity and sensitivity to allosteric regulators
(Ingermann 1997), convergence in
stage-specific expression also likely entailed convergence in functional properties.
Similarly, adult α- and β-globin genes that
encode the subunits of cathodic Hbs of European eel and dusky notothen are clearly not 1:1
orthologs (fig. 6), indicating that
specialized Hbs with similar functional properties evolved independently in different
teleost lineages. In fact, the dusky notothen cathodic Hba is closely related to sequences
in the LA cluster, whereas the eel cathodic Hba is closely related to sequences to the MN
cluster, suggesting they trace their duplicative origin at least to the TGD. Consistent
with other studies of vertebrate globins (Berenbrink
et al. 2005; Hoffmann et al. 2010),
these results demonstrate that similar expression patterns and functional properties in
the Hbs of distinct lineages may sometimes represent products of convergent evolution.
Although tandemly duplicated globin genes often evolve in concert due to interparalog gene
conversion (Hoffmann et al. 2008a, 2008b; Opazo et al. 2009; Runck et al.
2009; Storz et al. 2011),
paralogous genes that are products of genome duplications (also known as
“ohnologs”) can escape the homogenizing effects of gene conversion because
they are located on different chromosomes. This is one possible reason why paralogous gene
copies derived from genome duplications may be more likely to diverge in function than
tandem gene duplicates.
F
Maximum likelihood phylogram depicting relationships among the
globin genes of the seven fish species for which full genome sequence data were
available, plus sequences of functionally annotated globins from other teleost
species. The phylogenetic reconstructions were based on the amino acid sequences of
α- and β-globins (panels
A and B, respectively). Cartilaginous fish
globins were used as outgroup sequences, and tetrapod sequences were included for
comparative purposes. Values on the nodes denote bootstrap support values (above)
and Bayesian posterior probabilities (below). Genes are color coded according to
their time of expression. Genes expressed at the embryonic/larval stages are shown
in magenta, genes expressed at the juvenile and/or adult stages are shown light
blue, and genes expressed across all ontogenetic stages are shown in dark blue.
Genes with no record of expression and genes from nonactinopterygian vertebrates are
shown in gray. The substitution models selected are listed on supplementary table S3, Supplementary Material online.
Maximum likelihood phylogram depicting relationships among the
globin genes of the seven fish species for which full genome sequence data were
available, plus sequences of functionally annotated globins from other teleost
species. The phylogenetic reconstructions were based on the amino acid sequences of
α- and β-globins (panels
A and B, respectively). Cartilaginous fish
globins were used as outgroup sequences, and tetrapod sequences were included for
comparative purposes. Values on the nodes denote bootstrap support values (above)
and Bayesian posterior probabilities (below). Genes are color coded according to
their time of expression. Genes expressed at the embryonic/larval stages are shown
in magenta, genes expressed at the juvenile and/or adult stages are shown light
blue, and genes expressed across all ontogenetic stages are shown in dark blue.
Genes with no record of expression and genes from nonactinopterygian vertebrates are
shown in gray. The substitution models selected are listed on supplementary table S3, Supplementary Material online.
Conclusion
Results of our combined phylogenetic and comparative genomic analyses indicate that some of
the teleost α- and β-like globins are
representatives of ancient gene lineages, with duplicative origins that trace back at least
to the common ancestor of gar and teleost fish, and potentially back to the common ancestor
of Actinopterygii and Sarcopterygii (superclass Osteichthyes). Such a scenario is consistent
with the fact that Hb multiplicity has also been documented in cartilaginous fish (Fyhn and Sullivan 1975; Mumm et al. 1978; Weber et
al. 1983; Galderisi et al. 1996;
Dafre and Reischl 1997). Our results
indicate that the common ancestor of ray-finned fish possessed a fairly diverse globin gene
repertoire, and in teleosts, this inherited repertoire was further augmented by the TGD,
which produced dual sets of α- and β-like
globin genes on two paralogous chromosomes. These TGD-derived gene clusters underwent
lineage-specific changes in size and membership composition, and the MN gene cluster
underwent an especially high rate of gene turnover. The phylogenetic analyses of teleost
globins revealed repeated transitions in stage-specific expression patterns, demonstrating a
surprising fluidity in the genetic regulatory control of Hb synthesis during
development.
Supplementary Material
Supplementary
tables S1–S4, figure S1, and data file S1 are available at Molecular Biology and Evolution
online (http://www.mbe.oxfordjournals.org/).
Authors: David Hoogewijs; Bettina Ebner; Francesca Germani; Federico G Hoffmann; Andrej Fabrizius; Luc Moens; Thorsten Burmester; Sylvia Dewilde; Jay F Storz; Serge N Vinogradov; Thomas Hankeln Journal: Mol Biol Evol Date: 2011-11-24 Impact factor: 16.240
Authors: Anastasia V Nefedochkina; Natalia V Petrova; Elena S Ioudinkova; Anastasia P Kovina; Olga V Iarovaia; Sergey V Razin Journal: Histochem Cell Biol Date: 2016-02-04 Impact factor: 4.304
Authors: Juan C Opazo; Alison P Lee; Federico G Hoffmann; Jessica Toloza-Villalobos; Thorsten Burmester; Byrappa Venkatesh; Jay F Storz Journal: Mol Biol Evol Date: 2015-03-04 Impact factor: 16.240
Authors: Milan Malinsky; Hannes Svardal; Alexandra M Tyers; Eric A Miska; Martin J Genner; George F Turner; Richard Durbin Journal: Nat Ecol Evol Date: 2018-11-19 Impact factor: 15.460
Authors: Jay F Storz; Chandrasekhar Natarajan; Magnus K Grouleff; Michael Vandewege; Federico G Hoffmann; Xinxin You; Byrappa Venkatesh; Angela Fago Journal: J Exp Biol Date: 2020-01-23 Impact factor: 3.312
Authors: Yi Lei; Liandong Yang; Haifeng Jiang; Juan Chen; Ning Sun; Wenqi Lv; Shunping He Journal: Sci China Life Sci Date: 2020-10-10 Impact factor: 6.038