Literature DB >> 26446539

Whole Genome Sequencing of the Asian Arowana (Scleropages formosus) Provides Insights into the Evolution of Ray-Finned Fishes.

Christopher M Austin1, Mun Hua Tan1, Larry J Croft2, Michael P Hammer3, Han Ming Gan4.   

Abstract

The Asian arowana (Scleropages formosus) is of commercial importance, conservation concern, and is a representative of one of the oldest lineages of ray-finned fish, the Osteoglossomorpha. To add to genomic knowledge of this species and the evolution of teleosts, the genome of a Malaysian specimen of arowana was sequenced. A draft genome is presented consisting of 42,110 scaffolds with a total size of 708 Mb (2.85% gaps) representing 93.95% of core eukaryotic genes. Using a k-mer-based method, a genome size of 900 Mb was also estimated. We present an update on the phylogenomics of fishes based on a total of 27 species (23 fish species and 4 tetrapods) using 177 orthologous proteins (71,360 amino acid sites), which supports established relationships except that arowana is placed as the sister lineage to all teleost clades (Bayesian posterior probability 1.00, bootstrap replicate 93%), that evolved after the teleost genome duplication event rather than the eels (Elopomorpha). Evolutionary rates are highly heterogeneous across the tree with fishes represented by both slowly and rapidly evolving lineages. A total of 94 putative pigment genes were identified, providing the impetus for development of molecular markers associated with the spectacular colored phenotypes found within this species.
© The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  evolutionary rate; fish; genome; phylogenomics; pigmentation genes

Mesh:

Substances:

Year:  2015        PMID: 26446539      PMCID: PMC4684697          DOI: 10.1093/gbe/evv186

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

More than half of all vertebrate species are fishes, with the Class Osteichthyes (bony fish) being the most diverse class within the Subphylum Vertebrata. (Santini et al. 2009; Near et al. 2012; Betancur-R et al. 2013). Fish have a long evolutionary history extending over 500 Myr into the Cambrian, with the evolution of the jawless fishes, which are currently represented by the lampreys (Agnatha). Jawed fishes (Gnathostoma) evolved some 450 Ma and are divided among three lineages: the cartilaginous fishes (Chondrichthyes), the bony fishes (Osteichthyes), and the lobe-finned fishes (Sarcopterygii). With the availability of more molecular genetic and genomic data, there has been increasing interest in understanding the diversification of the major fish groups and the molecular evolutionary dynamics of fish lineages, their timing, and evolution of specific genes (Inoue et al. 2003; Takezaki et al. 2004; Shan and Gras 2011; Near et al. 2012; Zou et al. 2012; Amemiya et al. 2013; Betancur-R et al. 2013; Broughton et al. 2013; Opazo et al. 2013; Dornburg et al. 2014; Venkatesh et al. 2014). Of the 3 lineages in which fish are found, the bony fishes are by far the most diverse with nearly 30,000 recognized species and there has been much interest in understanding the drivers of their evolutionary success. Significant attention has been given to the impact of what is generally known as the fish- or teleost-specific genome duplication event (TGD) (Robinson-Rechavi et al. 2001; Hoegg et al. 2004; Hurley et al. 2005). Chromosomal duplications may provide opportunities for evolutionary experimentation, as paralogous genes are exapted to new functions, thereby facilitating rapid morphological, physiological, and behavioral diversification (Taylor et al. 2001; Hoegg et al. 2004; Meyer and Van de Peer 2005; Santini et al. 2009; Opazo et al. 2013). The Asian arowana (Scleropages formosus: Osteoglossidae) is of fundamental interest to fish phylogenetics as it belongs to one of the oldest teleost groups, the Osteoglossomorpha. This lineage comprises the mooneyes, knifefish, elephantfish, freshwater butterflyfish, and bonytongues, and is one of the three ancient extant lineages that diverged immediately after the TGD. The other two are the Elopomorpha comprising eels, tarpons and bonefish, and the Clupeocephala, which embraces the majority of teleost diversity including the species-rich Ostariophysi (e.g., catfish, carps and minnows, tetras) and Percomorphaceae (e.g., wrasse, cichlids, gobies, flatfish) (Betancur-R et al. 2013; Broughton et al. 2013; Betancur-R, Naylor, et al. 2014; Betancur-R, Wiley, et al. 2014). There has been on-going disagreement on which one is the sister group to all other teleosts (Patterson and Rosen 1977; Nelson 1994; Arratia 1997; Patterson 1998; Zou et al. 2012). Historically, the Osteoglossomorph was considered to have diverged first (Patterson and Rosen 1977; Lauder and Liem 1983; Nelson 1994; Inoue et al. 2003; Brinkmann et al. 2004); however, comprehensive morphological studies, including both fossil and extant teleosts, and recent molecular-based studies supported the Elopomorpha as the sister lineage to all other bony fishes (Arratia 1997, 1999, 2000; Li and Wilson 1999; Diogo 2007; Santini et al. 2009; Near et al. 2012; Betancur-R et al. 2013; Broughton et al. 2013). The arowana, sometimes also referred to as dragon fish, is also noteworthy as it is one of the most expensive fish in the world due to the occurrence of several bright color morphs that makes it highly sought after as an ornamental species (Dawes et al. 1999; Yue et al. 2006). Potentially relevant in this context is that teleost fishes are thought to have a greater range of pigment synthesis genes and pathways than any other vertebrate group (Braasch et al. 2009). However, the basis of color variation has seen little research in arowana with the exception of studies by Mohd-Shamsudin et al. (2011) and Mu et al. (2012) who found no consistent patterns of divergence between color variants and mitochondrial markers. Scleropages formosus is also of significant conservation concern in the wild. The species is listed by the International Union for Conservation of Nature (IUCN) as endangered (Kottelat 2013) and by the Convention on International Trades in Endangered Species of Wild Fauna and Flora as “highly endangered” (Yue et al. 2006). In this study, we present the whole genome sequences for S. formosus obtained from a captive Malaysian specimen, as a representative of the local wild form. We then place this species within a phylogenetic framework including sequences from all available fish with sequenced genomes making this the most complete phylogenomic analysis of fish so far conducted. We also carry out analysis of the rate of molecular evolution within and between fish lineages and identify a range of genes associated with pigmentation.

Genome Sequencing, Assembly, and Annotation

A total of 297,227,578 paired-end and 290,438,918 mate-pair reads (2 × 100 bp) were generated. Preprocessing resulted in 291,628,300 paired-end and 288,008,898 mate-pair reads, and these were subsequently assembled to generate a draft genome that consists of 42,110 scaffolds with a total size of 708 Mb and 2.85% gaps. The longest scaffold is 616,488 bp long and the N50 scaffold length is 58,849 bp. We also carried out a k-mer-based approach using read data and estimated the arowana genome size at approximately 900 Mb, a number in accord with the size of 1.05 Gb reported by Shen et al. (2014) estimated through flow cytometric comparative fluorescence with chicken cells. Based on these estimates, sequencing depth estimations ranging from 57 to 66 × coverage were inferred. Features predicted from the assembly include 24,274 protein-coding genes, 609 transfer RNAs (tRNAs), and 29 ribosomal RNAs (100% 5S rRNA). Based on sequence similarity (e-value threshold of 1 × 10−10, hit coverage cut-off of 70%), 71% of the predicted genes shared sequence similarity to another protein in the nonredundant (NR) database on National Center for Biotechnology Information (NCBI). For protein-coding genes, 95.8% have Annotation Edit Distance (Eilbeck et al. 2009) scores of less than 0.5 and 85.5% contain at least one Pfam domain, an indication of a well-annotated genome (Campbell et al. 2014). The gene space in this assembly appears fairly complete with 93.95% of core eukaryotic genes represented. This is further supported by the mapping of 78.92% of transcriptomic reads sequenced from a different arowana sample from Shen et al. (2014) to our assembled genome, with 64.32% of unmapped reads belonging to 18S and 28S ribosomal genes and 7.60% to mitochondrial genes. These genes are usually present in high copy numbers and may not have been assembled in our de novo assembly due to exceedingly high read coverage and short read lengths (Nagarajan and Pop 2013). This finding is also consistent with the lack of specific rRNAs (18S, 28S) predicted from the assembly.

Phylogenomics and Evolutionary Rates

Our sample of arowana shows a 100% identity to the most common mitochondrial cytochrome c oxidase subunit 1 (COI) haplotype (accession number: HM156394) found among Malaysian specimens by Mohd-Shamsudin et al. (2011) and is 99.87% similar to the complete COI gene (accession number: DQ023143) from a fish obtained from a commercial farm in Singapore (Yue et al. 2006). Tree-based ortholog inference resulted in a set of orthologous proteins belonging to 177 gene families (supplementary material S1, Supplementary Material online) shared across all 23 fishes and 4 tetrapod species (table 1). Concatenation of each aligned ortholog generated a final supermatrix comprising of a total of 71,360 amino acid sites per species with only 7.07% gaps. The aligned supermatrix and the best-fit partitioning scheme generated by PartitionFinder can be found in supplementary materials S2 and S3, Supplementary Material online. Rooted with the Chondrichthyes, both Bayesian (BI) and maximum-likelihood (ML) inferred phylogenomic trees display a topology largely consistent with recent studies with either more limited taxon sampling (Zou et al. 2012; Amemiya et al. 2013) or smaller gene sampling (Broughton et al. 2013; Glasauer and Neuhauss 2014; Braasch et al. 2015) with respect to evolutionary relationships and taxonomic classification (fig. 1).
Table 1

List of Species Included in the Phylogenetic Analyses

OrganismsourceScientific NameClassOrderReference
Ray-finned fish
    Asian arowana*Scleropages formosusActinopterygiiOsteoglossiformesThis study
    European eelZAnguilla anguillaActinopterygiiAnguilliformesHenkel et al. (2012)
    MedakaEOryzias latipesActinopterygiiBeloniformesKasahara et al. (2007)
    Blind cave fishEAstyanax mexicanusActinopterygiiCharaciformesMcGaugh et al. (2014)
    Common carpCCyprinus carpioActinopterygiiCypriniformesXu et al. (2014)
    ZebrafishEDanio rerioActinopterygiiCypriniformesHowe et al. (2013)
    Amazon mollyEPoecilia formosaActinopterygiiCyprinodontiformesUnpublished
    Southern platyfishEXiphophorus maculatusActinopterygiiCyprinodontiformesSchartl et al. (2013)
    Northern pikeVEsox luciusActinopterygiiEsociformesRondeau et al. (2014)
    Atlantic codEGadus morhuaActinopterygiiGadiformesStar et al. (2011)
    Three-spined sticklebackEGasterosteus aculeatusActinopterygiiGasterosteiformesJones et al. (2012)
    Electric eelFElectrophorus electricusActinopterygiiGymnotiformesGallant et al. (2014)
    Spotted garELepisosteus oculatusActinopterygiiLepisosteiformesUnpublished
    Nile tilapiaEOreochromis niloticusActinopterygiiPerciformesBrawand et al. (2014)
    Atlantic salmonSASalmo salarActinopterygiiSalmoniformesDavidson et al. (2010)
    Rainbow troutGOncorhynchus mykissActinopterygiiSalmoniformesBerthelot et al. (2014)
    Japanese pufferETakifugu rubripesActinopterygiiTetraodontiformesAparicio et al. (2002)
    Green spotted pufferETetraodon nigroviridisActinopterygiiTetraodontiformesJaillon et al. (2004)
Lobe-finned fish
    African coelacanthELatimeria chalumnaeSarcopterygiiCoelacanthiformesAmemiya et al. (2013)
    aLungfishSRProtopterus annectensSarcopterygiiLepidosireniformesAmemiya et al. (2013)
Cartilaginous fish
    Elephant sharkACallorhinchus miliiChondrichthyesChimaeriformesVenkatesh et al. (2014)
    bSmall-spotted catsharkSKScyliorhinus caniculaChondrichthyesCarchariniformesWyffels et al. (2014)
    bLittle skateSKLeucoraja erinaceaChondrichthyesRajiformesWang et al. (2012)
Tetrapods
    Western clawed frogEXenopus tropicalisAmphibiaAnuraFuchs et al. (2006)
    ChickenEGallus gallusAvesGalliformesHillier et al. (2004)
    HumanEHomo sapiensMammaliaPrimatesVenter et al. (2001)
    LizardEAnolis carolinensisReptiliaSquamataAlföldi et al. (2011)

Note.—Codes for source: A*STAR (A), CarpBase (C), Ensembl (E), efish genomics (F), Genoscope (G), SalmonDB (SA), SkateBase (SK), SRA (SR), UVic (V), ZF Genomics (Z), this study (*).

aRaw transcriptome reads were used.

bAssembled transcripts were used.

F

Phylogenetic relationships among fish species. The phylogenetic tree was inferred from a supermatrix containing the alignment of sequences from 27 species (177 orthologous proteins, 71,360 aligned amino acid positions, 7.07% gaps) and was rooted with the Chondrichthyes. Black circles indicate maximum nodal support with bootstrap values of 100% and Bayesian posterior probabilities of 1.00. The yellow and green circles represent 93% and 98% bootstrap support values, respectively, both with maximal Bayesian posterior probability values of 1.00. Branch length information is included and the rate of molecular evolution (number of amino acid substitutions per site) for each fish lineage is placed beside each taxa label. These values were calculated from the split of all ray-finned fish from lobe-finned fish and tetrapod lineages (node indicated with the orange star). A (T) is placed next to the species for which transcriptome data were utilized.

Phylogenetic relationships among fish species. The phylogenetic tree was inferred from a supermatrix containing the alignment of sequences from 27 species (177 orthologous proteins, 71,360 aligned amino acid positions, 7.07% gaps) and was rooted with the Chondrichthyes. Black circles indicate maximum nodal support with bootstrap values of 100% and Bayesian posterior probabilities of 1.00. The yellow and green circles represent 93% and 98% bootstrap support values, respectively, both with maximal Bayesian posterior probability values of 1.00. Branch length information is included and the rate of molecular evolution (number of amino acid substitutions per site) for each fish lineage is placed beside each taxa label. These values were calculated from the split of all ray-finned fish from lobe-finned fish and tetrapod lineages (node indicated with the orange star). A (T) is placed next to the species for which transcriptome data were utilized. List of Species Included in the Phylogenetic Analyses Note.—Codes for source: A*STAR (A), CarpBase (C), Ensembl (E), efish genomics (F), Genoscope (G), SalmonDB (SA), SkateBase (SK), SRA (SR), UVic (V), ZF Genomics (Z), this study (*). aRaw transcriptome reads were used. bAssembled transcripts were used. The rapid and divergent evolution of certain ray-finned fish groups is apparent in the tree from the relatively long branch lengths. Substantial evolutionary rate heterogeneity is observed within and among fish lineages by the comparison of amino acid substitutions per site calculated from branch lengths (fig. 1). Furthermore, based on Tajima’s relative rate test (supplementary material S4, Supplementary Material online), the Asian arowana was reported to have a significantly different evolutionary rate in comparison with other ray-finned fish lineages with P values ranging from 0 to 0.00048 (European eel). Using a Bonferroni corrected critical P value of 0.00098 (equivalent to α = 0.05 for a single test) results in the rejection of null hypothesis of equal rates of evolution between the arowana lineages and all other fish species. A major difference in our estimated phylogenetic relationships to other recent studies is the placement of the arowana sample as the sister lineage to all other teleost lineages, which conflicts with morphology-based studies and more recent molecular perspectives which posit that Elopomorpha is the sister group to all other teleost lineages (Arratia 1997, 1999; Li and Wilson 1999; Diogo 2007; Broughton et al. 2013; Glasauer and Neuhauss 2014). However, our result is consistent with other studies that have the Osteoglossomorpha as the sister lineage to all other teleosts (Patterson and Rosen 1977; Lauder and Liem 1983; Nelson 1994; Inoue et al. 2003; Brinkmann et al. 2004). We look forward to more comprehensive genomic resources becoming available with greater taxon sampling for teleost fishes to allow more rigorous testing of these alternate hypotheses. Our results support the findings of Amemiya et al. (2013) who found that the lungfish and not the coelacanth to be the closest relative to the tetrapods, which has also been a subject to much disputation (Brinkmann et al. 2004; Takezaki et al. 2004; Shan and Gras 2011). However, although we also found that the coelacanth proteins evolve at a slower rate relative to those of the tetrapods, from figure 1 it can be seen that the substitution rate in the coelacanth lineage is more than half of that for the tetrapod lineage, which is substantially faster than that observed by Amemiya et al. (2013). This discrepancy is most likely a result of the use of different protein data sets, taxon sampling, and outgroups in the two studies and provides a caveat for generalizing results from a single study even when utilizing information from a large number of genes.

Putative Pigmentation Genes

A total of 94 different pigmentation genes were identified from our genome sequences (table 2). Only the best hit for each pigmentation gene was retained in the table and these are grouped into various functional categories related to melanophore development, components of melanosomes, melanosome construction, melanosome transport, regulation of melanogenesis, systemic effects, xanthophore development, pteridine synthesis, iridophore development, and other functions as shown by Braasch et al. (2009). This result indicates that a wide range of pigmentation genes have been retained across the teleosts and will provide a valuable resource for the study of the genetic and developmental basis for the spectacular color phenotypes of the Asian arowana.
Table 2

Putative Arowana Pigmentation Genes

GeneAccession (Homo sapiens)Locus ID (arowana)PIDe-valueAccession (annotation)Species
Melanophore development
    adam17NP_003174.3Z043_11571668.980.00XP_010733184.1Larimichthys crocea
    adamts20NP_079279.3Z043_10647571.360.00XP_008274326.1Stegastes partitus
    creb1NP_004370.1Z043_12298795.370.00XP_005167757.1Danio rerio
    ece1NP_001106819.1Z043_11262880.030.00CDQ77702.1Oncorhynchus mykiss
    EdnrbNP_001116131.1Z043_10507681.500.00XP_007254865.1Astyanax mexicanus
    EgfrNP_958439.1Z043_114891
    fgfr2NP_000132.3Z043_10486684.500.00KKF10433.1La. crocea
    frem2NP_997244.4Z043_10138270.220.00XP_012683949.1Clupea harengus
    fzd4NP_036325.2Z043_10875589.760.00XP_012693402.1Cl. harengus
    gna11NP_002058.2Z043_10631096.020.00XP_010750457.1La. crocea
    gnaqNP_002063.2Z043_11408186.570.00XP_010735114.1La. crocea
    gpc3NP_001158091.1Z043_10123552.033 × 10−175XP_006639062.1Lepisosteus oculatus
    gpr161NP_722561.1Z043_11675073.060.00XP_007227875.1As. mexicanus
    hdac1NP_004955.2Z043_10821096.710.00XP_006631299.1Le. oculatus
    ikbkgNP_003630.1Z043_10576164.162 × 10−170XP_010903123.1Esox lucius
    itgb1NP_596867.1Z043_11674971.960.00NP_001030143.1D. rerio
    KitNP_001087241.1Z043_11885471.890.00XP_008297546.1St. partitus
    lef1NP_057353.1Z043_100731
    lmx1aNP_001167540.1Z043_10887191.039 × 10−180XP_008417499.1Poecilia reticulata
    mbtps1NP_003782.1Z043_10439186.310.00XP_009291810.1D. rerio
    mcoln3NP_060768.8Z043_11021369.960.00XP_006634884.1Le. oculatus
    mitfNP_937801.1Z043_10535783.910.00XP_006630679.1Le. oculatus
    pax3NP_039230.1Z043_107599
    rab32NP_006825.1Z043_10428178.476 × 10−118XP_012671987.1Cl. harengus
    scarb2NP_005497.1Z043_10539778.220.00NP_001117983.1O. mykiss
    sfxn1NP_073591.2Z043_12111989.100.00XP_010895582.1E. lucius
    snai2NP_003059.1Z043_11723185.885 × 10−164XP_003759837.1Sarcophilus harrisii
    sox10NP_008872.1Z043_10624277.780.00XP_008294581.1St. partitus
    sox18NP_060889.1Z043_10746961.333 × 10−161XP_001337702.1D. rerio
    sox9NP_000337.1Z043_11891779.080.00XP_006635207.1Le. oculatus
    tfap2aNP_001027451.1Z043_11993386.120.00XP_006634534.1Le. oculatus
    trpm1NP_001238949.1Z043_11166671.060.00XP_006629107.1Le. oculatus
    trpm7NP_060142.3Z043_10044182.160.00XP_006628750.1Le. oculatus
    wnt1NP_005421.1Z043_12012993.510.00XP_010873444.1E. lucius
    wnt3aNP_149122.1Z043_11818496.120.00XP_008312650.1Cynoglossus semilaevis
    zic2NP_009060.2Z043_10177988.540.00XP_006638968.1Le. oculatus
Components of melanosomes
    dctNP_001913.2Z043_10852673.90.00XP_008326759.1Cy. semilaevis
    rab32NP_006825.1Z043_11653667.761 × 10−88XP_003224067.2Anolis carolinensis
    rab38NP_071732.1Z043_12211290.051 × 10−126AAI50366.1D. rerio
    slc24a4NP_705934.1Z043_11425181.840.00XP_005803162.1Xiphophorus maculatus
    slc24a5NP_995322.1Z043_10339682.060.00XP_005814818.1X. maculatus
    tyrp1NP_000541.1Z043_10795674.520.00XP_005743086.1Pundamilia nyererei
Melanosome construction
    ap3d1NP_003929.4Z043_12076273.210.00XP_011472829.1Oryzias latipes
    fig4NP_055660.1Z043_10311586.550.00XP_006626354.1Le. oculatus
    gpr143NP_000264.2Z043_10217578.420.00XP_012680526.1Cl. harengus
    hps3NP_115759.2Z043_10037070.790.00XP_012680760.1Cl. harengus
    lystNP_001288294.1Z043_10075769.990.00XP_008300589.1St. partitus
    nsfNP_006169.2Z043_10844793.610.00XP_005164054.1D. rerio
    pldnNP_036520.1Z043_10941478.424 × 10−73XP_008274283.1St. partitus
    rabggtaNP_004572.3Z043_121567
    txndc5NP_110437.2Z043_11662677.020.00CDQ77189.1O. mykiss
    vps11NP_068375.3Z043_12108190.410.00XP_010863485.1E. lucius
    vps18NP_065908.1Z043_11126785.090.00XP_010892538.1E. lucius
    vps33aNP_075067.2Z043_11654294.660.00CDQ76904.1O. mykiss
    vps39NP_056104.2Z043_11704789.050.00XP_010749485.1La. crocea
Melanosome transport
    mlphNP_077006.1Z043_10168762.900.00XP_005168768.1D. rerio
    myo5aNP_000250.3Z043_10244886.240.00XP_006628770.1Le. oculatus
    myo7aNP_001120652.1Z043_10093178.910.00AAI63570.1D. rerio
    rab27aNP_899059.1Z043_11197387.892 × 10−148XP_006628775.1Le. oculatus
Regulation of melanogenesis
    creb1NP_004370.1Z043_12298795.370.00XP_005167757.1D. rerio
    drd2NP_000786.1Z043_11298083.670.00XP_006642348.1Le. oculatus
    mc1rNP_002377.4Z043_12163676.154 × 10−167AGC50885.1Cyprinus carpio
    mgrn1NP_001135763.2Z043_11124985.270.00XP_006637253.1Le. oculatus
    pomcNP_001030333.1Z043_10334051.727 × 10−66AAO17793.1Anguilla japonica
Systemic effects
    atp6ap1NP_001174.2Z043_10810266.240.00XP_012682891.1Cl. harengus
    atp6ap2NP_005756.2Z043_10088275.140.00XP_012675204.1Cl. harengus
    atp6v0cNP_001185498.1Z043_12512295.363 × 10−90XP_008434615.1P. reticulata
    atp6v0d1NP_004682.2Z043_12193394.480.00NP_955914.1D. rerio
    atp6v1e1NP_001687.1Z043_10454992.092 × 10−143XP_007579195.1Poecilia formosa
    atp6v1fNP_004222.2Z043_100808100.004 × 10−81XP_006633325.1Le. oculatus
    atp6v1hNP_998784.1Z043_11348390.610.00XP_007260238.1As. mexicanus
    atp7bNP_000044.2Z043_12208854.410.00XP_010017200.1Nestor notabilis
    rps19NP_001013.1Z043_11893991.677 × 10−95XP_008329573.1Cy. semilaevis
    rps20NP_001014.1Z043_107890100.004 × 10−80NP_001117836.1O. mykiss
Xanthophore development
    atp6v1e1NP_001687.1Z043_10454992.092 × 10−143XP_007579195.1P. formosa
    atp6v1hNP_998784.1Z043_11348390.610.00XP_007260238.1As. mexicanus
    csf1rNP_001275634.1Z043_11885471.890.00XP_008297546.1St. partitus
    ednrbNP_001116131.1Z043_10507681.500.00XP_007254865.1As. mexicanus
    ghrNP_001229389.1Z043_10116057.240.00BAD20706.1An. japonica
    pax3NP_039230.1Z043_107599
    sox10NP_008872.1Z043_10624277.780.00XP_008294581.1St. partitus
Pteridine synthesis
    gchiNP_001019195.1Z043_11044981.941 × 10−125XP_007231033.1As. mexicanus
    mycbp2NP_055872.4Z043_10447391.140.00XP_007251746.1As. mexicanus
    paicsNP_001072992.1Z043_12186887.940.00XP_010870568.1E. lucius
    pcbd1NP_000272.1Z043_10584295.051 × 10−66XP_012672435.1Cl. harengus
    PtsNP_000308.1Z043_10301581.212 × 10−84XP_012670027.1Cl. harengus
    qdprNP_000311.2Z043_10996286.835 × 10−129XP_006137052.1Pelodiscus sinensis
    SprNP_003115.1Z043_11428863.646 × 10−126NP_001133746.1Salmo salar
    xdhNP_000370.2Z043_11538469.120.00XP_006636840.1Le. oculatus
Iridophore development
    atp6v1hNP_998784.1Z043_11348390.610.00XP_007260238.1As. mexicanus
    dacNP_001077.2Z043_12329273.280.00ACN11084.1Sa. salar
    ednrbNP_001116131.1Z043_10507681.500.00XP_007254865.1As. mexicanus
    LtkNP_002335.2Z043_11842468.810.00XP_010877407.1E. lucius
    sox10NP_008872.1Z043_10624277.780.00XP_008294581.1St. partitus
    sox9NP_000337.1Z043_11891779.080.00XP_006635207.1Le. oculatus
    trim33NP_056990.3Z043_11560966.930.00NP_001002871.2D. rerio
    vps18NP_065908.1Z043_11126785.090.00XP_010892538.1E. lucius
    vps39NP_056104.2Z043_11704789.050.00XP_010749485.1La. crocea
Uncategorized function
    abhd11NP_683711.1Z043_11726279.649 × 10−155XP_010893523.1E. lucius
    ebna1bp2NP_006815.2Z043_12330077.787 × 10−146XP_006634973.1Le. oculatus
    gfpt1NP_002047.2Z043_10157495.160.00XP_006625541.1Le. oculatus
    gja5NP_859054.1Z043_10734371.020.00XP_008273833.1St. partitus
    irf4NP_002451.2Z043_10275975.710.00XP_006634623.1Le. oculatus
    kcnj13NP_002233.2Z043_11919471.767 × 10−173XP_010768290.1Notothenia coriiceps
    pabpc1NP_002559.2Z043_10957296.200.00XP_007230879.1As. mexicanus
    skiv2l2NP_056175.3Z043_11215491.680.00XP_006627067.1Le. oculatus
    tpcn2NP_620714.2Z043_11504162.500.00CDQ78014.1O. mykiss
Putative Arowana Pigmentation Genes

Materials and Methods

Sample Collection and DNA Extraction

A tail fin sample of S. formosus from a specimen was donated by the Malaysian Freshwater Fisheries Research Centre (FRI Glami Lemi). DNA was extracted using Qiagen Blood and Tissue DNA extraction kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Then, 1 µg of the purified DNA was sheared (500 bp setting) using Covaris S220 (Covaris, Woburn, MA) and prepped with Illumina TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA) according to the manufacturer’s instructions. Additionally, a 3-kb insert mate-pair library was generated using the Illumina Mate Pair Library Prep Kit. Both libraries were quantified using KAPA library quantification kit (KAPA Biosystems, Capetown, South Africa) and sequenced on the Illumina HiSeq 2000 using the 2 × 101 bp paired-end read setting (Illumina) located at the Malaysian Genomics Resource Centre.

Genome Size Estimation based on k-mer Frequency in Sequence Reads

Genome size of S. formosus was approximated from k-mer frequency distributions in raw genomic reads as was done by Li et al. (2010). Frequencies of distinct 15-, 17-, 19-, and 21-mers occurring in genomic reads from the paired-end library were counted using JELLYFISH (Marçais and Kingsford 2011). The real sequencing depth (N) was estimated from the peak of each frequency distribution (M), read length (L), and k-mer length (K) correlated according to the following formula: M = N × (L − K + 1)/L. Genome size was then approximated from the division of total genomic bases by the real sequencing depth.

Assembly and Annotation of the Scleropages formosus Genome

Raw reads were error corrected and preprocessed by removing low-quality reads (average Phred quality ≤20) and reads containing more than 10% ambiguous nucleotides. The resulting set of reads longer than 30 bp were assembled and scaffolded using the MSR-CA genome assembler (now renamed MaSuRCA, with default settings) (Zimin et al. 2013). Further scaffolding was carried out with reads from the mate-pair library using Scaffolder (Barton MD and Barton HA 2012). The final draft assembly consists of scaffolds longer than 200 bp. Finally, the CEGMA program (Parra et al. 2007) was used to assess the completeness of the assembly by detecting the presence of 248 highly conserved proteins within the draft genome. To compare our draft assembly with other arowana resources, transcriptomic reads generated using 454 pyrosequencing from the Asian arowana transcriptome (Shen et al. 2014) were aligned to the draft genome using GMAP (Wu and Watanabe 2005). Unmapped transcriptomic reads were further characterized by a BLASTN (Altschul et al. 1990) search against the NT database on NCBI. Arowana transcriptome reads were downloaded (SRA: SRR941557, SRR941783, SRR941785), preprocessed with QTrim (default settings) (Shrestha et al. 2014), and assembled de novo using IDBA-tran (–max_isoforms 10 –maxk 80) (Peng et al. 2013). To predict protein-coding genes, MAKER (Cantarel et al. 2008) was run on the arowana genome using the assembled arowana transcriptome and Ensembl proteins from zebrafish (Danio rerio), Nile tilapia (Oreochromis niloticus), medaka (Oryzias latipes), and Japanese puffer (Takifugu rubripes) as evidence. Repetitive regions were masked with all organisms in RepBase. MAKER was run iteratively to train the SNAP (Korf 2004) gene predictor in a bootstrap fashion to improve the predictor’s performance, and final MAKER predictions were made using the trained SNAP as well as Augustus trained with the zebrafish species model. Functional annotation of the predicted sequences was performed with a BLASTP (Altschul et al. 1990) search (e-value threshold of 1 × 10−10) against vertebrate proteins in NCBI’s NR database. A 70% blast hit coverage cut-off (based on subject length) was also applied to obtain confident annotations. Unannotated protein sequences were then searched against all sequences in NCBI’s NR database with the same e-value and hit coverage cut-offs. Gene ontologies, protein domains, and families were identified with InterProScan (Jones et al. 2014). tRNA genes in the assembly were detected by MAKER using tRNAscan (Lowe and Eddy 1997), while RNAmmer (Lagesen et al. 2007) was used to predict rRNA sequences.

Orthology Inference

Data selection for phylogenomic analyses is controversial and centers on issues of data quality and quantity and on benefits of taxon sampling versus high data coverage that minimizes alignment gaps (Laurin-Lemay et al. 2012; Amemiya et al. 2013; Betancur-R et al. 2013; Misof et al. 2013; Salichos and Rokas 2013). We take a conservative approach that minimizes gaps in the supermatrix and use several ways to carefully distinguish orthologs from paralogs to assemble a high quality phylogenomic data set, ensuring the estimation of a robust and accurate tree, including the placement of the deeper lineages in the tree. First, because conserved genes make for the best phylogenomic markers (Betancur-R et al. 2013), Hidden Markov Model (HMM) profiles from the TreeFam database (Schreiber et al. 2014) of gene families conserved across 104 other animal species were used to identify these conserved protein sequences in the arowana genome. For all species, protein sequences longer than 100 amino acids were scanned for sequence homology to gene families in the TreeFam database (version 9) (Schreiber et al. 2014) using hmmsearch (Eddy 2011) (e-value threshold of 1 × 10−10) and gene families having sequence homology to at least one protein in all 27 species were retained for subsequent orthology inference. Orthology inference from these protein clusters was conducted with scripts from the pipeline recently described by Yang and Smith (2014), which employs a tree-based approach to first identify paralogs, prune spurious branches, and finally identify orthologs. Briefly, protein sequences in each gene family were aligned and trimmed with the fasta_to_tree.py script. In addition, clusters containing paralogs were limited during orthology inference by implementing a tree-based approach on individual sequence clusters, along with additional pruning steps, to separate paralogs and orthologs (Yang and Smith 2014). Due to computational limitations, we modified the pipeline to use IQ-TREE (Nguyen et al. 2015) to build smaller gene trees (less than 1,000 sequences) and FastTreeMP (Price et al. 2010) for larger gene trees. For each tree, tips longer than 0.5 (=absolute tip cut-off) or longer than 0.2 and ten times longer than its nearby tips (=relative tip cut-off) were trimmed with trim_tips.py. Monophyletic tips belonging to the same taxon were masked with mask_tips_by_taxonID_genomes.py. Internal branches longer than 0.3, which may be separating orthologous groups, were cut with cut_long_internal_branches.py and only trees containing sequences from all 27 species were retained, thus reducing the amount of missing data and lowering the potential for nonphylogenetic signals (Borowiec et al. 2015). Protein sequence alignment, alignment trimming, and gene tree building were repeated for remaining sequences for each tree. Orthology inference was then carried out on the newly inferred trees with paralogy pruning by maximum inclusion using the prune_paralogy_MI.py script (relative tip cut-off 0.2, absolute tip cut-off 0.5, minimum taxa 27), which iteratively extracts the subtree containing the most taxa without taxon duplication. Protein sequences in each cluster were aligned with mafft_wrapper.py, each alignment was trimmed with pep_gblocks_wrapper.py, and all alignments were finally concatenated into a supermatrix. Orthology calls in teleosts, and specifically for Osteoglossomorphs and Elopomorphs, are not as simple and are complicated by divergent evolution in genes as a result of multiple rounds of genome duplication prior to teleost diversification (Braasch et al. 2015). Although we have taken several strict measures to identify orthologs and exclude paralogs, it is important to note that it is extremely challenging to ensure that all identified protein sequences in each cluster are truly orthologous.

Phylogenetic Analysis

Phylogenetic analysis was done based on amino acid alignments for a total of 27 species (table 1). For organisms lacking available proteome data sets, namely the lungfish, little skate, and small-spotted catshark, protein sequences were obtained from their respective transcriptomes. For the lungfish specifically, raw Illumina RNA-seq reads (SRA: SRR505721–SRR505726) were assembled with the Trinity assembler (Grabherr et al. 2011). All transcriptomes were translated with Transdecoder (http://transdecoder.sourceforge.net/, last accessed April 14, 2015). Each ortholog is treated as a separate data block and used as input to PartitionFinder (branchlengths = linked, model_selection = AICc, search = rcluster) (Lanfear et al. 2014) to estimate the best-fit partitioning schemes and models of protein evolution. Based on these results, ML analysis was conducted with RAxML (Stamatakis 2014) under the recommended partitions and substitution models. A total of 100 trees were generated using distinct random seeds and the tree with the best likelihood value was chosen as the final tree topology. Nodal support was represented by bootstrap replicates with the autoMRE convergence criterion (Pattengale et al. 2009). A Bayesian inference using the same supermatrix partitioned into each ortholog was also carried out using ExaBayes (Aberer et al. 2014). Four independent chains were run for 2 million generations and sampled every 500 generations. With 25% of initial samples discarded as burn-in, runs were considered to have converged when the average standard deviation of split frequencies is less than 1%. Both ML and BI phylogenetic trees were rooted using the Chondrichthyes as the outgroup and visualized with MEGA6 (Tamura et al. 2013).

Rate of Molecular Evolution

To compare evolutionary rates of the Asian arowana versus other ray-finned fish lineages, the rate of molecular evolution for each fish lineage was calculated by adding branch lengths from the end of each terminal branch to the node where the split between ray-finned fish and lobe-finned fish (and tetrapods) occurred (fig. 1, orange star). In addition, the Tajima’s relative rate test (Tajima 1993) was implemented, as done by Amemiya et al. (2013) to test for equal rates between lineages. Using MEGA6 (Tamura et al. 2013), Tajima’s relative rate tests (with missing positions and gaps eliminated) were conducted for comparisons between the Asian arowana and other ray-finned fishes, with a member of the Chondricthyes set as outgroup.

Identification of Putative Pigmentation Genes

Predicted protein sequences for arowana were screened for putative pigmentation genes using a list curated by Braasch et al. (2009). Using their homologs in humans (table 2), arowana proteins were searched against pigment genes using BLASTP (Altschul et al. 1990) with an e-value threshold of 1 × 10−40 and subsequently filtered with a hit coverage cut-off of 70%. The best hit for each pigment gene was chosen as a candidate to test for the presence of conserved domains by using the Batch CD-Search tool (Marchler-Bauer and Bryant 2004) to search against the Conserved Domain Database (Marchler-Bauer et al. 2014).

Supplementary Material

Supplementary materials S1–S4 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  81 in total

1.  Origin of land plants revisited in the light of sequence contamination and missing data.

Authors:  Simon Laurin-Lemay; Henner Brinkmann; Hervé Philippe
Journal:  Curr Biol       Date:  2012-08-07       Impact factor: 10.834

2.  The first transcriptome and genetic linkage map for Asian arowana.

Authors:  X Y Shen; H Y Kwan; N M Thevasagayam; S R S Prakki; I S Kuznetsova; S Y Ngoh; Z Lim; F Feng; A Chang; L Orbán
Journal:  Mol Ecol Resour       Date:  2014-01-03       Impact factor: 7.090

3.  Genome Annotation and Curation Using MAKER and MAKER-P.

Authors:  Michael S Campbell; Carson Holt; Barry Moore; Mark Yandell
Journal:  Curr Protoc Bioinformatics       Date:  2014-12-12

4.  Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish.

Authors:  Simone Hoegg; Henner Brinkmann; John S Taylor; Axel Meyer
Journal:  J Mol Evol       Date:  2004-08       Impact factor: 2.395

Review 5.  From 2R to 3R: evidence for a fish-specific genome duplication (FSGD).

Authors:  Axel Meyer; Yves Van de Peer
Journal:  Bioessays       Date:  2005-09       Impact factor: 4.345

6.  The sequence and de novo assembly of the giant panda genome.

Authors:  Ruiqiang Li; Wei Fan; Geng Tian; Hongmei Zhu; Lin He; Jing Cai; Quanfei Huang; Qingle Cai; Bo Li; Yinqi Bai; Zhihe Zhang; Yaping Zhang; Wen Wang; Jun Li; Fuwen Wei; Heng Li; Min Jian; Jianwen Li; Zhaolei Zhang; Rasmus Nielsen; Dawei Li; Wanjun Gu; Zhentao Yang; Zhaoling Xuan; Oliver A Ryder; Frederick Chi-Ching Leung; Yan Zhou; Jianjun Cao; Xiao Sun; Yonggui Fu; Xiaodong Fang; Xiaosen Guo; Bo Wang; Rong Hou; Fujun Shen; Bo Mu; Peixiang Ni; Runmao Lin; Wubin Qian; Guodong Wang; Chang Yu; Wenhui Nie; Jinhuan Wang; Zhigang Wu; Huiqing Liang; Jiumeng Min; Qi Wu; Shifeng Cheng; Jue Ruan; Mingwei Wang; Zhongbin Shi; Ming Wen; Binghang Liu; Xiaoli Ren; Huisong Zheng; Dong Dong; Kathleen Cook; Gao Shan; Hao Zhang; Carolin Kosiol; Xueying Xie; Zuhong Lu; Hancheng Zheng; Yingrui Li; Cynthia C Steiner; Tommy Tsan-Yuk Lam; Siyuan Lin; Qinghui Zhang; Guoqing Li; Jing Tian; Timing Gong; Hongde Liu; Dejin Zhang; Lin Fang; Chen Ye; Juanbin Zhang; Wenbo Hu; Anlong Xu; Yuanyuan Ren; Guojie Zhang; Michael W Bruford; Qibin Li; Lijia Ma; Yiran Guo; Na An; Yujie Hu; Yang Zheng; Yongyong Shi; Zhiqiang Li; Qing Liu; Yanling Chen; Jing Zhao; Ning Qu; Shancen Zhao; Feng Tian; Xiaoling Wang; Haiyin Wang; Lizhi Xu; Xiao Liu; Tomas Vinar; Yajun Wang; Tak-Wah Lam; Siu-Ming Yiu; Shiping Liu; Hemin Zhang; Desheng Li; Yan Huang; Xia Wang; Guohua Yang; Zhi Jiang; Junyi Wang; Nan Qin; Li Li; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Gane Ka-Shu Wong; Maynard Olson; Xiuqing Zhang; Songgang Li; Huanming Yang; Jian Wang; Jun Wang
Journal:  Nature       Date:  2009-12-13       Impact factor: 49.962

7.  Phylogenetic informativeness reconciles ray-finned fish molecular divergence times.

Authors:  Alex Dornburg; Jeffrey P Townsend; Matt Friedman; Thomas J Near
Journal:  BMC Evol Biol       Date:  2014-08-08       Impact factor: 3.260

8.  The tree of life and a new classification of bony fishes.

Authors:  Ricardo Betancur-R; Richard E Broughton; Edward O Wiley; Kent Carpenter; J Andrés López; Chenhong Li; Nancy I Holcroft; Dahiana Arcila; Millicent Sanciangco; James C Cureton Ii; Feifei Zhang; Thaddaeus Buser; Matthew A Campbell; Jesus A Ballesteros; Adela Roa-Varon; Stuart Willis; W Calvin Borden; Thaine Rowley; Paulette C Reneau; Daniel J Hough; Guoqing Lu; Terry Grande; Gloria Arratia; Guillermo Ortí
Journal:  PLoS Curr       Date:  2013-04-18

9.  Whole-genome duplication and the functional diversification of teleost fish hemoglobins.

Authors:  Juan C Opazo; G Tyler Butts; Mariana F Nery; Jay F Storz; Federico G Hoffmann
Journal:  Mol Biol Evol       Date:  2012-09-04       Impact factor: 16.240

10.  Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa.

Authors:  Marek L Borowiec; Ernest K Lee; Joanna C Chiu; David C Plachetzki
Journal:  BMC Genomics       Date:  2015-11-23       Impact factor: 3.969

View more
  9 in total

1.  The Asian arowana (Scleropages formosus) genome provides new insights into the evolution of an early lineage of teleosts.

Authors:  Chao Bian; Yinchang Hu; Vydianathan Ravi; Inna S Kuznetsova; Xueyan Shen; Xidong Mu; Ying Sun; Xinxin You; Jia Li; Xiaofeng Li; Ying Qiu; Boon-Hui Tay; Natascha May Thevasagayam; Aleksey S Komissarov; Vladimir Trifonov; Marsel Kabilov; Alexey Tupikin; Jianren Luo; Yi Liu; Hongmei Song; Chao Liu; Xuejie Wang; Dangen Gu; Yexin Yang; Wujiao Li; Gianluca Polgar; Guangyi Fan; Peng Zeng; He Zhang; Zijun Xiong; Zhujing Tang; Chao Peng; Zhiqiang Ruan; Hui Yu; Jieming Chen; Mingjun Fan; Yu Huang; Min Wang; Xiaomeng Zhao; Guojun Hu; Huanming Yang; Jian Wang; Jun Wang; Xun Xu; Linsheng Song; Gangchun Xu; Pao Xu; Junmin Xu; Stephen J O'Brien; László Orbán; Byrappa Venkatesh; Qiong Shi
Journal:  Sci Rep       Date:  2016-04-19       Impact factor: 4.379

2.  Phylogenetic classification of bony fishes.

Authors:  Ricardo Betancur-R; Edward O Wiley; Gloria Arratia; Arturo Acero; Nicolas Bailly; Masaki Miya; Guillaume Lecointre; Guillermo Ortí
Journal:  BMC Evol Biol       Date:  2017-07-06       Impact factor: 3.260

3.  Whole Genome Sequencing of the Pirarucu (Arapaima gigas) Supports Independent Emergence of Major Teleost Clades.

Authors:  Ricardo Assunção Vialle; Jorge Estefano Santana de Souza; Katia de Paiva Lopes; Diego Gomes Teixeira; Pitágoras de Azevedo Alves Sobrinho; André M Ribeiro-Dos-Santos; Carolina Furtado; Tetsu Sakamoto; Fábio Augusto Oliveira Silva; Edivaldo Herculano Corrêa de Oliveira; Igor Guerreiro Hamoy; Paulo Pimentel Assumpção; Ândrea Ribeiro-Dos-Santos; João Paulo Matos Santos Lima; Héctor N Seuánez; Sandro José de Souza; Sidney Santos
Journal:  Genome Biol Evol       Date:  2018-09-01       Impact factor: 3.416

4.  The genome of the arapaima (Arapaima gigas) provides insights into gigantism, fast growth and chromosomal sex determination system.

Authors:  Kang Du; Sven Wuertz; Mateus Adolfi; Susanne Kneitz; Matthias Stöck; Marcos Oliveira; Rafael Nóbrega; Jenny Ormanns; Werner Kloas; Romain Feron; Christophe Klopp; Hugues Parrinello; Laurent Journot; Shunping He; John Postlethwait; Axel Meyer; Yann Guiguen; Manfred Schartl
Journal:  Sci Rep       Date:  2019-03-28       Impact factor: 4.379

5.  Emerging patterns of genome organization in Notopteridae species (Teleostei, Osteoglossiformes) as revealed by Zoo-FISH and Comparative Genomic Hybridization (CGH).

Authors:  Felipe Faix Barby; Luiz Antônio Carlos Bertollo; Ezequiel Aguiar de Oliveira; Cassia Fernanda Yano; Terumi Hatanaka; Petr Ráb; Alexandr Sember; Tariq Ezaz; Roberto Ferreira Artoni; Thomas Liehr; Ahmed B H Al-Rikabi; Vladimir Trifonov; Edivaldo H C de Oliveira; Wagner Franco Molina; Oladele Ilesanmi Jegede; Alongklod Tanomtong; Marcelo de Bello Cioffi
Journal:  Sci Rep       Date:  2019-02-04       Impact factor: 4.379

6.  Resolving the Early Divergence Pattern of Teleost Fish Using Genome-Scale Data.

Authors:  Naoko Takezaki
Journal:  Genome Biol Evol       Date:  2021-05-07       Impact factor: 3.416

7.  Identification of candidate sex-specific genomic regions in male and female Asian arowana genomes.

Authors:  Xidong Mu; Yi Liu; Chao Liu; Chenxi Zhao; Ruihan Li; Xinxin You; Yexin Yang; Xuejie Wang; Yinchang Hu; Qiong Shi; Chao Bian
Journal:  Gigascience       Date:  2022-09-15       Impact factor: 7.658

8.  De novo transcriptome based on next-generation sequencing reveals candidate genes with sex-specific expression in Arapaima gigas (Schinz, 1822), an ancient Amazonian freshwater fish.

Authors:  Luciana Watanabe; Fátima Gomes; João Vianez; Márcio Nunes; Jedson Cardoso; Clayton Lima; Horacio Schneider; Iracilda Sampaio
Journal:  PLoS One       Date:  2018-10-29       Impact factor: 3.240

9.  Ran GTPase, an eukaryotic gene novelty, is involved in amphioxus mitosis.

Authors:  Ugo Coppola; Filomena Caccavale; Marta Scelzo; Nicholas D Holland; Filomena Ristoratore; Salvatore D'Aniello
Journal:  PLoS One       Date:  2018-10-09       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.