| Literature DB >> 29982381 |
Ricardo Assunção Vialle1, Jorge Estefano Santana de Souza2, Katia de Paiva Lopes1, Diego Gomes Teixeira2, Pitágoras de Azevedo Alves Sobrinho2, André M Ribeiro-Dos-Santos1,3, Carolina Furtado4, Tetsu Sakamoto5, Fábio Augusto Oliveira Silva6, Edivaldo Herculano Corrêa de Oliveira6, Igor Guerreiro Hamoy7, Paulo Pimentel Assumpção8, Ândrea Ribeiro-Dos-Santos1,8, João Paulo Matos Santos Lima2,9, Héctor N Seuánez4,10, Sandro José de Souza2,11, Sidney Santos1,8.
Abstract
The Pirarucu (Arapaima gigas) is one of the world's largest freshwater fishes and member of the superorder Osteoglossomorpha (bonytongues), one of the oldest lineages of ray-finned fishes. This species is an obligate air-breather found in the basin of the Amazon River with an attractive potential for aquaculture. Its phylogenetic position among bony fishes makes the Pirarucu a relevant subject for evolutionary studies of early teleost diversification. Here, we present, for the first time, a draft genome version of the A. gigas genome, providing useful information for further functional and evolutionary studies. The A. gigas genome was assembled with 103-Gb raw reads sequenced in an Illumina platform. The final draft genome assembly was ∼661 Mb, with a contig N50 equal to 51.23 kb and scaffold N50 of 668 kb. Repeat sequences accounted for 21.69% of the whole genome, and a total of 24,655 protein-coding genes were predicted from the genome assembly, with an average of nine exons per gene. Phylogenomic analysis based on 24 fish species supported the postulation that Osteoglossomorpha and Elopomorpha (eels, tarpons, and bonefishes) are sister groups, both forming a sister lineage with respect to Clupeocephala (remaining teleosts). Divergence time estimations suggested that Osteoglossomorpha and Elopomorpha lineages emerged independently in a period of ∼30 Myr in the Jurassic. The draft genome of A. gigas provides a valuable genetic resource for further investigations of evolutionary studies and may also offer a valuable data for economic applications.Entities:
Mesh:
Year: 2018 PMID: 29982381 PMCID: PMC6143160 DOI: 10.1093/gbe/evy130
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Phylogenomics inference. Phylogenomic tree inferred by maximum likelihood (ML) based on a supermatrix of 278 orthologs loci (188,505 amino acid sites) from 24 species using Elephant shark as outgroup. Dark gray circles indicate coincident nodes with Bayesian inference (BI) and maximum support values in both approaches (bootstrap=100% and Bayesian posterior probability=1). Branch lengths represent number of substitutions/site. Rates of molecular evolution (i.e., number of amino acids substitutions per site) estimated from the teleost split (red star) to the tips of the topology are indicated in red font close to the name of each taxon.
. 2.—Divergence time estimation between species. Numbers at nodes represent divergence time estimates in millions of years ago (Ma). Red squares indicate nodes calibrated by fossil records.
Summary Statistics of the Pirarucu Genome
| Sequencing Information | |
|---|---|
| Library insert size (bp) | 400–500 |
| Read length (bp) | 2×250 |
| Total raw bases sequenced (Gb) | 103.01 |
| Total filtered bases sequenced (Gb) | 76.91 |
| Genome Features | |
| Assembled genome size (Mb) | 661.28 |
| # scaffolds | 5,301 |
| Scaffold N50 (kb) | 668 |
| Contig N50 (kb) | 51.23 |
| Largest scaffold (bp) | 5,332,704 |
| GC (%) | 43.18 |
| Repeat content (% of genome) | 21.69 |
| Genome Annotation | |
| Protein-coding gene number | 24,655 |
| % of genome covered by genes | 33.9 |
| Mean transcript length (bp) | 9,150 |
| Mean exons per gene | 9 |
| Mean CDS length (bp) | 1,603 |
| Mean exon length (bp) | 174 |
| Mean intron length (bp) | 920 |
List of Species Included in Phylogenomic Analysis
| Organismsource | Scientific Name | Order | Reference |
|---|---|---|---|
| Pirarucu* | Osteoglossiformes | This study | |
| Asian arowanaU | Osteoglossiformes | ||
| Osteoglossiformes | |||
| Osteoglossiformes | |||
| European EelZ | Anguilliformes | ||
| Anguilliformes | |||
| Elopiformes | |||
| MedakaQFO | Beloniformes | ||
| Blind cave fishU | Characiformes | ||
| Nile tilapiaU | Cichliformes | ||
| Common carpR | Cypriniformes | ||
| ZebrafishQFO | Cypriniformes | ||
| Amazon mollyU | Cyprinodontiformes | Unpublished | |
| Southern platyfishU | Cyprinodontiformes | ||
| Atlantic codE | Gadiformes | ||
| Electric EelF | Gymnotiformes | ||
| Three-spined sticklebackU | Perciformes | ||
| Spotted green pufferfishU | Tetraodontiformes | ||
| FuguU | Tetraodontiformes | ||
| Spotted garQFO | Semionotiformes | Unpublished | |
| Acipenseriformes | |||
| Polypteriformes | |||
| CoelacanthU | Coelacanthiformes | Unpublished | |
| Elephant sharkR | Chimaeriformes | ||
Note.—Codes for source: Ensembl (E), efish genomics (F), Quest of Orthologs (QFO), RefSeq (R), EBI ENA (ENA), UniProt (U), ZF Genomics (Z), and this study (*).
Raw transcriptomics reads.
. 3.—Empirical age distributions. Age distributions based on number of synonymous substitutions per synonymous site (Ks) estimated for paralogous gene families of each species. Distributions were modelled using a four component Gaussian mixture model (GMM). Solid black lines show mixture distributions, and dashed lines represent individual components. Vertical dashed lines correspond to the geometric mean of each component. Ks estimates (X axis) can be interpreted as age divergence between paralogous genes of a given species. The initial peak represents newly duplicated genes (usually derived from small-scale duplication events). Over time, duplications are eventually lost, and a decreasing slope is observed following the initial peak, outlining the steady decrease of retained duplicates. WGD events create distinct peaks to the distribution and can usually be observed as different components in a mixture distribution.
. 4.—Reconstruction of gene family evolution. Events of gene family gains, losses, expansions, and contractions were inferred with Wagner parsimony. Number of families are indicated in black fonts near nodes. Gains (green numbers) indicate the number of families acquired along lineages leading to their respective MRCA node. Losses (red numbers) indicate lost families along lineages leading to their respective MRCA node. Expansions are indicated by numbers (in blue font) of expanded families (from size 1) and contractions by the number (in yellow font) of contracted families (to size 1) to their respective MRCA node. Gene Ontology (GO) terms associated with changes observed in key points of the phylogeny are shown near each node. GO enrichment was estimated based on Fisher’s exact test (FDR < 0.05) using, as background, population families present in each respective MRCA node. Arrows indicate terms associated to gains and/or expansions (upward) and losses and/or contractions (downward).
. 5.—Comparison of sex-specific sequences and assemblies. Samples from each sex were compared against the main genome assembly containing data from both sexes. (A) Depth of coverage (i.e., average number of reads mapped to a specific region) of female (in red) and male (in blue) reads compared with the main assembly. Coverage was estimated for windows of 50 kb using Mosdepth (Pedersen and Quinlan 2018) and regions were ordered by female coverage estimates (ascending, left to right). Genome scaffolds with <50 kb were not included in the plot. Y axis was restricted to 200 for better visualization. (B) Number of complete and partial genes identified in each sex-specific assembly.
Comparison of Sex-Specific Genome Assemblies
| Genome Features | ||
|---|---|---|
| Assembled genome size (Mb) | 660.71 | 660.43 |
| Genome fraction (%) | 99.74 | 99.73 |
| # scaffolds | 8,324 | 6,058 |
| Scaffold N50 (kb) | 295 | 471 |
| Contig N50 (kb) | 40.75 | 35.19 |
| Largest scaffold (bp) | 2,179,931 | 2,199,363 |
| Largest alignment to the reference (bp) | 2,178,748 | 1,935,564 |
| GC (%) | 43.18 | 43.18 |
| # misassemblies | 2,041 | 2,342 |
| # complete genes | 19,737 | 19,540 |
| # partial genes | 5,076 | 5,274 |
| Number of sex-specific bases in assembly (bp) | 103,749 (0.0157%) | 64,323 (0.0097%) |
| Longest sex-specific sequence length (bp) | 2,881 | 1,658 |