| Literature DB >> 28709399 |
Timothy D Read1,2, Robert A Petit1,2, Sandeep J Joseph1,2, Md Tauqeer Alam1,2, M Ryan Weil1,2, Maida Ahmad1,2, Ravila Bhimani1,2, Jocelyn S Vuong1,2, Chad P Haase1,2, D Harry Webb3, Milton Tan4,5, Alistair D M Dove3.
Abstract
BACKGROUND: The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species. Therefore, it is also the largest extant species of the paraphyletic assemblage commonly referred to as fishes. As both a phenotypic extreme and a member of the group Chondrichthyes - the sister group to the remaining gnathostomes, which includes all tetrapods and therefore also humans - its genome is of substantial comparative interest. Whale sharks are also listed as an endangered species on the International Union for Conservation of Nature's Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure.Entities:
Keywords: Elasmobranch; Fish; Gnathostomata; Rhincodon typus; Vertebrate; Whale shark; Whole genome shotgun
Mesh:
Year: 2017 PMID: 28709399 PMCID: PMC5513125 DOI: 10.1186/s12864-017-3926-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Whale shark (Rhincodon typus) from St. Helena (Photo credit: Alistair D.M. Dove. Rights free use permitted)
Project information
| Property | Term |
|---|---|
| Finishing quality | High quality draft |
| Libraries used | Illumina: paired end library; 454: single end library |
| Sequencing platforms | Illumina HiSeq 2000/454 GS FLX Titanium |
| Fold coverage | 30× |
| Assemblers | SOAPdenovo (v. 2.04) |
| Gene calling method | AUGUSTUS. Proteins matched against the NCBI nr database using BLASTP, and the INTERPRO profile database using InterProScan |
| Genbank ID | LVEK00000000 |
| GenBank Date of Release | 5.11.2016 |
| GOLD ID | Gp0102394 |
| BIOPROJECT | PRJNA255419 |
Sequencing runs and libraries generated. *Types are SE – single end, PE – paired end, and MP – mate pair
| SRA ID | Tissue | Library ID | Technology | Type* | Ave insert size (std dev) | Sequence length (bp) | Number of reads | Total bp |
|---|---|---|---|---|---|---|---|---|
| SRR1521182 | Spleen | 1 | LS454 | SE | na | 401,304 | 1,268,373 | 728,329,555 |
| SRR1521184 | Spleen | 1 | LS454 | SE | na | 401,328 | 1,279,760 | 680,625,037 |
| SRR1521184 | Spleen | 1 | LS454 | SE | na | 401,328 | 1,279,760 | 680,625,037 |
| SRR1521191 | Spleen | 2 | Illumina | PE | 293(101) | 100 | 210,821,824 | 21,082,182,400 |
| SRR1521192 | Spleen | 2 | Illumina | PE | 300(91) | 100 | 585,821,484 | 58,582,148,400 |
| SRR1521195 | Spleen | 2 | Illumina | PE | 328(90) | 100 | 585,054,464 | 58,505,446,400 |
| SRR1521197 | Spleen | 2 | Illumina | PE | 286(100) | 100 | 224,670,734 | 22,467,073,400 |
| SRR1521198 | Spleen | 3 | Illumina | MP | 7161(755) | 100 | 571,738,680 | 57,173,868,000 |
| SRR1521199 | Spleen | 2 | Illumina | PE | 290(100) | 100 | 300,519,032 | 30,051,903,200 |
| SRR1521200 | Spleen | 4 | Illumina | SE | na | 51 | 108,403,623 | 5,420,181,150 |
| SRR1521201 | Spleen | 5 | Illumina | PE | 274(54) | 100 | 34,239,020 | 3,423,902,000 |
| SRR1521204 | Spleen | 5 | Illumina | PE | 236(46) | 100 | 90,708,094 | 9,070,809,400 |
| SRR1521190 | Liver | 6 | Illumina | PE | 215(43) | 100 | 99,078,844 | 9,907,874,400 |
Genome and predicted protein statistics. Percentages of total genome size calculated as proportion of assembly size rather than estimated genome size
| Attribute | Value | % of Total |
|---|---|---|
| Genome size (Gbp) | 3.44 | |
| DNA coding (bp) | 10,400,226 | 0.41% |
| DNA G + C (bp) | 1,059,229,091 | 41.3% |
| Number of scaffolds | 997,976 | |
| Scaffold N50 (bp) | 5425 | |
| Number of contigs | 1,213,000 | |
| Contig N50 (bp) | 5304 | |
| Protein coding genes | 19,384 | |
| Genes with function prediction | 5380 | 27.8% |
| Genes assigned to KOGs | 7038 | 36.3% |
| Genes with Pfam domains | 6612 | 34.1% |
Fig. 2Histogram of predicted protein sizes
Fig. 3Overview of best matches to the protein database that map to the Chordata taxonomy group
Fig. 4Phylogeny based on alignment of conserved single-copy proteins. Silhouettes are not to scale. Accessions: Petromyzon: GCA_000148955.1, Callorhinchus: GCA_000165045.2, Latimeria: GCA_000225785.1, Danio: GCA_000002035.3, Gadus: GCA_000231765.1, Gasterosteus: GCA_000180675.1, Oryzias: version MEDAKA1 (Ensembl), Oreochromis: GCA_000188235.1, Takifugu: GCA_000180615.2, Tetraodon: GCA_000180735.1. Silhouette credits: Petromyzon by Gareth Monger, CC-BY; Callorhinchus by Tony Ayling, CC-BY-SA; Rhincodon by Scarlet23, vectorized by T. Michael Keesey, CC-BY-SA; Latimeria by Maija Karala, CC-BY-NC-SA; Gadus, Oreochromis, Tetraodon, Gasterosteus by Milton Tan; Danio, Oryzias, Takifugu, no copyright
Number of genes associated with general KOG functional categories. Percentages of genes is based on the total number of predicted proteins
| Code | Value | % | Description |
|---|---|---|---|
| J | 161 | 0.83 | Translation, ribosomal structure and biogenesis |
| A | 226 | 1.17 | RNA processing and modification |
| K | 458 | 2.36 | Transcription |
| L | 128 | 0.66 | Replication, recombination and repair |
| B | 154 | 0.79 | Chromatin structure and dynamics |
| D | 143 | 0.74 | Cell cycle control, Cell division, chromosome partitioning |
| V | 100 | 0.52 | Defense mechanisms |
| T | 1280 | 6.60 | Signal transduction mechanisms |
| M | 52 | 0.27 | Cell wall/membrane biogenesis |
| N | 22 | 0.11 | Cell motility |
| U | 307 | 1.58 | Intracellular trafficking and secretion |
| O | 532 | 2.74 | Posttranslational modification, protein turnover, chaperones |
| C | 105 | 0.54 | Energy production and conversion |
| G | 165 | 0.85 | Carbohydrate transport and metabolism |
| E | 140 | 0.72 | Amino acid transport and metabolism |
| F | 65 | 0.34 | Nucleotide transport and metabolism |
| H | 19 | 0.10 | Coenzyme transport and metabolism |
| I | 164 | 0.85 | Lipid transport and metabolism |
| P | 345 | 1.78 | Inorganic ion transport and metabolism |
| Q | 66 | 0.34 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 2142 | 11.06 | General function prediction only |
| S | 393 | 2.03 | Function unknown |
| - | - | - | Not in KOGs |
The total is based on the total number of protein coding genes in the genome
Fig. 5Overview of taxonomy of whale shark protein best matches to the nr database. Figure was constructed from best BLAST matches to the nr database using Krona [31] tool