| Literature DB >> 25309735 |
Jennifer Wyffels1, Benjamin L King2, James Vincent3, Chuming Chen1, Cathy H Wu1, Shawn W Polson1.
Abstract
Chondrichthyan fishes are a diverse class of gnathostomes that provide a valuable perspective on fundamental characteristics shared by all jawed and limbed vertebrates. Studies of phylogeny, species diversity, population structure, conservation, and physiology are accelerated by genomic, transcriptomic and protein sequence data. These data are widely available for many sarcopterygii (coelacanth, lungfish and tetrapods) and actinoptergii (ray-finned fish including teleosts) taxa, but limited for chondrichthyan fishes. In this study, we summarize available data for chondrichthyes and describe resources for one of the largest projects to characterize one of these fish, Leucoraja erinacea, the little skate. SkateBase ( http://skatebase.org) serves as the skate genome project portal linking data, research tools, and teaching resources.Entities:
Year: 2014 PMID: 25309735 PMCID: PMC4184313 DOI: 10.12688/f1000research.4996.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Species distribution within chondrichthyan orders.
There is a single order of Holocephalans, Chimaeriformes, and 13 orders of elasmobranchs. The distribution of chondrichthyan species in each of the 14 orders is shown relative to the total number of species, genera and families for the clade. The batoids are composed of 4 orders, Rajiiformes, Myliobatiformes, Torpidiformes, and Rhinopristiformes, and contain 54% of extant chondrichthyan species. Sharks are broadly divided into two super orders, Galeomorphii and Squalomorphii that together include the remaining 9 orders and 43% of extant chondrichthyan species.
Figure 2. Representation of SkateBase data within the The National Center for Biotechnology Information (NCBI) databases.
A. The little skate genome project is represented as a BioProject entry that connects all samples and data thematically. A BioSample record describes the DNA sample that was used for genome sequencing that was generated from a single stage 32 skate embryo. The SRA catalogs the unassembled Illumina genome sequence data. The Whole Genome Shotgun (WGS) database contains the contiguous sequences from shotgun sequencing projects. The assembled and annotated mitochondrial genome was deposited in GenBank and subsequently included in the NCBI Reference Sequence Database (RefSeq). B. The project to characterize the embryonic transcriptomes of L. erinacea, C. milii and S. canicula is represented in a BioProject entry. Three BioSample entries, one for each species, lead to three SRA datasets. The transcriptome data is represented also in the Gene Expression Omnibus (GEO), a database of high-throughput functional genomic data derived from microarrays and next-generation sequencing technologies.
Chondrichthyan molecular sequence data in public databases.
| National Center for Biotechnology Information (NCBI) databases
[ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GenBank | UniProtKB
[ | |||||||||||||||
| Taxonomy | BioProject | BioSample | Gene | GenBank | WMG | EST | EST lib | GSS | GSS
| WGS
| GEO
[ | SRA | Swiss-
| TrEMBL | PDB | |
| Chondrichthyes | 7777 | 16 | 75 | 21069 | 55810 | 72 | 192948 | 33 | 28497 | 5 | 2492.3 | 3 | 22 | 276 | 26485
[ | 178 |
| Holocephali | 7863 | 3 | 21 | 20201 | 39512 | 8 | 109965 | 6 | 27944 | 1 | 936.9 | 1 | 13 | 12 | 20170 | 0 |
|
| 7868 | 3 | 21 | 20110 | 39232 | 1 | 109965 | 6 | 27944 | 1 | 936.9 | 1 | 13 | 3 | 19989 | 0 |
| Elasmobranchii | 7778 | 13 | 54 | 868 | 16273 | 64 | 82983 | 27 | 553 | 4 | 1555.4 | 2 | 9 | 264 | 6299 | 178 |
|
| 7782 | 3 | 7 | 13 | 284 | 1 | 31167 | 5 | 0 | 0 | 1555.4 | 1 | 2 | 6 | 123 | 0 |
|
| 7830 | 2 | 8 | 13 | 645 | 1 | 1600 | 7 | 0 | 0 | 0 | 1 | 1 | 38 | 283 | 1 |
(WMG) whole mitochondrial genome, (EST) Expressed Sequence Tags, (lib) libraries (GSS) Genome Survey Sequences, (GEO) Gene Expression Omnibus, (WGS) Whole Genome Shotgun, (SRA) Sequence Read Archive, (WMG) whole mitochondrial genomes, (PDB) Protein Data Bank, * includes 16 unidentified fin entries
1 NCBI databases accessed July 25, 2014, 2 Release 2014_07 of 09-Jul-2014, 3 GEO sample accessions
Figure 3. Holocephalan and elasmobranch resources in public nucleotide and protein databases.
The distribution of data for Holocephalii (chimaeras) and elasmobranchii (sharks and rays) subclasses of chondrichthyan fishes does not always reflect their species distribution. The number of species represented in GenBank is representative of the actual species distribution but the amount of data in GenBank is not. Holocephalan data forms the majority of the NCBI Gene, GenBank, Genome Survey Sequence (GSS) and UniProt TrEMBL databases. The number of Short Reach Archive (SRA) experiments and EST sequences in nearly equal for each subclass and the remaining databases are primarily populated by elasmobranch data.
Chondrichthyan genome sequencing projects.
| Website | Genome
| Coverage | Contigs | N50
| Platform | Facility | Genbank | Data
[ | BioProject | BioSample | Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Holocephali | ||||||||||||
|
|
| 0.910 | 19.25x | 21,203 | 1466 | Sanger
| IMCB |
| 244 M |
|
| 20-Dec-13
[ |
| Elasmobranchii | ||||||||||||
|
|
| 3.42 | 26x | 2,62,365 | 665 | Illumina
| NECC |
| 105 G |
|
| 22-Dec-11 |
|
| - | 3.5 | 32x | 3,449,662 | 1,292 | Illumina
| Genoscope-CEA | - | - | - | - | - |
|
| - | 3.44(est.) | 35x | Illumina
| Emory University
| - | - |
|
| 16-Jul-14 | ||
1 (M) Mega or (G) Giga base pairs; (PE) paired end; (est) estimated; (ICMB) Institute of Molecular and Cell Biology, A*STAR, (NECC) North East Cyberinfrastructure Consortium
* replaced original sequence data GenBank AAVX00000000.1 (1.4x coverage) released 20-DEC-2006
National Center for Biotechnology Information (NCBI) Expressed Sequence Tags (EST) and Genome Survey Sequences (GSS) databases (release 130101): Chondrichthyan sequence data.
| BioSample | BioSample Description | Library ID | Organism | Sample
| Sample type | ESTs | Facility
[ | Date |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Chimaeriformes | ||||||||
|
| Whole-genome shotgun library of the elephant shark (aka
| GSS: LIBGSS_009694 |
| - | testis | 27944 | IMCB | 2004 |
|
| Elephant shark full- length cDNA library from testis | EST: LIBEST_027873 |
| - | testis | 29234 | IMCB | 2012 |
|
| Elephant shark full- length cDNA library from spleen | EST: LIBEST_027872 |
| - | spleen | 16664 | IMCB | 2012 |
|
| Elephant shark full- length cDNA library from liver | EST: LIBEST_027871 |
| - | liver | 16573 | IMCB | 2012 |
|
| Elephant shark full- length cDNA library from kidney | EST: LIBEST_027870 |
| - | kidney | 19246 | IMCB | 2012 |
|
| Elephant shark full- length cDNA library from intestine | EST: LIBEST_027869 |
| - | intestine | 12146 | IMCB | 2012 |
|
| Elephant shark full- length cDNA library from gills | EST: LIBEST_027868 |
| - | gills | 16012 | IMCB | 2012 |
|
| ||||||||
| Torpediformes | ||||||||
|
| Torpedo marmorata electric organ | EST: LIBEST_003755 |
| - | electric organ | 8 | CNRS | 2000 |
|
| Torpedo marmorata electric lobe | EST: LIBEST_003754 |
| - | electric lobe | 26 | CNRS | 2000 |
|
| pFL61-TEL | EST: LIBEST_002905 |
| - | electric lobe | 2 | CNRS | 2000 |
|
| pFL61-EL | EST: LIBEST_002849 |
| - | electric lobe | 5 | CNRS | 2000 |
|
| Torpedo californica electric organ | EST: LIBEST_020696 |
| - | electric organ | 10185 | Children’s National
| 2006 |
| Rajiformes | ||||||||
|
| Little Skate Multiple Tissues, Normalized | EST: LIBEST_015890 |
| adult | mixed
[ | 5698 | MDIBL | 2004 |
|
| Little Skate Liver, Normalized | EST: LIBEST_017626 |
| adult | liver | 6016 | MDIBL | 2005 |
|
| Little Skate embryo cell line 1 (LEE-1): 5' sequences | EST: LIBEST_022984 |
| embryonic
| stage 28 | 4825 | MDIBL | 2006 |
|
| Little skate embryo tissues; 5' sequences | EST: LIBEST_020422 |
| embryo | stage 19, 20, 25 | 5600 | MDIBL | 2006 |
|
| Skate Multiple Tissues, Normalized | EST: LIBEST_023576 |
| adult | mixed
[ | 9028 | MDIBL | 2008 |
|
| ||||||||
| Carcharhiniformes | ||||||||
|
| Dogfish testis - round spermatids zone (SSH) | EST: LIBEST_025578 |
| adult | testis | 20 | Caen University | 2009 |
|
| Dogfish testis - spermatogonia zone (SSH) | EST: LIBEST_025577 |
| adult | testis | 12 | Caen University | 2010 |
|
| Scyliorhinus canicula juvenile library | EST: LIBEST_026904 |
| juvenile | 5 days post-hatch | 56 | enoscope-CEA | 2011 |
|
| Scyliorhinus canicula embryonic, stages 9–15 library | EST: LIBEST_026903 |
| embryo | stages 9–15 | 628 | Genoscope-CEA | 2011 |
|
| Scyliorhinus canicula embryonic, stages 19–25 library | EST: LIBEST_026902 |
| embryo | stages 19–25 | 772 | Genoscope-CEA | 2011 |
|
| Scyliorhinus canicula embryonic, stages 19–24 library | EST: LIBEST_026901 |
| embryo | stages 19–24 | 33 | Genoscope-CEA | 2011 |
|
| Scyliorhinus canicula adult brain library | EST: LIBEST_026900 |
| adult | brain | 79 | Genoscope-CEA | 2011 |
|
| cloudy catshark embryo cDNA library | EST: LIBEST_027410 |
| embryo | stage 31 | 2942 | RIKEN | 2011 |
| Orectolobiformes | ||||||||
|
| GC__Ba | GSS: LIBGSS_009945 |
| adult | red blood cells | 178 | University of
| 2005 |
|
| shark whole genome shotgun library 2 | GSS: LIBGSS_011249 |
| female | ventral fin | 177 | Tgen | 2008 |
|
| shark whole genome shotgun library 1 | GSS: LIBGSS_011248 |
| female | ventral fin | 194 | Tgen | 2008 |
|
| Shark liver regeneration | EST: LIBEST_023789 |
| adult | liver | 2103 | BGI | 2008 |
|
| cDNA library of Shark hepatic regeneration tissues | EST: LIBEST_017019 |
| none | Hour 24 after 2/3
| 17 | CPU | 2005 |
|
| Toll like receptor ligand induced Spleen | EST: LIBEST_027180 |
| male | spleen | 1051 | MVC | 2011 |
|
| Spleen of Chiloscyllium griseum | EST: LIBEST_027179 |
| male | spleen | 1000 | MVC | 2011 |
|
| Suppressive subtractive hybridization library from
| EST: LIBEST_028031 |
| male | spleen | 315 | MVC | 2012 |
| Squaliformes | ||||||||
|
| Dogfish Shark Multiple Tissues, Normalized | EST: LIBEST_016552 |
| adult | mixed
[ | 15078 | MDIBL | 2004 |
|
| Dogfish Shark Embryo-derived Cell Line SAE, Normalized | EST: LIBEST_018195 |
| embryonic
| embryo with
| 5824 | MDIBL | 2005 |
|
| Spiny dogfish shark rectal gland EST library | EST: LIBEST_020417 |
| - | rectal gland | 5085 | MDIBL | 2006 |
|
| Dogfish Shark Rectal Gland, Normalized | EST: LIBEST_020023 |
| adult | rectal gland | 6575 | MDIBL | 2006 |
| Hexanchiformes | ||||||||
|
| Hexanchus griseus DNA (Hunter C) | GSS: LIBGSS_003277 |
| - | - | 4 | HGMP-RC | 2001 |
1 (ICMB) Institute of Molecular and Cell Biology, A*STAR, (HGMP-RC) Human Genome Mapping Project Resource Centre, Hinxton, (Tgen) Translational Genomics Research Institute AZ, USA, (CNRS) National Center for Scientific Research, France, (MDIBL) Mount Desert Island Biological Laboratory, (CPU) China Pharmaceutical University, (MVC) Madras Veterinary College, TANUVAS, (BGI) Beijing Genomics Institute (SSH) Suppressive subtractive hybridization; (mixed a) liver, kidney, brain, testis, ovary, gill, heart, spleen, rectal gland; (mixed b) rectal gland, kidney, brain, testis, ovary, gill, intestine, heart, spleen
National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database: Chondrichthyan sequence data.
| BioProject | BioSample | SRA description | SRA | Organism | Age | Sample
| Platform
[ | Data
[ | Facility
[ | Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Holocephali | |||||||||||
|
|
| 454 sequencing of Callorhinchus milii
|
|
| adult | testis | LS454 | 244.9 M | IMCB | 2008 | |
|
|
| GSM643959: Callorhinchus milii pooled
|
|
| embryos | stage 32 | Illumina SE | 3.3 G | MDIBL | 2011 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | thymus | Illumina PE | 9.7 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | testis | Illumina PE | 7.3 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | spleen | Illumina PE | 6.3 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | ovary | Illumina PE | 7.9 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | liver | Illumina PE | 16.7 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | muscle | Illumina PE | 11.1 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | kidney | Illumina PE | 9 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | intestine | Illumina PE | 11.2 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | heart | Illumina PE | 6.9 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | gills | Illumina PE | 5.4 G | IMCB | 2013 | |
|
|
| Illumina sequencing of elephant shark
|
|
| - | brain | Illumina PE | 10.5 G | IMCB | 2013 | |
| Elasmobranchii | |||||||||||
|
|
| Initial Characterization of Leucoraja
|
|
| embryo | stage 32 | Illumina PE | 105 G | NECC | 2011 | |
|
|
| GSM643957: Leucoraja erinacea pooled
|
|
| embryos | stage
| Illumina SE | 3.8 G | MDIBL | 2011 | |
|
|
| GSM643958: Scyliorhinus canicula
|
|
| embryos | stage
| Illumina SE | 3.9 G | MDIBL | 2011 | |
|
|
| Torazame EST |
|
| embryos | stage
| LS454 | 43.6 M | RIKEN | 2011 | |
|
|
| Carcharodon carcharias cDNA Illumina
|
|
| juvenile | heart | Illumina SE | 7.9 G | Cornell | 2013 | |
|
|
| Carcharodon carcharias heart
|
|
| juvenile | heart | LS454 | 408.4 M | Cornell | 2013 | |
|
|
| Illumina sequencing of Nurse Shark
|
|
| - | thymus | Illumina PE | 12 G | IMCB | 2013 | |
|
|
| Illumina sequencing of Nurse Shark
|
|
| - | spleen | Illumina PE | 11.2 G | IMCB | 2013 | |
|
|
| Neotrygon kuhlii barb venom gland
|
|
| - | barb
| Illumina PE | 84.3 M | LSTM | 2014 | |
* genomic data; 1 (SE) single end or (PE) paired end; 2 (M) Mega or (G) Giga base pairs
3 (MDIBL) Mount Desert Island Biological Laboratory, (ICMB) Institute of Molecular and Cell Biology, A*STAR, (LSTM) Liverpool School of Tropical Medicine, (NECC) North East Cyberinfrastructure Consortium
Whole mitochondrial sequences for chondrichthyan fishes.
| Accessions | ||||||
|---|---|---|---|---|---|---|
| BioProject | NCBI Ref_seq | GenBank | Organism | bp |
[ | Date |
|
| ||||||
| Chimaeriformes | ||||||
|
|
|
|
| 16758 | 34 | 21-Oct-10 |
|
|
|
|
| 16760 | 34.1 | 21-Oct-10 |
|
|
|
|
| 16769 | 33.7 | 21-Oct-10 |
|
|
|
|
| 18580 | 38.6 | 14-Nov-06 |
|
|
|
|
| 21336 | 38.2 | 19-Oct-10 |
|
|
|
|
| 18024 | 42.5 | 19-Oct-10 |
|
|
|
|
| 21233 | 39.4 | 19-Oct-10 |
|
|
|
|
| 24889 | 41.6 | 19-Oct-10 |
|
| ||||||
| Myliobatiformes | ||||||
|
|
|
|
| 17874 | 45.1 | 7-May-14 |
|
|
|
|
| 17657 | 39.1 | 25-Feb-14 |
|
|
|
|
| 20201 | 40.9 | 3-Nov-13 |
|
|
|
|
| 17658 | 40.4 | 10-Mar-14 |
|
|
|
|
| 17668
| 40.2
| 22-Jul-13
|
|
|
|
|
| 18264 | 36.6 | 24-May-13 |
|
|
|
|
| 17514 | 41.9 | 20-Mar-07 |
|
|
|
|
| 17448 | 43.3 | 14-Jan-14 |
|
|
|
|
| 18880 | 37.4 | 18-Jan-13 |
|
|
|
|
| 18039 | 39.5 | 17-Jul-13 |
|
|
|
|
| 17638 | 41.6 | 8-Nov-13 |
| Rajiformes | ||||||
|
|
|
|
| 16912 | 41.6 | 13-Mar-14 |
|
|
|
|
| 16724 | 40.3 | 28-Nov-11 |
|
|
|
|
| 16972 | 42.4 | 15-Jun-05 |
|
|
|
|
| 16783 | 40.3 | 22-Apr-09 |
|
|
|
|
| 16910 | 41.4 | 11-Sep-13 |
|
|
|
|
| 16905 | 42.2 | 11-Sep-13 |
|
|
|
|
| 16909 | 41.1 | 1-May-14 |
| Rhinopristiformes | ||||||
|
|
|
|
| 16804 | 39.8 | 13-Nov-13 |
|
|
|
|
| 16776 | 40.3 | 13-Nov-13 |
|
|
|
|
| 16780 | 39.6 | 6-Apr-14 |
|
| ||||||
| Carcharhiniformes | ||||||
|
|
|
|
| 16719 | 38.4 | 29-Apr-14 |
|
|
|
|
| 16705 | 38.2 | 6-Apr-14 |
|
|
|
|
| 16704 | 37.4 | 25-Feb-14 |
|
|
|
|
| 16706 | 38.6 | 7-Jun-14 |
|
|
|
|
| 16706 | 38.6 | 8-Nov-13 |
|
|
|
|
| 16707 | 38.9 | 25-Feb-14 |
|
|
|
|
| 16703 | 36.9 | 31-Oct-13 |
|
|
|
|
| 16702 | 39.2 | 13-Jan-14 |
|
|
|
|
| 16701 | 39 | 25-Jul-14 |
|
|
|
|
| 16754 | 39 | 25-Feb-14 |
|
|
|
|
| 16707 | 38.3 | 8-Apr-00 |
|
|
|
|
| 16705 | 37.5 | 13-Nov-13 |
|
|
|
|
| 16700 | 36.4 | 29-Oct-13 |
|
|
|
|
| 16693 | 37 | 31-Mar-14 |
|
|
|
|
| 16697 | 38 | 18-Apr-05 |
|
|
|
|
| 16726 | 39.5 | 8-Nov-13 |
| Orectolobiformes | ||||||
|
|
|
|
| 16755 | 36.1 | 6-Mar-12 |
|
|
|
|
| 16725 | 37.4 | 25-Jul-12 |
|
|
|
|
| 16703 | 36.8 | 31-Mar-14 |
|
|
|
|
| 16706 | 37.3 | 19-Sep-13 |
|
|
|
|
| 16875
| 37.1
| 19-Mar-14
|
| Lamniformes | ||||||
|
|
|
|
| 16773 | 39.5 | 5-Feb-14 |
|
|
|
|
| 16744 | 40.8 | 31-Oct-13 |
|
|
|
|
| 16670 | 40.6 | 14-Jan-14 |
|
|
|
|
| 16701 | 43.2 | 28-Sep-13 |
|
|
|
|
| 16704 | 43.8 | 7-May-14 |
|
|
|
|
| 16699 | 41.8 | 30-May-14 |
|
|
|
|
| 16694 | 36.7 | 13-May-13 |
|
|
|
|
| 17743 | 38.8 | 29-Dec-08 |
|
|
|
|
| 16692 | 38.6 | 18-Dec-13 |
|
|
|
|
| 16719 | 39.3 | 26-Jun-13 |
| Heterodontiformes | ||||||
|
|
|
|
| 16708 | 39.9 | 14-Nov-06 |
|
|
|
|
| 16720 | 40 | 18-Jun-13 |
| Squaliformes | ||||||
|
|
|
|
| 16543 | 38.8 | 29-Apr-14 |
|
|
|
|
| 16730 | 39.3 | 29-Oct-13 |
|
|
|
|
| 16738 | 38.8 | 18-Apr-05 |
| Squatiniformes | ||||||
|
|
|
|
| 16689 | 37.9 | 4-Jun-14 |
| Pristiophoriformes | ||||||
|
|
|
|
| 18430 | 44.5 | 10-May-14 |
| Hexanchiformes | ||||||
|
|
|
|
| 17223 | 36.3 | 29-Oct-13 |
|
|
|
|
| 18605 | 36.3 | 29-Oct-13 |
|
|
|
|
| 18909 | 35.9 | 29-Oct-13 |
|
|
|
|
| 17314 | 35 | 29-Oct-13 |
|
|
|
|
| 16990 | 38.2 | 29-Oct-13 |
*Metazoan Mitochondrial Genomes Accessible dataset Metamiga ( http://amiga.cbmeg.unicamp.br/)
Figure 4. A survey of public data and phylogeny for chondrichthyan orders.
A. The 14 orders of chondrichthyan fish and their relative distribution in public nucleotide and protein databases for Chondrichthyes and Elasmobranchii are shown individually. The species distribution for each Order and GenBank are similar indicating sequence data has been collected for a broad range of chondrichthyans. For Chondrichthyes, the elephant shark genome project data contributes the majority of the data in NCBI Gene, GenBank, Genome Survey Sequence (GSS), and the Short Reach Archive (SRA) databases. The NCBI GSS, GSS libraries, and Protein Data Bank (PDB) are the least diverse with representation of 1–6 of the 14 Orders. The color of each Order as represented in the bar chart is included in the cladogram key with left to right in the bar chart corresponding with top to bottom in the cladogram. B. A cladogram of Chondrichthyes illustrates the phylogeny relationship between the 14 Orders. The color code associated with each Order appears consecutively in the bar chart.
Figure 5. Example of using SkateBase and NCBI resources to find transcriptome data for SOCS6.
A. SkateBLAST query form showing the four steps to align the UniProt sequence for human SOCS6 (O14544) against the skate embryonic transcriptome using tblastn. Step 1 is to enter the sequence in FASTA format. The second step is to choose the tblastn program that will align the query protein sequence against translated sequences in all six possible reading frames. The third step is to select the embryonic transcriptome as the sequence database to search. The fourth step is to launch the search. B. The complete BLAST output can be accessed by clicking the “Inspect BLAST output” link at the top of the summary report page. This is necessary to examine the sequence alignments. C. Four important fields in the output should be examined carefully to interpret the alignments and determine which returned alignment best represents the skate ortholog to SOCS6. First, the alignment score, E-value, alignment length and percent identity can be used to interpret the overall alignment significance. Alignment coverage with respect to the query protein sequence and the subject transcriptome sequence can be interpreted by comparing the alignment coordinates to the length of the query protein sequence and length of the transcriptome sequence. In this example, the entire query protein sequence is covered by this transcriptome sequence. D. The SkateBase Contig Lookup tool can be used to retrieve the transcriptome sequence found in the SOCS6 tblastn search in FASTA format. Sequences from the skate genome assembly or the skate, S. canicula or C. milii transcriptome assemblies can be retrieved using this tool. E. Output from the NCBI ORF Finder tool showing a 536aa ORF in the skate transcriptome contig that best represents SOCS6 (left). Alignment from blastx search of the skate transcriptome sequence (contig 15542) against human UniProt using NCBI BLAST to validate that the contig aligned best to human SOSC6 rather than another human gene.
Figure 6. GenBank and WGS data trends for Chondrichthyes and all taxa.
GenBank is the National Institutes of Health (NIH) genetic sequence database and together with the DNA Databank of Japan (DDBJ) and the European Molecular Biological Laboratory (EMBL) comprise the International Nucleotide Sequence Database Collaboration (INSDC). The cumulative base pair total for all taxa as well as chondrichthyan only data are given versus time for GenBank and Whole Genome Shotgun (WGS) data. The Elephant Shark Genome Project is responsible for the spike in chondrichthyan GenBank in 2011. The little skate and elephant shark genome projects are currently the only two WGS datasets (yellow line).