| Literature DB >> 31106360 |
Leiming You1,2, Jiaqi Chi3, Shengfeng Huang2, Ting Yu2, Guangrui Huang1, Yuchao Feng2, Xiaopu Sang1, Xinhui Gao1, Ting'an Li1, Zirui Yue2, Aijie Liu1, Shangwu Chen2, Anlong Xu1,2.
Abstract
Lancelet (amphioxus) represents the most basally divergent extant chordate (cephalochordates) that diverged from the other two chordate lineages (urochordates and vertebrates) more than half a billion years ago. As it occupies a key position in evolution, it is considered as one of the best proxies for understanding the chordate ancestral state. Thus, the construction of a database with multiple lancelet genomes and gene annotation data, including protein domains, is urgently needed to investigate the loss and gain of domains in orthologues among species, especially ancient domain types (non-vertebrate-specific domains) and novel domain combination, which is helpful for providing new insight into the chordate ancestral state and vertebrate evolution. Here, we present an integrated genome database for lancelet, LanceletDB, which provides reference haploid genome sequence and annotation data for lancelet (Branchiostoma belcheri), including gene models and annotation, protein domain types, gene expression pattern in embryogenesis, different expression sequence tag sets and alternative polyadenylation (APA) sites profiled by the sequencing APA sites method. Especially, LanceletDB allows comparison of domain types and combination in orthologues among type species so as to decode the ancient domain types and novel domain combination during evolution. We also integrated the released diploid lancelet genome annotation data (Branchiostoma floridae) to expand LanceletDB and extend its usefulness. These data are available through the search and analysis page, basic local alignment search tool page and genome browser to provide an integrated display.Entities:
Mesh:
Year: 2019 PMID: 31106360 PMCID: PMC6526094 DOI: 10.1093/database/baz056
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Datasets listed by species in LanceletDB website
|
|
|
|
|
|---|---|---|---|
|
| B.belcheri_HapV2(v7h2)_genome | 5679 | Reference haploid genome assembly (v7h2) for |
|
| B.belcheri_HapV2(v7h2)_cds | 35 293 | Non-redundant transcript set for |
|
| B.belcheri_HapV2(v7h2)_proteins | 35 293 | Non-redundant protein set for |
|
| B.belcheri_v18h27.r3_ref_genome | 2307 | Reference haploid genome assembly (v18h27) for |
|
| B.belcheri_v18h27.r3_ref_cds | 37 646 | Non-redundant transcript set for |
|
| B.belcheri_v18h27.r3_ref_protein | 37 646 | Non-redundant protein set for |
|
| B.belcheri_v7h2_polyA_ | 51 931 | Describing APA sites, 3'-UTRs and heterogeneous cleavage sites, in the intestine of lancelet infected by |
|
| B.belcheri_454EST_ | 223 103 | ESTs from intestine of |
|
| B.belcheri_454EST_intestine | 170 667 | ESTs from intestine of |
|
| B.belcheri_454EST_embryo-mix | 1 097 418 | ESTs in |
|
| B.belcheri_454EST_gill | 467 739 | ESTs from gill of |
|
| B.belcheri_454EST_liver | 451 959 | ESTs from liver of |
|
| B.belcheri_454EST_xiamen-beihai-merged_adult | 98 118 | ESTs from |
|
| B.belcheri_sangerEST_xiamen_adult | 4074 | ESTs from |
|
| B.tsing_sangerEST_qingdao_adult | 24 412 | ESTs from |
|
| B.floridae_ESTs_embryogenesis-gastrula | 262 037 | ESTs from gastrula in |
|
| B.floridae_v1.allmasked | 3032 | Reference diploid assembly for |
|
| B.floridae_v1_anno.transcripts | 50 815 | Transcript model for |
|
| B.floridae_v1_anno.proteinsa | 50 815 | Protein model for |
aGenerated APA dataset integrated into our APASdb (http://genome.bucm.edu.cn/utr)
bReleased datasets from JGI site.
Figure 1Overview of LanceletDB website. (A) Outline of LanceletDB building pipeline. Data flow is indicated by arrowed lines. (B) Architecture of LanceletDB website. Arrows denote direction of information flow, and several output pages are shown, including the popular genome browser (Gbrowse) and developmental presentations termed `Transcript Detail’, `Expression Pattern’ and `Orthologues among Species’.
Figure 2Screenshot of searching page and media page resulting from fuzzy query keyword of ‘NLRP’. (A) User retrieval interface designed to query datasets. (B) Descriptive list of datasets in retrieval interface. The list summarizes released datasets and directs user query. The ‘view’ button supports quick access to an example query of dataset and the ‘chr’ button links browsing of the dataset in a genome browser (Gbrowse). (C) List of gene models matching fuzzy keyword of ‘NLRP’. Texts in ‘locus’ column can guide users to specified URLs to browse gene models in genome browser. For the example mentioned here, it is available at http://genome.bucm.edu.cn/lancelet/search.php?seqkeywords=NLRP&db=Transcripts/B.belcheri_HapV2(v7h2)_cds.
Figure 3Exon structure of ssr4 gene model and its expression pattern during lancelet embryonic development. (A) Picture layer for tracking exon mapping. (B) Picture layer for tracking RNA-seq read mapping coverage. Reads generated from samples involved in lancelet embryogenesis, including oosperm (0 hpf), 4–8 cells (0.5 hpf), blastula (4 hpf), cap gastrula (5 hpf), cup gastrula (6 hpf), late neurula (20 hpf), 1-gill slit (30 hpf) and larve (6 dpf). (C) Picture layer for tracking ESTs to support gene model. (D) Picture layer for tracking BLAST alignment of Florida lancelet protein model to Belcheri’s genome. (E) Picture layer for tracking APA sites and poly(A) signals mapped to searched gene model. APA sites were identified by SAPAS method. (F) Bar chart indicating expression pattern of ssr4 during lancelet embryogenesis for 0 hpt to 6 dpf. Time points approximately correspond to major development stages such as oosperm, 4–8 cells, blastula, cap gastrula, cup gastrula, late neurula, 1-gill slit and larve. h/dpf, hours/days post fertilization; FPKM, Fragments per Kilobase Million. For direct browsing the example mentioned here, readers are asked to refer to http://genome.bucm.edu.cn/lancelet/search.php?seqkeywords=ssr4&db=Transcripts/B.belcheri_HapV2(v7h2)_cds.
Figure 4Screenshot of detail page with unfolded `Orthologues among Species’ tab to compare domain types and combination of myd88 orthologues between lancelet and other species. Pictures detail domain types and combination in identified myd88 orthologues among species (left). The myd88 orthologue ids (available in other resources) and length, including matched InterPro domains (ids, locus and description), are listed by species (right). IPR000488 labels death domain; IPR000157 labels Toll/interleukin-1 receptor homology (TIR) domain. For direct browsing the example described here, readers can refer to http://genome.bucm.edu.cn/lancelet/search.php?seqkeywords=Bb_172050R&db=Transcripts/B.belcheri_Hap V2(v7h2)_cds.