| Literature DB >> 23143105 |
Shuai Zhan1, Steven M Reppert.
Abstract
The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly's migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations.Entities:
Mesh:
Year: 2012 PMID: 23143105 PMCID: PMC3531138 DOI: 10.1093/nar/gks1057
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data content in current version of MonarchBase
| Genome reference | |
| Assembly (v3) | 5397 scaffolds spanning 248.6 Mb genome with 6.7 Mb as gaps |
| Repeat | 121 269 repetitive elements spanning 25.3 Mb genome |
| Gene repertoire | |
| Official geneset (OGS2.0) | 15 130 |
| GLEAN consensus set | 16 216 |
| Maker consensus set | 13 744 |
| AUGUSTUS | 14 550 |
| GeneMark | 27 256 |
| Genscan | 12 921 |
| Glimmer | 23 898 |
| SNAP | 25 758 |
| RNAseq assembly | 18 563 genes with 23 543 alternative transcripts |
| Annotation for OGS2.0 | |
| Public databases | 12 943 |
| Lepidoptera genesets | 13 572 |
| GO term | 8120 genes assigned with 1539 GO terms |
| InterPro domain | 10 034 genes assigned with 5069 domains |
| KO | 8157 genes assigned with 3856 KO terms |
| 4708 genes assigned into 264 pathways | |
| Ortholog group | 198 021 proteins from 15 species assigned into 34 392 ortholog groups |
| Non-coding RNAs | |
| MicroRNA | 116 |
| Transfer RNA | 379 |
| Ribosome RNA | 127 |
| Other resources | |
| Brain ESTs | 9484 |
| ESTs with microarray data | 9417 |
aPublic databases used for annotating monarch genes include RefSeq (5), UniRef50 (6) and non-redundant database of NCBI.
bLepidopteran genesets include Bombyx geneset (7) and Heliconius geneset (8).
ctRNAs were predicted by tRNAscan-SE (9).
drRNAs were predicted by RNAmmer (10) or Rfam scan pipeline (11) following the default settings.
Quality control of the latest monarch assembly v3 compared with v1 and the other lepidopterans
| Assembly statistics | ||||
| L50 (bp) | 715 606 | 53 032 | 194 302 | 3 998 728 |
| N50 | 101 | 1138 | 345 | 38 |
| L90 (bp) | 160 499 | 6262 | 38 051 | 60 675 |
| N90 | 366 | 7140 | 1634 | 260 |
| CEGMA analysis for 248 ultra-conserved CEGs present in genome | ||||
| # Complete | 230 | 229 | 214 | 195 |
| # Partial | 243 | 241 | 237 | 241 |
| Homologs in | ||||
| # Recovered | 9655 | 9653 | 9539 | 9524 |
| Average coverage | 55.2% | 54.5% | 53.0% | 52.8% |
| Homologs in | ||||
| # Recovered | 11 015 | 11 017 | 10 915 | 10 983 |
| Average coverage | 63.8% | 63.0% | 61.9% | 61.3% |
| Homologs in | ||||
| # Recovered | 13 010 | 12 996 | 12 820 | — |
| Average coverage | 84.3% | 83.1% | 82.4% | — |
| Homologs in | ||||
| # Recovered | 12 860 | 12 840 | — | — |
| Average coverage | 86.5% | 84.9% | — | — |
aThe Heliconius assembly used here is the latest version available for downloading from http://butterflygenome.org/, date to June 1, 2012, though a better N50 value (277 kb) was reported on a linkage-based improved version (8), which was not available to us.
bThe Bombyx assembly (7) was downloaded from SilkDB 2.0 (18).
cFor quantitative statistics of assembly, N50 indicates that half of the total sequence in the assembly is presented by a total of N50 scaffolds of length more than or equal to the L50 size; in a similar way, N90 and L90 indicates how 90% of sequence is presented in the assembly.
dStatistics of the complete and partial presence of 248 ultra-conserved CEGs were calculated by CEGMA pipeline v2.4 following the default settings (19).
eDrosophila geneset r5.36 is from FlyBase (20) and only the longest protein per gene was used for analysis. Recovered queries were automatically calculated by GenBlastA (21) as follows: genblasta_v1.0.4_linux_x86_64 -P blast -pg tblastn -p T -e 1e-5 -g T -f F -a 0.5 -r 1 -c 0.5, output then was processed by a custom Perl script to sort out coverage on a single scaffold.
fTribolium geneset 3.0 is from BeetleBase (22) and analyzed as described earlier.
gBombyx geneset is from SilkDB 2.0 (18) and analyzed as described earlier.
hHeliconius geneset 1.1 (8) is from http://butterflygenome.org/ and analyzed as described earlier.
Figure 1.Schematic view of the components of MonarchBase and their connections. The green arrows represent the clickable connections between the components. Thin arrows represent the major entrances of MonarchBase accepting users’ input to retrieve data: black arrows indicate the sequence inputs; blue arrows indicate ID inputs; red arrows indicate keyword inputs; and purple arrows indicate browsing menus.