| Literature DB >> 22434832 |
Qinghua Wang1, Cecilia N Arighi, Benjamin L King, Shawn W Polson, James Vincent, Chuming Chen, Hongzhan Huang, Brewster F Kingham, Shallee T Page, Marc Farnum Rendino, William Kelley Thomas, Daniel W Udwary, Cathy H Wu.
Abstract
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.Entities:
Mesh:
Year: 2012 PMID: 22434832 PMCID: PMC3308154 DOI: 10.1093/database/bar064
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Little Skate Genome Project overview, illustrating the North East Cyberinfrastructure Consortium's distributed and collaborative resources.
SkateBase components
| Component | Description | Public access |
|---|---|---|
| Informational | Basic information about the project goals and current status | Y |
| Training | Dissemination of tutorials, educational materials, annotation guidelines and SOPs | N |
| Download | Repository for project sequence and annotation data | Y |
| Tools | ||
| Genome browsers | Analysis of genomic context | Y |
| SkateBLAST | Searching and download of genomic contigs and features | Y |
| SkateBase community | Connectivity and coordination of community annotation activities | N |
| File exchange | Sharing of raw and analyzed high-throughput sequence data | N |
| RACE-P | Community annotation of proteins | Y |
aFeature under development for future public release.
Figure 2.Little Skate Genome Project's timeline indicating the simultaneous annotation training and genome development. Sequencing Data Sets I: seven lanes of paired-end reads; II: four lanes of paired-end reads; III: two lanes of mate-pair reads; IV: five lanes of paired-end reads; V: three lanes of mate-pair reads. There are a total of 2 931 925 134 reads.
Figure 3.Mitochondrial genome annotation jamboree workflow. Curators from each state worked independently for ∼2 weeks before submitting results to project leaders for review.
Figure 4.Leucoraja erinacea mitochondrial genome. (A) Leucoraja erinacea mitochondrial genome with the consensus annotation for genes and other sequence features generated using CGView (29). The orientation of genes is shown with arrow heads. The tRNA genes are shown in pink, rRNA genes in purple and protein-coding genes in grey. The first inner circle shows the GC content above and below the average GC content for the mitochondrion in black. Positive GC skew is shown in green and negative in magenta. (B) The mitochondrial genomes of L. erinacea, A. radiata and O. kenojei are displayed using Mauve (16, 17), with rRNA features in red, tRNA features in green, protein-coding regions in white, and miscellaneous features in blue. The pink profiles indicate the sequence identity levels among the three genomes.