| Literature DB >> 27738138 |
Marek S Skrzypek1, Jonathan Binkley1, Gail Binkley1, Stuart R Miyasato1, Matt Simison1, Gavin Sherlock2.
Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27738138 PMCID: PMC5210628 DOI: 10.1093/nar/gkw924
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of Assembly 22 features corrected, by error type (in either or both haplotypes)
| Error Type | Features Corrected |
|---|---|
| Boundary/annotation | 455 |
| Ambiguous sequence | 268 |
| Nonsense codons | 48 |
| Missing stop codons | 46 |
| Misc. coding sequence | 8 |
| Misc. non-coding sequence | 5 |
| Missing start codons | 2 |
Figure 1.JBrowse Visualization of RNA-Seq data at CGD. JBrowse display of the region around the C. albicans serum-inducible gene HWP1, showing aligned RNA-Seq reads from serum-treated cells (37). The red and blue bars in the top track of the main display window show genes annotated at CGD: red for genes encoded on the ‘W’ strand (+), blue for genes on the ‘C’ strand (–). HWP1 is the second gene from the left. Clicking on a bar brings up an information window for that gene, and includes a link to its CGD Locus Summary Page. The green bar graph below the gene track shows the density of aligned RNA-Seq reads along to the chromosome, plotted on a log scale. The bottom track shows all the aligned RNA-Seq reads along the chromosome: each short bar in the bottom track represents a unique read. In this example the sequence reads are strand-specific: pink bars indicate reads transcribed from the ‘W’ strand, and light-blue bars indicate reads transcribed from the ‘C’ strand. Clicking on a bar brings up information about the read, including the sequence and quality score for each base. Menus and controls at the top of the browser provide navigation, zoom and search functionalities, and allow users to load their own data.