| Literature DB >> 29415277 |
Weiping Zhang1, Yudong Li1,2, Yiwang Chen2, Sha Xu1, Guocheng Du1, Huidong Shi3, Jingwen Zhou1, Jian Chen1.
Abstract
Chinese rice wine is a popular traditional alcoholic beverage in China, while its brewing processes have rarely been explored. We herein report the first gapless, near-finished genome sequence of the yeast strain Saccharomyces cerevisiae N85 for Chinese rice wine production. Several assembly methods were used to integrate Pacific Bioscience (PacBio) and Illumina sequencing data to achieve high-quality genome sequencing of the strain. The genome encodes more than 6,000 predicted proteins, and 238 long non-coding RNAs, which are validated by RNA-sequencing data. Moreover, our annotation predicts 171 novel genes that are not present in the reference S288c genome. We also identified 65,902 single nucleotide polymorphisms and small indels, many of which are located within genic regions. Dozens of larger copy-number variations and translocations were detected, mainly enriched in the subtelomeres, suggesting these regions may be related to genomic evolution. This study will serve as a milestone in studying of Chinese rice wine and related beverages in China and in other countries. It will help to develop more scientific and modern fermentation processes of Chinese rice wine, and explore metabolism pathways of desired and harmful components in Chinese rice wine to improve its taste and nutritional value.Entities:
Keywords: annotation; genome sequence; rice wine yeast; transcriptomics
Year: 2018 PMID: 29415277 PMCID: PMC6014378 DOI: 10.1093/dnares/dsy002
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Summary of the de novo hybrid assembly results
| Library type | No. of contigsa | Maximum contig size (bp) | N50 contig (bp) | Total length (bp) | Software |
|---|---|---|---|---|---|
| HiSeq | 1,379 | 200,122 | 50,511 | 12,737,887 | CLC genomic workbench |
| MiSeq | 2,382 | 39,836 | 7,243 | 11,888,873 | A5-miseq pipeline |
| PacBio | 324 | 242,389 | 75,828 | 11,883,288 | HGAP |
| Hybrid | 601 | 27,619 | 6,928 | 3,893,255 | PBcR |
| Hybrid | 284 | 862, 32 | 201,497 | 11,857,571 | SPAdes |
| Close Gap | 204 | 1,107,090 | 477,098 | 11,917,338 | SSPACE-LongRead |
aFor each assembly, only contigs >500 bp in length were considered.
bHybrid represents the combination of HiSeq, MiSeq, and PacBio datasets.
Figure 1Dot plot of sequence similarity between the assembly scaffolds of the N85 and S288c strains. The majority of N85 assembly sequences are collinear with the chromosome of the reference S288c strain.
Figure 2Annotation of the S. cerevisiae N85 genome. Analysis of S288c homologous and non-S288c genes in S. cerevisiae N85 through different approaches. (A) Number of S288c homologous genes identified using exonerate (yellow), AUGUSTUS (red), AUGUSTUS-Tophat (blue), and TRINITY (green). (B) Number of non-S288c genes identified using AUGUSTUS (red), AUGUSTUS-Tophat (blue), and TRINITY (green). (C) Genomic architecture of non-S288c genes in N85. Gene locations are shown below each gene box. Color figures available in online version.
Figure 3General characteristics of coding and non-coding transcripts. (A) The number of different transcript variants in S. cerevisiae N85. (B) Box-plots of transcript expression levels in log2 (FPKM) units. FPKM, fragments per kilobase of exon per million reads mapped.
Genetic variations identified in the S. cerevisiae strain N85
| N85 | YHJ7-homologues | |||||
|---|---|---|---|---|---|---|
| Total | Hom | Het | Total | Hom | Het | |
| Exonic | 38,478 | 37,564 | 914 | 214 | 73 | 141 |
| Synonymous | 23,717 | 23,350 | 367 | 75 | 21 | 54 |
| Nonsynonymous | 12,380 | 12,121 | 259 | 72 | 26 | 46 |
| Frameshift | 274 | 243 | 31 | 6 | 2 | 4 |
| Nonframeshift | 1,996 | 1,748 | 248 | 55 | 20 | 35 |
| Stop gain or loss | 89 | 82 | 7 | 6 | 4 | 2 |
| Intronic | 428 | 424 | 4 | 5 | 3 | 2 |
| Intergenic | 26,996 | 25,783 | 1,213 | 1,175 | 852 | 323 |
aHom: homozygous; Het: heterozygous. YHJ7-homologs represents those SNPs that shared with YHJ7 were filtered from the total N85 SNPs.
Figure 4Genetic variation in the S. cerevisiae N85 genome. The first and second circles represent SNPs and INDELs relative to the S. cerevisiae S288c reference genome, in which the specific variation in N85 relative to YHJ7 is highlighted in red. The third and fourth cycles represent larger duplication/deletion or translocation events relative to YHJ7. Most of the structural rearrangements are localized in subtelomeric regions. Color figures available in online version.
Figure 5Screenshot of the homepage of the Huangjiu Yeast Genome Database and genome browser displaying Chromosome 1 of S. cerevisiae N85. Gene regions are represented as a horizontal box, and genetic variants are represented as blue dots. Color figures available in online version.