| Literature DB >> 28961779 |
Alex N Salazar1,2, Arthur R Gorter de Vries3, Marcel van den Broek3, Melanie Wijsman3, Pilar de la Torre Cortés3, Anja Brickwedde3, Nick Brouwers3, Jean-Marc G Daran3, Thomas Abeel1,2.
Abstract
The haploid Saccharomyces cerevisiae strain CEN.PK113-7D is a popular model system for metabolic engineering and systems biology research. Current genome assemblies are based on short-read sequencing data scaffolded based on homology to strain S288C. However, these assemblies contain large sequence gaps, particularly in subtelomeric regions, and the assumption of perfect homology to S288C for scaffolding introduces bias. In this study, we obtained a near-complete genome assembly of CEN.PK113-7D using only Oxford Nanopore Technology's MinION sequencing platform. Fifteen of the 16 chromosomes, the mitochondrial genome and the 2-μm plasmid are assembled in single contigs and all but one chromosome starts or ends in a telomere repeat. This improved genome assembly contains 770 Kbp of added sequence containing 248 gene annotations in comparison to the previous assembly of CEN.PK113-7D. Many of these genes encode functions determining fitness in specific growth conditions and are therefore highly relevant for various industrial applications. Furthermore, we discovered a translocation between chromosomes III and VIII that caused misidentification of a MAL locus in the previous CEN.PK113-7D assembly. This study demonstrates the power of long-read sequencing by providing a high-quality reference assembly and annotation of CEN.PK113-7D and places a caveat on assumed genome stability of microorganisms. © FEMS 2017.Entities:
Keywords: Saccharomyces cerevisiae; genome assembly; long-read sequencing; nanopore sequencing; yeast
Mesh:
Year: 2017 PMID: 28961779 PMCID: PMC5812507 DOI: 10.1093/femsyr/fox074
Source DB: PubMed Journal: FEMS Yeast Res ISSN: 1567-1356 Impact factor: 2.796
Comparison of 454/Illumina and nanopore de novo assemblies of CEN.PK113–7D.
| CEN.PK113–7D | CEN.PK113–7D | ||
|---|---|---|---|
| Delft | Frankfurt | ||
|
|
|
|
|
|
| 414 | 24 | 20 |
|
| 0.210 Mbp | 1.08 Mbp | 1.50 Mbp |
|
| 0.001 Mbp | 0.013 Mbp | 0.085 Mbp |
|
| 0.048 Mbp | 0.736 Mbp | 0.912 Mbp |
|
| 11.4 Mbp | 11.9 Mbp | 12.1 Mbp |
Summary of de novo assembly metrics of CEN.PK113–7D Delft and CEN.PK113–7D Frankfurt. For the short-read assembly, only contigs of at least 1 Kbp are shown (Nijkamp et al.2012). The nanopore assembly of CEN.PK113–7D Delft is uncorrected for misassemblies while CEN.PK113–7D Frankfurt was corrected for misassemblies.
Figure 1.Overview of gained and lost sequence and genes in the CEN.PK113–7D Frankfurt nanopore assembly relative to the short-read CEN.PK113–7D assembly and to the genome of S288C. The two unplaced subtelomeric contigs and the mitochondrial DNA were not included in this figure. (A) Chromosomal location of sequence assembled in the nanopore assembly which was not assembled using short-read data. The 16 chromosome contigs of the nanopore assembly are shown. Chromosome XII has a gap at the RDN1 locus, a region estimated to contain more than 1 Mbp worth of repetitive sequence (Venema and Tollervey 1999). Centromeres are indicated by black ovals, gained sequence relative to the short-read assembly is indicated by black marks and 46 identified retrotransposon Ty-elements are indicated by blue marks. The size of all chromosomes and marks is proportional to their corresponding sequence size. In total, 611 Kbp of sequence was added within the chromosomal contigs. (B) Relative chromosome position of sequences and genes assembled on chromosome contigs of the nanopore assembly which were not assembled using short-read data. The positions of added sequence and genes were normalised to the total chromosome size. The number of genes (red) and the amount of sequence (cyan) over all chromosomes are shown per 10th of the relative chromosome size. (C) Relative chromosome position of gene presence differences between S288C and CEN.PK113–7D. The positions of the 45 genes identified as unique to CEN.PK113–7D and of the 44 genes identified as unique to S288C were normalised to the total chromosome size. The number of genes unique to CEN.PK113–7D (red) and to S288C (purple) are shown per 10th of the relative chromosome position.
Presence in the nanopore assembly of genes identified as absent in CEN.PK113–7D in previous research.
| Not analysed | Absent in assembly | Present in assembly | |
|---|---|---|---|
| Daran-Lapujade |
|
|
|
| Nijkamp |
|
|
|
For genes identified as absent in CEN.PK113–7D in two previous studies, the absence or presence in the nanopore assembly of CEN.PK113–7D is shown. A total of 25 genes were identified previously by aCGH (Daran-Lapujade et al.2003) and 21 genes were identified by short-read genome assembly (Nijkamp et al.2012). Genes that were not annotated by MAKER2 in S288C could not be analysed. Genes with an alignment to genes identified as missing in the nanopore assembly of at least 50% of the query length and 95% sequence identity were confirmed as being absent, while those without such an alignment were identified as present. The presence of these genes was verified manually, which revealed the misanotation of YPL277C as YOR389W.
Figure 2.LEU2 and NFS1 duplication in chromosome VII of CEN.PK113–7D. The nanopore assembly contains a duplication of LEU2 and part of NFS1 in CEN.PK113–7D. In S288C, the two genes are located in chromosome III next to a Ty element. In CEN.PK113–7D, the two genes are present in chromosome III and in chromosome VII. The duplication appears to be mediated by Ty-elements. Note that the additional copy in chromosome VII is present in between two Ty-elements and contains only the first ∼500 bp of NFS1. The duplication is supported by long-read data that span across the LEU2, NFS1, the two Ty-elements and the neighbouring flanking genes (not shown).
Figure 3.Overview of chromosome structure heterogeneity in CEN.PK113–7D Delft for CHRIII and CHRVIII that led to the misidentification of a fourth MAL locus in a previous short-read assembly study of the genome of CEN.PK113–7D. Nanopore reads support the presence of two chromosome architectures: the normal chromosomes III and VIII (left panel) and translocated chromosomes III–VIII and VIII–III (right panel). The translocation occurred in Ty-elements, large repetitive sequences known to mediate chromosomal translocations in Saccharomyces species (Fischer et al.2000). Long reads are required to diagnose the chromosome architecture via sequencing: the repetitive region between KCC4 and NFS1 in chromosome III exceeds 15 Kbp, while the region between SPO13 and MIP6 in chromosome VIII is only 1.4 Kbp long. For the translocated architecture, the region from NFS1 to MIP6 in chromosome III–VIII exceeds 16 Kbp and the distance from SPO13 to KCC4 in chromosome VIII–III is nearly 10 Kbp.