Literature DB >> 34849826

Chromosome assembled and annotated genome sequence of Aspergillus flavus NRRL 3357.

Jeffrey M Skerker¹, Kaila M Pianalto¹, Stephen J Mondo^2,3, Kunlong Yang⁴, Adam P Arkin^1,5, Nancy P Keller⁴, Igor V Grigoriev^2,6, N Louise Louise Glass^1,6.

Abstract

Aspergillus flavus is an opportunistic pathogen of crops, including peanuts and maize, and is the second leading cause of aspergillosis in immunocompromised patients. A. flavus is also a major producer of the mycotoxin, aflatoxin, a potent carcinogen, which results in significant crop losses annually. The A. flavus isolate NRRL 3357 was originally isolated from peanut and has been used as a model organism for understanding the regulation and production of secondary metabolites, such as aflatoxin. A draft genome of NRRL 3357 was previously constructed, enabling the development of molecular tools and for understanding population biology of this particular species. Here, we describe an updated, near complete, telomere-to-telomere assembly and re-annotation of the eight chromosomes of A. flavus NRRL 3357 genome, accomplished via long-read PacBio and Oxford Nanopore technologies combined with Illumina short-read sequencing. A total of 13,715 protein-coding genes were predicted. Using RNA-seq data, a significant improvement was achieved in predicted 5' and 3' untranslated regions, which were incorporated into the new gene models.

Entities: Chemical

Keywords: zzm321990 Aspergillus flavuszzm321990 ; NRRL 3357; Nanopore; PacBio; genome sequence

Mesh：

Substances：
Aflatoxins

Year: 2021 PMID： 34849826 PMCID： PMC8496237 DOI： 10.1093/g3journal/jkab213

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Introduction

Aspergillus flavus is an opportunistic plant pathogen, human pathogen, and a saprophyte. Agriculturally, A. flavus colonizes crops such as maize, peanuts, and cotton, both pre- and post-harvest (Klich 2007). While colonizing these crops, A. flavus produces aflatoxin, which is both toxic to mammals and a potent carcinogen (Williams ; Liu and Wu 2010; Wild and Gong 2010), chronically impacting an estimated 4.5 billion people (CDC 2016). In the United States, mycotoxin production causes estimated yearly agricultural losses in corn ranging from tens of millions to over $1 billion (Mitchell ). In addition to mycotoxin production, A. flavus is also a leading cause of invasive aspergillosis in humans, as well as the leading cause of fungal sinusitis and keratitis in tropical climates (Krishnan ; Pasqualotto 2009; Rudramurthy ). The A. flavus isolate NRRL 3357 produces high levels of aflatoxin and has been developed as a model for the development of molecular tools and for dissecting the regulation and production of aflatoxin (Georgianna and Payne 2009; Amaike and Keller 2011). Previously, the genome sequence of A. flavus NRRL 3357 was sequenced using the whole-genome shotgun method (Nierman ) and assembled into 958 contigs comprising 331 scaffolds. To enhance genome-wide studies in A. flavus for both functional and population genomics studies, we sought to generate a more complete genome assembly and annotation. Here, we report an updated, near complete, telomere-to-telomere assembly of the A. flavus strain NRRL 3357 genome, with 8 scaffolds corresponding to the 8 chromosomes of this species. Genome annotation, assisted by publicly available RNA-seq data, yielded greater resolution of 5’ and 3’ UTRs in this organism, with nearly half of genes containing annotated UTRs. We manually curated over 200 previously published genes, verifying that new predicted gene models corresponded with prior gene models and RNA-seq datasets.

Materials and methods

Fungal strain culture and DNA extraction

A. flavus NRRL 3357 was originally isolated from peanut, with our sample originating in the Keller lab strain collection (Hesseltine ). Conidia from a week-old culture were grown in glucose minimal medium + 0.5% yeast extract at 30°C overnight. Genomic DNA isolation for genome sequencing of A. flavus was previously reported (Drott ). Briefly, powdered lyophilized mycelia were resuspended in LETS buffer (20 mM EDTA pH 8.0, 0.5% SDS, 10 mM Tris-HCl pH 8.0, and 0.1 M LiCl). Genomic DNA was extracted using phenol: chloroform: isoamyl alcohol (25:24:1), followed by ethanol precipitation. gDNA was collected either by spooling or centrifugation, then washed with 70% ethanol and allowed to air dry. gDNA was resuspended in 10 mM Tris-HCl (pH 8.0) + 3.33 μg/mL RNase A and heated at 65°C for 30 minutes.

Sequencing

DNA quality control, library preparation for PacBio and Illumina, and sequencing were performed at the Vincent Coates Genomics Sequencing Laboratory at the University of California, Berkeley. For the PacBio sequencing, DNA quantification and quality control were performed using a Femto Pulse System (Agilent Technologies), and high molecular weight DNA (∼130 kbp average size) was used to construct BluePippin (Sage Science) size-selected (>30-kbp) SMRTbell libraries (PacBio). PacBio libraries were sequenced on the Sequel Platform using the S/P2-C2 polymerase and version 5.0 chemistry on four Sequel SMRT cells, generating a total of ∼1.8 M reads with average read length of 13,122 bp. For Illumina sequencing, small (∼540 bp) insert libraries were generated using KAPA DNA HyperPrep kit with PCR-free protocol (Roche), generating ∼319 M PE150 reads. Oxford Nanopore libraries were generated using ∼15 μg of high molecular weight DNA and the SQK-LSK308 kit. Oxford Nanopore sequencing was performed in-house using the minION platform and a combination of live base calling using minKNOW App (v1.11.5) and offline base calling using Albacore App (v2.3.3) (Oxford Nanopore). Three FLO-MIN107 flow cells (version 9.5.1 pore chemistry) were used to generate ∼1.4 M reads with average length of 3050 bp. The assembled genome sequence has been used to assess population genomics of A. flavus (Drott et al.2020, 2021).

Genome assembly and annotation

The combined PacBio and Oxford Nanopore long-read datasets were used to generate a hybrid de novo assembly using the CANU assembler (v.1.7.1) (Koren ) with default settings, except the genome size was set to 40 Mbp and stopOnReadQuality set to “false.” The final read depth coverage, after filtering for reads <1 kbp, was ∼700X. The CANU scaffolds were polished using the PacBio Sequel data and a combination of pbalign (v.0.3.1), blasr (v.5.3), and arrow (v.2.2.2) from the SMRT Link package (v.5.1.0.26412, PacBio). A final error correction step was performed using the Illumina data (∼650X coverage) with a combination of bwa (v.0.7.17) (Li 2013), samtools (v.1.9) (Li ), and pilon (v.1.22) (Walker ). For the most accurate final assembly of the A. flavus genome, Illumina data and at least one round of pilon correction were needed. For the discovery of de novo repeats Repeatscout v1.0.5 (Price ) was used and for repeat masking, RepBase (Bao ) and RepeatMasker (Smit ) were used. Annotation of the genome was performed using the Joint Genome Institute (JGI) Genome Annotation pipeline (Grigoriev ) using publicly available RNA-seq data (SRA datasets: SRR2632952, SRR2632961, SRR2632962, SRR2632963, SRR2632966, SRR2633059, SRR2633060, SRR2633061, SRR2633139, SRR5061895, SRR5061899, SRR5061903, SRR5061905, SRR5061908, SRR5061909, SRR544871, SRR544872, SRR544873, SRR8115610, SRR8115611, SRR8115612, SRR8115613, SRR8115614, and SRR8115615). Previously produced gene models available from FungiDB (Stajich ) were mapped forward to the new assembly. The final annotation included 13,715 gene models, of which 43.29% represent previously produced models mapped forward from FungiDB (fungidb.org/), while the remainder of gene models were updated/improved based on transcriptomics data and the improved assembly. For predicted short genes (i.e., <200AA), we evaluated predicted annotations including signal peptides, transmembrane domains, InterPro domain annotations, or support through self-clustering. If any of these types of support were detected, short gene models were retained in our final gene set. In addition, although alternative splice forms were not included in our final gene catalog, RNA-seq based models from tools like COMBEST (Zhou ) are available as tracks on the Aspfl2_3 genome browser https://mycocosm.jgi.doe.gov/Aspfl2_3 and enable reconstruction of alternative splice forms.

Data availability

The whole-genome assembly and annotation and the A. flavus mitochondrial DNA sequence are available from the JGI MycoCosm portal (Grigoriev ) at https://mycocosm.jgi.doe.gov/Aspfl2_3 and have been deposited at GenBank under accession numbers CP044616-CP044623, Bioproject accession number PRJNA575750. Raw sequencing reads have been deposited under SRA project accession number PRJNA637788. Manually curated genes and references are included in Supplementary Table S1. Supplementary material is available at figshare: https://doi.org/10.25387/g3.14738154.

Results and discussion

The final genome assembly of NRRL 3357 resulted from a combined long- (PacBio SMRT and Oxford Nanopore) and short-read (Illumina) sequencing methods was 37.75 Mbp in 8 contigs, a significant improvement from the previous assembly which contained 331 scaffolds (Table 1) (Nierman ) and a genome size slightly less the 37 Mbp. The industrially relevant species, Aspergillus oryzae, is closely related to A. flavus and A. flavus chromosome names are based on the A. oryzae genome (Machida ; Figure 1). The final sequencing read depth was 650X, and the average GC content across the genome was 47.34%. Seven of eight chromosomes are represented by complete telomere-to-telomere assemblies. It was not possible to complete the assembly of the right end of chromosome 7 due to the presence of a large rDNA repeat. The new assembly increased in size relative to the original assembly (Nierman ) primarily due to the improved assembly of repetitive regions other than the rDNA region, resulting in an increase from 1.1% of the genome to 3.47% (Table 1), including a significant increase in Mariner Tc1 repeats [43% in previous assembly (Nierman ) compared to 83.18% in the current assembly]. We also identified 15 out of 16 telomeres as well as predicted centromeric regions (Figure 1). Each chromosome was flanked by 10–13 telomeric repeats “TTAGGGTCAACA” that were identical to those identified in the A. oryzae (Kusumoto ). In filamentous fungi, centromeric regions have high AT content, are ∼100 kb and typically lack coding regions (Smith ). In the assembled A. flavus genome, predicted centromeric regions were identified using these criteria and were ∼100 kbp, but some chromosomes had additional AT-rich regions surrounding predicted centromeric regions (Figure 1). Verification of functional centromeric regions of the A. flavus genome would require further experimentation, for example, the identification of the specialized histone H3 variant [called CenH3 in Neurospora crassa) (Smith )], a so-called universal “centromere identifier.” In our new assembly, the mitochondrial DNA was identical to that previously published (Joardar ).

Table 1

Summary of assembly and annotation statistics of the A. flavus NRRL 3357 genome

Assembly statistics	Illumina assembly (Nierman 2015 )^a	Illumina assembly, reannotated (Hatmaker 2020 )^b	Mixed sequencing assembly (this work)
Genome size (Mbp)	36.89	36.89	37.75
Coverage	5X	5X	650X
Number of scaffolds/contigs	958/331	958/331	8
L50	6	6	4
N50, Mbp (Scaffold)	2.39	2.39	4.81
Complete chromosomes	—	—	8
% genes by BUSCO assessment^c: Fungi set
Single-copy	92.70	96.30	98.70
Duplicated	0.30	1.70	0.50
Fragmented	3.00	0.90	0.00
Annotation statistics
Number of predicted protein coding genes	13,485	14,313	13,715
Percent gene models with annotated UTRs	9.25%	—	44.43%
Predicted secondary metabolite clusters	56		83^d
Predicted CAZymes	627		644
Total repeat length (bp)	404,315 (1.1%)		1,311,342 (3.47%)

Nierman et al. (2015).

Hatmaker et al. (2020).

Seppey et al. (2019).

Drott et al. (2021).

Figure 1

1Figure generated using Geneious Prime v.2021.0.3; www.geneiousprime.com.

Graphic representation of the 8 NRRL 3357 Aspergillus flavus assembled chromosomes. Green traces represent AT content and dark blue traces represent GC content. Teal-colored arrows indicate predicted coding regions1. 1Figure generated using Geneious Prime v.2021.0.3; www.geneiousprime.com. Summary of assembly and annotation statistics of the A. flavus NRRL 3357 genome Nierman et al. (2015). Hatmaker et al. (2020). Seppey et al. (2019). Drott et al. (2021). Our final annotation contained 13,715 predicted protein coding genes, of which 43.29% of the final filtered gene model set represent previously produced models mapped forward from FungiDB (https://fungidb.org/). The remaining gene models were updated and improved based on the improved assembly and on publicly available transcriptomics data (see Materials and Methods). This gene model count was slightly lower than a recently updated, transcriptome-based annotation of the 331 scaffolds of the A. flavus 2015 assembly (Nierman ), which relied on RNA isolated from A. flavus in six conditions and used a combination of different ab initio gene predictors (Hatmaker ). A BUSCO v5.0 (Seppey ) analyses, calculated using the Fungi dataset, revealed that 98.7% of BUSCO genes were captured in single copy in our new assembly and annotation, vs 96.3% in the Hatmaker and 92.7% from Nierman versions (Table 1). In addition, fragmented and duplicated genes were less frequent in our version relative to other annotations. The total number of gene models in our version (13,715 genes) and the Nierman et al., annotation (13,485 genes) (Nierman ) was slightly lower than gene models predicted by Hatmaker et al., (14,313 genes) (Hatmaker ; Table 1). This discrepancy may partially result from our filtering methods, which aimed to remove transposable elements and short, unsupported gene models from the final gene catalog. We also used publicly available transcriptomic data to improve the annotation of 5’ and 3’ UTRs on our filtered gene models, which were absent or rarely predicted in other models. We UTRs for 44.43% of our models (6093 genes), a significant increase from the 9.25% UTRs identified from previous studies. To validate updated annotation methods using the mapped-forward gene models, we chose a set of 172 genes that had been described in the literature to manually curate and to compare our new JGI models with FungiDB gene models (Supplementary Table S1). Within this set, we used publicly available RNA-seq data to provide 5’ and 3’ untranslated regions (UTRs) information and homology modeling to confirm gene models. In 68/172 genes, the models did not change, while for an additional 47 genes, the gene predictions were 100% identical, but UTRs were added (Supplementary Table S1). In 57 cases, the existing FungiDB gene model was corrected using newly available RNAseq and homology evidence. For a few gene models, it was difficult to define the correct model, primarily due to the absence of RNA-seq data or differences in predicted gene models when orthologs in other species were evaluated. In the Mycocosm portal (https://mycocosm.jgi.doe.gov/Aspfl2_3/Aspfl2_3.home.html), we provide a link to the closest FungiDB AFLA model for users to compare the new models with the previous annotation. Overall, our mapped-forward gene models, using the complete genome assembly and RNA-seq data provided more complete gene models for A. flavus and with significantly more 5’ and 3’ UTR information compared to previous versions. As A. flavus is a plant pathogen and saprophyte, carbohydrate active enzymes (CAZymes) are important for this fungus’s ability to grow on plant material. With our updated annotation, we observed a slight increase in the number of predicted CAZymes, from 627 to 644, in the updated A. flavus genome annotation as predicted by CAZy Database (Lombard ). A. flavus also produces a wealth of bioactive secondary metabolites from biosynthetic gene clusters (BGCs); the function and structure of these secondary metabolites is an active area of research with this organism (Greco ; Keller 2019). Through prior studies, 56 secondary metabolite clusters have been identified in the A. flavus genome (Georgianna ; Marui ; Amare and Keller 2014; Nierman ). Using this current A. flavus genome assembly for NRRL 3357, coupled with assessing BGC diversity in A. flavus populations, 83 BCGs were identified in the A. flavus NRRL 3357 genome (Drott ). The genomic positions for these 83 BCGs in our assembled A. flavus NRRL 3357 genome are available as supplemental data from Drott . In summary, we provide an improved, near-complete, telomere-to-telomere genome assembly for A. flavus NRRL 3357, resulting from combined Illumina, Oxford Nanopore, and PacBio SMRT sequencing data. This updated assembly has been useful for assessing population genomics of 94 isolates of A. flavus isolated from different geographic locations in the United States (Drott ) and BCG diversity (Drott ). The genome assembly consists of 8 contigs representing 8 chromosomes, with 15 of 16 telomeric repeats and all centromeric sequences identified and assembled. The total genome assembly size is 37.75 Mbp, and the updated annotation, supported by RNA-seq and homology data, yielded 13,715 predicted protein-coding gene models.

Funding

Funding for this project was provided by a grant to N.L.G., J.M.S., and A.P.A. through the Innovative Genomics Institute at University of California, Berkeley. Genome annotation performed by the Joint Genome Institute, a Department of Energy (DOE) Office of Science User Facility, was supported by the Office of Science of the US DOE under Contract no. DE-AC02-05CH11231.

Conflicts of interest

None declared.

35 in total

1. Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors: Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal: Mob DNA Date: 2015-06-02

2. BUSCO: Assessing Genome Assembly and Annotation Completeness.

Authors: Mathieu Seppey; Mosè Manni; Evgeny M Zdobnov
Journal: Methods Mol Biol Date: 2019

3. Beyond aflatoxin: four distinct expression patterns and functional roles associated with Aspergillus flavus secondary metabolism gene clusters.

Authors: D Ryan Georgianna; Natalie D Fedorova; James L Burroughs; Andrea L Dolezal; Jin Woo Bok; Sigal Horowitz-Brown; Charles P Woloshuk; Jiujiang Yu; Nancy P Keller; Gary A Payne
Journal: Mol Plant Pathol Date: 2010-03 Impact factor: 5.663

Review 4. Unearthing fungal chemodiversity and prospects for drug discovery.

Authors: Claudio Greco; Nancy P Keller; Antonis Rokas
Journal: Curr Opin Microbiol Date: 2019-05-06 Impact factor: 7.934

Review 5. Aspergillus flavus: an emerging non-fumigatus Aspergillus species of significance.

Authors: Suganthini Krishnan; Elias K Manavathu; Pranatharthi H Chandrasekar
Journal: Mycoses Date: 2009-01-14 Impact factor: 4.377

Review 6. Genetic regulation of aflatoxin biosynthesis: from gene to genome.

Authors: D Ryan Georgianna; Gary A Payne
Journal: Fungal Genet Biol Date: 2008-11-05 Impact factor: 3.495

7. Genome sequencing and analysis of Aspergillus oryzae.

Authors: Masayuki Machida; Kiyoshi Asai; Motoaki Sano; Toshihiro Tanaka; Toshitaka Kumagai; Goro Terai; Ken-Ichi Kusumoto; Toshihide Arima; Osamu Akita; Yutaka Kashiwagi; Keietsu Abe; Katsuya Gomi; Hiroyuki Horiuchi; Katsuhiko Kitamoto; Tetsuo Kobayashi; Michio Takeuchi; David W Denning; James E Galagan; William C Nierman; Jiujiang Yu; David B Archer; Joan W Bennett; Deepak Bhatnagar; Thomas E Cleveland; Natalie D Fedorova; Osamu Gotoh; Hiroshi Horikawa; Akira Hosoyama; Masayuki Ichinomiya; Rie Igarashi; Kazuhiro Iwashita; Praveen Rao Juvvadi; Masashi Kato; Yumiko Kato; Taishin Kin; Akira Kokubun; Hiroshi Maeda; Noriko Maeyama; Jun-ichi Maruyama; Hideki Nagasaki; Tasuku Nakajima; Ken Oda; Kinya Okada; Ian Paulsen; Kazutoshi Sakamoto; Toshihiko Sawano; Mikio Takahashi; Kumiko Takase; Yasunobu Terabayashi; Jennifer R Wortman; Osamu Yamada; Youhei Yamagata; Hideharu Anazawa; Yoji Hata; Yoshinao Koide; Takashi Komori; Yasuji Koyama; Toshitaka Minetoki; Sivasundaram Suharnan; Akimitsu Tanaka; Katsumi Isono; Satoru Kuhara; Naotake Ogasawara; Hisashi Kikuchi
Journal: Nature Date: 2005-12-22 Impact factor: 49.962

8. Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

Authors: Vinita Joardar; Natalie F Abrams; Jessica Hostetler; Paul J Paukstelis; Suchitra Pakala; Suman B Pakala; Nikhat Zafar; Olukemi O Abolude; Gary Payne; Alex Andrianopoulos; David W Denning; William C Nierman
Journal: BMC Genomics Date: 2012-12-12 Impact factor: 3.969

9. Genome Sequence of Aspergillus flavus NRRL 3357, a Strain That Causes Aflatoxin Contamination of Food and Feed.

Authors: William C Nierman; Jiujiang Yu; Natalie D Fedorova-Abrams; Liliana Losada; Thomas E Cleveland; Deepak Bhatnagar; Joan W Bennett; Ralph Dean; Gary A Payne
Journal: Genome Announc Date: 2015-04-16

10. The Frequency of Sex: Population Genomics Reveals Differences in Recombination and Population Structure of the Aflatoxin-Producing Fungus Aspergillus flavus.

Authors: Milton T Drott; Tatum R Satterlee; Jeffrey M Skerker; Brandon T Pfannenstiel; N Louise Glass; Nancy P Keller; Michael G Milgroom
Journal: mBio Date: 2020-07-14 Impact factor: 7.867