Literature DB >> 31720340

Draft genome sequence data of Cercospora kikuchii, a causal agent of Cercospora leaf blight and purple seed stain of soybeans.

Francisco J Sautua1, Sergio A Gonzalez2, Vinson P Doyle3, Marcelo F Berretta4,5, Manuela Gordó6, Mercedes M Scandiani7, Maximo L Rivarola2,4, Paula Fernandez2,4,8, Marcelo A Carmona1.   

Abstract

Cercospora kikuchii (Tak. Matsumoto & Tomoy.) M.W. Gardner 1927 is an ascomycete fungal pathogen that causes Cercospora leaf blight and purple seed stain on soybean. Here, we report the first draft genome sequence and assembly of this pathogen. The C. kikuchii strain ARG_18_001 was isolated from soybean purple seed collected from San Pedro, Buenos Aires, Argentina, during the 2018 harvest. The genome was sequenced using a 2 × 150 bp paired-end method by Illumina NovaSeq 6000. The C. kikuchii protein-coding genes were predicted using FunGAP (Fungal Genome Annotation Pipeline). The draft genome assembly was 33.1 Mb in size with a GC-content of 53%. The gene prediction resulted in 14,856 gene models/14,721 protein coding genes. Genomic data of C. kikuchii presented here will be a useful resource for future studies of this pathosystem. The data can be accessed at GenBank under the accession number VTAY00000000 https://www.ncbi.nlm.nih.gov/nuccore/VTAY00000000.
© 2019 The Authors.

Entities:  

Keywords:  Agriculture; Bioinformatics; Cercospora kikuchii; Cercospora leaf blight (CLB); Draft genome; Fungal pathogens; Next generation sequencing (NGS); Purple seed stain (PSS)

Year:  2019        PMID: 31720340      PMCID: PMC6838444          DOI: 10.1016/j.dib.2019.104693

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table The first draft genome of Cercospora kikuchii ARG_18_001. C. kikuchii is an important pathogen of soybean, but the biology of this fungus is poorly understood. Genomic data presented here will be a useful resource for the study of this pathosystem. This draft genome will help in the search for genetic resistance in soybean lines

Data

We present the draft genome assembly and gene prediction of the fungus C. kikuchii, causal agent of Cercospora leaf blight (CLB) and purple seed stain (PSS) of soybean. Recently, multi-locus phylogenetic studies confirmed that CLB and PSS is a disease complex caused by several Cercospora species. Phylogenetic analyses of cercosporoid fungi isolated from infected soybean in Argentina, Brazil and the USA determined that the species C. kikuchii, C. cf. flagellaris and C. cf. sigesbeckiae are causal agents of these diseases [1,2]. More recently, C. cf. nicotianae isolated from soybean leaves in Bolivia has been identified as a species in association with CLB [3]. A maximum-likelihood phylogenetic tree of Cercospora species was inferred in RAxML using seven nuclear loci, with data from isolate ARG_18_001 sliced from the genome assembly. The strain ARG_18_001 nested within the clade that includes other isolates of C. kikuchii, including the ex-type, with 97% bootstrap support (Fig. 1).
Fig. 1

Subtree from a maximum-likelihood phylogenetic analysis of Cercospora species. The complete phylogeny was inferred in RAxML assuming the GTRGAMMA model by integrating data sliced from the genome of ARG_18_001 with the following seven loci from 379 other isolates in Groenewald et al. (2013) and Bakhshi et al. (2018): actA, cmdA, nrITS, gapdh, histone 3, tef1-alpha, and tub2. The subtree that includes ARG_18_001 was pruned from the rest of the tree for ease of reference. Branches are labeled with bootstrap support values ≥ 70%. Bold font indicates the placement of the ex-type of C. kikuchii. Arrow indicates the placement of the isolate for which the genome was sequenced. The scale bar indicates the estimated number of substitutions per site.

Subtree from a maximum-likelihood phylogenetic analysis of Cercospora species. The complete phylogeny was inferred in RAxML assuming the GTRGAMMA model by integrating data sliced from the genome of ARG_18_001 with the following seven loci from 379 other isolates in Groenewald et al. (2013) and Bakhshi et al. (2018): actA, cmdA, nrITS, gapdh, histone 3, tef1-alpha, and tub2. The subtree that includes ARG_18_001 was pruned from the rest of the tree for ease of reference. Branches are labeled with bootstrap support values ≥ 70%. Bold font indicates the placement of the ex-type of C. kikuchii. Arrow indicates the placement of the isolate for which the genome was sequenced. The scale bar indicates the estimated number of substitutions per site. A total of 33,107,531 reads were assembled de novo, resulting in 136 scaffolds of at least 500 bp with the largest scaffold 3,211,885 bp and an N50 value of 898,622 bp. The mean coverage of the total assembly was 196.72-fold. The G + C content was 53.04%. The gene prediction resulted in 14,856 gene models with 14,721 protein coding genes and 135 non coding RNAs, including the mitochondrial genome (Table 1). The distribution of protein annotations are summarized in Table 2, and Table 3 provides the summary statistics of the identified repetitive elements. The distribution of functional gene ontology (GO) terms from the annotated C. kikuchii ARG_18_001 genes are illustrated in Fig. 2. The distribution of species from the top BLAST hit of the predicted protein coding genes is shown in Fig. 3.
Table 1

Genome features of C. kikuchii strain ARG_18_001.

FeaturesC. kikuchii ARG_18_001
Assembled length33,197,932
Scaffold length (≥ 50,000 bp)32,541,287
Number of scaffolds (>500 bp)136
Number of scaffolds (>1 kb)107
Number of scaffolds (>50 kb)71
Sequencing read coverage depth (fold)196.72
GC-Content53.04
No. of predicted protein-coding genes14,721
Gene density (genes/Mb)447.5
Average length of transcripts1468.7
Average CDS length1354.2
Average protein length451.4
Average exon length568.6
Average intron length82.9
Spliced genes9702 (66.0%)
Number of total introns20,309
Median number of introns per gene2.0
Number of total exons35,010
Median number of exons per gene2.0
Table 2

Genome annotation summary of C. kikuchii strain ARG_18_001.

SummaryNumber
Number of protein-coding gene models14,721
Number of models with BLAST hit13,015 (88.4%)
Blast2GO annotation6296 (42.8%)
PFAM annotation5684 (38.6%)
Table 3

Summary of repetitive elements in the assembled genome of C. kikuchii strain ARG_18_001.

SummaryNumber
Total of bases masked178,815 (0.54%)
Number of simple repeats3131
Number of low complexity repeats358
Number of DNA transposons68
Number of LTRs2
Number of LINEs254
Number of SINEs21
Fig. 2

Histogram representing the gene ontology distribution of the annotated Cercospora kikuchii ARG_18_001 genes. The functionally annotated genes were assigned to three main GO categories: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC).

Fig. 3

Pie chart denoting the species distribution based on the top BLAST hit of the Cercospora kikuchii ARG_18_001 genes queried against the nr database with an E-value cut-off of 1E-10. The category “Others” includes species with less than 1% representation.

Genome features of C. kikuchii strain ARG_18_001. Genome annotation summary of C. kikuchii strain ARG_18_001. Summary of repetitive elements in the assembled genome of C. kikuchii strain ARG_18_001. Histogram representing the gene ontology distribution of the annotated Cercospora kikuchii ARG_18_001 genes. The functionally annotated genes were assigned to three main GO categories: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). Pie chart denoting the species distribution based on the top BLAST hit of the Cercospora kikuchii ARG_18_001 genes queried against the nr database with an E-value cut-off of 1E-10. The category “Others” includes species with less than 1% representation.

Experimental design, materials, and methods

Genomic DNA extraction and sequencing

Cercospora kikuchii strain ARG_18_001 was isolated from a single conidium from soybean seeds of variety DM62R63 sampled that exhibited symptoms of purple seed stain during the 2018 harvest in San Pedro, Buenos Aires, Argentina. The isolation technique is described in [4]. This strain was deposited in the fungal culture collection of the Department of Plant Pathology, School of Agriculture, University of Buenos Aires (FAUBA, Argentina). Genomic DNA was isolated from hyphal tissue grown in potato dextrose broth for four days in darkness and constant agitation. The DNA extraction was carried out at the Institute of Microbiology and Agricultural Zoology (IMYZA -INTA) using a modified cetyltrimethylammonium bromide (CTAB) extraction protocol developed by [5]. Total DNA was quantified by fluorometry using a Picogreen dsDNA dye kit (Quant-iT, Invitrogen, by Life Technologies, CA, USA) with a Victor 3 plate reader. Paired-end whole-genome shotgun libraries were constructed using the TruSeq Nano DNA (insert size 350 bp) library preparation kit following Illumina (San Diego, CA) protocols. Sequencing was performed using a NovaSeq 6000 sequencing system (Illumina) and yielded 65,202,278 reads.

Phylogenetic species identification

The isolate ARG_18_001 was identified by aligning seven nuclear loci (actin (actA), calmodulin (cmdA), nuclear ribosomal internal transcribed spacer region (nrITS), glyceraldehyde-3-phosphate dehydrogenase (gapdh), histone H3 (his 3), translation elongation factor 1-a (tef1-alpha) and beta tubulin (tub2)) with data from [6,7]. A maximum-likelihood phylogeny was then inferred in RAxML (Randomized Axelerated Maximum Likelihood) [8] assuming a GTRGAMMA model with Septoria provencialis CPC_12226 as an outgroup.

Genome assembly and annotation

Read trimming and filtering was performed using Trimmomatic [9] and merging of paired-end reads from shorter fragments was made using FLASH [10]. De novo assembly was carried out using the Celera Assembler [11] and then completed with Spades [12] using a wide range of k-mer values from 21 to 111 with a step of 2. The genome was annotated using FunGAP [13], tRNAscan-SE [14], rnammer [15] and MFannot (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl) [16]. For predicting genes with FunGAP, the C. kikuchii ARG_18_001 genome assembly and the C. beticola 10.73.4 (Bioproject PRJNA294383) RNA-seq reads were used as inputs. To perform the functional annotation, we used hmmsearch [17] against PFAM database (v32.0) (e-value cut off ≤ 10e-5) and BLASTP [18] (e-value cut off ≤ 10e-10) against the NCBI nr database. To assign Gene Ontology [19] terms we used Blast2GO [20] and pfam2go table (http://www.geneontology.org/external2go/pfam2go) with the ragp R package (https://rdrr.io/github/missuse/ragp/). The repetitive regions, including tandem repeats and transposable elements, were detected using the repeat identification tool RepeatMasker [21].

Specifications table

SubjectBiology
Specific subject areaBioinformatics (Genomics)
Type of dataRaw sequencing reads, draft genome assembly, gene prediction and phylogenetic position of C. kikuchii strain ARG_18_001
How data were acquiredWhole genome sequencing was performed using an Illumina NovaSeq 6000 sequencing system
Data formatRaw sequencing reads, draft genome assembly and gene prediction
Parameters for data collectionReads were filtered and merged with Trimmomatic (v 0.39) and FLASH (v 1.2.11). The genome was assembled with Celera Assembler (v 8.3) and Spades (v 3.11.1). Gene prediction was performed with FunGAP (v 1.0.1), tRNAscan-SE (v 2.0.3), rnammer (v 1.2) and mfannot (v 1.35). Protein-coding gene annotation was performed with hmmsearch (v 3.1b2), ncbi-blast (v 2.2.25+) and Blast2GO (v 2.5) using the ragp R package (v 0.3.0.0001). RepeatMasker (v 4.0.9) was used to identify and filter repetitive regions.
Description of data collectionStrain ARG_18_001 was isolated from soybean seeds of variety DM62R63 sampled during the 2018 harvest that exhibited symptoms of purple seed stain.
Data source locationSamples were originally collected from Gobernador Castro, San Pedro, Buenos Aires, Argentina (33°39′26.37″S, 59°49′36.00″O)
Data accessibilityThis Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession VTAY00000000 https://www.ncbi.nlm.nih.gov/nuccore/VTAY00000000. The version described in this paper is version VTAY00000000.1 https://www.ncbi.nlm.nih.gov/nuccore/VTAY00000000.
Value of the Data

The first draft genome of Cercospora kikuchii ARG_18_001.

C. kikuchii is an important pathogen of soybean, but the biology of this fungus is poorly understood.

Genomic data presented here will be a useful resource for the study of this pathosystem.

This draft genome will help in the search for genetic resistance in soybean lines

  18 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  FLASH: fast length adjustment of short reads to improve genome assemblies.

Authors:  Tanja Magoč; Steven L Salzberg
Journal:  Bioinformatics       Date:  2011-09-07       Impact factor: 6.937

3.  tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.

Authors:  Patricia P Chan; Todd M Lowe
Journal:  Methods Mol Biol       Date:  2019

4.  FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation.

Authors:  Byoungnam Min; Igor V Grigoriev; In-Geol Choi
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

5.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

6.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

7.  Novel primers improve species delimitation in Cercospora.

Authors:  Mounes Bakhshi; Mahdi Arzanlou; Asadollah Babai-Ahari; Johannes Z Groenewald; Pedro W Crous
Journal:  IMA Fungus       Date:  2018-09-26       Impact factor: 3.515

8.  Species concepts in Cercospora: spotting the weeds among the roses.

Authors:  J Z Groenewald; C Nakashima; J Nishikawa; H-D Shin; J-H Park; A N Jama; M Groenewald; U Braun; P W Crous
Journal:  Stud Mycol       Date:  2013-06-30       Impact factor: 16.097

9.  More Cercospora Species Infect Soybeans across the Americas than Meets the Eye.

Authors:  Ana Paula Gomes Soares; Eduardo A Guillin; Leandro Luiz Borges; Amanda C T da Silva; Álvaro M R de Almeida; Pablo E Grijalba; Alexandra M Gottlieb; Burton H Bluhm; Luiz Orlando de Oliveira
Journal:  PLoS One       Date:  2015-08-07       Impact factor: 3.240

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  2 in total

1.  Genera of phytopathogenic fungi: GOPHY 4.

Authors:  Q Chen; M Bakhshi; Y Balci; K D Broders; R Cheewangkoon; S F Chen; X L Fan; D Gramaje; F Halleen; M Horta Jung; N Jiang; T Jung; T Májek; S Marincowitz; I Milenković; L Mostert; C Nakashima; I Nurul Faziha; M Pan; M Raza; B Scanu; C F J Spies; L Suhaizan; H Suzuki; C M Tian; M Tomšovský; J R Úrbez-Torres; W Wang; B D Wingfield; M J Wingfield; Q Yang; X Yang; R Zare; P Zhao; J Z Groenewald; L Cai; P W Crous
Journal:  Stud Mycol       Date:  2022-06-02       Impact factor: 25.731

2.  High-quality genome assembly of the soybean fungal pathogen Cercospora kikuchii.

Authors:  Takeshi Kashiwa; Tomohiro Suzuki
Journal:  G3 (Bethesda)       Date:  2021-09-27       Impact factor: 3.154

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.