| Literature DB >> 31720340 |
Francisco J Sautua1, Sergio A Gonzalez2, Vinson P Doyle3, Marcelo F Berretta4,5, Manuela Gordó6, Mercedes M Scandiani7, Maximo L Rivarola2,4, Paula Fernandez2,4,8, Marcelo A Carmona1.
Abstract
Cercospora kikuchii (Tak. Matsumoto & Tomoy.) M.W. Gardner 1927 is an ascomycete fungal pathogen that causes Cercospora leaf blight and purple seed stain on soybean. Here, we report the first draft genome sequence and assembly of this pathogen. The C. kikuchii strain ARG_18_001 was isolated from soybean purple seed collected from San Pedro, Buenos Aires, Argentina, during the 2018 harvest. The genome was sequenced using a 2 × 150 bp paired-end method by Illumina NovaSeq 6000. The C. kikuchii protein-coding genes were predicted using FunGAP (Fungal Genome Annotation Pipeline). The draft genome assembly was 33.1 Mb in size with a GC-content of 53%. The gene prediction resulted in 14,856 gene models/14,721 protein coding genes. Genomic data of C. kikuchii presented here will be a useful resource for future studies of this pathosystem. The data can be accessed at GenBank under the accession number VTAY00000000 https://www.ncbi.nlm.nih.gov/nuccore/VTAY00000000.Entities:
Keywords: Agriculture; Bioinformatics; Cercospora kikuchii; Cercospora leaf blight (CLB); Draft genome; Fungal pathogens; Next generation sequencing (NGS); Purple seed stain (PSS)
Year: 2019 PMID: 31720340 PMCID: PMC6838444 DOI: 10.1016/j.dib.2019.104693
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Subtree from a maximum-likelihood phylogenetic analysis of Cercospora species. The complete phylogeny was inferred in RAxML assuming the GTRGAMMA model by integrating data sliced from the genome of ARG_18_001 with the following seven loci from 379 other isolates in Groenewald et al. (2013) and Bakhshi et al. (2018): actA, cmdA, nrITS, gapdh, histone 3, tef1-alpha, and tub2. The subtree that includes ARG_18_001 was pruned from the rest of the tree for ease of reference. Branches are labeled with bootstrap support values ≥ 70%. Bold font indicates the placement of the ex-type of C. kikuchii. Arrow indicates the placement of the isolate for which the genome was sequenced. The scale bar indicates the estimated number of substitutions per site.
Genome features of C. kikuchii strain ARG_18_001.
| Features | |
|---|---|
| Assembled length | 33,197,932 |
| Scaffold length (≥ 50,000 bp) | 32,541,287 |
| Number of scaffolds (>500 bp) | 136 |
| Number of scaffolds (>1 kb) | 107 |
| Number of scaffolds (>50 kb) | 71 |
| Sequencing read coverage depth (fold) | 196.72 |
| GC-Content | 53.04 |
| No. of predicted protein-coding genes | 14,721 |
| Gene density (genes/Mb) | 447.5 |
| Average length of transcripts | 1468.7 |
| Average CDS length | 1354.2 |
| Average protein length | 451.4 |
| Average exon length | 568.6 |
| Average intron length | 82.9 |
| Spliced genes | 9702 (66.0%) |
| Number of total introns | 20,309 |
| Median number of introns per gene | 2.0 |
| Number of total exons | 35,010 |
| Median number of exons per gene | 2.0 |
Genome annotation summary of C. kikuchii strain ARG_18_001.
| Summary | Number |
|---|---|
| Number of protein-coding gene models | 14,721 |
| Number of models with BLAST hit | 13,015 (88.4%) |
| Blast2GO annotation | 6296 (42.8%) |
| PFAM annotation | 5684 (38.6%) |
Summary of repetitive elements in the assembled genome of C. kikuchii strain ARG_18_001.
| Summary | Number |
|---|---|
| Total of bases masked | 178,815 (0.54%) |
| Number of simple repeats | 3131 |
| Number of low complexity repeats | 358 |
| Number of DNA transposons | 68 |
| Number of LTRs | 2 |
| Number of LINEs | 254 |
| Number of SINEs | 21 |
Fig. 2Histogram representing the gene ontology distribution of the annotated Cercospora kikuchii ARG_18_001 genes. The functionally annotated genes were assigned to three main GO categories: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC).
Fig. 3Pie chart denoting the species distribution based on the top BLAST hit of the Cercospora kikuchii ARG_18_001 genes queried against the nr database with an E-value cut-off of 1E-10. The category “Others” includes species with less than 1% representation.
Specifications table
| Subject | Biology |
| Specific subject area | Bioinformatics (Genomics) |
| Type of data | Raw sequencing reads, draft genome assembly, gene prediction and phylogenetic position of |
| How data were acquired | Whole genome sequencing was performed using an Illumina NovaSeq 6000 sequencing system |
| Data format | Raw sequencing reads, draft genome assembly and gene prediction |
| Parameters for data collection | Reads were filtered and merged with Trimmomatic (v 0.39) and FLASH (v 1.2.11). The genome was assembled with Celera Assembler (v 8.3) and Spades (v 3.11.1). Gene prediction was performed with FunGAP (v 1.0.1), tRNAscan-SE (v 2.0.3), rnammer (v 1.2) and mfannot (v 1.35). Protein-coding gene annotation was performed with hmmsearch (v 3.1b2), ncbi-blast (v 2.2.25+) and Blast2GO (v 2.5) using the ragp R package (v 0.3.0.0001). RepeatMasker (v 4.0.9) was used to identify and filter repetitive regions. |
| Description of data collection | Strain ARG_18_001 was isolated from soybean seeds of variety DM62R63 sampled during the 2018 harvest that exhibited symptoms of purple seed stain. |
| Data source location | Samples were originally collected from Gobernador Castro, San Pedro, Buenos Aires, Argentina (33°39′26.37″S, 59°49′36.00″O) |
| Data accessibility | This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession VTAY00000000 |
The first draft genome of Genomic data presented here will be a useful resource for the study of this pathosystem. This draft genome will help in the search for genetic resistance in soybean lines |