| Literature DB >> 30123662 |
Wladimir Mardones1, Alex Di Genova2,3,4,5, María Paz Cortés2,4,5, Dante Travisany2,4,5, Alejandro Maass4,5,6, Jaime Eyzaguirre1.
Abstract
The high lignocellulolytic activity displayed by the soft-rot fungus Penicillium purpurogenum has made it a target for the study of novel lignocellulolytic enzymes. We have obtained a reference genome of 36.2 Mb of non-redundant sequence (11,057 protein-coding genes). The 49 largest scaffolds cover 90% of the assembly, and Core Eukaryotic Genes Mapping Approach (CEGMA) analysis reveals that our assembly captures almost all protein-coding genes. RNA-seq was performed and 93.1% of the reads aligned to the assembled genome. These data, plus the independent sequencing of a set of genes of lignocellulose-degrading enzymes, validate the quality of the genome sequence. P. purpurogenum shows a higher number of proteins with CAZy motifs, transcription factors and transporters as compared to other sequenced Penicillia. These results demonstrate the great potential for lignocellulolytic activity of this fungus and the possible use of its enzymes in related industrial applications.Entities:
Keywords: CAZymes; Illumina; Penicillium purpurogenum; RNA-seq; genome sequencing; lignocellulose biodegradation
Year: 2018 PMID: 30123662 PMCID: PMC6059080 DOI: 10.1080/21501203.2017.1419995
Source DB: PubMed Journal: Mycology ISSN: 2150-1203
Assembly summary.
| Illumina assembly | PacBio + Illumina | |
|---|---|---|
| Number of contigs | 836 | 687 |
| Total contigs size (bp) | 35,743,205 | 35,951,513 |
| Min contig length (bp) | 1000 | 947 |
| Max contig length (bp) | 462,380 | 665,523 |
| N50 contigs (kb) | 131 | 230 |
| Number of scaffolds | 582 | 158 |
| Total scaffolds size (bp) | 35,858,128 | 36,207,399 |
| Min scaffolds length (bp) | 1000 | 977 |
| Max scaffolds length (bp) | 1,235,734 | 2,800,880 |
| N50 scaffolds (kb) | 213.4 | 838.183 |
| N90 scaffolds (kb) | 12.2 (249) | 188.1 (49) |
The N90 and N50 scaffolds are the length of the scaffolds covering 90 and 50% of the genome, respectively. The N90 and N50 were calculated using 39 Mb as genome size. In parenthesis are the numbers of contigs/scaffolds required to cover 90% of the genome length.
Relevant statistics of gene discovery.
| Attribute | Stats |
|---|---|
| Number of genes | 11,057 |
| Number of mRNAs | 11,555 |
| Number of exons | 37,145 |
| Number of coding DNA sequences | 35,980 |
| Number of introns | 24,653 |
| Exons per gene | 3.3 |
| Coding DNA sequencesper gene | 3.2 |
| Introns per gene | 2.2 |
| Genes alternative spliced | 454 |
| Percentage of genes alternative spliced (%) | 4.1 |
| Average gene length (bp) | 2105 |
| Average exon length (bp) | 612 |
| Average intron length (bp) | 92 |
| Average distance between genes (bp) | 1349 |
| Genome coding (%) | 59 |
| Number of tRNAs | 170 |
A consensus gene set was built with EVM and PASA using evidence from ab initio, protein and transcript alignments.
Characterised lignocellulose-degrading enzymes from P. purpurogenum whose gene sequence was determined independently.
| Enzyme | CAZy family | Gene ID | GenBank | Reference |
|---|---|---|---|---|
| Endoxylanase A | GH 10 | PPSCF00028.25 | AAF71268 | Chávez et al. ( |
| Acetyl xylan esterase II | CE 5 | PPSCF00020.380 | AAC39371 | Gutiérrez et al. ( |
| Arabinofuranosidase 1 | GH 54 | PPSCF00061.89 | AAK51551 | Carvallo et al. ( |
| Arabinofuranosidase 2 | GH 51 | PPSCF00010.19 | EF490448 | Fritz et al. ( |
| Arabinofuranosidase 3 | GH 43 | PPSCF0002.743 | FJ906695 | Ravanal et al. ( |
| Arabinofuranosidase 4a | GH 54 | PPSCF00024.186 | AGR66205 | Ravanal and Eyzaguirre ( |
| Pectin lyase | PL 1 | PPSCF00015.779 | KC751539 | Pérez-Fuentes et al. ( |
| Exo-arabinanasea | GH 93 | PPSCF00035.45 | KP313779 | Mardones et al. ( |
aThe gene was mined from the genome and later sequenced.
Figure 1.Phylogenetic tree of the phylum Ascomycota inferred using mrBayes and FasTree analysis, considering the deduced amino acid sequences of 147 conserved genes. T. ressei was used as outgroup.