Literature DB >> 29225727

The genomic study of an environmental isolate of Scedosporium apiospermum shows its metabolic potential to degrade hydrocarbons.

Laura T Morales¹, Laura N González-García², María C Orozco¹, Silvia Restrepo², Martha J Vives^1,3.

Abstract

Crude oil contamination of soils and waters is a worldwide problem, which has been actively addressed in recent years. Sequencing genomes of microorganisms involved in the degradation of hydrocarbons have allowed the identification of several promoters, genes, and degradation pathways of these contaminants. This knowledge allows a better understanding of the functional dynamics of microbial degradation. Here, we report a first draft of the 44.2 Mbp genome assembly of an environmental strain of the fungus Scedosporium apiospermum. The assembly consisted of 178 high-quality DNA scaffolds with 1.93% of sequence repeats identified. A total of 11,195 protein-coding genes were predicted including a diverse group of gene families involved in hydrocarbon degradation pathways like dioxygenases and cytochrome P450. The metabolic pathways identified in the genome can potentially degrade hydrocarbons like chloroalkane/alkene, chorocyclohexane, and chlorobenzene, benzoate, aminobenzoate, fluorobenzoate, toluene, caprolactam, geraniol, naphthalene, styrene, atrazine, dioxin, xylene, ethylbenzene, and polycyclic aromatic hydrocarbons. The comparison analysis between this strain and the previous sequenced clinical strain showed important differences in terms of annotated genes involved in the hydrocarbon degradation process.

Entities: Chemical Disease Gene Species

Keywords: Genome; Hydrocarbon degradation; Scedosporium apiospermum

Year: 2017 PMID： 29225727 PMCID： PMC5716253 DOI： 10.1186/s40793-017-0287-6

Source DB: PubMed Journal: Stand Genomic Sci ISSN： 1944-3277

Introduction

Accidental spills of oil have risen as a global important problem due to the serious environmental damages caused by soil and water contamination [1]. Whereas oil is a complex mixture of aromatic and aliphatic hydrocarbons of different molecular weights, its removal from the environment is difficult and its permanence is prolonged [2]. These compounds have gained considerable attention because of their harmful features like resistance to degradation, bioaccumulation, and carcinogenic activity. Their persistence in the environment increases with their molecular weight and there is a need to develop technologies or processes able to degrade or to transform these compounds into less toxic molecules [3]. The ability of several organisms, primarily microorganisms (bacteria, fungi and microalgae), to degrade these toxic substances has been extensively studied in recent decades [1, 4–9]. The main goal is to improve the decontamination of the environment via bioremediation, which encompasses technologies that allow the transformation of compounds to less harmful or not harmful forms, with less use of chemicals, energy, and time [10, 11]. Microbial bioremediation is very effective due to the catabolic activity of microorganisms; among these, many species of bacteria, fungi, and microalgae have demonstrated the ability of hydrocarbon degradation. This process involves the breakdown of organic molecules through biotransformation in less complex metabolites, or mineralization to water, carbon dioxide, or methane [3]. Several strategies have been employed to study these microorganisms and to understand the processes carried out by them. Within these, genomics have allowed the recognition of promoters, genes, and degradation pathways that influence the construction of more efficient degradative strains relevant in bioremediation processes [12, 13]. Genome sequencing of hydrocarbon-degrader organisms has allowed the identification of several genes involved in metabolism and catabolism of aliphatic, aromatic alcohols, and other similar compounds, as well as some metals resistance genes [14]. However, the number of sequenced genomes of fungal species is lower than in bacteria. To date, there are 103,076 prokaryotic genomes sequenced whereas there are only 4503 genomes from eukaryotes in GenBank database (July 2017). (teleomorph: apiosperma [15]) is a fungus belonging to the phylum Ascomycota, which has been isolated from various environments, usually in those influenced by human activity [16]. This fungus was reported as a hydrocarbon-degrading microorganism since 1998 due to its ability to degrade polluting compounds, such as phenol and p-cresol [17]. One year later, its ability to degrade phenylbenzoate and its derivatives was elucidated [18]. In recent years, studies regarding degradation of complex compounds, such as toluene [19], polycyclic aromatic hydrocarbons (PAHs) [20], long-chain aliphatic hydrocarbons, and mixtures of these contaminants (unpublished results from our group) [21] have risen. Additionally, the fungus’ ability to regenerate granular activated carbon once it has been saturated with phenol was shown in our laboratory (unpublished results). Therefore, presents a wide range of opportunities in bioremediation and its genome sequencing can allow the identification of promoters, genes, and degradation pathways of hydrocarbons. Indeed, the genomic analysis of this fungus can improve the understanding of functional dynamics of contaminants microbial degradation and enhance conditions for effective decontamination processes in different environments [2]. On the other hand, this fungus has been recognized as a potent etiologic agent of severe infections in immunocompromised and occasionally in immunocompetent patients [22]. For this reason, in 2014, the genome of an isolate from a cystic fibrosis patient (clinical strain) was sequenced with the aim of gaining knowledge of its pathogenic mechanisms [23]. Thus, our objective was the complete characterization of the genome of the S. apiospermum environmental strain HDO1. In order to analyze the genes and pathways involved in the degradation process and to assess the unique components of its genome compared to the clinical strain and other sister species, we sequenced, assembled, annotated, and fully characterized the environmental strain’s genome.

Organism information

Classification and features

environmental strain HDO1 was isolated as a contaminant from assays on bacterial strains able to grow in crude oil (API gravity 33) as the unique carbon source. It was selected for sequencing due to its capability to grow in cultures containing aliphatic hydrocarbons of crude oil, naphthalene, phenanthrene, phenol, and mixtures of these compounds in the laboratory. The fungal isolate was grown on potato dextrose agar (OXOID LTD, Hampshire, UK) plates for a period of 7 days at 30 °C. The optimal growth temperature was 30–37 °C. Identification was based on the following morphological characteristics: obverse and reverse colony color (according to the color chart Küppers, H. [24]), colony texture, size, and presence of diffusible pigments, hyphae characteristics, and conidia arrangement. The morphological characteristics were: colonies with a diameter of 7 cm on PDA at 25 °C after 7 days, cottony textured, greyish-white (N00, C00-A00) with yellowish-white reverse. No diffusible pigment was observed. The mycelium was hyaline, septate, and thin. Unbranched conidiphores with long neck-bottle shaped phialides were observed. Conidia were hyaline, approximately 5 μm in diameter, occurring in basipetal chains leaving long hyaline annelids (Fig. 1). For the molecular characterization, the fungus was grown in Sabouraud broth at 25 °C, 150 rpm for 7 days, and the biomass obtained was lyophilized for at least 12 h. Fungal genomic DNA was extracted from 100 mg of lyophilized and pulverized mycelia conducting the CTAB and Phenol/Chloroform/Isoamylic alcohol method [25]. The universal primers used for amplification of the ITS region, were ITS4 (5′-TCCTCCGCTTATTGATATGC-3′) and ITS5 (5′-GGAAGTAAAAGTCGTAACAAGG-3′) [26]. Sanger sequencing was performed by Macrogen (South Korea). Nucleotide sequences obtained were compared with the non-redundant database of the National Center for Biotechnology Information (NCBI) using the tBlastx program (parameters by default), and the ITS region sequences were assigned to the fungus with an E-value equal to 0.0, 100% query coverage and 100% identity. The obtained sequence is deposited at the NCBI Genbank nr database with the accession number JQ003882.1.

Fig. 1

Micrograph of Scedosporium apiospermum. a Optical microscopy of hyphae and conidia from a PDA culture, at 100× total magnification. Lactophenol cotton blue wet mount preparation. b Scanning electron microscopy of hyphae and conidia from a liquid culture grown in minimal salt medium plus crude oil as the sole carbon and energy source A phylogenetic analysis was performed using the long subunit rRNA gene, the internal transcribed spacer and the elongation factor 1-α sequences obtained from GenBank. Species from the Microascaceae family were included [27] [28] and are described in the Additional file 1: Table S3. Individual gene regions (LSU, ITS and TEF) were aligned using MAFFT v. 7.187 [29]. Maximum Likelihood analyses were performed using RAxML v.7.6.3 [30] as implemented on the CIPRES portal [31]. The sequence alignment was partitioned into three subsets, each one under a specified model of nucleotide substitution, chosen with PartitionFinder [32]. Estimation of different shapes, GTR rates, and base frequencies for each partition were allowed. The majority rule criterion implemented in RAxML [33] (−autoMRE) was used to assess clade support by bootstrap. The resulting trees were plotted using FigTree v. 1.4.2 [34]. and were used as outgroups. Environmental strain HDO1 used in this study clustered with the clinical strain 10.1601/strainfinder?urlappend=%3Fid%3DIHEM+14462 with good support, and they are the sister group of 10.1601/strainfinder?urlappend=%3Fid%3DCBS+635.78 (Fig. 2). The whole group is contained within the wardamycopsis lineage described by Sandoval-Denis, M. et al. [28]. Summary of the classification and general features of is given in Table 1.

Fig. 2

Table 1

Classification and general features of Scedosporium apiospermum strain HDO1

MIGS ID	Property	Term	Evidence code^a
	Classification	Domain Fungi	TAS [71]
		Phylum Ascomycota.	TAS [72]
		Class Sordariomycetes	TAS [72]
		Order Microascales	TAS [72]
		Family – Microascaceae	TAS [72]
		Genus Scedosporium	TAS [73]
		Species Scedosporium apiospermum	TAS [73]
		strain: HDO1
	Gram stain	n/a
	Cell shape	Mycelium with septae	IDA
	Motility	non-motile	IDA
	Sporulation	Conidia	TAS [72]
	Temperature range	25–42 °C	TAS [17]
	Optimum temperature	30–37 °C	TAS [17]
	pH range; Optimum	5.5–8.5: 7.3	TAS [17]
	Carbon source	Glucose, sucrose, urea, ethanol, ribitol, xylitol, L-arabinitol, phenol, p-cresol, hydroquinone, 1,2,4-benzenetriol, catechol, 4-hydroxybenzylalcohol, 4-hydroxybenzaldehyde,, 4-hydroxybenzoate, protocatechuate, 3-oxoadipate, phenylbenzoate, naphthalene, pyrene, phenantrene, crude oil.	TAS [17–21]
MIGS-6	Habitat	Soil and polluted water	TAS [74, 75]
MIGS-6.3	Salinity	1.7–2.8/Up to 5% in vitro	TAS [76]
MIGS-22	Oxygen requirement	Aerobic/Tolerate low pressure of O₂	TAS [76, 77]
MIGS-15	Biotic relationship	free-living	IDA
MIGS-14	Pathogenicity	Pathogenic	TAS [22]
MIGS-4	Geographic location	Bogotá, Colombia	IDA
MIGS-5	Sample collection	20 May 2008	IDA
MIGS-4.1	Latitude	4.600659	IDA
MIGS-4.2	Longitude	−74.065592	IDA
MIGS-4.4	Altitude	2658 m	IDA

aEvidence codes: IDA inferred from direct assay (first time in publication); TAS traceable author statement (i.e., a direct report exists in the literature); NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These codes are from the Gene Ontology project [56]

Phylogenetic Analysis of S. apiospermum HDO1. Estimated relationships of S. apiospermum HDO1 with S. apiospermum IHEM 14462 and other species from the Microascaceae family. The tree shows the concatenated analysis of the Internal Transcribed Spacer, the Large Subunit and the Elongation factor gene regions. Sequences from reference strains were used (Additional file 1:Table S3). Support values represent Bootstrap support values (Maximum Likelihood) Classification and general features of Scedosporium apiospermum strain HDO1 aEvidence codes: IDA inferred from direct assay (first time in publication); TAS traceable author statement (i.e., a direct report exists in the literature); NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These codes are from the Gene Ontology project [56]

Genome sequencing information

Genome project history

The Genome of the isolate HDO1 was sequenced by NovoGene Technology Bioinformatics Co., Ltd. (Hong Kong). The whole genome shotgun project of has been deposited in NCBI database under the accession number MVOQ00000000, belonging to the bioproject PRJNA357602. A summary of the project and information about genome sequence are shown in Table 2.

Table 2

Project information

MIGS ID	Property	Term
MIGS 31	Finishing quality	Draft
MIGS-28	Libraries used	One 250 pb Paired-End library, one 5 kb Mate pair library
MIGS 29	Sequencing platforms	Illumina HiSeq2500
MIGS 31.2	Fold coverage	540 x
MIGS 30	Assemblers	Abyss 1.9.0.20
MIGS 32	Gene calling method	Augustus 3.0.3
	Locus Tag	BTW05
	Genbank ID	MVOQ00000000
	GenBank Date of Release	May 26, 2017
	GOLD ID	NA
	BIOPROJECT	PRJNA357602
MIGS 13	Source Material Identifier	Strain HDO1 Museo de Historia Natural Andes
	Project relevance	Biotechnological

Project information

Growth conditions and genomic DNA preparation

Fungus growth was carried out in liquid culture (YPG: 1% yeast extract, peptone 1% and 2% glucose) at 30 °C for 7 days, followed by vacuum filtration, lyophilization, and maceration to have a homogeneous sample. DNA was extracted by the CTAB and Phenol/Chloroform/Isoamyl alcohol method [25]. DNA quality was analyzed by Nanodrop2000 (Thermo Fisher Scientific, MA, USA) and agarose gel electrophoresis (0.8%). DNA quantity was determined by Qubit2.0 (Invitrogen, CA, USA).

Genome sequencing and assembly

Genome sequencing of the strain was performed using high-throughput Illumina technology on a Hiseq2500 and employing two libraries: a 250 bp paired-end library and a 5kpb mate-pair library. Quality trimming of reads was performed using Trimmomactic 0.23 [35] and quality control was performed using FastQC 0.11.2 [36]. Coverage and depth of sequencing was analyzed by mapping the reads using Bowtie2–2.2.4 [37], the sam files were converted to bam files for visualization using samtools1.1 [38], and the visualization was made using tablet. 1.15.09.1 [39]. The genome was the novo assembled using Abyss 1.0.9.20 [40] with a kmer size of 64, scaffolds were generated with SSPACE BASIC 2.0 [41], and gaps were reduced using GapFiller 1.1 [42]. Assembly statistics were obtained using Quast 2.3 (Additional file 1: Table S1) [43]. Repetitive elements were identified with RepeatMasker 4.0.5 [44]. The draft genome of S. apiospermum strain HDO1 was assembled from a total of 97,208,043 reads using Abyss [40] assembler. The assembly yielded 178 scaffolds (larger than 500 bp) with a genome size of 44.2 Mbp and a G + C content of 49.91% with a mean depth of 541X. The genome assembly statistics are shown in Table 3. The total number of non-coding repetitions was found using RepeatMasker [44] and was of 1.93%. The majority of repetitions were found to be simple repeats (0.89%) and low complexity regions (0.25%). The complete report of the annotation results for the non-coding repeats sequences can be seen in the Additional file 1: Table S2. The assembly features obtained for the draft sequence were similar to other fungal genome sequence projects [23, 45, 46].

Table 3

Genomic statistics

Attribute	Value	% of total
Genome size (bp)	44,188,879	100
DNA coding (bp)	18,219,288	41.2
DNA G + C (bp)	22,057,939	49.91
DNA scaffolds	178	100
Total genes	11,278	100
Protein coding genes	11,184	99.16
RNA genes	92	0.81
Pseudo genes	2	0,1 × 10⁻¹
Genes in internal clusters	ND	ND
Genes with function prediction	8575	76.0
Genes assigned to COGs	4789	42.4
Genes with Pfam domains	7978	70.8
Genes with signal peptides	1333	11.8
Genes with transmembrane helices	2293	20.3

Genomic statistics

Genome annotation

Gene prediction and structure annotation was conducted using Augustus 3.0.3 [47]. Functional annotation was performed using Blast2GO 3.1 [48]. Briefly, a BLASTx against the National Center for Biotechnology Information “nr” database [49] was conducted. Then, results were classified among Gene Ontology categories [50]. Protein classification was made using the COG [51], KOG (Eukaryotic Orthologous Groups) [52] and EggNOG [53] databases using Blast2GO v4.0 platform [48]. Annotated genes were mapped against Kyoto encyclopedia of genes and genomes [54] to its functional analysis and assigned the Enzyme Codes. A total of 11,195 protein-encoding genes were predicted using Augustus [47]. Functional annotation showed a total of 8595 (76.0% of predicted genes) sequences with predicted function using Blastx [49]. Then, InterProScan [55] and Gene Ontology [56] permitted the annotation of 7934 (70.3%) sequences with GO terms, whilst the remaining genes were annotated as hypothetical (17.1%) and unknown function proteins (5.0%). A total of 7978 (70.8%) genes contained pfam [57] domains and 1333 had signal peptide domains. The transmembrane helices in the proteins were predicted with TMHMM sever v.2.0 in the online portal [58]. The ribosomal RNA genes were predicted in the RNAmmer 1.2 Server [59] and making an alignment with the predicted genes for from the database FungiDB [60], same database was used for pseudogenes prediction comparing with pseudogenes predicted for . The statistics of the genome annotation are shown in Table 3. A total of 4789 (42.5%) genes were assigned to the KOG [61] categories, most of them (60%) were assigned to one or more functional groups and the rest of genes were assigned to the function unknown group (Table 4). KEGG pathway analysis assigned an enzyme code to 2645 (23.5%) genes and revealed specific genes involved in the pathways of hydrocarbon degradation. These hydrocarbons are chloroalkane/alkene, chlorocyclohexane and chlorobenzene, benzoate, aminobenzoate, fluorobenzoate, toluene, caprolactam, geraniol, naphthalene, styrene, atrazine, dioxin, xylene, ethylbenzene, and polycyclic aromatic hydrocarbons. Also, the analysis revealed the presence of genes involved in metabolism of xenobiotics by cytochrome P450 and in synthesis and degradation of ketone bodies. These results are shown in Fig. 3.

Table 4

Number of genes associated with general COG functional categories

Code	Value	% age	Description
J	234	2.09	Translation, ribosomal structure and biogenesis
A	42	0.37	RNA processing and modification
K	163	1.46	Transcription
L	126	1.12	Replication, recombination and repair
B	19	0.17	Chromatin structure and dynamics
D	32	0.29	Cell cycle control, Cell division, chromosome partitioning
V	32	0.29	Defense mechanisms
T	153	1.37	Signal transduction mechanisms
M	126	0.44	Cell wall/membrane biogenesis
N	2	0.02	Cell motility
U	241	2.15	Intracellular trafficking and secretion
O	294	2.63	Posttranslational modification, protein turnover, chaperones
C	216	1.93	Energy production and conversion
G	399	3.56	Carbohydrate transport and metabolism
E	257	2.30	Amino acid transport and metabolism
F	65	0.58	Nucleotide transport and metabolism
H	63	0.56	Coenzyme transport and metabolism
I	140	1.25	Lipid transport and metabolism
P	131	1.17	Inorganic ion transport and metabolism
Q	186	1.66	Secondary metabolites biosynthesis, transport and catabolism
R	0	0	General function prediction only
S	1918	17.14	Function unknown
–	6402	57.2	Not in KOGs

Fig. 3

Distribution of the hydrocarbon degradation genes in KEEG pathways. The bars represent the number of genes mapped in KEEG pathways related to hydrocarbon degradation. Most of the genes were mapped to the benzoate and its derivate compounds as aminobenzoate and fluorobenzoate

Number of genes associated with general COG functional categories Distribution of the hydrocarbon degradation genes in KEEG pathways. The bars represent the number of genes mapped in KEEG pathways related to hydrocarbon degradation. Most of the genes were mapped to the benzoate and its derivate compounds as aminobenzoate and fluorobenzoate

Genome properties

The assembled genome of the strain HDO1 has a size of 44,188,879 pb (distributed in 178 scaffolds) with a G-C content of 49.91%; the genome size and the G-C content was similar to the draft genome reported for the strain IHEM 14462 [22] (Table 5). A total of 11,278 genes were predicted; among these, 11,184 were identified as coding protein genes (representing the 99.16% of the total genes); 92 as RNA genes (0.81%); and 2 as pseudogenes (0.02%) (Table 3). Some other features of the predicted genes are shown in Table 4. The number of chromosomes could not be elucidated.

Table 5

Genomic features comparison between HDO1 strain and IHEM 14462 strain [22]

Parameter	IHEM 14462	HDO1
Size (Mb)	43.44	44.19
Content G-C (%)	50.4	49.91
Predicted genes	10,919	11,278
Predicted proteins	8.375	11,184

Genomic features comparison between HDO1 strain and IHEM 14462 strain [22]

Insights from the genomic sequence

Comparative genomics

Reads were mapped versus the clinical strain 10.1601/strainfinder?urlappend=%3Fid%3DIHEM+14462 using Bowtie2–2.2.4 [26]. The sam files were converted to bam files for the visualization using samtools1.1 [27] and the visualization was made using tablet. 1.15.09.1 [28] resulting in an overall alignment of 92.75%. Genomes’ comparison between the environmental strain HDO1 and the clinical strain 10.1601/strainfinder?urlappend=%3Fid%3DIHEM+14462 was performed using MAUVE 20150226 [62]. The genome sequence of HDO1 strain aligned with the sequence of 10.1601/strainfinder?urlappend=%3Fid%3DIHEM+14462 strain in 88,1% of its length. The MAUVE [62] alignment showed a high level of similarity between the clinical and the environmental strains (Fig. 4). A total of 508 local collinear blocks (LCBs) that correspond to the homologous regions that are shared by the two sequences were found and a few of them were in reverse orientation after eight reordered cycles. From ordered output fasta file obtained with MAUVE [48] a new alignment was made with Nucmer at nucleotide level (maximum gap between two adjacent matches in a cluster of 90 bp and a minimum length of a maximal exact match of 20 bp) and Promer at amino acid level (maximum gap between two adjacent matches in a cluster of 30 amino acids and a minimum length of a maximal exact match of 6 amino acids). Nucmer and Promer alignments were plotted using Mumerplot, the last three mentioned tools from mummer 3.0 suite [63] (Fig. 5). This analysis revealed that a high number of forward matches are in the greatest scaffolds of HDO1 genome sequence and reverse matches are more common in the smallest scaffolds. These differences and similarities seen for the nucleotides showed the same trends when these were translated to amino acids. These analyses and their corresponding plots also permitted to determine rearrangements, insertions, and deletions between both genomes.

Fig. 4

Fig. 5

Dot plot analysis comparing the HDO1 and IHEM14462 strains’ genomes. a Comparison at the nucleotide level. b Comparison at the protein level. It shows the alignment of the genome sequence of IHEM 14462 strains (y axis) against HDO1 genome sequence (x axis). The red color lines and dots represent the forward matches between the both genome sequences while the blue color ones represent reverse matches

MAUVE [62] alignment of draft genome sequence of HDO1 strain and draft genome sequence of IHEM 14462 strain. The figure represents the locally contiguous blocks (LCBs) that both sequences share, connected by lines to show their positions in the genomes. At the top the sequence of HDO1 strain is visualized and at the bottom the re-ordered sequence of the IHEM 14462 strain appears [23]. Blocks that are shown below indicate regions that have the reverse sequence related to the HDO1 sequence Dot plot analysis comparing the HDO1 and IHEM14462 strains’ genomes. a Comparison at the nucleotide level. b Comparison at the protein level. It shows the alignment of the genome sequence of IHEM 14462 strains (y axis) against HDO1 genome sequence (x axis). The red color lines and dots represent the forward matches between the both genome sequences while the blue color ones represent reverse matches A thorough comparative analysis showed some important differences between the genome draft sequences of the clinical and the environmental strain sequenced here. These differences were evident in the genome size of the assemblies and the number of predicted genes (Table 5). Indeed, our assembly had a total of 783.135 bp (1.77% of genome size) and 276 coding sequences more than the clinical strain. The remarkable difference in the number of annotated genes involved in hydrocarbons degradation pathways could be attributed to the pipeline followed to annotate genes. For the clinical strain the CDSs found were annotated against TrEmbl database [64] that only comprises UniProtKB/Swiss-Prot, while in this study, we used the nr (non-redundant protein sequences) database of NCBI which has a wider coverage because it comprises sequences obtained from another databases like GenPept, TPA, PIR, PRF, PDB, NCBI RefSeq, and UniProtKB/Swiss-Prot [65]. Since the repetitive elements of the genome were estimated as only 1.93%, it is highly probable that the difference in size can be attributed to some of the elements involved in functional categories.

Genes involved in hydrocarbon biodegradation pathways

Several genes involved in hydrocarbon biodegradation pathways were annotated in the genome of the environmental strain. In Table 6 the genes previously reported in the clinical strain [23] are shown. Results revealed that some genes are involved in several degradation pathways, principally corresponding to aromatic hydrocarbon metabolism (polycyclic aromatic hydrocarbons and phenolic compounds) and cytochrome P450 system. The number of these genes annotated for each strain can also be seen in the table and these values showed a higher number of genes in the environmental strain HDO1. The genes solely found in the draft sequence of the strain HDO1 are reported in Table 7. These genes comprised some genes belonging to the aromatic hydrocarbons degradation pathways completing the pathways in which genes found in both strains are also involved. Genes involved in the degradation of other organic compounds like toluene, lignin, and xylenol were found (Table 6).

Table 6

Annotated genes involved in hydrocarbons degradation pathways

Gene	Pathway	# of genes in HDO1	# of genes in IHEM 14462 [23]
Cytochrome P450 monooxygenase (EC:1.14.13.12)	PAHs degradation, alkane biodegradation [78, 79]	79	44
Phenol hydroxylase	Phenol degradation [17]	4	4
Epoxide hydrolase (EC:3.3.2.9)	PAHs degradation [80]	3	2
Oxidoreductase	Organic compound metabolism [81]	13	8
Salicylate hydroxylase (EC:1.14.13.24)	Naphthalene degradation [82]	13	4
Laccase (EC:1.10.3.2)	PAHs degradation [83]	2	2
Catechol 1,2-dioxygenase (EC:1.13.11.1)	Phenol degradation [17]
2,4-dichlorophenol 6-monooxygenase (EC:1.14.13.7; EC:1.14.13.20)	Chlorinated phenols degradation [84]	5	5
2,3-dihydroxybenzoate decarboxylase (EC:4.1.1.46)	2,3-dihydroxybenzoate degradation [85]	1	1
Carboxy-cis,cis-muconate cyclase	Phenol degradation [17]	4	2
Phenylacetate 2-hydroxylase	Homogentisate degradation [86]	1	2
2-nitropropane dioxygenase (EC:1.13.12.16)	Nitroalquene oxidation [87]	4	4
Biphenyl-2,3-diol 1,2-dioxygenase	Byphenyl degradation [88]	1	2
Dienelactone hydrolase	Chloroaromatic degradation [89]	2	7
Vanillyl-alcohol oxidase (EC:1.1.3.38)	Aromatic degradation [90]	4	6
Cyclopentanone 1,2-monooxygenase (EC:1.14.13.8; EC:1.14.13.16)	Cyclopentanol degradation [91]	2	2
Tyrosinase	Phenolic compounds degradation [92]	1	3
Lignostilbene dioxygenase (EC:1.13.11.43)	Lignin degradation [93]	2	1
Total number of genes		145	103

Table 7

Annotated genes found only in the HDO1 strain

Genes in HDO1	Pathway	# of genes
3-oxoadipate enol-lactonase	Phenol degradation [94]	5
5-carboxymethyl-2-hydroxymuconate isomerase	Homoprotocatechuate degradation pathway [95]	1
Trihydroxytoluene oxygenase	2,4-dinitrotoluene degradation [96]	1
Benzoate 4-monooxygenase (EC:3.6.1.3)	Benzoate degradation [18]	5
Diphenol oxidase	Phenolic compounds degradation [97]	1
Cyclohexanone monooxygenase (EC:1.14.13.8)	Cyclohexane degradation [98]	4
Gentisate 1,2-dioxygenase	PAHs degradation [99]	2
2-keto-4-pentenoate hydratase (EC:3.7.1.5)	Benzoate degradation [100]	1
carboxymuconolactone decarboxylase	Protocatechuate degradation [101]	1
3-(3-hydroxy-phenyl)propionate 3-hydroxycinnamic acid hydroxylase	Phenyl propionate degradation [102]	2
3-hydroxybenzoate 6-hydroxylase	Xylenol [103] and 3-hydroxybenzoate degradation [104]	1
3-hydroxyisobutyrate dehydrogenase (EC:1.1.1.44; EC:2.1.1.43)	Aromatic compounds metabolism [105]	5
Total number of genes		30

Annotated genes involved in hydrocarbons degradation pathways Annotated genes found only in the HDO1 strain The complete annotation of the genome and, particularly, of the genes belonging to a major class of protein families involved in fungal catabolism of organic pollutants was made. We could identify genes coding for proteins that have the ability to oxidize aromatic compounds like dioxygenases or monooxygenases. Among these, we could predict dioxygenases such as 2-nitropropane dioxygenase, extracellular dioxygenase (EC:1.13.11), gentisate 1,2-dioxygenase, intradiol ring-cleavage dioxygenase (EC:1.13.11), lignostilbene dioxygenase (EC:1.13.11.43), catechol 1,2-dioxygenase (EC:1.13.11.1), biphenyl-2,3-diol 1,2-dioxygenase, aromatic ring-opening dioxygenase, and 4-hydroxyphenylpyruvate dioxygenase (EC:1.13.11.27). These enzymes have great importance because, along with NADH-dependent flavin reductase and [2Fe-2S] redox centers, they catalyze the transformation of several aromatic compounds to dihydrodiols [66], allowing the complete mineralization of these compounds to CO2 and H2O (with the participation of other specific enzymes). Another enzyme family identified among the annotated genes was cytochrome P450. These enzymes have an interesting catabolic potential because they do not have substrate specificity and can catalyze epoxidation and hydroxylation of several organic pollutants like dioxins, nonylphenol, and PAHs [67]. Genes coding for extracellular proteins like laccases and tyrosinase (known as phenoloxidase enzymes), which have the ability to degrade several groups of organic compounds due to their non-specificity action, were annotated in the genome. These enzymes produce organic radicals beyond one electron abstraction; those free radicals can be transformed by several reactions that include the ether cleavage in dioxins, quinone formations from PAHs and chlorophenols [68]. These extracellular enzymes are extremely important because of their potential in biotechnological applications [69, 70]. Moreover, several oxidoreductases, hydrolases, dehydroxylases, isomerases, and transferases were also predicted in the studied strain. However, extracellular enzymes such as lignin and manganese peroxidases could not be identified yet. Catabolic proteins of S. apiospermum involved in phenol, p-cresol and phenylbenzoate degradation pathway previously reported by (Clauβen and Schmidt) [17, 18] like phenol 2-monooxygenase and cathecol 1,2 dioxygenase were identified. However, hydroquinone hydroxylase, 4-hydroxybenzoate 3-hydroxylase, hydroxiquinone 1,2 dioxygenase, protocatechuate 3,4 dioxygenase, and maleylacetate reductase could not be found, suggesting that these proteins could be classified among the proteins annotated as hypothetical or with an unknown function or that they can be in the gap regions of the genome assembly.

Conclusions

The draft genome sequence of environmental strain HDO1 isolated from bacterial bioremediation assays in crude oil was described here. The structural and functional information of the genome sequence of has allowed advancing in the understanding of the ability of this fungus to degrade several kinds of xenobiotic compounds mainly several hydrocarbons families and offers an opportunity to propose its use or its enzymes in controlled bioremediation or bioaugmentation processes. Genome assembly statistics reported by Quast [44]. Table S2. Non-coding repeats sequences summary. Table S3. Species and genes (accession numbers) used in the phylogenetic analysis [29]. (DOCX 37 kb)

83 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

Review 2. Scedosporium apiospermum: changing clinical spectrum of a therapy-refractory opportunist.

Authors: Josep Guarro; A Serda Kantarcioglu; Regine Horré; Juan Luis Rodriguez-Tudela; Manuel Cuenca Estrella; Juan Berenguer; G Sybren de Hoog
Journal: Med Mycol Date: 2006-06 Impact factor: 4.076

3. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors: Alexandros Stamatakis
Journal: Bioinformatics Date: 2006-08-23 Impact factor: 6.937

4. Broad substrate Cytochrome P450 monooxygenase activity in the cells of Aspergillus terreus MTCC 6324.

Authors: Preety Vatsyayan; A Kiran Kumar; Papori Goswami; Pranab Goswami
Journal: Bioresour Technol Date: 2007-01-23 Impact factor: 9.642

5. Different types of dienelactone hydrolase in 4-fluorobenzoate-utilizing bacteria.

Authors: M Schlömann; E Schmidt; H J Knackmuss
Journal: J Bacteriol Date: 1990-09 Impact factor: 3.490

6. Evidence for isofunctional enzymes used in m-cresol and 2,5-xylenol degradation via the gentisate pathway in Pseudomonas alcaligenes.

Authors: C L Poh; R C Bayly
Journal: J Bacteriol Date: 1980-07 Impact factor: 3.490

7. Biodegradation of phenylbenzoate and some of its derivatives by Scedosporium apiospermum.

Authors: M Claussen; S Schmidt
Journal: Res Microbiol Date: 1999 Jul-Aug Impact factor: 3.992

8. Biochemical and molecular characterization of the diphenol oxidase of Cryptococcus neoformans: identification as a laccase.

Authors: P R Williamson
Journal: J Bacteriol Date: 1994-02 Impact factor: 3.490

9. Tablet--next generation sequence assembly visualization.

Authors: Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal: Bioinformatics Date: 2009-12-04 Impact factor: 6.937

10. Molecular typing of Australian Scedosporium isolates showing genetic variability and numerous S. aurantiacum.

Authors: Laurence Delhaes; Azian Harun; Sharon C A Chen; Quoc Nguyen; Monica Slavin; Christopher H Heath; Krystyna Maszewska; Catriona Halliday; Vincent Robert; Tania C Sorrell; Wieland Meyer
Journal: Emerg Infect Dis Date: 2008-02 Impact factor: 6.883

4 in total

1. Anaerobic Protein Purification and Kinetic Analysis via Oxygen Electrode for Studying DesB Dioxygenase Activity and Inhibition.

Authors: Stacy N Uchendu; Angelika Rafalowski; Erin F Cohn; Luke W Davoren; Erika A Taylor
Journal: J Vis Exp Date: 2018-10-03 Impact factor: 1.355

2. Fungal proliferation and hydrocarbon removal during biostimulation of oily sludge with high total petroleum hydrocarbon.

Authors: Hanghai Zhou; Xiaomin Huang; Kuiyong Bu; Fang Wen; Dongdong Zhang; Chunfang Zhang
Journal: Environ Sci Pollut Res Int Date: 2019-09-13 Impact factor: 4.223

3. Reference genes for gene expression analysis in the fungal pathogen Neonectria ditissima and their use demonstrating expression up-regulation of candidate virulence genes.

Authors: Liz M Florez; Reiny W A Scheper; Brent M Fisher; Paul W Sutherland; Matthew D Templeton; Joanna K Bowen
Journal: PLoS One Date: 2020-11-13 Impact factor: 3.240

4. Diversity and Oil Degradation Potential of Culturable Microbes Isolated from Chronically Contaminated Soils in Trinidad.

Authors: Amanda C Ramdass; Sephra N Rampersad
Journal: Microorganisms Date: 2021-05-28

4 in total