| Literature DB >> 35438621 |
Celine Petersen1, Trine Sørensen1, Klaus R Westphal1, Lavinia I Fechete1, Teis E Sondergaard1, Jens L Sørensen2, Kåre L Nielsen1.
Abstract
During the last two decades, whole-genome sequencing has revolutionized genetic research in all kingdoms, including fungi. More than 1000 fungal genomes have been submitted to sequence databases, mostly obtained through second generation short-read DNA sequencing. As a result, highly fragmented genome drafts have typically been obtained. However, with the emergence of third generation long-read DNA sequencing, the assembly challenge can be overcome and highly contiguous assemblies obtained. Such attractive results, however, are extremely dependent on the ability to extract highly purified high molecular weight (HMW) DNA. Extraction of such DNA is currently a significant challenge for all species with cell walls, not least fungi. In this study, four isolates of filamentous ascomycetes (Apiospora pterospermum, Aspergillus sp. (subgen. Cremei), Aspergillus westerdijkiae, and Penicillium aurantiogriseum) were used to develop extraction and purification methods that result in HMW DNA suitable for third generation sequencing. We have tested and propose two straightforward extraction methods based on treatment with either a commercial kit or traditional phenol-chloroform extraction both in combination with a single commercial purification method that result in high quality HMW DNA from filamentous ascomycetes. Our results demonstrated that using these DNA extraction methods and coverage, above 75 x of our haploid filamentous ascomycete fungal genomes result in complete and contiguous assemblies.Entities:
Keywords: DNA extraction; Long-read sequencing; MinION; ascomycete; filamentous fungi; genome assembly; high molecular weight DNA
Mesh:
Substances:
Year: 2022 PMID: 35438621 PMCID: PMC9453082 DOI: 10.1099/mgen.0.000816
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.The effect of six different combinations of DNA extraction and purification methods on four fungi (Apiospora pterospermum, Aspergillus westerdijkiae, Penicillium aurantiogriseum, and Aspergillus sp. (subgen. Cremei)). PS-SC; QIAGEN DNeasy PowerSoil Kit with bead beating for indicated time periods followed by purification using Spin columns from QIAGEN DNeasy PowerSoil Kit. PS-PCP; QIAGEN DNeasy PowerSoil Kit with bead beating for 1 min followed by phenol-chloroform purification. PCE-PCP; only phenol-chloroform extraction and purification. PCE-AMP; phenol-chloroform extraction followed by AMPure XP beads purification. PCE-GT; phenol-chloroform extraction and QIAGEN Genomic-Tips 20 G−1 purification. GB-GT; QIAGEN Genomic Buffer Set and QIAGEN Genomic-Tips 20 G−1 purification. (a) DNA yield (ng DNA/mg dry cell mass). (b) A260/230 and (c) A260/280 ratio; all samples with a concentration below 20 ng µl−1 were excluded from the figure. (d) Ratio of concentration determinations (UV-abs/Fluorescence); samples with a concentration above 20 ng µl−1 were measured only. (e) Mean fragment size (kbp) of the extracted DNA; samples that complied with the listed criteria were measured only. Hatched grey lines in b–d indicate recommended intervals.
Summary of sequencing statistics. Filtering was conducted by Filtlong version 0.2 using a minimum read length of 10 kb and minimum quality of Q7
|
Before filtering | ||||
|---|---|---|---|---|
|
|
|
|
|
|
|
No. of bases (Gb) |
5.57 |
5.31 |
5.06 |
6.89 |
|
No. of reads |
397 011 |
304 966 |
1 051 356 |
497 426 |
|
N50 (kb) |
27.12 |
29.92 |
13.10 |
21.05 |
|
Mean read quality |
Q12.0 |
Q12.0 |
Q11.8 |
Q11.9 |
|
| ||||
|
No. of bases (Gb) |
4.57 |
4.67 |
2.94 |
5.60 |
|
No. of reads |
169 574 |
168 878 |
134 614 |
249 012 |
|
N50 (kb) |
32.61 |
32.87 |
24.85 |
24.70 |
|
Mean read quality |
Q12.1 |
Q12.2 |
Q12.1 |
Q12.0 |
Fig. 2.Overview of the bioinformatics workflow. Programs used in the different steps are denoted with italics. Nucleotide mismatches are marked as red squares. Statistics with regard to the four fungi analysed in this study can be found in Table 1. Guppy is used for basecalling and demultiplexing the raw reads. Reads ≥10 kb and quality threshold ≥80 are removed using Filtlong. After filtering of the reads, an assembly can be created using the assembler Miniasm but it requires read-to-read overlap. To perform this, the program Minimap2 is used. Polishing of the consensus is performed using the two programs Racon and Medaka since these have shown to increase the completeness significantly. A second round of Medaka can be used to polish the assembly.
Overview of statistics of polishing steps. See Methods for details. C (%) and F (%) denotes BUSCO completeness in percent and fragmented BUSCOs in percent, respectively. A total of 1315 genes were considered in the BUSCO analyses
|
Coverage |
Polishing steps |
no. of contigs |
N99 (b) |
Genome size (Mb) |
C (%) |
F (%) | |
|---|---|---|---|---|---|---|---|
|
|
125 x |
None |
18 |
394 867 |
44.2 |
2.2 |
7.2 |
|
|
125 x |
Racon |
18 |
398 398 |
44.7 |
90 |
5.5 |
|
|
125 x |
Racon+Medaka |
17 |
398 846 |
44.7 |
98 |
0.8 |
|
|
125 x |
Racon+MedakaX2 |
17 |
398 820 |
44.7 |
98 |
0.8 |
|
|
130 x |
None |
11 |
2 708 532 |
35.7 |
1.6 |
8.3 |
|
|
130 x |
Racon |
10 |
2 734 490 |
36 |
88.3 |
5.8 |
|
|
130 x |
Racon+Medaka |
10 |
2 736 658 |
36 |
98 |
0.7 |
|
|
130 x |
Racon+MedakaX2 |
10 |
2 736 562 |
36 |
97.9 |
0.7 |
|
|
139 x |
None |
8 |
4 594 295 |
32.4 |
1.3 |
9.4 |
|
|
139 x |
Racon |
8 |
4 638 995 |
32.8 |
90.8 |
4.9 |
|
|
139 x |
Racon+Medaka |
8 |
4 642 496 |
32.8 |
97.5 |
0.8 |
|
|
139 x |
Racon+MedakaX2 |
8 |
4 642 389 |
32.8 |
97.5 |
1.1 |
|
|
91 x |
None |
11 |
2 636 682 |
31.9 |
1.1 |
7.6 |
|
|
91 x |
Racon |
11 |
2 655 487 |
32.2 |
87.1 |
6.9 |
|
|
91 x |
Racon+Medaka |
11 |
2 657 265 |
32.2 |
97.6 |
0.5 |
|
|
91 x |
Racon+MedakaX2 |
11 |
2 657 265 |
32.2 |
97.8 |
0.5 |
Fig. 3.Influence of coverage on the assemblies. (a) Number of contigs as a function of coverage. (b) N99 as a function of coverage. (c) Completeness as a function of coverage. Completeness was assessed from BUSCO analysis. (d) Number of fragmented BUSCO genes as a function of coverage. (e) Average mapping completeness (%): the average of the fraction of any read that maps to the assembly across all reads.
Fig. 4.Counts of random indel errors and homopolymeric indel errors in assemblies polished with Racon and two rounds of Medaka at different coverage levels for the three BUSCO genes EOG092D0072, EOG92D005G, and EOG092D00. The sum of indel counts for all three genes is shown. The data point for coverage 10x – 75x is made with data from the four assemblies, whereas the data point for 100 x is only represented by data from three assemblies, since Aspergillus sp. (subgen. Cremei) was only sequenced to maximum of 91 x coverages.
Overview of the telomeric regions in the different assemblies made for the four fungi all polished with Racon and two rounds of Medaka
|
Coverage |
Total no. of telomeric regions |
no. of contigs with telomeric region at both ends |
no. of contigs with telomeric region at one end |
no. of contigs with no telomeric region* | |
|---|---|---|---|---|---|
|
|
125 x |
8 |
2 |
4 |
8 |
|
100 x |
10 |
1 |
8 |
6 | |
|
75 x |
8 |
2 |
4 |
16 | |
|
50 x |
12 |
1 |
10 |
8 | |
|
|
130 x |
7 |
1 |
5 |
2 |
|
100 x |
6 |
2 |
2 |
4 | |
|
75 x |
7 |
2 |
3 |
3 | |
|
50 x |
6 |
1 |
4 |
4 | |
|
|
139 x |
6 |
2 |
2 |
0 |
|
100 x |
5 |
1 |
3 |
0 | |
|
75 x |
6 |
2 |
2 |
0 | |
|
50 x |
6 |
2 |
2 |
0 | |
|
|
91 x |
9 |
3 |
3 |
2 |
|
75 x |
6 |
2 |
2 |
4 | |
|
|
50 x |
6 |
1 |
4 |
2 |
*Contigs comprising mtDNA or exclusively rRNA genes are not included in the analysis.