| Literature DB >> 33283863 |
Pepijn W Kooij1,2, Jaume Pellicer1,3.
Abstract
Each day, as the amount of genomic data and bioinformatics resources grows, researchers are increasingly challenged with selecting the most appropriate approach to analyze their data. In addition, the opportunity to undertake comparative genomic analyses is growing rapidly. This is especially true for fungi due to their small genome sizes (i.e., mean 1C = 44.2 Mb). Given these opportunities and aiming to gain novel insights into the evolution of mutualisms, we focus on comparing the quality of whole genome assemblies for fungus-growing ants cultivars (Hymenoptera: Formicidae: Attini) and a free-living relative. Our analyses reveal that currently available methodologies and pipelines for analyzing whole-genome sequence data need refining. By using different genome assemblers, we show that the genome assembly size depends on what software is used. This, in turn, impacts gene number predictions, with higher gene numbers correlating positively with genome assembly size. Furthermore, the majority of fungal genome size data currently available are based on estimates derived from whole-genome assemblies generated from short-read genome data, rather than from the more accurate technique of flow cytometry. Here, we estimated the haploid genome sizes of three ant fungal symbionts by flow cytometry using the fungus Pleurotus ostreatus (Jacq.) P. Kumm. (1871) as a calibration standard. We found that published genome sizes based on genome assemblies are 2.5- to 3-fold larger than our estimates based on flow cytometry. We, therefore, recommend that flow cytometry is used to precalibrate genome assembly pipelines, to avoid incorrect estimates of genome sizes and ensure robust assemblies.Entities:
Keywords: evolution; fungi; fungus-growing ants; genome assembly; genome size; mutualism
Year: 2020 PMID: 33283863 PMCID: PMC7719231 DOI: 10.1093/gbe/evaa217
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Genome Statistics for the Different Assemblies
| Species/ID | Assembler ± Redundans | Total Length (bp) | No. of Contigs | N50 (bp) | Longest Contig (bp) |
| No. of Predicted Genes | Total BUSCO Genes (%) | BUSCO Genes Duplicated (%) |
|---|---|---|---|---|---|---|---|---|---|
|
| ABySS – | 37,868,966 | 7,777 | 51,836 | 742,525 | 1,695,483 | 6,730 | 94.9 | 0.6 |
| 100610-02 | ABySS + | 41,010,461 | 2,091 | 111,840 | 1,288,898 | 96,942 | 7,054 | 97.5 | 0.7 |
| SGA – | 94,552,020 | 48,075 | 3,297 | 92,288 | 0 | 14,239 | 83.6 | 0.4 | |
| SGA + | 64,486,262 | 5,151 | 31,855 | 524,303 | 95,670 | 9,385 | 96.5 | 1.5 | |
| SOAPdenovo – | 171,922,479 | 20,433 | 19,233 | 200,771 | 24,532,942 | 23,180 | 88.8 | 19.8 | |
| SOAPdenovo + | 190,471,773 | 10,447 | 30,961 | 285,616 | 2,602,102 | 26,171 | 96.0 | 26.1 | |
| SPAdes – | 101,100,038 | 27,307 | 9,377 | 213,758 | 533,869 | 14,882 | 91.4 | 0.4 | |
| SPAdes + | 79,165,892 | 4,326 | 75,775 | 496,909 | 104,891 | 11,069 | 97.5 | 1.0 | |
|
| ABySS – | 37,642,602 | 9,676 | 16,119 | 234,386 | 223,574 | 6,794 | 94.4 | 0.5 |
| MS140512-07 | ABySS + | 37,057,723 | 4,778 | 24,356 | 234,870 | 4,484 | 6,715 | 95.7 | 0.5 |
| SGA – | 150,310,303 | 151,153 | 1,024 | 54,556 | 0 | 22,568 | 69.3 | 5.2 | |
| SGA + | 58,439,858 | 32,056 | 2,491 | 54,556 | 3,648 | 9,703 | 70.5 | 0.3 | |
| SOAPdenovo – | 80,942,739 | 30,870 | 4,169 | 42,629 | 450,122 | 11,837 | 67.3 | 0.3 | |
| SOAPdenovo + | 63,729,975 | 12,609 | 8,493 | 130,081 | 116,914 | 9,226 | 82.3 | 0.5 | |
| SPAdes – | 124,500,266 | 31,228 | 14,053 | 190,955 | 438,978 | 19,134 | 65.3 | 6.8 | |
| SPAdes + | 101,793,287 | 12,552 | 21,931 | 190,955 | 7,959 | 15,194 | 74.2 | 2.6 | |
|
| ABySS – | 32,671,039 | 4,005 | 28,088 | 430,464 | 255,022 | 6,552 | 94.5 | 0.9 |
| KM164561 | ABySS + | 33,544,974 | 2,248 | 42,155 | 558,264 | 4,497 | 6,641 | 95.7 | 0.5 |
| SGA – | 78,616,683 | 74,089 | 1,143 | 54,770 | 0 | 15,165 | 66.3 | 4.9 | |
| SGA + | 43,854,981 | 25,027 | 2,392 | 54,770 | 3,356 | 9,083 | 67.8 | 0.5 | |
| SOAPdenovo – | 40,835,165 | 11,074 | 7,913 | 103,485 | 124,322 | 7,550 | 79.0 | 0.4 | |
| SOAPdenovo + | 39,685,961 | 4,810 | 19,634 | 245,171 | 40,103 | 7,252 | 89.6 | 0.5 | |
| SPAdes – | 58,776,147 | 24,243 | 4,952 | 120,587 | 398,844 | 11,519 | 64.7 | 3.8 | |
| SPAdes + | 47,589,435 | 8,392 | 10,680 | 120,587 | 11,950 | 9,023 | 81.1 | 1.4 |
Note.—Genome statistics as extracted using the ContigStats.pl script and the BUSCO pipeline. Assemblies with and without Redundans optimization are marked with – or +, respectively. Full BUSCO results are presented in supplementary table S1, Supplementary Material online.
. 1Correlation between genome assembly size and number of predicted genes. Based on the data obtained from the eight genome assemblies for each of the three samples, the number of predicted genes was positively correlated with the total assembly length (Spearman’s rank ρ = 0.9765, P < 0.0001). In most cases, optimizing the assembly using Redundans reduced the total genome assembly size presumably due to the removal of heterozygous regions. The one exception was observed for the previously published sequence data set from a Cyphomyrmex costatus fungal symbiont, Leucocoprinus sp., assembled with SOAPdenovo (Nygaard et al. 2016, in red). As indicated in the key, colors correspond to each of the three samples whereas the different shapes correspond to the four assemblers used, with filled shapes representing assemblies that also used Redundans, and open shapes corresponding to those which did not. The genome size for the fungal symbiont Leucocoprinus sp. isolated in this work and estimated using flow cytometry is marked by a blue line, showing that the best assemblies for this species were obtained using ABySS.
Genome Size Estimation Using Flow Cytometry
| Species | ID | 1C-value (Mb) | Standard Deviation (Mb) | CV% (Standard) | CV% (Target) |
|---|---|---|---|---|---|
|
| KM237125 | 24.17 | 0.39 | 4.95 | 6.75 |
|
| Ac-2009-42 | 39.86 | 0.43 | 4.00 | 5.29 |
|
| MS140512-07 | 47.17 | 0.10 | 3.94 | 4.45 |
|
| MS140507-01 | 49.10 | 0.79 | 4.24 | 5.23 |
Note.—Name of the ant species is given in parentheses below the fungal species name. The 1C-value represents the DNA content of the unreplicated haploid chromosome complement (i.e., the holoploid genome size sensu Greilhuber et al. 2005). CV% is the fluorescence peak width expressed as coefficient of variation.
. 2Examples of flow cytometry histograms used to estimate genome size. An example of the flow cytometry histograms obtained showing results for Leucoagaricus gongylophorus (A and C) and Leucocoprinus sp. (B and D) either without the standard Pleurotus ostreatus (A and B) or with (C and D).