| Literature DB >> 35394540 |
Pierre Nouhaud1,2, Jack Beresford1,2, Jonna Kulmuni1,2.
Abstract
Formica red wood ants are a keystone species of boreal forest ecosystems and an emerging model system in the study of speciation and hybridization. Here, we performed a standard DNA extraction from a single, field-collected Formica aquilonia × Formica polyctena haploid male and assembled its genome using ~60× of PacBio long reads. After polishing and contaminant removal, the final assembly was 272 Mb (4687 contigs, N50 = 1.16 Mb). Our reference genome contains 98.5% of the core Hymenopteran BUSCOs and was pseudo-scaffolded using the assembly of a related species, F. selysi (28 scaffolds, N50 = 8.49 Mb). Around one-third of the genome consists of repeats, and 17 426 gene models were annotated using both protein and RNAseq data (97.4% BUSCO completeness). This resource is of comparable quality to the few other single individual insect genomes assembled to date and paves the way to genomic studies of admixture in natural populations and comparative genomic approaches in Formica wood ants. © The American Genetic Association. 2022.Entities:
Keywords: Hymenoptera; PacBio sequencing; genome annotation; genome assembly; haplodiploidy; wood ant
Mesh:
Year: 2022 PMID: 35394540 PMCID: PMC9270870 DOI: 10.1093/jhered/esac019
Source DB: PubMed Journal: J Hered ISSN: 0022-1503 Impact factor: 2.679
Software used for data analysis
| Software | Version | Reference | Custom parameters (if any) |
|---|---|---|---|
| Canu | 1.8 |
| correctedErrorRate=0.085 |
| wtdbg2 | 2.5 |
| -p 0 -k 15 -AS 2 -s 0.05 -L 10000 |
| Busco | 4.0.5 |
| Hymenoptera ODB gene set v10 |
| Purge Haplotigs | 1.1.1 | ( | — |
| Racon | 1.4.10 |
| -u |
| Trimmomatic | 0.38 |
| LEADING:20 TRAILING:20 MINLEN:50 |
| minimap2 | 2.17 |
| -x map-pb (PacBio)/ -ax sr (Illumina) |
| Blobtools | 1.1.1 |
| — |
| RaGOO | 1.1 |
| — |
| MUMmer | 4.0.0beta2 |
| — |
| Merqury | 1.3 | ( | — |
| Repeatmodeler2 | 2.0.1 |
| -LTRStruct; via TETools container 1.1 |
| Repeatmasker | 4.1.0 | ( | via TETools container 1.1 |
| Braker2 | 2.1.5 | Bruna et al. (2020) | — |
| Star | 2.7.2 |
| — |
| SAMtools | 1.10 |
| — |
| EnTAP | 0.10.3 |
| — |
Assembly and annotation metrics
| Genome assembly | |
| BUSCO v4.0.5 genome score | C: 98.5% [S: 97.9%, D: 0.6%], F: 0.4%, M: 1.1%, n: 5991 |
| Number of contigs | 4687 |
| Contig N50 (bp) | 1 163 114 |
| Shortest contig (bp) | 117 |
| Longest contig (bp) | 4 650 116 |
| Average contig length (bp) | 58 036 |
| Total contig length (bp) | 272 015 305 |
| Number of pseudo-scaffolds | 28 |
| Pseudo-scaffold N50a (bp) | 8 490 488 |
| Shortest pseudo-scaffold (bp) | 3 646 393 |
| Longest pseudo-scaffolda (bp) | 14 915 360 |
| Average pseudo-scaffold lengtha (bp) | 7 887 222 |
| Total pseudo-scaffold length (bp) | 272 497 664 |
| Total unanchored length (bp, fraction) | 59 526 201 (21.8%) |
| GC content | 36.3% |
| N fraction | 0.17% |
| Genome annotation | |
| BUSCO v4.0.5 protein score | C: 97.4% [S: 96.8%, D: 0.6%], F: 1.4%, M: 1.2%, n: 5991 |
| Total number of gene models | 17 426 |
| Mean gene length (bp) | 5524 |
| Average number of exons per gene | 5.80 |
| Number of models with RNAseq support (fraction) | 11 956 (68.6%) |
| Number of isoforms | 19 226 |
| Average number of isoforms per gene | 1.10 |
| Cumulative gene length (bp, fraction) | 78 835 002 (29.0%) |
| Cumulative exon length (bp, fraction) | 27 442 032 (10.1%) |
| Repeat annotation | |
| Fraction of genome masked | 32.01% |
| Interspersed repeats, total fraction | 28.44% |
| Retroelements (class I) | 6.39% |
| LINEs | 1.47% |
| Gypsy/DIRS1 | 2.72% |
| DNA transposons (class II) | 3.56% |
| Unclassified | 18.50% |
| Simple repeats | 2.59% |
Scaffold statistics computed after excluding both the mitochondrial genome and Scaffold 0, which contains all unanchored contigs (59 Mb, “total unanchored length”).
Figure 1.Comparison of mapping rates for F. aquilonia and F. polyctena individuals (n = 10 per species, data from Portinha et al. 2021) against our hybrid assembly (x-axis) and the F. selysi assembly (y-axis, Brelsford et al. 2020). The dashed line gives y = x.
Figure 2.Total number of gene models as a function of BUSCO genome completeness metrics in ant genomes for which annotations are available on NCBI (n = 24, light gray) and the assembly of this study (black). Detailed statistics are shown in Supplementary Table 3 online.