| Literature DB >> 35972389 |
Upendra R Bhattarai1, Mandira Katuwal1, Robert Poulin2, Neil J Gemmell1, Eddy Dowle1.
Abstract
The European earwig Forficula auricularia is an important model for studies of maternal care, sexual selection, sociality, and host-parasite interactions. However, detailed genetic investigations of this species are hindered by a lack of genomic resources. Here, we present a high-quality hybrid genome assembly for Forficula auricularia using Nanopore long-reads and 10× linked-reads. The final assembly is 1.06 Gb in length with 31.03% GC content. It consists of 919 scaffolds with an N50 of 12.55 Mb. Half of the genome is present in only 20 scaffolds. Benchmarking Universal Single-Copy Orthologs scores are ∼90% from 3 sets of single-copy orthologs (eukaryotic, insect, and arthropod). The total repeat elements in the genome are 64.62%. The MAKER2 pipeline annotated 12,876 protein-coding genes and 21,031 mRNAs. Phylogenetic analysis revealed the assembled genome as that of species B, one of the 2 known genetic subspecies of Forficula auricularia. The genome assembly, annotation, and associated resources will be of high value to a large and diverse group of researchers working on dermapterans.Entities:
Keywords: zzm321990 Forficula auriculariazzm321990 ; genome annotation; hybrid genome assembly; repeatome
Mesh:
Year: 2022 PMID: 35972389 PMCID: PMC9526046 DOI: 10.1093/g3journal/jkac199
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.Schematic representation of the assembly pipeline for the F. auricularia genome. The solid black arrow represents the workflow and the red dotted lines represent the additional input data in the pipeline (created with BioRender.com).
Fig. 2.The phylogenetic relationships of F. auricularia obtained from different geographic regions inferred from COI and COII using a Neighbour-Joining method and Maximum Composite Likelihood approach in MEGA11. All ambiguous positions were removed for each nucleotide sequence pair (pairwise deletion). The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. Species labeled with the colored squares are subspecies B. The red square (Dunedin NZ) is the one for which the genome is reported in this article. Green squares are the species categorized as subspecies B by Wirth and the purple squares are others for which the nucleotide sequences were downloaded from NCBI. Species labeled with colored circles belong to subspecies A. Green circles represent subspecies A inferred by Wirth and blue are other species for which nucleotide sequences were downloaded from NCBI. E. arcanum is the outgroup labelled with a black triangle.
Assembly statistics at different stages of assembly for the genome of the European earwig F. auricularia.
| Assembly length | No. scaffolds | N50 | L50 | Ns per 100 kbp | BUSCO % (Quast) | ||
|---|---|---|---|---|---|---|---|
| Complete | Partial | ||||||
| Supernova assembly | 1,145,470,221 | 145,055 | 30,358 | 7,500 | 3,677.89 | 64.69 | 9.24 |
| Flye assembly | 1,118,374,848 | 18,766 | 180,737 | 1,832 | 0.35 | 82.18 | 9.24 |
| Final hybrid assembly | 1,062,210,345 | 919 | 12,548,649 | 20 | 846.85 | 87.13 | 2.97 |
The Supernova and the Flye assembly statistics are for the assembly right after the assembler and no further processing, whereas the Final hybrid assembly shows the statistics of the assembly through all the assembly process as described in this article. Quast scores are to its default Eukaryota database.
Repeat content analysis in the European earwig Forficula auricularia genome.
| No. sequences | 919 | ||
|---|---|---|---|
| Total length (bp) | 1,062,210,345 | ||
| GC level | 31.03% | ||
| Bases masked | 722,769,501 bp (68.04%) | ||
| Numbers | Length (bp) | Percentage | |
| Retroelements | 1,385,007 | 248,236,495 | 23.37 |
| SINEs | 41,157 | 5,138,497 | 0.48 |
| Penelope | 50,409 | 10,372,837 | 0.98 |
| LINEs | 660,178 | 124,985,146 | 11.77 |
| CRE/SLACS | 0 | 0 | 0.00 |
| L2/CR1/Rex | 112,418 | 20,654,321 | 1.94 |
| R1/LOA/Jockey | 167,317 | 22,277,052 | 2.10 |
| R2/R4/NeSL | 23,348 | 4,271,189 | 0.40 |
| RTE/Bov-B | 136,406 | 28,799,096 | 2.71 |
| L1/CIN4 | 10,079 | 1,892,539 | 0.18 |
| LTR elements | 683,672 | 118,112,852 | 11.12 |
| BEL/Pao | 60,561 | 12,114,300 | 1.14 |
| Ty1/Copia | 97,132 | 14,352,992 | 1.35 |
| Gypsy/DIRS1 | 521,467 | 91,083,363 | 8.57 |
| Retroviral | 3,701 | 443,583 | 0.04 |
| DNA transposons | 1,040,870 | 178,326,460 | 16.79 |
| hobo-Activator | 362,395 | 59,188,939 | 5.57 |
| Tc1-IS630-Pogo | 355,781 | 66,331,225 | 6.24 |
| En-Spm | 0 | 0 | 0.00 |
| MuDR-IS905 | 0 | 0 | 0.00 |
| PiggyBac | 21,153 | 2,726,812 | 0.26 |
| Tourist/Harbinger | 5,541 | 1,187,174 | 0.11 |
| Other (Mirage, P-element, Transib) | 10,240 | 1,580,945 | 0.15 |
| Rolling circles | 174,964 | 34,830,487 | 3.28 |
| Unclassified | 1,563,937 | 259,874,747 | 24.47 |
| Total interspersed repeats | 686,437,702 | 64.62 | |
| Small RNA | 9,913 | 1,406,877 | 0.13 |
| Satellites | 1,110 | 495,561 | 0.05 |
| Simple repeats | 0 | 0 | 0.00 |
| Low complexity | 0 | 0 | 0.00 |
Genome annotation summary for the European earwig Forficula auricularia.
| Total sequence length | 1,062,210,345 |
|---|---|
| Number of genes | 12,876 |
| Number of mRNAs | 21,031 |
| Number of exons | 145,003 |
| Number of introns | 123,973 |
| Number of CDS | 21,030 |
| Total gene length | 155,753,058 |
| Total mRNA length | 271,884,000 |
| Total exon length | 32,584,454 |
| Total intron length | 239,538,939 |
| Total CDS length | 23,936,568 |
| Longest gene | 412,198 |
| Longest mRNA | 412,198 |
| Longest exon | 10,240 |
| Longest intron | 319,382 |
| Longest CDS | 19,035 |
| Mean gene length | 12,096 |
| Mean mRNA length | 12,928 |
| Mean exon length | 225 |
| Mean intron length | 1,932 |
| Mean CDS length | 1,138 |
| % of genome covered by genes | 14.7 |
| % of genome covered by CDS | 2.3 |
| Mean mRNAs per gene | 2 |
| Mean exons per mRNA | 7 |
| Mean introns per mRNA | 6 |
Fig. 3.GC percentage in different genomic features of the F. auricularia genome. GC content for 10-kb windows was generated without regard to any genomic features. Whiskers extend to 25th and 75th percentiles. GC content in exons is higher and in introns is lower compared to the genome average.