| Literature DB >> 35217860 |
Jacqueline Heckenhauer1,2, Paul B Frandsen1,3,4, John S Sproul5,6, Zheng Li7, Juraj Paule8, Amanda M Larracuente5, Peter J Maughan3, Michael S Barker7, Julio V Schneider2, Russell J Stewart9, Steffen U Pauls1,2,10.
Abstract
BACKGROUND: Genome size is implicated in the form, function, and ecological success of a species. Two principally different mechanisms are proposed as major drivers of eukaryotic genome evolution and diversity: polyploidy (i.e., whole-genome duplication) or smaller duplication events and bursts in the activity of repetitive elements. Here, we generated de novo genome assemblies of 17 caddisflies covering all major lineages of Trichoptera. Using these and previously sequenced genomes, we use caddisflies as a model for understanding genome size evolution in diverse insect lineages.Entities:
Keywords: Trichoptera; biodiversity; de novo genome assembly; genome duplication; genome size evolution; genomic diversity; genomics; insects; repetitive elements; transposable elements
Mesh:
Substances:
Year: 2022 PMID: 35217860 PMCID: PMC8881205 DOI: 10.1093/gigascience/giac011
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Comparison of assembly and annotation statistics of all available Trichoptera genomes
| Species | Abbre-viation | Accession No. | Length (bp) | N50 (kb) | No. of contigs/scaffolds | BUSCOs |
|---|---|---|---|---|---|---|
|
| AF | JAGTXP000000000 | 552,637,417 | 2.8 | 296,752/291,536 | C: 43.8% [S: 43.0%, D: 0.8%], F: 35.2%, M: 21.0% |
|
| AS | JAGTTH000000000 | 196,044,125 | 86 | 7,077/7,050 | C: 94.2% [S: 88.6%, D: 5.6%], F: 1.9%, M: 3.9% |
|
| AV | GCA_016648135.1 | 1,352,945,503 | 111.8 | 25,541/25,153 | C: 87.5% [S: 77.1%, D: 10.4%], F: 6.1%, M: 6.4% |
|
| DA | JAGWCC000000000 | 727,941,535 | 1,043.7 | 2,401 | C: 90.3% [S: 89.6%, D: 0.7%], F: 6.5%, M: 3.2% |
|
| GC1 | JAGTXR000000000 | 568,249,599 | 2,212.1 | 653 | C: 90.1% [S: 89.1%, D: 1.0%], F: 2.7%, M: 7.2% |
|
| GC2 | GCA_003347265.1 | 604,293,666 | 17.1 | 132,934/119,821 | C: 78.4% [S: 77.3%, D: 1.1%], F: 15.0%, M: 6.6% |
|
| GP | Glyphotaelius_pellucidus_k51_scaffolds | 623,431,006 | 1.6 | 461,749 | C: 20.3% [S: 19.7%, D: 0.6%], F: 39.9%, M: 39.8% |
|
| HR | JAHDVE000000000 | 973,356,502 | 125.2 | 12,636/12,484 | C: 85.0% [S: 82.6%, D: 2.4%], F: 8.5%, M: 6.5% |
|
| HP | JAGVSL000000000 | 633,785,554 | 4,634 | 710 | C: 96.0% [S: 95.3%, D: 0.7%], F: 2.5%, M: 1.5% |
|
| HM | GCA_016648045.1 | 1,275,967,528 | 768.2 | 6,877 | C: 88.4% [S: 80.7%, D: 7.7%], F: 6.4%, M: 5.2% |
|
| HT | GCA_009617725.1 | 229,663,394 | 2,190.1 | 403 | C: 95.7% [S: 94.9%, D: 0.8%], F: 2.4%, M: 1.9% |
|
| LB | JAGTTH000000000 | 769,208,668 | 1,052 | 1,712/1,621 | C: 94.4% [S: 93.1%, D: 1.3%], F: 3.5%, M: 2.1% |
|
| LL | GCA_000648945.2 | 1,369,180,260 | 69.1 | 69,049/58,718 | 70.4% [S: 66.2%, D: 4.2%], F: 20.8%, M: 8.8% |
|
| ML2 | JAGXCS000000000 | 668,600,304 | 2.5 | 374,883/368,330 | C: 45.0% [S: 44.1%, D: 0.9%], F: 34.5%, M: 20.5% |
|
| ML1 | JAGVSM000000000 | 585,245,295 | 170.5 | 5,470/5,451 | C: 78.6% [S: 77.6%, D: 1.0%], F: 5.4%, M: 16.0% |
|
| MM | JAGVSQ000000000 | 329,257,313 | 69.5 | 7,561 | C: 59.1% [S: 58.6%, D: 0.5%], F: 10.4%, M: 30.5% |
|
| MS | JAGUCF000000000 | 778,692,278 | 7.9 | 144,300/144,286 | C: 44.4% [S: 41.7%, D: 2.7%], F: 29.6%, M: 26.0% |
|
| OA | JAGTXQ000000000 | 1,305,984,461 | 266.4 | 9,583/9,303 | C: 92.2% [S: 90.6%, D: 1.6%], F: 5.4%, M: 2.4% |
|
| PE | JAGVSN000000000 | 282,185,525 | 5,591.7 | 159 | C: 95.1% [S: 94.3%, D: 0.8%], F: 1.8%, M: 3.1% |
|
| PL | JAGXCT000000000 | 360,300,449 | 67.5 | 44,049/37,274 | C: 92.0% [S: 90.0%, D: 2.0%], F: 4.7%, M: 3.3% |
|
| PC | GCA_009,617,715.1 | 396,695,105 | 869 | 1,614 | C: 94.7% [S: 93.8%, D: 0.9%], F: 2.5%, M: 2.8% |
|
| RB | JAGYXB000000000 | 1,086,872,538 | 1,030.6 | 2,227/2,125 | C: 95.4% [S: 92.0%, D: 3.4%], F: 2.5%, M: 2.1% |
|
| RE2 | JAGVSQ000000000 | 565,830,460 | 9.9 | 118,140/114,057 | C: 75.1% [S: 74.3%, D: 0.8%], F: 17.9%, M: 7.0% |
|
| RE1 | JAGVSO000000000 | 562,550,625 | 9.7 | 115,243/111,706 | C: 74.1% [S: 73.4%, D: 0.7%], F: 18.7%, M: 7.2% |
|
| SS | GCA_003003475.1 | 1,015,727,762 | 3.2 | 561,698 | C: 30.9% [S: 30.3%, D: 0.6%], F: 40.2%, M: 28.9% |
|
| ST | GCA_008973525.1 | 451,494,475 | 1,296.7 | 552 | C: 95.3% [S: 92.4%, D: 2.9%], F: 2.3%, M: 2.4% |
Assemblies produced in this study.
NArthropoda = 2,124.
C: complete; S: single; D: duplicated; F: fragmented; M: missing.
Figure 1:Ecological diversity (right) and genome size (left) in caddisflies. Phylogenetic relationships derived from ASTRAL-III analyses using single BUSCO genes. Goeridae, which was not included in the BUSCO gene set, was placed according to [64]. ASTRAL support values (local posterior probabilities) >0.9 are given for each node. The placement of Hydroptilidae (clade B1) was ambiguous. Because its placement was poorly supported in our analyses, we placed it according to Thomas et al. [64]. Taxa were collapsed to family level. Trichoptera are divided into two suborders: Annulipalpia (“fixed retreat– and net-spinners,” clade A: blue) and Intergripalpia (clade B: green), which includes basal Integripalpia (“cocoon-builders,” clades B1–B3, dark green) and Phryganides or “tube case–builders” (clade B4: light green). “Cocoon-builders” are divided into “purse case-building” (clade B1), “tortoise case-building” (clade B2), and “free-living” (clade B3) families. Genome size estimates based on different methods (Genomescope2: orange, Backmap.pl: black, flow cytometry [FCM]: brown) are given for various caddisfly families. Each dot corresponds to a mean estimate of a species. For detailed information on the species and number of individuals used in each method see Supplementary Data File S1.7. Colors and clade numbers in the phylogenetic tree refer to colored boxes with illustrations. The following species are illustrated by Ralph Holzenthal: a: Hydropsyche sp. (Hydropsychidae); b: Chimarra sp. (Philopotamidae); C: Stenopsyche sp. (Stenopsychidae); d: Polycentropus sp. (Polycentropodidae); e: Agraylea sp. (Hydroptilidae); f: Glossosoma sp. (Glossosomatidae); g: Rhyacophila sp. (Rhyacophilidae); h: Fabria inornata (Phryganeidae); i: Micrasema sp. (Brachycentridae); j:Goera fuscula (Goeridae); k: Sphagnophylax meiops (Limnephilidae); l: Psilotreta sp. (Odontoceridae), m: Grumicha grumicha (Sericostomatidae).
Figure 2:Repeat abundance and classification in 26 caddisfly genomes. Number of bp for each repeat type is given for each caddisfly genome. A: Repeat abundance and classification. Phylogenetic tree was reconstructed with ASTRAL-III using single BUSCO genes from the genome assemblies. The placement of Hydroptilidae (clade B1) was ambiguous. Because its placement was poorly supported in our analyses, we placed the single hydroptilid taxon (Agraylea sexmaculata) according to Thomas et al. [64]. Species names corresponding to the abbreviations in the tree can be found in Table 1. Trichoptera are divided into two suborders: Annulipalpia (“fixed retreat– and net-spinners,” clade A: blue) and Intergripalpia (clade B: green), which includes basal Integripalpia (“cocoon-builders,” clades B1–B3, dark green) and Phryganides or “tube case–builders” (clade B4: light green). “Cocoon-builders” are divided into “purse case-building” (clade B1), “tortoise case-building” (clade B2), and “free-living” (clade B3) families. An illustration of a representative of each clade is given. The “other repeats” category includes rolling circles, Penelope, low-complexity, simple repeats, and small RNAs. B: Box plots summarizing shifts in the genomic proportion of RE categories in major Trichoptera lineages. Colored rectangles in the boxplots show the first and third quartiles plotted around the median genomic proportion with outlier values shown as black dots.
Figure 3:Transposable element age distribution landscapes. Representative examples are chosen from major Trichoptera lineages. The y-axis shows TE abundance as a proportion of the genome (e.g., 1.0 = 1% of the genome). The x-axis shows sequence divergence relative to TE consensus sequences for major TE classes. TE classes with abundance skewed toward the left (i.e., low sequence divergence) are inferred to have a recent history of diversification relative to TE classes with right-skewed abundance. Plots were generated in dnaPipeTE. Plots for all species are shown in Supplementary Fig. S148. For tip labels of the phylogenetic tree see Fig. 2.
Figure 4:TE-BUSCO-gene associations in Trichoptera species. (A) Raw abundance of TE-associated BUSCO sequences present in the assembly of 2,442 BUSCOs in the OrthoDB 9 Endopterygota dataset. (B) Top: An example of a coverage depth profile of a TE-associated BUSCO gene (BUSCO EOG090R02Q9 from ML1 [“inflated species”]) that shows unexpected high coverage in the second exon putatively due to the presence of an RE-derived sequence fragment. Bottom: A typifying alignment between a TE-associated BUSCO and its orthologous BUSCO from a closely related species (“reference species”) that lacks TE-association. The non-TE-associated orthologous BUSCO shows non-contiguous alignment in regions of inflated coverage in the TE-associated BUSCO, consistent with the presence of an RE-derived sequence fragment in the TE-associated BUSCO that is absent in the reference species. (C) Summary of total bases annotated as REs obtained from each of the two BLAST searches. First, when we used BLAST to compare any TE-associated BUSCOs against an assembly for the same species, BLAST hits included megabases of annotated repeats (dark bars). Second, when non-TE-associated orthologs of the TE-associated BUSCOs in the first search are taken from a close relative and compared against the inflated species using BLAST, there is a dramatic decreae in BLAST hits annotated as REs. Note log scale on the y-axis. (D) Summary of annotations for BLAST hits for classified REs when TE-associated BUSCOs are compared against an assembly of the same species using BLAST.