| Literature DB >> 36005323 |
Kakeru Yokoi1, Kiyoshi Kimura2, Hidemasa Bono3,4.
Abstract
Transposable elements (TEs) are grouped into several families with diverse sequences. Owing to their diversity, studies involving the detection, classification, and annotation of TEs are difficult tasks. Moreover, simple comparisons of TEs among different species with different methods can lead to misinterpretations. The genome data of several honey bee (Apis) species are available in public databases. Therefore, we conducted a meta-analysis of TEs, using 11 sets of genome data for Apis species, in order to establish data of "landscape of TEs". Consensus TE sequences were constructed and their distributions in the Apis genomes were determined. Our results showed that TEs belonged to four to seven TE families among 13 and 15 families of TEs detected in classes I and II respectively mainly consisted of Apis TEs and that more DNA/TcMar-Mariner consensus sequences and copies were present in all Apis genomes tested. In addition, more consensus sequences and copy numbers of DNA/TcMar-Mariner were detected in Apis mellifera than in other Apis species. These results suggest that TcMar-Mariner might exert A. mellifera-specific effects on the host A. mellifera species. In conclusion, our unified approach enabled comparison of Apis genome sequences to determine the TE landscape, which provide novel evolutionary insights into Apis species.Entities:
Keywords: Apis cerana; Apis dorsata; Apis florea; Apis laboriosa; Apis mellifera; Mariner-like-element; RepeatMasker; RepeatModeler2; meta-analysis; transposable element
Year: 2022 PMID: 36005323 PMCID: PMC9408917 DOI: 10.3390/insects13080698
Source DB: PubMed Journal: Insects ISSN: 2075-4450 Impact factor: 3.139
Apis genome assemblies used in this study.
| Organism Name [Reference] | GenBank Assembly Accession ID | Genome Size (bp) | Contig N50 | Abbreviation in This Study |
|---|---|---|---|---|
| GCA_003254395.2 | 225,250,884 | 5,382,476 | Am | |
| GCA_002217905.1 | 211,200,590 | 179,487 | Acj | |
| GCA_001442555.1 | 228,331,812 | 43,751 | Ack | |
| GCA_011100585.1 | 215,670,033 | 3,898,192 | Acc | |
| GCA_009792835.1 | 223,527,749 | 30,868 | Ad | |
|
| GCA_000184785.2 | 229,015,090 | 24,915 | Af |
| GCF_014066325.1 | 226,078,798 | 303,790 | Al | |
| GCA_000819425.1 | 243,566,977 | 504 | Ami | |
| GCA_003314205.1 | 227,036,473 | 5,131,172 | Amm | |
| GCA_013841245.1 | 226,044,179 | 2,692,667 | Amcar | |
| GCA_013841205.1 | 224,766,697 | 3,303,520 | Amcau |
Asterisks indicate chromosome-level genome assembly data according to NCBI genome assembly statistics in the NCBI dataset database (URL: https://www.ncbi.nlm.nih.gov/datasets/, accessed on 2 August 2022). See discussion section.
Figure 1Workflow of the data analyses performed in this study. De novo TE detection was performed using 11 Apis genome sequences (Table 1) from NCBI genome database (URL: https://www.ncbi.nlm.nih.gov/genome/ accessed on 1 June 2022) using RepeatModeler2 [24]. Phylogenetic analysis revealed MLE relationships, where the most abundant consensus sequences were detected among the TE families in Apis species. The distributions of repetitive elements, including the TEs detected by RepeatModeler2, were investigated using RepeatMasker. The landscapes of TEs in Apis species were obtained using both sets of results, which led to new insights into TEs in Apis species. The images in Figure 1 were obtained from TogoTV (© 2016 DBCLS TogoTV).
Total numbers of consensus sequences in the TE families of all Apis species, based on de novo TE detection with RepeatModeler2 [24].
| Family Name | Acc | Acj | Ack | Ad | Af | Al | Am | Ami | Amm | Amcar | Amcau |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 2 | 3 | 4 | 1 | 2 | 1 | 7 | 1 | 6 | 2 | 2 |
|
| 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 1 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
|
| 1 | 0 | 3 | 4 | 3 | 5 | 2 | 2 | 3 | 2 | 2 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
| 11 | 5 | 6 | 4 | 6 | 11 | 11 | 13 | 14 | 11 | 11 |
|
| 2 | 1 | 1 | 0 | 5 | 1 | 7 | 11 | 13 | 8 | 7 |
|
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
|
| 5 | 3 | 4 | 4 | 4 | 2 | 7 | 2 | 4 | 2 | 5 |
|
| 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 1 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
|
| 1 | 0 | 0 | 1 | 1 | 0 | 3 | 0 | 1 | 0 | 2 |
|
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
|
| 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
| 3 | 1 | 1 | 3 | 2 | 2 | 1 | 3 | 1 | 1 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
|
| 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
|
| 2 | 1 | 1 | 0 | 1 | 1 | 2 | 2 | 1 | 1 | 1 |
|
| 2 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 2 | 0 | 0 |
|
| 1 | 0 | 1 | 1 | 7 | 2 | 1 | 1 | 0 | 3 | 2 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Total (per species) | 34 | 17 | 24 | 21 | 35 | 33 | 48 | 39 | 51 | 34 | 38 |
The consensus sequences not clearly annotated as a family (i.e., “unknown” sequences) are excluded (all-inclusive count result data are available in Supplemental Data S2). The nomenclatures of the TE families were defined previously [2]. Family names belonging to class II TEs and class I TEs are represented with red and blue text, respectively. The degree of red shading indicates the number of the consensus sequences found where darker shading indicates higher numbers.
Figure 2Phylogenetic tree of Apis TcMar–Mariner consensus sequences identified in this study. The MLE sequences of other species and A. mellifera were annotated with Mariner subfamilies in previous reports [8,20]. Blue, orange, green, purple, and yellow circles located at end of each node (MLE sequences from the previous reports) indicate the MLE subfamilies. The red circles indicate consensus sequences detected with more than 200 copies. The green semicircular shading encompasses a clade including many sequences with over 200 copies. The numbers at the branches indicate bootstrap values. A high-resolution phylogenetic tree data is available in Supplemental Data S5.
Percentages of repetitive elements present in each Apis genome.
| Acc | Acj | Ack | Ad | Af | Al | Am | Ami | Amm | Amcar | Amcau |
|---|---|---|---|---|---|---|---|---|---|---|
| 9.97% | 7.87% | 6.83% | 10.09% | 8.20% | 10.26% | 11.02% | 8.01% | 12.09% | 11.61% | 11.41% |
Total copy numbers of class II TE families in the Apis genomes listed Table 1.
| Family Name | Acc | Acj | Ack | Ad | Af | Al | Am | Ami | Amm | Amcar | Amcau |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1387 | 1684 | 1797 | 880 | 1060 | 692 | 2761 | 538 | 2200 | 1305 | 1518 |
|
| 0 | 0 | 0 | 0 | 0 | 107 | 0 | 0 | 169 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 477 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 165 | 0 | 0 | 0 | 0 | 59 | 0 | 193 |
|
| 0 | 0 | 107 | 0 | 0 | 0 | 0 | 0 | 335 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 406 | 59 | 698 | 0 | 0 |
|
| 138 | 0 | 316 | 845 | 474 | 826 | 456 | 318 | 848 | 797 | 678 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 364 |
|
| 1254 | 630 | 798 | 631 | 903 | 1343 | 1892 | 1495 | 2641 | 2478 | 3475 |
|
| 618 | 159 | 110 | 0 | 313 | 608 | 1010 | 1507 | 1656 | 1461 | 2300 |
|
| 230 | 0 | 0 | 0 | 118 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 98 | 201 | 0 | 0 | 0 | 0 | 0 |
|
| 657 | 510 | 233 | 821 | 673 | 409 | 1702 | 404 | 974 | 351 | 1736 |
|
| 0 | 0 | 188 | 0 | 0 | 642 | 0 | 0 | 466 | 0 | 447 |
|
| 0 | 0 | 38 | 0 | 0 | 0 | 2852 | 0 | 0 | 0 | 0 |
| Total (per species) | 4284 | 2983 | 3587 | 3342 | 4116 | 4828 | 11,079 | 4321 | 10,046 | 6392 | 10,711 |
The total numbers of TE families (detected using RepeatModeler2) were calculated using output files from RepeatMasker. The degree of red shading reflects the copy numbers found, where darker shading indicates higher copy numbers. Family names belonging to class II TEs and class I TEs are represented with red.
Total copy numbers of class I TE families in the Apis genomes listed in Table 1.
| Family Name | Acc | Acj | Ack | Ad | Af | Al | Am | Ami | Amm | Amcar | Amcau |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 24 | 0 | 0 | 0 | 0 | 0 |
|
| 26 | 0 | 0 | 121 | 341 | 0 | 654 | 0 | 480 | 0 | 261 |
|
| 74 | 57 | 0 | 81 | 0 | 161 | 0 | 0 | 0 | 75 | 0 |
|
| 332 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 249 |
|
| 829 | 82 | 101 | 749 | 354 | 466 | 257 | 321 | 318 | 109 | 268 |
|
| 0 | 0 | 0 | 0 | 0 | 217 | 419 | 0 | 0 | 350 | 75 |
|
| 0 | 356 | 0 | 0 | 0 | 0 | 326 | 0 | 0 | 1316 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 52 | 0 | 0 | 0 | 0 |
|
| 574 | 46 | 44 | 0 | 147 | 233 | 1000 | 426 | 417 | 203 | 499 |
|
| 153 | 0 | 0 | 91 | 0 | 0 | 0 | 48 | 483 | 0 | 0 |
|
| 44 | 0 | 228 | 416 | 730 | 213 | 677 | 57 | 0 | 300 | 1237 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 0 | 0 | 0 |
| Total (per species) | 2032 | 592 | 373 | 1458 | 1672 | 1314 | 3385 | 876 | 1698 | 2353 | 2589 |
The total numbers of TE families (detected using RepeatModeler2) were calculated using output files from RepeatMasker. The degree of red shading indicates the number of the consensus sequences found, where darker shading indicates higher numbers. Family names belonging to class II TEs and class I TEs are represented with blue text.
Figure 3The total numbers of DNA/TcMar-Mariner (A) and DNA/CMC-EnSpm (B) TEs in each Apis genome listed in Table 1. Both TE families were detected using Repeat Modeler 2 and the total numbers of TEs were calculated using. out files from Repeat Masker. Abbreviations of names of Apis species in the figure are shown in Table 1.