| Literature DB >> 33081418 |
Angelika Voronova1, Martha Rendón-Anaya2, Pär Ingvarsson2, Ruslan Kalendar3, Dainis Ruņģis1.
Abstract
Sequencing the giga-genomes of several pine species has enabled comparative genomic analyses of these outcrossing tree species. Previous studies have revealed the wide distribution and extraordinary diversity of transposable elements (TEs) that occupy the large intergenic spaces in conifer genomes. In this study, we analyzed the distribution of TEs in gene regions of the assembled genomes of Pinus taeda and Pinus lambertiana using high-performance computing resources. The quality of draft genomes and the genome annotation have significant consequences for the investigation of TEs and these aspects are discussed. Several TE families frequently inserted into genes or their flanks were identified in both species' genomes. Potentially important sequence motifs were identified in TEs that could bind additional regulatory factors, promoting gene network formation with faster or enhanced transcription initiation. Node genes that contain many TEs were observed in multiple potential transposable element-associated networks. This study demonstrated the increased accumulation of TEs in the introns of stress-responsive genes of pines and suggests the possibility of rewiring them into responsive networks and sub-networks interconnected with node genes containing multiple TEs. Many such regulatory influences could lead to the adaptive environmental response clines that are characteristic of naturally spread pine populations.Entities:
Keywords: MITE; Pinus lambertiana; Pinus taeda; gene networks; gene regulation; introns; node gene; pine reference genome; retrotransposons; transposable elements
Year: 2020 PMID: 33081418 PMCID: PMC7602945 DOI: 10.3390/genes11101216
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Comparison of extracted gene-flanking regions in genome data sets.
| Genome/ | Flanking Region from the Gene Start/End Coordinates | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 5′ | 3′ | 5′ | 3′ | 5′ | 3′ | 5′ | 3′ | 5′ | 3′ | ||
| 0–1 kb | 0–1 kb | 1–2 kb | 1–2 kb | 2–3 kb | 2–3 kb | 3–4 kb | 3–4 kb | 4–5 kb | 4–5 kb | ||
|
| Nb of extr.seq. | 36,726 | 36,728 | 34,711 | 34,063 | 33,184 | 32,310 | 31,767 | 30,838 | 30,349 | 29,479 |
| Nb of hqh to TE-dr | 5851 | 6450 | 4362 | 3901 | 3750 | 3628 | 3310 | 3069 | 3202 | 2924 | |
| ratio |
| ||||||||||
| >50 | 17 | 22 | 10 | 10 | 4 | 2 | 1 | 0 | 0 | 0 | |
| >100 | 8 | 9 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| Nb of extr.seq. | 15,084 | 15,057 | 14,114 | 13,793 | 13,371 | 12,912 | 12,713 | 12,192 | 11,985 | 11,569 |
| Nb of hqh to TE-dr | 816 | 773 | 800 | 732 | 875 | 968 | 1161 | 991 | 901 | 1000 | |
| ratio |
| ||||||||||
| >50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| >100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Nb of extr.seq. | 4298 | 4239 | 4177 | 4128 | 4130 | 4091 | 4081 | 4028 | 4023 | 3967 | |
| Nb of hqh to TE-dr | 784 | 779 | 2258 | 1890 | 3151 | 2693 | 3593 | 3222 | 3816 | 3539 | |
| ratio |
| ||||||||||
| >50 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| >100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Nb of extr.seq. | 75,425 | 75,459 | 72,840 | 72,797 | 71,554 | 71,470 | 70,002 | 69,836 | 68,237 | 68,017 | |
| Nb of hqh to TE-dr | 2317 | 2540 | 4188 | 4243 | 4979 | 5070 | 5256 | 5387 | 5645 | 5382 | |
| ratio |
| ||||||||||
| >50 | 2 | 2 | 5 | 5 | 6 | 5 | 4 | 7 | 7 | 6 | |
| >100 | 1 | 1 | 3 | 4 | 1 | 1 | 0 | 1 | 0 | 0 | |
| Nb of extr.seq. | 8779 | 8778 | 8746 | 8742 | 8719 | 8708 | 8692 | 8673 | 8660 | 8640 | |
| Nb of hqh to TE-dr | 71 | 55 | 163 | 187 | 278 | 277 | 315 | 296 | 355 | 357 | |
| ratio |
| ||||||||||
| >50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| >100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| Nb of extr.seq. | 71,162 | 71,157 | 70,386 | 70,475 | 69,773 | 69,909 | 69,217 | 69,344 | 68,660 | 68,836 |
| Nb of hqh to TE-dr | 470 | 466 | 1063 | 1011 | 1556 | 1508 | 1789 | 1368 | 2038 | 1999 | |
| ratio |
| ||||||||||
| >50 | 0 | 0 | 1 | 0 | 4 | 3 | 6 | 1 | 7 | 7 | |
| >100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
>—indicates count of TE families that hits greater than 50 or greater than 100 gene flanking regions; hqh—high quality hits; TE-dr—TE-derived repeats; HQ—high quality gene set; LQ—low quality gene set; extr.seq.—extracted gene flanking sequences.
Figure 1Comparison of TE distribution in gene non-coding regions of high-quality genes of the P. lambertiana genome v.1.01 (PILA) and filtered annotated gene set of P. taeda v.2.0 (PITA).
Figure 2Distribution of >1 kb hits to TEs in gene introns of filtered P. taeda (PITA) genes and high-quality P. lambertiana (PILA) genes.
Figure 3(A) Distribution of Plater MITE insertions across P. taeda (PITA) and P. lambertiana (PILA) gene-flanking regions. (B) Distribution of Plater MITE insertions across P. taeda (PITA) and P. lambertiana (PILA) gene introns. World cloud generated from biological process GO terms of P. taeda genes involved in the networks using the online tool https://wordart.com/. (C) Alignment of P. taeda (PITA) and P. lambertiana (PILA) consensus sequences with predicted plant cis-acting regulatory elements.
P. taeda v.2.0 and P. lambertiana v.1.01 genes carrying several Plater MITE insertions. qc- query coverage.
| Species | Genes ID with Multiple | Nb of | Description | qc, % | ID, % | Accession | |
|---|---|---|---|---|---|---|---|
| PITA_12742 | 7 | uncharacterized protein with domain of phosphoglucosamine mutase family protein | 88 | 0.00 × 100 | 65 | PLN02371 | |
| PITA_21987 | 4 | subtilisin-like protease SBT5.3 | 96 | 0.0 | 47 | XP_012083905.1 | |
| PITA_00114 | 3 | metal tolerance protein 11 | 99 | 0.0 | 72 | XP_006857671.1 | |
| PITA_24114 | 2 | probable xyloglucan endotransglucosylase/hydrolase protein B | 93 | 3.00 × 10−153 | 72 | XP_030961064.1 | |
| PITA_21327 | 2 | 60S ribosomal protein L8-1-like | 95 | 2.00 × 10−169 | 90 | XP_022936671.1 | |
| PITA_17959 | 2 | TMV resistance protein N-like | 93 | 8.00 × 10−165 | 31 | XP_023886681.1 | |
| PITA_34859 | 2 | 3-oxoacyl-[acyl-carrier-protein] synthase I, chloroplastic-like isoform X1 | 95 | 0.0 | 74 | XP_028101593.1 | |
| PITA_28894 | 2 | L-gulonolactone oxidase 2 isoform X2 | 95 | 0.0 | 54 | XP_011621860.1 | |
| PITA_00539 | 2 | probable potassium transporter 11 | 99 | 0.0 | 67 | XP_006830082.1 | |
| PITA_33316 | 2 | plasma membrane intrinsic protein 2;8 | 95 | 5.00 × 10−141 | 74 | NP_179277.1 | |
| PITA_09881 | 2 | cytokinin hydroxylase | 93 | 0.0 | 52 | XP_011099558.1 | |
| S/hiseq/c38458_g1_i1|m.23006 | 2 | bifunctional phosphatase IMPL2, chloroplastic | 75 | 7.00 × 10−147 | 73 | XP_011088446.1 | |
| PILAhq_048992 | 2 | putative clathrin assembly protein At4g40080 | 80 | 2.00 × 10−40 | 36 | XP_027337607.1 | |
| PILAhm_002002 | 2 | histone deacetylase 15 isoform X3 | 69 | 4.00 × 10−179 | 63.89 | XP_010265267.1 |
HQ—high quality genes; qc—query coverage; ID—identity; Nb—number.
Figure 4Gene network formed by Irbe DNA TE presence in the gene introns of P. taeda v.2.0.
Node genes containing several TE insertions and found to be homologous or carrying identical domains between P. taeda (PITA) and P. lambertiana (PILA).
| TE-dr Nb. | TE-dr Nb. | Description | Accession | Conserved Domain Name | Accession | GO Terms |
|---|---|---|---|---|---|---|
| 24 | 19 | plastidial pyruvate kinase 2 | XP_006843356.1 h | PLN02623 | PLN02623 | reproduction; ATP generation from ADP; seed maturation; |
| 23 | 26 | DEAD-box ATP-dependent RNA helicase 20 isoform X2/helicase 58, chloroplastic isoform X3 | XP_025888827.1 | SrmB | COG0513 | RNA secondary structure unwinding |
| 21 | 21 | phospholipid:diacylglycerol acyltransferase 1 | XP_006849611.1 h | PLN02517 | PLN02517 | acylglycerol biosynthetic process |
| 18 | 20 | nuclear pore complex protein NUP62-like/GPCR-type G protein 1 isoform X2 | XP_024396806.1 | SMC_prok_B super family | cl37069 | RNA export from nucleus; protein import/export into/from nucleus; nucleocytoplasmic transport, localization |
| 13 | 23 | WD repeat-containing protein WRAP73 | XP_008798782.1 | WD40 super family | COG2319 | - |
| - | 19 | protein RAE1 | XP_028076289.1 | cl29593 | ||
| - | 24 | actin-related protein 2/3 complex subunit 1A | XP_011627051.1 | cl29593 | ||
| 12 | 19 | uncharacterized protein LOC109715170/probable E3 ubiquitin-protein ligase HERC4 isoform X1 | XP_020095639.1 | ATS1 super family | cl34932 | - |
| 11 | 31 | peroxisomal adenine nucleotide carrier 1/mitochondrial substrate carrier family protein C-like | XP_006841423.1 | Mito_carr | pfam00153 | Establishment of localization; transmembrane transport; amide biosynthetic process; translation; nitrogen compound metabolic process. |
TE-dr Nb—Number of unique TE-derived repeats; h—genes with similar transcripts (>90% nucleotide identity).