| Literature DB >> 35387384 |
Yuki Yoshida1, Nurislam Shaikhutdinov2, Olga Kozlova2, Masayoshi Itoh3, Michihira Tagami4, Mitsuyoshi Murata4, Hiromi Nishiyori-Sueki4, Miki Kojima-Ishiyama4, Shohei Noma4, Alexander Cherkasov2, Guzel Gazizova2, Aigul Nasibullina2, Ruslan Deviatiiarov2, Elena Shagimardanova2, Alina Ryabova2, Katsushi Yamaguchi5, Takahiro Bino5, Shuji Shigenobu5, Shoko Tokumoto6, Yugo Miyata7, Richard Cornette7, Takahiro G Yamada8, Akira Funahashi8, Masaru Tomita1, Oleg Gusev2, Takahiro Kikawada6.
Abstract
Non-biting midges (Chironomidae) are known to inhabit a wide range of environments, and certain species can tolerate extreme conditions, where the rest of insects cannot survive. In particular, the sleeping chironomid Polypedilum vanderplanki is known for the remarkable ability of its larvae to withstand almost complete desiccation by entering a state called anhydrobiosis. Chromosome numbers in chironomids are higher than in other dipterans and this extra genomic resource might facilitate rapid adaptation to novel environments. We used improved sequencing strategies to assemble a chromosome-level genome sequence for P. vanderplanki for deep comparative analysis of genomic location of genes associated with desiccation tolerance. Using whole genome-based cross-species and intra-species analysis, we provide evidence for the unique functional specialization of Chromosome 4 through extensive acquisition of novel genes. In contrast to other insect genomes, in the sleeping chironomid a uniquely high degree of subfunctionalization in paralogous anhydrobiosis genes occurs in this chromosome, as well as pseudogenization in a highly duplicated gene family. Our findings suggest that the Chromosome 4 in Polypedilum is a site of high genetic turnover, allowing it to act as a 'sandbox' for evolutionary experiments, thus facilitating the rapid adaptation of midges to harsh environments.Entities:
Year: 2022 PMID: 35387384 PMCID: PMC8982440 DOI: 10.1093/nargab/lqac029
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Statistics of P. vanderplanki genome
| Pv0.9 | Pv5.2 | |
|---|---|---|
| Main genome scaffold total | 9104 | 388 |
| Main genome contig total | 25 913 | 2067 |
| Main genome scaffold sequence total (Mb) | 116.771 | 118.969 |
| Main genome contig sequence total (Mb) | 111.894 | 118.352 |
| % Gap | 4.177 | 0.518 |
| Main genome scaffold N/L50 | 101/264.32 kb | 2/35.209 Mb |
| Main genome contig N/L50 | 2191/12.463 kb | 161/219.078 kb |
| Main genome scaffold N/L90 | 1986/4.421 kb | 4/14.02 Mb |
| Main genome contig N/L90 | 10 658/2.139 kb | 596/49.999 kb |
| Max scaffold length | 2.184 Mb | 36.877 Mb |
| Max contig length | 358 kb | 1.494 Mb |
| Number of scaffolds >50KB | 358 | 7 |
| % Main genome in scaffolds >50KB | 77.60 | 98.98 |
| % GC | 28.3% | 28.1% |
| % N | 4.2% | 0.5% |
| BUSCO4 completeness (Diptera) | 92.9% [91.9%,1.0%], 2.7%, 4.4% | 95.7% [94.7%,1.0%], 0.6%, 3.7% |
| BUSCO4 completeness (Insecta) | 96.4% [95.4%,1.0%], 1.5%, 2.1% | 98.3% [90.9%,7.4%], 0.5%, 1.2% |
Figure 1.Chromosome-scale genome assembly of Polypedilum vanderplanki. (A) Circos plot of the four chromosomes. From the outer rim, ARId regions, AT% (purple), gene count for each 50 Kbp window (red), non-coding gene count for each 50 kb window (blue), genome coverage of Illumina DNA-Seq (SRR12736661, SRR12736660, SRR12736662, SRR12736659, yellow), RNA-Seq coverage (DRR024752, DRR024753, DRR024754, DRR024755, DRR024756, green), collinear blocks calculated with MCScanX (black: inter-chromosomal, red: intra-chromosomal). Colors used for each chromosome are inherited in subsequent figures. Individual figures are indicated on Additional Figure S1 (B) Contact map of Hi-C reads. Hi-C reads were mapped to the genome assembly and KR transformed contact frequencies at 250kbp were visualized as a contact map with Jucier. Gray arrowheads indicate possible telomeric regions. (C) GC ratio of 100 kb windows. (D) GC ratio of the coding sequence from the longest isoform for each gene. The black lines indicate the chromosome average (chr_1: 34.20%, chr_2: 34.60%, chr_3: 34.17%, chr_4: 31.50%). (E) Pairwise nucleotide diversity (π) of 50 kb windows. An increase in π diversity can be observed in the latter half of Chromosome 3 and most of Chromosome 4.
Statistics of gene structure obtained by TALON
| Data preparation | ||||||
|---|---|---|---|---|---|---|
| Category | Genome | Braker | Final | |||
| Version | Pv_5.0 | Pv_5.0.1 | Pv_5.2.4 | |||
|
| ||||||
|
|
|
|
|
|
|
|
| Total genes | - | - | 17 852 | 100.00% | 18 990 | 100.00% |
| Total transcripts | - | - | 19 117 | 107.09% | 67 079 | 353.23% |
|
| ||||||
|
|
|
|
|
|
| |
| Complete | 98.3 | 95.7 | 98.3 | 95.9 | 98.6 | 96.2 |
| Single | 97.1 | 94.7 | 90.9 | 55.3 | 50.3 | 57.0 |
| Duplicated | 1.2 | 1.0 | 7.4 | 40.6 | 48.3 | 39.2 |
| Fragmented | 0.2 | 0.6 | 0.5 | 0.9 | 0.4 | 1.0 |
| Missing | 1.5 | 3.7 | 1.2 | 3.2 | 1.0 | 2.8 |
|
| ||||||
|
|
|
|
|
|
|
|
| Wet-1 | - | 95 103 034 | 84.24% | 104 346 021 | 92.19% | |
| Wet-2 | - | 102 208 115 | 84.32% | 111 736 595 | 91.96% | |
| D24-1 | - | 106 902 973 | 83.57% | 119 406 151 | 93.12% | |
| D24-2 | - | 101 771 203 | 82.68% | 114 237 766 | 92.55% | |
| D48 | - | 113 995 064 | 83.84% | 127 677 786 | 93.61% | |
| Pv11-T0 | - | 98 544 418 | 80.09% | 111 471 188 | 90.23% | |
| Pv11-T48 | - | 91 836 757 | 79.69% | 103 011 915 | 89.12% | |
Figure 2.Chromosome 4 lacks synteny blocks with other Diptera. (A) GC ratios and gene conservation ratios between Dipteran species. Conservation ratio and GC ratio (genome and single copy genes) were plotted against a phylogenetic tree using 1,014 single copy orthologs from Orthofinder clustering of longest isoforms. Homologs were determined by reciprocal diamond blastp searches and conservation ratios were counted for each chromosome of P. vanderplanki, A. aegypti, A. gambiae and D. melanogaster. A red asterisk indicates genome-level GC ratio. Only autosome and sex chromosomes are visualized (unplaced scaffolds are skipped). (B, C) Detection of collinear blocks between [b] the two Polypedilum species (P. vanderplanki and P. pembai) and (C) the family Chironomidae (B. antarctica, C. marinus and P. vanderplanki). Amino acid sequences of longest isoforms were submitted for diamond blastp searches and collinear blocks were detected and visualized with McScanX and synvisio. Links are colored according to the chromosome color used in Figure 1A.
Figure 3.Functional and non-functional multi-copy ortholog groups. (A) The cumulative number of genes specific to P. vanderplanki, specific to the genus Polypedilum, and conserved within Diptera (Conserved), along the genome. Clade specificity was determined by gene counts of OrthoFinder ortholog groups. (B) Gene ontology enrichment analysis of Polypedilum-specific genes on Chromosome 4. Only terms in the Biological Process category are shown. (C) dN/dS values between the three P. pembai orthologs (g14092.t1, g2359.t1, g4021.t1) and P. vanderplanki orthologs. Coding sequences were aligned with MAFFT, and dN/dS values were calculated with codeml. (D) Differential expression information of LEA protein gene orthologs in ARId1. Conditions identified as differentially expressed (from top row : (1) Heat 42°C T1; (2) Heat 42°C T24, (3) Paraquat T1; (4) Paraquat T24, (5) Mannitol T3; (6) Mannitol T24; (7) NaCl T1; (8) NaCl T3; (9) NaCl T24; (10) Trehalose T1; (11) Trehalose T3; (12) PreCondTre T0vsT12; (13) PreCondTre T12vsT24; (14) PreCondTre T24vsT36; (15) PreCondTre T36vsT48; (16) PreCondTre T48 versus Rehydration T0; (17) Rehydration T0 versusT3; (18) Rehydration T3vsT12; (19) Rehydration T12vsT24; (20) Rehydration T24vsT72) are indicated in colors (up-regulated : red, down-regulated : blue). (E) The pI values of proteins deduced from Lea orthologs in each Block. *** p-value < 0.001
High proportion of non-expressed genes on Chromosome 4
| Chromosome 1 | Chromosome 2 | Chromosome 3 | Chromosome 4 | |||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | 36 877 143 | 31% | 35 209 052 | 30% | 31 432 203 | 27% | 14 019 908 | 12% |
| Genes | ||||||||
| Total | 5275 | 30% | 4911 | 28% | 4197 | 24% | 3241 | 18% |
| No. gene/Mb | 143.0 | 139.5 | 133.5 | 231.2 | ||||
| Protein coding | 4908 | 93% | 4588 | 93% | 3918 | 93% | 3136 | 97% |
| Non-coding | 367 | 7% | 323 | 7% | 279 | 7% | 105 | 3% |
| Gene TPM > 1 | 2987 | 57% | 2738 | 56% | 2215 | 53% | 1156 | 36% |
| Gene TPM ≦ 1 | 2288 | 43% | 2173 | 44% | 1982 | 47% | 2085 | 64% |
| TPM > 1/Mb | 81.0 | 77.8 | 70.5 | 82.5 | ||||