| Literature DB >> 35751604 |
Zhongjun Gong1, Tong Li1, Jin Miao1, Yun Duan1, Yueli Jiang1, Huiling Li1, Pei Guo1, Xueqin Wang1, Jing Zhang1, Yuqing Wu1.
Abstract
The orange wheat blossom midge Sitodiplosis mosellana Géhin (Diptera: Cecidomyiidae), an economically important pest, has caused serious yield losses in most wheat-growing areas worldwide in the past half-century. A high-quality chromosome-level genome for S. mosellana was assembled using PacBio long read, Illumina short read, and Hi-C sequencing technologies. The final genome assembly was 180.69 Mb, with contig and scaffold N50 sizes of 998.71 kb and 44.56 Mb, respectively. Hi-C scaffolding reliably anchored 4 pseudochromosomes, accounting for 99.67% of the assembled genome. In total, 12,269 protein-coding genes were predicted, of which 91% were functionally annotated. Phylogenetic analysis indicated that S. mosellana and its close relative, the swede midge Contarinia nasturtii, diverged about 32.7 MYA. The S. mosellana genome showed high chromosomal synteny with the genome of Drosophila melanogaster and Anopheles gambiae. The key gene families involved in the detoxification of plant secondary chemistry were analyzed. The high-quality S. mosellana genome data will provide an invaluable resource for research in a broad range of areas, including the biology, ecology, genetics, and evolution of midges, as well as insect-plant interactions and coevolution.Entities:
Keywords: zzm321990 Sitodiplosis mosellanazzm321990 ; Hi-C; chromosome-level genome; comparative genomics; detoxification
Mesh:
Year: 2022 PMID: 35751604 PMCID: PMC9339269 DOI: 10.1093/g3journal/jkac161
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.The genome characteristics of OWBM, S. mosellana. Circos plot showing the genomic features. Units on the circumference are megabase values of pseudomolecules. From outermost to innermost circles: Track a: 4 chromosomes of the genome; Track b: gene distribution on each chromosomes; Track c: GC content distribution on each chromosomes; Track d: LTR distribution on each chromosomes; Track e: LINE distribution on each chromosomes; Track f: DNA distribution on each chromosomes; Track g: SINE distribution on each chromosomes; Track h: tRNA located on chromosomes; Track i: miRNA located on chromosomes; Track j: snRNA located on chromosomes; Track k: rRNA located on chromosomes.
Comparison of S. mosellana genome assemblies from this and a previous study.
| Assembly | ASM2101890v1 (this study) | AAFC_SMos_1.0 (from Agriculture and Agri-food Canada) |
|---|---|---|
| Bioproject | PRJNA720212 | PRJNA563698 |
| DNA resource | Third-instar larvae | Single pupa |
| Assembly approach | Falcon | Supernova |
| Sequencing platform | NovaSeq/PacBio | Illumina HiSeq |
| Assembly level | Chromosomes | Scaffolds |
| Number of contigs | 381 | 11,287 |
| Contig N50 (bp) | 988,708 | 62,752 |
| Number of Scaffolds | 25 | 7,269 |
| Scaffold N50 (bp) | 44,562,869 | 5,125,045 |
| Total gap length (bp) | 35,600 | 13,573,270 |
| Total sequence length | 180,693,642 | 208,800,104 |
| Ungapped bases (bp) | 180,658,042 | 195,226,834 |
Statistics of the completeness of the assembled S. mosellana genome by BUSCO.
| Type | BUSCO groups | Percentage (%) |
|---|---|---|
| Complete BUSCOs | 907 | 92.7 |
| Complete and single-copy BUSCOs | 887 | 90.7 |
| Complete duplicated BUSCOs | 20 | 2.0 |
| Fragmented BUSCOs | 4 | 0.4 |
| Missing BUSCOs | 67 | 6.9 |
| Total BUSCO groups searched | 978 | 100 |
Statistics of gene function annotation of S. mosellana.
| Database | Number | Percentage (%) |
|---|---|---|
| Total | 12,269 | – |
| Swissprot | 9,292 | 75.70 |
| Nr | 10,869 | 88.60 |
| KEGG | 9,245 | 75.40 |
| InterPro | 10,235 | 83.40 |
| GO | 7,201 | 58.70 |
| Pfam | 9,121 | 74.30 |
| Annotated | 11,169 | 91.00 |
| Unannotated | 1,100 | 9.00 |
Fig. 2.Phylogenetic tree, gene orthology, and synteny blocks. a) The phylogenetic tree was constructed based on 1,024 single-copy gene families with 14 insects and 1 noninsect species, using RAxML maximum-likelihood methods. Bootstrap values are 100 in all nodes based on 100 replicates. The numbers near each node correspond to the estimated divergence time of these species. The colored bars to the right are subdivided to represent different types of orthology. “Single-copy genes” indicates single copy orhologous genes in common gene families; “Multiple-copy genes” indicates mutiple copy orthologous genes in common gene families; “Unique genes” indicates genes from unique gene family from each species; “Other genes” indicates genes that do not belong to any above-mentioned ortholog categories; “Uncluster” indicates genes that do not cluster to any families. b) Venn diagram of the orthologous gene families from 3 gall midges: S. mosellana, C. nasturtii, and M. destructor. c) Synteny blocks between S. mosellana, D. melanogaster, and A. gambiae.
Fig. 3.Gene family evolution between genomes of S. mosellana and 14 other arthropod species. Left number indicates gene family expansions and right number indicates gene family contractions. The length of branch indicate the divergence time. MRCA: most recent common ancestor.