| Literature DB >> 32292419 |
Zi-Long Wang1, Yong-Qiang Zhu2, Qing Yan2, Wei-Yu Yan1, Hua-Jun Zheng2, Zhi-Jiang Zeng1.
Abstract
Apis cerana is one of the main honeybee species in artificial farming, which is widely distributed in Asian countries. The genome of A. cerana has been sequenced by several different research groups using second generation sequencing technologies. However, it is still necessary to obtain more complete and accurate genome sequences. Here we present a chromosome-scale assembly of the A. cerana genome using single-molecule real-time (SMRT) Pacific Biosciences sequencing and high-throughput chromatin conformation capture (Hi-C) genome scaffolding. The updated assembly is 215.67 Mb in size with a contig N50 of 4.49 Mb, representing an 212-fold improvement over the previous Illumina-based version. Hi-C scaffolding resulted in 16 pseudochromosomes occupying 97.85% of the assembled genome sequences. A total of 10,741 protein-coding genes were predicted and 9,627 genes were annotated. Besides, 314 new genes were identified compared to the previous version. The improved high-quality A. cerana reference genome will provide precise sequence information for biological research of A. cerana.Entities:
Keywords: Apis cerana; Hi-C; SMRT; chromosome-scale; genome assembly
Year: 2020 PMID: 32292419 PMCID: PMC7119468 DOI: 10.3389/fgene.2020.00279
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Summary of sequencing data for the new assembly of A. cerana genome.
| Number of bases | 28,919,033,693 | 42,964,780,942 |
| Number of reads | 1,379,634 | 143,436,579 |
| N50 read length (bp) | 38,250 | 150 |
| Mean reads length (bp) | 20,961 | 150 |
Statistics of the Pacbio and Hi-C assembly.
| Assembly length (bp) | 215,661,233 | 215,670,033 |
| Number | 200 | 126 |
| N50 (bp) | 4,485,954 | 13,422,783 |
| Largest (bp) | 11106448 | 26,804,424 |
FIGURE 1Heat map of Hi-C contact information of the 16 chromosomes. Pixel colors represent different normalized counts of Hi-C links between 50-kb non-overlapping windows for all 16 chromosomes (chr) on a logarithmic scale.
Transposable elements and repeat sequence statistics.
| LINEs | 786 | 84,104 | 0.04 |
| LTR elements | 983 | 441,743 | 0.20 |
| DNA elements | 5,809 | 1,393,124 | 0.65 |
| Unclassified | 24,299 | 6,823,093 | 3.16 |
| Small RNA | 45 | 31,168 | 0.01 |
| Simple repeats | 191,534 | 8,693,708 | 4.03 |
| Low complexity | 43,237 | 2,285,374 | 1.06 |
General statistics of the functional annotation.
| Total | 10,741 | 100 | |
| KEGG | 9,303 | 86.61 | |
| KOG | 7,629 | 71.03 | |
| Pfam | 7,776 | 72.40 | |
| Uniprot | 4,884 | 45.47 | |
| Unannotated | 1,114 | 10.37 |
Statistics of the BUSCO assessment.
| Complete BUSCOs | 3,801 | 86.09 |
| Complete and single-copy BUSCOs | 3,792 | 85.89 |
| Complete and duplicated BUSCOs | 9 | 0.20 |
| Fragmented BUSCOs | 315 | 7.13 |
| Missing BUSCOs | 299 | 6.77 |
| Total BUSCO groups searched | 4,415 | 100 |
FIGURE 2The atlas of the A. cerana chromosomes.