| Literature DB >> 30949689 |
Kui Wang1, Pengpeng Li2, Yongyang Gao2, Chunqin Liu3, Qinglei Wang3, Jiao Yin1, Jie Zhang1, Lili Geng1, Changlong Shu1.
Abstract
BACKGROUND: Protaetia brevitarsis, commonly known as the white-spotted flower chafer, is an important Scarabaeidae insect that is distributed in most Asian countries. Recently, research on the insect's harmfulness to crops, usefulness in agricultural waste utilization, edibility, medicinal value, and usability in insect immunology has provided sufficient impetus to demonstrate the need for a detailed study of its biology. Herein, we sequenced the whole genome of this species to improve our understanding and study of P. brevitarsis.Entities:
Keywords: zzm321990 Protaetia brevitarsiszzm321990 ; assembly; genome; white-spotted flower chafer
Mesh:
Year: 2019 PMID: 30949689 PMCID: PMC6449472 DOI: 10.1093/gigascience/giz019
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1Image of adult of the white-spotted flower chafer, P. brevitarsis.
Summary statistics of generated sequence data
| Library name | Experiment title | Sequencing instrument | Total bases (bp) | Accession No. |
|---|---|---|---|---|
| Raw_200_DNA_Hiseq | DNA pair end (PE) library | Illumina HiSeq 2500 | 48,637,157,380 | - |
| Raw_420_DNA_Hiseq | DNA PE library | Illumina HiSeq 2500 | 59,133,181,272 | - |
| Filtered_200_DNA_Hiseq | DNA PE library | Illumina HiSeq 2500 | 46,322,512,285 | SRR7421508 |
| Filtered_420_DNA_Hiseq | DNA PE library | Illumina HiSeq 2500 | 40,349,624,172 | SRR7421507 |
| DNA_PacBio1 | DNA PacBio library | PacBio RS II | 1,248,598,019 | SRR7429397 |
| DNA_PacBio2 | DNA PacBio library | PacBio RS II | 1,742,919,487 | SRR7429396 |
| DNA_PacBio3 | DNA PacBio library | PacBio RS II | 1,471,376,296 | SRR7429395 |
| DNA_PacBio4 | DNA PacBio library | PacBio RS II | 1,446,032,590 | SRR7429394 |
| DNA_PacBio5 | DNA PacBio library | PacBio RS II | 1,410,533,432 | SRR7429401 |
| DNA_PacBio6 | DNA PacBio library | PacBio RS II | 1,303,543,797 | SRR7429400 |
| DNA_PacBio7 | DNA PacBio library | PacBio RS II | 1,185,731,970 | SRR7429399 |
| DNA_PacBio8 | DNA PacBio library | PacBio RS II | 1,360,241,545 | SRR7429398 |
| DNA_PacBio9 | DNA PacBio library | PacBio RS II | 1,033,036,210 | SRR7429403 |
| DNA_PacBio10 | DNA PacBio library | PacBio RS II | 981,818,132 | SRR7429402 |
| DNA_PacBio11 | DNA PacBio library | PacBio RS II | 1,192,589,806 | SRR7429389 |
| DNA_PacBio12 | DNA PacBio library | PacBio RS II | 707,437,407 | SRR7429388 |
| DNA_PacBio13 | DNA PacBio library | PacBio RS II | 659,418,664 | SRR7429391 |
| DNA_PacBio14 | DNA PacBio library | PacBio RS II | 618,638,129 | SRR7429390 |
| DNA_PacBio15 | DNA PacBio library | PacBio RS II | 630,384,409 | SRR7429393 |
| DNA_PacBio16 | DNA PacBio library | PacBio RS II | 761,167,622 | SRR7429392 |
| DNA_PacBio17 | DNA PacBio library | PacBio RS II | 2,180,394,708 | SRR7470031 |
| DNA_PacBio18 | DNA PacBio library | PacBio RS II | 2,035,388,872 | SRR7470028 |
| DNA_PacBio19 | DNA PacBio library | PacBio RS II | 1,796,143,706 | SRR7470027 |
| DNA_PacBio20 | DNA PacBio library | PacBio RS II | 1,980,034,243 | SRR7470030 |
| DNA_PacBio21 | DNA PacBio library | PacBio RS II | 2,229,575,050 | SRR7470029 |
| Egg | RNA-Seq library | Illumina HiSeq 2500 | 6,049,557,600 | SRR7418793 |
| Larva | RNA-Seq library | Illumina HiSeq 2500 | 6,112,599,900 | SRR7418797 |
| Prepupal | RNA-Seq library | Illumina HiSeq 2500 | 6,168,021,600 | SRR7418791 |
| Middle pupal | RNA-Seq library | Illumina HiSeq 2500 | 6,015,743,700 | SRR7418789 |
| Late pupal | RNA-Seq library | Illumina HiSeq 2500 | 6.260,516,400 | SRR7418796 |
| Male adult | RNA-Seq library | Illumina HiSeq 2500 | 6.054,195,300 | SRR7418798 |
| Female adult | RNA-Seq library | Illumina HiSeq 2500 | 6.188,099,400 | SRR7418790 |
| Forewing (D1) | RNA-Seq library | Illumina HiSeq 2500 | 6.234,580,800 | SRR7585362 |
| Forewing (D3) | RNA-Seq library | Illumina HiSeq 2500 | 6.208,411,800 | SRR7418792 |
| Underwing (D1) | RNA-Seq library | Illumina HiSeq 2500 | 6.154,223,400 | SRR7418801 |
| Underwing (D3) | RNA-Seq library | Illumina HiSeq 2500 | 6.172,792,500 | SRR7418794 |
| Head (D1) | RNA-Seq library | Illumina HiSeq 2500 | 6.090,345,900 | SRR7418799 |
| Head (D3) | RNA-Seq library | Illumina HiSeq 2500 | 6.247,745,100 | SRR7418800 |
Note: D1 or D3: tissues of newly (1-day) or 3-day emerged adults.
Summary statistics of RNA-Seq reads mapped onto the assemblies
| Sample | No. of reads | Reads mapped to scaffolds (No. [%]) | Reads mapped to ASs (No. [%]) |
|---|---|---|---|
| Egg | 40,330,384 | 35,659,462 (88.42) | 14,961,772 (37.10) |
| Larva | 40,750,666 | 33,467,876 (82.13) | 15,266,678 (37.46) |
| Prepupal stage | 41,120,144 | 34,780,542 (84.58) | 15,172,926 (36.90) |
| Middle pupal stage | 40,104,958 | 36,742,877 (91.62) | 15,307,418 (38.17) |
| Late pupal stage | 41,736,776 | 37,468,206 (89.77) | 17,178,294 (41.16) |
| Male adult | 40,361,302 | 36,198,806 (89.69) | 14,518,842 (35.97) |
| Female adult | 41,253,996 | 32,620,778 (79.07) | 16,135,954 (39.11) |
| Forewing (D1) | 41,563,872 | 36,449,354 (87.69) | 9233,440 (22.22) |
| Forewing (D3) | 41,389,412 | 35,727,909 (86.32) | 13,516,032 (32.66) |
| Underwing (D1) | 41,028,156 | 36,669,771 (89.38) | 14,943,970 (36.42) |
| Underwing (D3) | 41,151,950 | 37,484,048 (91.09) | 16,851,062 (40.95) |
| Head (D1) | 40,602,306 | 32,278,844 (79.50) | 11,935,160 (29.40) |
| Head (D3) | 41,651,634 | 35,214,660 (84.55) | 11,779,198 (28.28) |
Note: D1 or D3: tissues of newly (1-day) or 3-day emerged adults.
Figure 2The 17-mer distribution of the P. brevitarsis genome using the jellyfish [13] program with 420-bp paired-end whole-genome sequencing data.
Summary statistics of data during the assembly process
| No. | Total bases (bp) | N50 | Mean length (bp) | |
|---|---|---|---|---|
| Filtered PacBio reads | 1,353,926 | 14,251,368,546 | 16,059 | 10,525 |
| Elementary contigs | 8,760 | 1,127,134,570 | 190,967 | 128,668 |
| HGCs | 3,816 | 738,878,186 | 347,620 | 193,626 |
| ASs | 4,939 | 391,445,919 | 91,687 | 79,256 |
| Scaffolds | 313 | 751,076,257 | 2,939,522 | 2,399,604 |
| Corrected HGCs | 3,821 | 739,117,100 | 327,214 | 193,435 |
| Corrected ASs | 4,939 | 393,190,609 | 92,105 | 79,609 |
| Corrected scaffolds | 313 | 751,076,257 | 2,939,522 | 2,399,604 |
| Final scaffolds | 327 | 750,736,501 | 2,939,521 | 2,295,830 |
Figure 3Schematic illustration of the method used to detect aligned sequences in the assembly. MUMmer was used to perform self-alignment on the elementary contigs; paired contigs were categorized into 4 types of outcome. In Case I, the contig aligns to itself, which will be ignored. In Case II, Contig 2 represents a contig with no obvious alignment with other contigs, and the contig type is defined as haploid genome contig. In Cases III and IV, the contig under analysis can align with another contig; in the figure, Contigs 4 and 6 are defined as haploid genome contigs because B is longer than A. In Case III, Contig 3 (the shorter contig) is defined as AS because the aligned sequence (a+b+c) accounted for >85% of the no-repeat sequence total length (A). In Case IV, Contig 5 (the shorter contig) is considered to be a duplication of sequence because the aligned sequence (a+b+c) accounted for <85% of the no-repeat sequence total length (A); therefore, Contig 5 is defined as a haploid genome contig.
Summary statistics of Illumina genome-sequencing reads mapped onto the assemblies
| Mean depth | Lowest depth | Highest depth | |
|---|---|---|---|
| Corrected HGCs | 121.9 | 73.24 | 167.07 |
| Corrected ASs | 85.63 | 13.08 | 1,221.48 |
| Corrected scaffolds | 122.2 | 73.24 | 167.07 |
BUSCOs found in coleopteran genomes
| Species | Complete (%) | Fragment (%) | Missing (%) | Duplication (%) |
|---|---|---|---|---|
|
| 80.45 | 10.00 | 9.55 | 8.80 |
|
| 81.47 | 8.53 | 10.00 | 10.87 |
|
| 82.31 | 8.70 | 8.99 | 9.12 |
|
| 91.90 | 2.70 | 5.40 | 4.10 |
|
| 93.00 | 1.90 | 5.10 | 7.20 |
|
| 96.59 | 2.90 | 0.51 | 9.40 |
|
| 98.80 | 0.60 | 0.60 | 7.20 |
Summary of identified repeat elements in the P. brevitarsis genome
| Repeat element | Repeat elements from haploid genome | Repeat elements from ASs | ||
|---|---|---|---|---|
| Length (bp) | Percentage (%) | Length (bp) | Percentage (%) | |
| Long terminal repeat | 109,722,085 | 14.35 | 60,133,491 | 15.29 |
| Long interspersed nuclear element | 101,529,627 | 13.28 | 52,758,849 | 13.42 |
| Short interspersed nuclear element | 259,936 | 0.03 | 50,366 | 0.01 |
| DNA element | 166,788,392 | 21.81 | 92,972,801 | 23.65 |
| Simple repeat | 4,749,908 | 0.62 | 2,485,661 | 0.63 |
| Low complexity | 1,132,919 | 0.15 | 656,626 | 0.17 |
| Rolling circle | 7,162,276 | 0.94 | 5,220,692 | 1.33 |
| Satellite | 304,734 | 0.04 | 221,437 | 0.06 |
| Other | 131,605 | 0.02 | 99,113 | 0.03 |
| Unclassified | 4,451,277 | 0.58 | 5,618,712 | 1.43 |
| Total | 396,232,759 | 51.82 | 220,217,748 | 56.02 |
Summary of annotated genes in the P. brevitarsis genome
| Database | Annotated genes | |
|---|---|---|
| From haploid genome (No. [%]) | From ASs (No. [%]) | |
| KEGG | 15,828 (71.16) | 7,980 (67.17) |
| Swiss-Prot | 10,509 (47.25) | 5,179 (43.59) |
| Nr | 17,487 (78.62) | 8,757 (73.71) |
| Nt | 3,688 (16.58) | 1,855 (15.61) |
| TrEMBL_eggNOG | 15,986 (71.87) | 8,029 (67.58) |
| No. of total annotated genes | 17,625 (79.24) | 8,887 (74.80) |
Figure 4Phylogenetic relationship of P. brevitarsis and 6 Coleoptera insects based on 2,354 orthologue genes. Estimated divergence times using D. ponderosae-T. castaneum [180Mya] as the calibration time are shown [55].