| Literature DB >> 29186486 |
Xinhua Fu1, Jingjing Li2, Yu Tian2, Weipeng Quan2, Shu Zhang2, Qian Liu3, Fan Liang2, Xinlei Zhu4, Liangsheng Zhang5, Depeng Wang2, Jiang Hu2.
Abstract
Background: Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis. Findings: Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions. Conclusions: We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution.Entities:
Keywords: Pyrocoelia pectoralis; assembly; firefly; genome; long reads
Mesh:
Year: 2017 PMID: 29186486 PMCID: PMC5751067 DOI: 10.1093/gigascience/gix112
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Example of P. pectoralis (image from Xinhua Fu).
Comparison of genome features between P. pectoralis and D. melanogaster
| Type | Original assembly | Filtered assembly |
|
|---|---|---|---|
| Total number | 3517 | 474 | 2442 |
| Total length, bp | 1119 821 639 | 760 416 098 | 142 573 024 |
| Average length | 318 403 | 1604 253 | 58 384 |
| N50 length, bp/number | 2316 748/136 | 3035 809/79 | 21 485 538/3 |
| N90 length, bp/number | 161 781/689 | 813 338/261 | 666 663/17 |
| Longest | 13 688 299 | 13 688 299 | 27 905 053 |
| GC content, % | 34.69 | 34.79 | 42.01 |
| BUSCO (n = 1658) | C: 98.8%, F: 0.6%, | C: 98.7%, F: 0.7% | C: 99.7%, F: 0.2% |
C: complete BUSCOs; F: fragmented BUSCOs.
Figure 2:A demo of filtering heterozygous contigs. The alternative heterozygous regions between contig X000148F (x-axis) and contig X000170F (y-axis) are represented by red lines. The breakpoints of the main red line are caused by highly heterozygous loci. In total, 83.49% of short contig X000170F (865 792 bp) was covered by long contig X000148F (2 140 267 bp) with identity 0.94, so the short one was removed and the long contig was kept in the finally assembly.
Figure 3:The quality of genome assembly of 137 insects. The completeness of genome assemblies (y-axis) was assessed using 1658 insecta BUSCOs. The x-axis is the contig N50 (bp) of different insect genomes with log transformation to reduce the range. The red triangle and green square represent the D. melanogaster genome and P. pectoralis genome, respectively. The blue points represent 135 other insect genomes.
The coverage of unigenes from P. pectoralis
| Coverage rate | Coverage rate | |||||||
|---|---|---|---|---|---|---|---|---|
| >90% in 1 contig | >50% in 1 contig | |||||||
| Total length, | Sequence covered by | |||||||
| Data set | Number | bp | assembly (%) | Number | Percentage | Number | Percentage | |
| Original assembly | All | 37 552 | 30 971 346 | 98.28 | 34 963 | 93.10 | 36 636 | 97.56 |
| >500 bp | 15 237 | 24 436 334 | 99.35 | 14 521 | 95.30 | 15 050 | 98.77 | |
| >1000 bp | 9041 | 20 067 802 | 99.77 | 8730 | 96.56 | 8980 | 99.32 | |
| Filtered assembly | All | 37 552 | 30 971 346 | 97.88 | 34 472 | 91.79 | 36 389 | 96.90 |
| >500 bp | 15 237 | 24 436 334 | 99.11 | 14 387 | 94.42 | 14 979 | 98.30 | |
| >1000 bp | 9041 | 20 067 802 | 99.60 | 8668 | 95.87 | 8950 | 98.99 | |
Summary statistics of annotated repeats
| Number of | Length | Percentage of | |
|---|---|---|---|
| Type | elements | occupied, bp | sequence |
| DNA | 292 513 | 115 966 469 | 15.25 |
| LINE | 156 922 | 63 646 057 | 8.37 |
| SINE | 4935 | 634 774 | 0.08 |
| LTR | 35 391 | 26 864 897 | 3.53 |
| Other | 96 807 | 39 411 289 | 5.18 |
| Unknown | 384 377 | 99 828 399 | 13.13 |
| Total | 970 945 | 341 311 350 | 44.88 |
Summary statistics of genes and function annotation
| Number | Percentage | |
|---|---|---|
| Type | of genes | of genes |
| InterProScan | 18 318 | 79.33 |
| GO | 12 648 | 54.77 |
| KEGG | 7930 | 34.34 |
| Swissprot | 15 813 | 68.48 |
| Trembl | 20 061 | 86.87 |
| Annotated | 20 423 | 88.44 |
| Total | 23 092 | 100.00 |