| Literature DB >> 29136191 |
Junjie Zhu1,2, Feng Jiang3, Xianhui Wang1, Pengcheng Yang3, Yanyuan Bao4, Wan Zhao1, Wei Wang1, Hong Lu1, Qianshuo Wang1, Na Cui1, Jing Li1, Xiaofang Chen1, Lan Luo1, Jinting Yu1, Le Kang1,3, Feng Cui1.
Abstract
Background: Laodelphax striatellus Fallén (Hemiptera: Delphacidae) is one of the most destructive rice pests. L. striatellus is different from 2 other rice planthoppers with a released genome sequence, Sogatella furcifera and Nilaparvata lugens, in many biological characteristics, such as host range, dispersal capacity, and vectoring plant viruses. Deciphering the genome of L. striatellus will further the understanding of the genetic basis of the biological differences among the 3 rice planthoppers. Findings: A total of 190 Gb of Illumina data and 32.4 Gb of Pacbio data were generated and used to assemble a high-quality L. striatellus genome sequence, which is 541 Mb in length and has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive elements account for 25.7% of the genome. A total of 17 736 protein-coding genes were annotated, capturing 97.6% and 98% of the BUSCO eukaryote and arthropoda genes, respectively. Compared with N. lugens and S. furcifera, L. striatellus has the smallest genome and the lowest gene number. Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit, and plant virus transmission between L. striatellus and the other 2 planthoppers. Conclusions: We report a high-quality genome assembly of L. striatellus, which is an important genomic resource not only for the study of the biology of L. striatellus and its interactions with plant hosts and plant viruses, but also for comparison with other planthoppers.Entities:
Keywords: annotation; comparative genomics; genome sequencing; insects; virus transmission
Mesh:
Year: 2017 PMID: 29136191 PMCID: PMC5740986 DOI: 10.1093/gigascience/gix109
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Photograph of Laodelphax striatellus on a rice plant leaf. Scale bar, 1 mm.
Sequencing data used for genome assembly and annotation
| Category | Accession | Life stage | Sample type | Insert size, bp | Read length, bp | No. of reads |
|---|---|---|---|---|---|---|
| Survey | SRR5816389 | Adult | DNA | 230 | 2 × 125 | 127 772 669 |
| Assmebly | SRR5830088 | Adult | DNA | 180 | 2 × 100 | 123 459 791 |
| SRR5816388 | Adult | DNA | 250 | 2 × 125 | 137 013 558 | |
| SRR5816387 | Adult | DNA | 500 | 2 × 100 | 141 587 274 | |
| SRR5816386 | Adult | DNA | 500 | 2 × 125 | 30 520 480 | |
| SRR5816393 | Adult | DNA | 800 | 2 × 100 | 153 498 320 | |
| SRR5816392 | Adult | DNA | 1.4–1.6 K | 2 × 125 | 40 251 413 | |
| SRR5816391 | Adult | DNA | 2.6–2.8 K | 2 × 125 | 36 559 438 | |
| SRR5816390 | Adult | DNA | 5–5.6 K | 2 × 125 | 26 684 783 | |
| SRR5816385 | Adult | DNA | 5.6–6.5 K | 2 × 125 | 23 069 935 | |
| SRR5816384 | Adult | DNA | 9–11 K | 2 × 125 | 24 285 333 | |
| SRR5816377 | Adult | DNA | 11–13 K | 2 × 125 | 23 396 366 | |
| SRR5816376 | Adult | DNA | 13–15 K | 2 × 125 | 30 547 732 | |
| SRR5816379 | Adult | DNA | 15–18 K | 2 × 125 | 25 926 919 | |
| SRR5816378 | Adult | DNA | 18–24 K | 2 × 125 | 26 325 395 | |
| SRR5817574 | Adult | DNA | - | 8559 | 99 701 | |
| SRR5817559 | Adult | DNA | - | 8947 | 77 038 | |
| SRR5817582 | Adult | DNA | - | 8474 | 104 288 | |
| SRR5817569 | Adult | DNA | - | 8518 | 114 320 | |
| SRR5817560 | Adult | DNA | - | 9202 | 80 599 | |
| SRR5817562 | Adult | DNA | - | 9211 | 100 089 | |
| SRR5817573 | Adult | DNA | - | 8610 | 102 997 | |
| SRR5817558 | Adult | DNA | - | 9007 | 86 083 | |
| SRR5817581 | Adult | DNA | - | 8452 | 89 374 | |
| SRR5817570 | Adult | DNA | - | 8419 | 101 715 | |
| SRR5817550 | Adult | DNA | - | 9192 | 82 657 | |
| SRR5817576 | Adult | DNA | - | 8597 | 105 080 | |
| SRR5817553 | Adult | DNA | - | 8586 | 77 467 | |
| SRR5817557 | Adult | DNA | - | 8821 | 75 712 | |
| SRR5817567 | Adult | DNA | - | 8363 | 106 634 | |
| SRR5817575 | Adult | DNA | - | 8620 | 105 795 | |
| SRR5817552 | Adult | DNA | - | 8985 | 66 096 | |
| SRR5817556 | Adult | DNA | - | 8573 | 83 500 | |
| SRR5817568 | Adult | DNA | - | 8357 | 104 295 | |
| SRR5817578 | Adult | DNA | - | 8528 | 108 299 | |
| SRR5817565 | Adult | DNA | - | 8728 | 69 694 | |
| SRR5817555 | Adult | DNA | - | 8480 | 86 385 | |
| SRR5817571 | Adult | DNA | - | 8437 | 106 314 | |
| SRR5817577 | Adult | DNA | - | 8686 | 106 337 | |
| SRR5817566 | Adult | DNA | - | 8890 | 52 889 | |
| SRR5817554 | Adult | DNA | - | 8648 | 85 970 | |
| SRR5817572 | Adult | DNA | - | 8437 | 101 258 | |
| SRR5817580 | Adult | DNA | - | 8490 | 104 459 | |
| SRR5817563 | Adult | DNA | - | 8954 | 91 218 | |
| SRR5817561 | Adult | DNA | - | 8724 | 84 033 | |
| SRR5817579 | Adult | DNA | - | 8776 | 107 138 | |
| SRR5817564 | Adult | DNA | - | 9054 | 68 294 | |
| SRR5817551 | Adult | DNA | - | 8508 | 88 776 | |
| Annotation | SRR5816381 | Larva | RNA | 250–300 | 2 × 150 | 23 733 333 |
| SRR5816380 | Adult | RNA | 250–300 | 2 × 150 | 24 933 333 | |
| SRR5816383 | Egg | RNA | 250–300 | 2 × 150 | 24 633 333 | |
| SRR5816382 | Fat body | RNA | 250–300 | 2 × 150 | 31 300 000 | |
| SRR5816375 | Brain | RNA | 250–300 | 2 × 150 | 40 333 333 | |
| SRR5816374 | Gonad | RNA | 250–300 | 2 × 150 | 33 300 000 | |
| SRR5816394 | Tentacle | RNA | 250–300 | 2 × 150 | 24 966 666 |
Survey library in the Category column was used to estimate the genome size of Laodelphax striatellus. Libraries of insert size >1 Kb were mate-paired. For gene annotation, data from 2 previously sequenced tissues were used under accession SRR1619428 for salivary gland and SRR1617617 for alimentary canal.
Statistics comparison of genome assembly and annotation among 3 planthoppers
|
|
|
| ||||
|---|---|---|---|---|---|---|
| Category | Contig | Scaffold | Contig | Scaffold | Contig | Scaffold |
| Total size, Mb | 530.2 | 541.0 | 993.8 | 1140.8 | 673.9 | 720.7 |
| Total number | 48 574 | 38 193 | 80 046 | 46 558 | 50 020 | 20 450 |
| Maximum length, Kb | 1990 | 10 350 | 230 | 2254 | 800 | 12 789 |
| N50 length, Kb | 118 | 1085 | 24 | 357 | 71 | 1185 |
| GC content, % | 34.5 | 34.6 | 31.6 | |||
| TE proportion, % | 23.0 | 38.9 | 39.7 | |||
| BUSCO evaluation, % | 92 | 81 | 92 | |||
| Gene number | 17 736 | 27 571 | 21 254 | |||
| Average gene length, bp | 14 342 | 11 216 | 12 597 | |||
| Average CDS length, bp | 1289 | 1135 | 1526 | |||
| Average exon per gene | 6 | 4 | 6 | |||
| Average exon length, bp | 213 | 264 | 240 | |||
| Average intron length, bp | 2587 | 3062 | 2064 | |||
Gene number means number of protein-coding genes.
BUSCO: benchmarking universal single copy ortholog; CDS: coding sequence; TE: transposable element.
From the published Nilaparvata lugens genome [8].
From the published Sogatella furcifera genome [9].
Comparison of transposable element contents of the 3 planthoppers
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| TE proteins | Combined TEs | Combined TEs | Combined TEs | ||||||
| Class | Length, bp | % of genome | Length, bp | % of genome | Length, bp | % of genome | Length, bp | % of genome | Length, bp | % of genome |
| DNA | 24 818 676 | 4.59 | 2 550 902 | 0.47 | 26 592 872 | 4.92 | 162 024 958 | 14.20 | 126 002 323 | 17.33 |
| LINE | 24 160 245 | 4.47 | 4 889 094 | 0.90 | 27 124 925 | 5.01 | 182 652 892 | 16.00 | 69 257 982 | 9.52 |
| LTR | 7 122 249 | 1.32 | 0 | 0.00 | 7 122 249 | 1.32 | 168 492 299 | 14.80 | 31 286 552 | 4.30 |
| SINE | 22 739 683 | 4.20 | 743 909 | 0.14 | 23 044 510 | 4.26 | 8 272 412 | 0.70 | 10 730 722 | 1.48 |
| Other | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | 41 262 | 0.00 | 23 167 338 | 3.18 |
| Unknown | 27 609 625 | 5.10 | 0 | 0.00 | 27 609 625 | 5.10 | 21 890 733 | 1.90 | 28 395 639 | 3.90 |
| Total | 119 645 576 | 22.12 | 8 177 428 | 1.51 | 124 360 921 | 22.99 | 443 765 874 | 38.90 | 288 840 556 | 39.73 |
De novo + Repbase refers to TE integrated between de novo and Repbase prediction. TE proteins refers to TE identified by RepeatProteinMask. Combined TEs refers to the 2 TE combined results above. Other means TE that can be classified but doesn’t belong given classes. Unknown means TE that can’t be classified.
DNA: DNA transposon; LINE: long interspersed nuclear element; LTR: long terminal repeat; SINE: short interspersed nuclear element.
Figure 2:Gene cluster analysis among 22 arthropod species. 1:1:1 and N: N: N represent universal orthologs with single-copy or multiple-copy numbers, respectively. Insect, Diptera, Hemiptera, Hymenoptera, Lepidoptera, and Coleoptera stand for taxon-specific orthologs, respectively. Other indicates orthlogs that do not belong to any abovementioned ortholog categories. SD indicates species-specifically duplicated genes. ND indicates genes that cannot be classified into any other categories. The location of Laodelphax striatellus is indicated by an arrow.
Figure 3:Venn diagram of functional annotation by 4 databases. NR: nonredundant protein databases.
Figure 4:Phylogenetic analysis of 22 arthropod species. The phylogenetic tree was constructed based on amino acid sequences of 277 single-copy orthologs among 22 arthropod species (Anopheles gambiae, Anoplophora glabripennis, Apis mellifera, Acyrthosiphon pisum, Bombyx mori, Bemisia tabaci, Cimex lectularius, Diaphorina citri, Drosophila melanogaster, Diuraphis noxia, Danaus plexippus, Daphnia pulex, Locusta migratoria, Laodelphax striatellus, Nilaparvata lugens, Nasonia vitripennis, Oncopeltus fasciatus, Pediculus humanus, Rhodnius prolixus, Sogatella furcifera, Tribolium castaneum, Zootermopsis nevadensis) using the maximum likelihood algorithm. The tree was rooted with D. pulex.