| Literature DB >> 28369349 |
Lin Wang1, Nan Tang1, Xinlei Gao1, Zhaoxia Chang1, Liqin Zhang1, Guohui Zhou2, Dongyang Guo1, Zhen Zeng1, Wenjie Li1, Ibukun A Akinyemi1, Huanming Yang3, Qingfa Wu1,4,5.
Abstract
Background: Sogatella furcifera is an important phloem sap-sucking and plant virus-transmitting migratory insect of rice. Because of its high reproductive potential, dispersal capability and transmission of plant viral diseases, S. furcifera causes considerable damage to rice grain production and has great economical and agricultural impacts. Comprehensive studies into ecological aspects and virus-host interactions of S. furcifera have been limited because of the lack of a well-assembled genome sequence. Findings: A total of 241.3 Gb of raw reads from the whole genome of S. furcifera were generated by Illumina sequencing using different combinations of mate-pair and paired-end libraries from 17 insert libraries ranging between 180 bp and 40 kbp. The final genome assembly (0.72 Gb), with average N50 contig size of 70.7 kb and scaffold N50 of 1.18 Mb, covers 98.6 % of the estimated genome size of S. furcifera . Genome annotation, assisted by eight different developmental stages (embryos, 1 st -5 th instar nymphs, 5-day-old adults and 10-day-old adults), generated 21 254 protein-coding genes, which captured 99.59 % (247/248) of core CEGMA genes and 91.7 % (2453/2675) of BUSCO genes. Conclusions: We report the first assembled and annotated whole genome sequence and transcriptome of S. furcifera . The assembled draft genome of S. furcifera will be a valuable resource for ecological and virus-host interaction studies of this pest.Entities:
Keywords: Annotation; Assembly; Genomics; Sogatella furcifera genome
Mesh:
Year: 2017 PMID: 28369349 PMCID: PMC5437944 DOI: 10.1093/gigascience/giw004
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1.Photograph of Sogatella furcifera (white-backed planthopper) on a rice plant leaf. The scale bar of 1 mm is shown in the photograph.
Whole genome shotgun (WGS) reads used in sequencing of the Sogatella furcifera genome
| Read length | Insert size | Sequencing data | ||
|---|---|---|---|---|
| Libraries | (bp) | (bp) | Total (G) | Coverage(x) |
| MiSeq | 2 × 300 | 470 | 10.5 | 14.38 |
| 2 × 300 | ||||
| Paired-end | 2 × 125 | 180 | 15.0 | 20.54 |
| 2 × 125 | 320 | 13.0 | 17.80 | |
| 2 × 125 | 420 | 11.0 | 15.06 | |
| 2 × 100 | 350 | 2.4 | 3.28 | |
| 2 × 90 | 500 | 11.6 | 15.89 | |
| 2 × 125 | 600 | 8.9 | 12.19 | |
| 2 × 125 | 680 | 12.8 | 17.53 | |
| Mate-pair | 2 × 125 | 2000 | 12.6 | 17.26 |
| 2 × 90 | 3200 | 3.4 | 4.65 | |
| 2 × 125 | 5000 | 10.0 | 13.69 | |
| 2 × 125 | 8000 | 7.8 | 10.68 | |
| 2 × 125 | 10 000 | 15.9 | 21.78 | |
| 2 × 125 | 15 000 | 48.8 | 66.84 | |
| 2 × 125 | 20 000 | 24.2 | 33.15 | |
| 2 × 125 | 40 000 | 18.6 | 25.47 | |
| Total | 241.3 | 330.54 | ||
*The estimated genome size was 0.73 Gb
Sogatella furcifera genome assembly statistical analysis
| Contig | Scaffold | |||
|---|---|---|---|---|
| Size (bp) | Number | Size (bp) | Number | |
| N90 | 9232 | 12 253 | 85 450 | 890 |
| N80 | 21 047 | 7536 | 319 035 | 489 |
| N70 | 35 405 | 5076 | 529 262 | 317 |
| N60 | 51 883 | 3500 | 845 521 | 207 |
| N50 | 70 730 | 2390 | 1 185 287 | 133 |
| Longest | 799 912 | – | 12 788 806 | – |
| Total size | 673 904 942 | – | 720 705 630 | – |
| Total number (>10 000 bp) | 602 082 273 | 11 792 | 697 471 028 | 2567 |
| Total number (>100 000 bp) | 258 085 068 | 1448 | 649 732 076 | 840 |
Transposable element (TE) content of the Sogatella furcifera genome, derived from RepeatMasker analysis
| RepBase TEs | TE Proteins |
| Combined TEs | |||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % of genome | Length (bp) | % of genome | Length (bp) | % of genome | Length (bp) | % of genome | |
| DNA | 3 946 730 | 0.54 | 4 606 659 | 0.63 | 120 249 936 | 16.54 | 126 002 323 | 17.33 |
| LINE | 5 042 806 | 0.69 | 28 043 919 | 3.85 | 44 814 393 | 6.16 | 69 257 982 | 9.52 |
| SINE | 810 448 | 0.11 | 0 | 0 | 10 821 265 | 1.48 | 10 730 722 | 1.48 |
| LTR | 3 346 275 | 0.46 | 7 721 608 | 1.06 | 28 298 660 | 3.89 | 31 286 552 | 4.30 |
| Other | 975 677 | 0.13 | 317 139 | 0.04 | 23884560 | 328 | 23 167 338 | 3.18 |
| Unknown | 0 | 0.00 | 0 | 0.00 | 28 395 639 | 3.90 | 28 395 639 | 3.90 |
| Total | 14 121 936 | 1.94 | 40 689 325 | 5.59 | 256 464 453 | 35.27 | 288 840 556 | 39.73 |
Note: LINE: long interspersed nuclear element; LTR: long terminal repeat; SINE: short interspersed nuclear element.
Characteristics of the predicted protein-coding genes in the Sogatella furcifera assembly
| Gene | Coding DNA | Exon | Exon | Intron | |||
|---|---|---|---|---|---|---|---|
| Gene set | Number | length (bp) | sequence length (bp) | per gene | length (bp) | length (bp) | |
|
| AUGUSTUS | 44 600 | 10 406.51 | 1507 | 6.04 | 249 | 1659.97 |
| GENSCAN | 44 160 | 9280.96 | 1106 | 4.25 | 260 | 1505.23 | |
| GeneWise: | |||||||
| Homolog |
| 11 687 | 4890.78 | 771 | 2.87 | 268 | 2197.30 |
|
| 19 671 | 6574.52 | 884 | 3.44 | 257 | 2328.74 | |
|
| 18 160 | 8437.51 | 1000 | 4.08 | 244 | 2411.35 | |
|
| 55 250 | 1414.84 | 865 | 1.27 | 681 | 2032.34 | |
|
| 29 842 | 4534.72 | 775 | 2.82 | 274 | 2054.77 | |
|
| 33 096 | 4704.18 | 925 | 2.63 | 351 | 2315.42 | |
|
| 17 785 | 7868.22 | 986 | 4.01 | 245 | 2282.51 | |
| RNA-Seq | 28 183 | 4049.83 | 1800 | 3.33 | 540 | 2504.86 | |
| EVidenceModeler | 21 254 | 12 584.24 | 1577 | 6.47 | 243 | 2011.27 | |
Summary of functional annotation
|
| |||
|---|---|---|---|
| Gene number | Percent of total genes (%) | ||
| Total | 21 254 | – | |
| InterPro | 12 699 | 59.74 | |
| GO* | 8633 | 40.61 | |
| Annotated | KEGG | 6646 | 31.26 |
| Swiss-Prot | 11 102 | 52.23 | |
| TrEMBL | 14 553 | 68.47 | |
| Annotated | 14 990 | 70.52 | |
| Unannotated | 6264 | 29.47 | |
Note: Five proteins databases were chosen to assist the function prediction of genes: InterPro, GO, KEGG, Swiss-Port, and TrEMBL. The table shows numbers of genes matched in each database. *GO assignments were based on InterPro. KEGG: Kyoto Encyclopedia of Genes and Genomes, GO: Gene Ontology.
Figure 2.Gene ontology (GO) enrichment analysis for differentially expressed genes in eight different developmental stages of Sogatella furcifera. All differentially expressed genes were subjected to GO analysis – the top 20 enriched terms are shown here.
Figure 3.k-means clustering for differentially expressed genes and expression patterns. (A) Eight expression patterns are shown on the left. Heat map shows the relative expression levels of each transcript (rows) in each sample (column). Normalized fragments per kilobase of exon per million fragments (FPKMs) calculated by Cuffdiff2 were log2-transformed and then median-centered by transcript. Heatmap was drawn based on clustering results. Red color represents higher expression; green represents lower expression. Note: red asterisks (*) on the left side of the figure indicate that expression at the corresponding stage is higher than the average expression level. Abbreviations: emb: embryo; 1in: 1st instar nymph; 2in: 2nd instar nymph; 3in: 3rd instar nymph; 4in: 4th instar nymph; 5in: 5th instar nymph; 5d: 5-day-old adult; 10d: 10-day-old adult. (B) The average of log2-transformed FPKM corresponding genes in each pattern.