| Literature DB >> 30512018 |
Zsolt Boldogkői1, Norbert Moldován1, Attila Szűcs1, Dóra Tombácz1.
Abstract
Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is a prototypic baculovirus infecting specific insects. AcMNPV contains a large double-stranded DNA genome encoding a complex transcriptome. This virus has a widespread application as a vector for the expression of heterologous proteins. Here, we present a dataset, derived from Oxford Nanopore Technologies (ONT) long-read sequencing platform. We used both cDNA and direct RNA sequencing techniques. The dataset contains 520,310 AcMNPV and 1,309,481 host cell reads using the regular cDNA-sequencing method of ONT technique, whereas altogether 6,456 reads were produced by using direct RNA-sequencing. We also used a Cap-selection protocol for certain ONT samples, and obtained 2,568,669 reads by using this method. The raw reads were aligned to the AcMNPV reference genome (KM667940.1). Here, we openly released the 'static' and the dynamic transcript catalogue of AcMNPV. This dataset can be used for deep analyses of the transcriptomic and epitranscriptomic patterns of the AcMNPV and the host cell. The data can be also useful for the validation of different bioinformatics software packages and analysis tools.Entities:
Mesh:
Year: 2018 PMID: 30512018 PMCID: PMC6278695 DOI: 10.1038/sdata.2018.276
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary table of the cDNA-, dRNA and Cap-sequencing reads - from the mixed samples - mapped to the AcMNPV genome.
| Sample | Number of aligned reads | Median of read lengths | Average of read lengths±SE | Average of aligned read lengths±SE | Average Insertion frequency (%)±SE (%) | Average Deletion frequency (%)±SE (%) | Average Mismatch frequency (%)±SE (%) | Coverage | N50 |
|---|---|---|---|---|---|---|---|---|---|
| Read N50 is defined as the length N for which 50% of all bases in the reads are in a sequence of length L < N. | |||||||||
| 95953 | 840 | 1060.76±2.39 | 702.89±1.56 | 2.83±0.01 | 5.92±0.01 | 7.31±0.01 | 503.45 | 816 | |
| 2425 | 504 | 612.53±8.04 | 595.96±8.35 | 2.02±0.04 | 8.62±0.05 | 6.10±0.04 | 10.79 | 678 | |
| 488847 | 627 | 723.10±0.53 | 577.88±0.45 | 2.75±0.00 | 3.70±0.00 | 4.81±0.00 | 2108.69 | 529 |
Summary statistics of the cDNA sequencing reads – derived from the different time points - aligned to the AcMNPV genome.
| Sample | Number of aligned reads | Median of read lengths | Average of read lengths±SE | Average of aligned read lengths±SE | Average Insertion frequency (%)±SE (%) | Average Deletion frequency (%)±SE (%) | Average Mismatch frequency (%)±SE (%) | Coverage | N50 |
|---|---|---|---|---|---|---|---|---|---|
| Read N50 is defined as the length N for which 50% of all bases in the reads are in a sequence of length L < N. | |||||||||
| 135 | 1010 | 1139.68±53.61 | 638.67±40.74 | 4.05±0.21 | 3.51±0.17 | 5.84±0.25 | 0.64 | 661 | |
| 90 | 902 | 1037.20±70.43 | 624.51±51.52 | 4.08±0.29 | 3.52±0.22 | 5.72±0.26 | 0.42 | 660 | |
| 870 | 577 | 753.84±15.58 | 541.90±13.17 | 3.78±0.07 | 3.29±0.07 | 5.08±0.11 | 3.52 | 596 | |
| 21557 | 592 | 789.73±3.07 | 601.71±2.99 | 3.66±0.01 | 3.51±0.01 | 5.14±0.02 | 96.82 | 881 | |
| 19989 | 730 | 908.66±3.72 | 714.37±3.61 | 3.66±0.01 | 3.41±0.01 | 5.11±0.02 | 106.59 | 1025 | |
| 84201 | 582 | 689.09±1.27 | 492.74±1.23 | 3.63±0.01 | 3.17±0.01 | 5.14±0.01 | 309.69 | 509 | |
| 145127 | 593 | 738.07±1.04 | 543.22±1.01 | 3.66±0.01 | 3.14±0.00 | 5.11±0.01 | 588.47 | 598 | |
| 92564 | 617 | 737.39±1.26 | 535.93±1.14 | 3.68±0.01 | 3.16±0.01 | 5.15±0.01 | 370.30 | 517 | |
| 59824 | 633 | 867.47±2.41 | 650.31±2.16 | 3.56±0.01 | 2.99±0.01 | 4.94±0.01 | 290.40 | 894 |
Figure 1Barplot figure shows the read lengths of the static dataset.
(a) The figure illustrates the average read lengths of the cDNA-Seq, dRNA-Seq and Cap-Seq samples, as well as the weighted arithmetic mean values from the individual time points. (b) This plot shows the average mapped read lengths of the different samples. AVG: Weighted arithmetic mean.
Figure 2Barchart diagram represents the sequencing read length of the cDNA-sequencing of samples from various time points.
(a) AcMNPV (b) Sf9.
Summary table of the cDNA-, dRNA and Cap-sequencing reads - from the mixed samples - mapped to the host genome
| Sample | Number of aligned reads | Median of read lengths | Average of read lengths±SE | Average of aligned read lengths±SE | Average Insertion frequency (%)±SE (%) | Average Deletion frequency (%)±SE (%) | Average Mismatch frequency (%)±SE (%) | Coverage | N50 |
|---|---|---|---|---|---|---|---|---|---|
| Read N50 is defined as the length N for which 50% of all bases in the reads are in a sequence of length L<N. | |||||||||
| 210987 | 942 | 1236.91 ±2.07 | 879.06±1,71 | 3.49±0.01 | 5.75±0.00 | 6.66±0.01 | 0.361 | 1073 | |
| 4031 | 284 | 499.18±16.11 | 274.47±6.03 | 3.21±0.11 | 5.03±0.09 | 3.72±0.06 | 0.002 | 466 | |
| 2079822 | 672 | 726.81±0.27 | 591.93±0.24 | 2.89±0.00 | 4.27±0.00 | 5.41±0.00 | 2.394 | 617 |
Summary statistics of the cDNA sequencing reads – derived from the different time points - aligned to the Sf9 genome
| Sample | Number of aligned reads | Median of read lengths | Average of read lengths±SE | Average of aligned read lengths±SE | Average Insertion frequency (%)±SE (%) | Average Deletion frequency (%)±SE (%) | Average Mismatch frequency (%)±SE (%) | Coverage | H50 |
|---|---|---|---|---|---|---|---|---|---|
| Read N50 is defined as the length N for which 50% of all bases in the reads are in a sequence of length L<N. | |||||||||
| 169794 | 680 | 773.75±0.99 | 586.81±0.94 | 3.69±0.01 | 3.77±0.01 | 4.94±0.01 | 0.19 | 649 | |
| 20346 | 642 | 961.12±6.74 | 719.88±6.50 | 3.54±0.02 | 3.24±0.02 | 4.54±0.02 | 0.03 | 796 | |
| 27209 | 588 | 701.67±2.77 | 466.31±2.64 | 3.41±0.02 | 3.53±0.02 | 4.62±0.02 | 0.02 | 765 | |
| 46519 | 611 | 730.90±1.90 | 475.63±1.87 | 3.36±0.02 | 3.35±0.01 | 4.44±0.01 | 0.04 | 611 | |
| 60076 | 627 | 738.99±1.68 | 517.17±1.59 | 3.52±0.01 | 3.58±0.01 | 4.72±0.01 | 0.06 | 438 | |
| 275282 | 683 | 841.67±1.11 | 656.40±1.08 | 3.85±0.00 | 3.54±0.00 | 4.92±0.00 | 0.35 | 617 | |
| 299940 | 641 | 733.99±0.71 | 550.25±0.69 | 3.79±0.00 | 3.61±0.00 | 4.96±0.00 | 0.32 | 621 | |
| 80781 | 687 | 826.03±1.86 | 632.02±1.77 | 3.83±0.01 | 3.58±0.01 | 5.00±0.01 | 0.10 | 610 | |
| 118547 | 750 | 883.51±1.32 | 683.21±1.28 | 3.97±0.01 | 3.79±0.01 | 5.19±0.01 | 0.16 | 1629 |
Figure 3Data flow diagram shows the detailed overview of the study design.
Summary table of the reagents and chemistries used for the sequencing.
| Total RNA isolation | PolyA selection | Ribodepletion | Reverse transcription & dscDNA production | cDNA synthesis by PCR | Library preparation kit |
|---|---|---|---|---|---|
| Macherey-Nagel RNA | Qiagen Oligotex mRNA mini Kit | - | SuperScript III | - | Direct RNA Sequencing Kit |
| SuperScript IV | KAPA HiFi PCR Kit | Ligation Sequencing Kit 1D | |||
| PCR Barcoding Expansion 1-96+Ligation Sequencing Kit 1D | |||||
| - | Epicentre Ribo-Zero™ Magnetic Kit H/M/R | Lexogen Teloprime Kit enzymes & reagents | Lexogen Teloprime PCR mix | Ligation Sequencing Kit 1D |
Overview table of the amount of utilized nucleic acids for cDNA, dRNA and Cap-seq from mixed samples.
| Sample | Starting material (ng) | Amount of the library after PCR (ng) | Amount of the loaded library onto the flow cell (ng) |
|---|---|---|---|
| 75 | 475 | 170 | |
| 100 | no PCR | 46 | |
| 216 | 120 | 90 |
The list of different, oligod(T)-containing primers used in this study for the reverse transcription reactions.
| Sequencing method | Name, availability | Catalog # | Sequence (5′ -> 3′) |
|---|---|---|---|
| cDNA-seq | Poly(T)-containing anchored primer [(VN)T20 - ONT recommended, custom made (Bio Basic) | - | 5phos/ ACTTGCCTGTCGCTCTATCTTC(T)20VN |
| dRNA-seq | RT adapter - Direct RNA Sequencing Kit (Oxford Nanopore Technologies) | SQK-RNA001 | GAGGCGAGCGGTCAATTTTCCTAAGAGCAAGAAGAAGCCTTTTTTTTTT |
| Cap-seq | TeloPrime Full-Length cDNA Amplification Kit (Lexogen) | 013.08 & 013.24 | TCTCAGGCGTTTTTTTTTTTTTTTTTT |
Overview table of the amount of utilized nucleic acids for cDNA- seq for dynamic transcriptome analysis.
| Sample | Starting material (ng) | Amount of the library after PCR (ng) | Amount of the loaded library onto the flow cell | Barcode # |
|---|---|---|---|---|
| 58 | 1224 ng | 440 ng | C2 | |
| 59 | 684 ng | C3 | ||
| 54 | 612 ng | C4 | ||
| 53 | 744 ng | C5 | ||
| 54.5 | 738 ng | 410 ng | C6 | |
| 60 | 570 ng | C7 | ||
| 52.5 | 351 ng | C8 | ||
| 56 | 360 ng | C9 | ||
| 50 | 600 ng | C1 |
The sequence of the gene-specific primers used for the PCR amplification of 104.1 gene of AcMNPV.
| Primer | 5′ -> 3′ |
|---|---|
| fw | AACGTGCTGTTGAATTATGTGG |
| rev | AAACTGTTATCAATTAGTTTCGTTT |