| Literature DB >> 29257134 |
Zsolt Balázs1, Dóra Tombácz1, Attila Szűcs1, Michael Snyder2, Zsolt Boldogkői1.
Abstract
Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.Entities:
Mesh:
Year: 2017 PMID: 29257134 PMCID: PMC5735922 DOI: 10.1038/sdata.2017.194
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary of the raw sequencing datasets.
| Run accessions are ENA Run accession IDs. All sequencing runs belong to the same sample: ERS1870077. Sample complex means the solution prepared from the library by the addition of the DNA/Polymerase binding solution. Three sample complexes were prepared from the poly(A)-selected library (polyA1–3), and one for the random library. Runs marked with an asterisk (*) contain samples from a separate experiment as well (see the Methods section for details). | |||||
|---|---|---|---|---|---|
| Run1 | 1,668 | 9.174 | 1293.294±7.539 | 0.07%±0.004% | 0.395%±0.018% |
| Run2 | 1,531 | 8.680 | 1333.135±7.865 | 0.0785%±0.012% | 0.425%±0.023% |
| Run3 | 2,274 | 12.673 | 1310.457±6.366 | 0.08%±0.007% | 0.463%±0.018% |
| Run4 | 2,307 | 9.154 | 933.072±5.788 | 0.099%±0.014% | 0.426%±0.019% |
| Run5 | 3,098 | 18.395 | 1396.243±5.137 | 0.083%±0.003% | 0.618%±0.019% |
| Run6 | 3,814 | 22.439 | 1383.451±4.55 | 0.079%±0.004% | 0.547%±0.018% |
| Run7 | 9,358 | 43.429 | 1091.287±3.022 | 0.14%±0.008% | 0.734%±0.014% |
| Run8 | 9,036 | 40.450 | 1052.634±2.975 | 0.191%±0.013% | 0.684%±0.014% |
| Total | 33,086 | 164.394 | 1168.371±1.509 | 0.128%±0.004% | 0.616%±0.007% |
Summary statistics of the ROIs.
| Quality values have been determined using the RS_Isoseq protocol. In case of Run 4, the random library, full-length non-chimeric reads were called without requiring the presence of poly(A) tails. | |||
|---|---|---|---|
| FJ616285 | Human herpesvirus 5 strain Towne | 43,285 | 4.38E-05 |
| GQ121041 | Human herpesvirus 5 transgenic strain Towne | 44,041 | 4.34E-05 |
| KX544836 | Human herpesvirus 5 isolate VR5201 | 86,370 | 4.38E-05 |
| KF493877 | Human herpesvirus 5 transgenic isolate Towne-BAC-der | 88,787 | 6.68E-05 |
| AC146851 | Human Herpesvirus 5 Towne-BAC isolate | 88,805 | 6.68E-05 |
| KF493876 | Human herpesvirus 5 transgenic isolate Towne-BAC_UL96_Mutant | 89,227 | 6.68E-05 |
| AY315197 | Human herpesvirus 5 strain Towne | 89,875 | 5.68E-05 |
| KX101023 | Human herpesvirus 5 strain Toledo/Towne Chimera 3 | 193,241 | 4.61E-05 |
| KX101022 | Human herpesvirus 5 strain Toledo/Towne Chimera 2 | 203,141 | 4.50E-05 |
| AH013698 | Human herpesvirus 5 strain Toledo | 217,386 | 0.000171 |
Summary statistics of the sequencing reads which aligned to the human genome (hg19) from each SMRT cell.
| Average values are given together with s.e. values. | |||||
|---|---|---|---|---|---|
| Run1 | ERR2106421 | oligo(dT) | 05-11-2016 | PolyA1 | 4,357 |
| Run2 | ERR2106422 | oligo(dT) | 20-11-2016 | PolyA1 | 3,911 |
| Run3 | ERR2106423 | oligo(dT) | 20-11-2016 | PolyA1 | 4,878 |
| Run4 | ERR2106424 | Ribodepletion | 20-11-2016 | Random | 6,344 |
| Run5 | ERR2106425 | oligo(dT) | 25-11-2016 | PolyA2 | 6,637 |
| Run6 | ERR2106426 | oligo(dT) | 25-11-2016 | PolyA2 | 7,423 |
| Run7 | ERR2106427 | oligo(dT) | 04-12-2016 | PolyA3 | 23,199 |
| Run8 | ERR2106428 | oligo(dT) | 04-12-2016 | PolyA3 | 21,274 |
Summary statistics of the sequencing reads which aligned to the HCMV genome (FJ616285.1) from each SMRT cell.
| Average values are given together with s.e. values. | |||||
|---|---|---|---|---|---|
| Run1 | 6,898 | 6,277 | 1,346 | 25 | 98.99% |
| Run2 | 6,349 | 5,790 | 1,377 | 20 | 98.97% |
| Run3 | 9,706 | 8,800 | 1,347 | 21 | 99.00% |
| Run4 | 16,346 | 14,241 | 901 | 32 | 99.24% |
| Run5 | 13,099 | 11,647 | 1,504 | 27 | 99.09% |
| Run6 | 16,995 | 15,015 | 1,461 | 24 | 99.15% |
| Run7 | 43,682 | 30,058 | 1,213 | 18 | 98.20% |
| Run8 | 43,315 | 28,686 | 1,179 | 18 | 98.30% |
Figure 1The distribution of read lengths in the Poly(A)-selected samples.
The average distribution of read lengths which align to the human (hg19) genome is shown in a (n=7), and for the HCMV genome (FJ616285.1) in b (n=7). The same can be seen broken down to the three sample complexes in c and d (for the hg19 and the FJ616285.1 genomes respectively). The sample complex PolyA1 was used for three SMRTcells, PolyA2 and PolyA3 were used for two SMRTcells each. Error bars represent s.e.
BLAST results confirm the strain of the virus.
| All sequencing reads were aligned against all the complete HCMV genomes in the NCBI database. The ten results with the fewest mismatches are shown. | |||||
|---|---|---|---|---|---|
| Run1 | 5,670 | 0.002 | 1171.04±9.63 | 0.742%±0.06% | 0.661%±0.035% |
| Run2 | 5,204 | 0.002 | 1201.064±10.349 | 0.785%±0.073% | 0.761%±0.039% |
| Run3 | 8,072 | 0.003 | 1176.594±8.123 | 0.744%±0.043% | 0.782%±0.034% |
| Run4 | 9,900 | 0.003 | 845.664±5.383 | 2.486%±0.059% | 1.24%±0.035% |
| Run5 | 10,852 | 0.005 | 1329.305±7.035 | 0.697%±0.044% | 0.862%±0.027% |
| Run6 | 14,240 | 0.006 | 1270.59±6.149 | 0.652%±0.02% | 0.815%±0.026% |
| Run7 | 34,868 | 0.011 | 968.521±3.475 | 0.85%±0.027% | 1.055%±0.018% |
| Run8 | 33,830 | 0.010 | 929.996±3.417 | 0.976%±0.033% | 1.119%±0.02% |
| Total | 122,636 | 0.040 | 1047.903±1.956 | 0.965%±0.015% | 0.994%±0.009% |