| Literature DB >> 28388925 |
Kiril M Dimitrov1, Poonam Sharma1, Jeremy D Volkening2, Iryna V Goraichuk1,3, Abdul Wajid4,5, Shafqat Fatima Rehmani4, Asma Basharat4, Ismaila Shittu6, Tony M Joannis6, Patti J Miller1, Claudio L Afonso7.
Abstract
BACKGROUND: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected variants and co-infecting agents. However, NGS is not widely used for small RNA viruses because of incorrectly perceived cost estimates and inefficient utilization of freely available bioinformatics tools.Entities:
Keywords: Avian paramyxovirus; Complete genomes; De novo assembly; Galaxy; Mixed infection; Multiplexing; Newcastle disease virus; Next-generation sequencing
Mesh:
Year: 2017 PMID: 28388925 PMCID: PMC5384157 DOI: 10.1186/s12985-017-0741-5
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Fig. 1Customized Galaxy workflow used in the current study. Double arrows indicate steps where the read pairs were processed in parallel. Blue shading indicates pre-processing steps; green shading indicates assembly/post-processing steps; output is shaded purple. “In” indicates input filetypes; “out” indicates output filetypes
Statistics of next-generation sequencing of 30 avian paramyxovirus isolates in a single run
| Data | Results |
|---|---|
| Cluster density (K/mm2)a | 917 +/- 19 |
| Clusters passing filterb | 92.34% |
| Total number of reads | 17762176 |
| Pass-filter readsc | 16403251 |
| Percentage of reads passing filter | 96.31% |
| ≥ Q30d | 77.9% |
| Lowest representation for any indexe | 0.0007% |
| Highest representation for any indexe | 7.16% |
a shows number of clusters per square millimeter (optimal cluster density is 1000–1200, can vary with chemistry)
b indicates the purity of the signals detected from the clusters (i.e. signals passing chastity filter that is the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities and the filtration process removes the least reliable clusters from the image analysis results)
c reads passing filter (about 15 million reads are expected from an optimally clustered flow cell)
d percentage of bases with Phred quality score equal or greater to 30
e percentage of pass-filter reads assigned to any index
Fig. 2Analysis of Newcastle disease virus genome assembly at various read depths. Shown are the longest contig produced at each read depth as a fraction of the full genome length. Subsamples up to 200x were generated using digital normalization. Above 200x, additional reads were added using random subsampling (due to issues with high median cutoffs in the kh-mer package). At each subsampling depth, the final velvetg assembly was optimized for maximum contig length based on the “cov_cutoff” parameter
Summary of sequencing and assembly data of 25 avian paramyxovirus isolates
| Isolate number | % PF readsa | Number of raw read pairs | Number of filtered read pairsb | Forward read qualityc | Reverse read qualityc | Identified virus | Final coverage depthc | Number of reads used for consensusd | Consensus nucleotide length | Missing positions at 5' ende | Length of internal gaps | Missing positions at 3' ende | Percent coveragee |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1002 | 2.49 | 409193 | 405137 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|3680|6088|7868|18004 | 390740 | 15124 | 68 | 99.55 | ||
| 1004 | 2.67 | 437755 | 432361 | 2|37|38|38|38 | 2|34|37|38|38 | NDV | 0|4185|6151|7909|14329 | 422150 | 15125 | 67 | 99.56 | ||
| 1007 | 4.20 | 688524 | 681691 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|817|3648|7368|19348 | 665220 | 15125 | 67 | 99.56 | ||
| 994 | 1.39 | 227500 | 226196 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|1758|2756|4186|14276 | 219609 | 15121 | 71 | 99.53 | ||
| 995 | 1.38 | 226050 | 224416 | 2|36|37|38|38 | 2|34|36|37|38 | NDV | 0|2162|2995|4197|9101 | 216240 | 15110 | 82 | 99.46 | ||
| 996 | 1.53 | 251238 | 250338 | 2|37|38|38|38 | 2|34|37|37|38 | NDV | 0|2383|3175|4411|9167 | 242158 | 15104 | 20 | 68 | 99.42 | |
| 1001 | 3.79 | 621655 | 618002 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|4653|6784|9510|21623 | 594361 | 15122 | 70 | 99.54 | ||
| 997 | 1.72 | 281376 | 266251 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 0|1937|3026|4030|9645 | 233775 | 15104 | 22 | 66 | 99.42 | |
| 999 | 1.70 | 279207 | 272105 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|1728|2583|4496|11915 | 241105 | 15108 | 84 | 99.45 | ||
| 1000 | 1.74 | 285079 | 280596 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 0|2338|3107|4441|8796 | 241724 | 15117 | 75 | 99.51 | ||
| 959 | 3.60 | 590494 | 588212 | 2|37|38|38|38 | 2|37|38|38|38 | APMV-13 | 1|2250|3484|5801|23998 | 531520 | 16126 | 20 | 99.88 | ||
| 960 | 4.34 | 711909 | 709377 | 2|37|37|38|38 | 2|37|37|38|38 | NDV | 0|3738|5631|7347|19491 | 628837 | 15135 | 25 | 32 | 99.62 | |
| 961 | 4.61 | 756197 | 753487 | 2|36|37|38|38 | 2|35|37|38|38 | NDV | 0|4585|6509|8206|14939 | 674667 | 15167 | 25 | 99.84 | ||
| 962 | 3.35 | 548833 | 547622 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 2|4617|6340|8772|14719 | 529553 | 15192 | 100.00 | |||
| 967 | 3.71 | 607876 | 597902 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 2|2673|3928|6881|26072 | 562657 | 15192 | 100.00 | |||
| 968 | 4.46 | 731692 | 727838 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 0|2507|4136|7393|22877 | 636218 | 15167 | 19 | 99.87 | ||
| 695 | 7.16 | 1156516 | 1129415 | 2|36|37|38|38 | 2|35|37|38|38 | NDV | 2|1678|3827|6301|32862 | 955829 | 15192 | 100.00 | |||
| 714 | 3.98 | 653603 | 643246 | 2|37|37|38|38 | 2|36|37|38|38 | NDV | 1|2082|3513|6724|35052 | 570825 | 15192 | 100.00 | |||
| 715 | 3.63 | 594802 | 580223 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 10|2608|5633|8881|28809 | 526885 | 15192 | 100.00 | |||
| 720 | 4.05 | 663757 | 657982 | 2|37|37|38|38 | 2|35|37|38|38 | NDV | 13|3821|6267|9252|27439 | 525313 | 15192 | 100.00 | |||
| 861 | 3.51 | 576077 | 574864 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 6|3930|6419|8881|27153 | 559394 | 15192 | 100.00 | |||
| 867 | 4.11 | 673510 | 668781 | 2|37|38|38|38 | 2|36|37|38|38 | NDV | 0|4284|6079|9028|23441 | 647586 | 15176 | 16 | 99.89 | ||
| 892 | 3.92 | 642753 | 642250 | 2|37|38|38|38 | 2|35|37|38|38 | NDV | 1|5147|7049|9922|18221 | 618591 | 15180 | 12 | 99.92 | ||
| 913 | 4.06 | 665640 | 661350 | 2|36|37|38|38 | 2|35|37|38|38 | NDV | 0|2278|3784|7011|31842 | 566280 | 15192 | 100.00 | |||
| 688 | 0.01 | no data | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
NA not applicable
a the fraction of reads assigned to each sample out of all number of reads that passed filter (i.e. pass-filter reads)
b the number of paired reads remaining after host and internal control filtering
c numbers represent distribution (minimum | lower quartile | median | upper quartile | maximum)
d numbers of paired reads used to re-call the final consensus for each sequence
e the missing nucleotides at the ends and the fraction of the expected full genome length covered by the consensus scaffold (i.e. not containing unknown nucleotides)
Summary of sequencing and assembly data of five samples that were identified to have mixed populations of Newcastle disease virus (NDV) and other avian viruses
| Isolate number | % PF readsa | Number of raw read pairs | Number of filtered read pairsb | Forward read qualityc | Reverse read qualityc | Identified virus | Final coverage depthc | Number of reads used for consensusd | Consensus nucleotide length | Missing positions at 5' ende | Length of internal gaps | Missing positions at 3' ende | Percent coveragef |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
| ||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
| |||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
a the fraction of reads assigned to each sample out of all number of reads that passed filter (i.e. pass-filter reads)
b the number of paired reads remaining after host and internal control filtering
c numbers represent distribution (minimum | lower quartile | median | upper quartile | maximum)
d numbers of paired reads used to re-call the final consensus for each sequence
e for avian influenza viruses, the missing nucleotides refer to the beginning and the end of the coding sequences of the genes
f the fraction of the expected full genome length covered by the consensus scaffold (i.e. not containing unknown nucleotides), for avian influenza genes, the coverage represents comparison to the coding sequences of the genes only
g Infectious bronchitis virus
h coverage depth and number of reads used to re-call the final consensus for this NDV isolate were impacted by the presence of influenza virus A in the sample (influenza reads were estimated to be approximately 98% of all reads, data not shown)
i Avian influenza virus; PB2 = segment 1 polymerase PB2; PB1 = segment 2 polymerase PB1; PA = segment 3 polymerase PA; HA = segment 4 hemagglutinin; NP = segment 5 nucleocapsid protein; NA = segment 6 neuraminidase; M1, M2 = segment 7 matrix protein 1 and matrix protein 2; NEP = segment 8 nuclear export protein and nonstructural protein 1
Comparison of differences in number of reads and genome coverage of three samples prepared with and without capture of NDV RNA
| Virus designation | Number of reads | % fewer reads without capture | Identity of consensus sequences | Missing sequences at genome termini and internal gaps (in number of nucleotides) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| With capture | Without capture | With capture | Without capture | |||||||
| 5′ | gaps | 3′ | 5′ | gaps | 3′ | |||||
| 691 | 403515 | 283501 | 29.7 | 100% | 20 | 0 | 0 | 26 | 0 | 0 |
| 698 | 363962 | 262452 | 27.9 | 100% | 0 | 0 | 0 | 25 | 0 | 0 |
| 901 | 415661 | 285405 | 31.3 | 100% | 0 | 94 | 0 | 22 | 84 | 0 |