| Literature DB >> 31299914 |
Eric T Dawson1,2, Sarah Wagner3, David Roberson3, Meredith Yeager3, Joseph Boland3, Erik Garrison4, Stephen Chanock1, Mark Schiffman1, Tina Raine-Bennett5, Thomas Lorey6, Phillip E Castle7, Lisa Mirabello1, Richard Durbin8,9.
Abstract
BACKGROUND: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods.Entities:
Keywords: Bioinformatics; Coinfection; HPV; Human papillomavirus; Kmers; MinHash
Mesh:
Substances:
Year: 2019 PMID: 31299914 PMCID: PMC6626348 DOI: 10.1186/s12859-019-2918-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Sensitivity of rkmh with respect to sketch size (a) and kmer size (b). There are diminishing returns to increasing sketch size above roughly 4000, regardless of read length. (b) shows that kmers are not sufficiently unique to classify reads with k ≤10. Above k = 18, sensitivity begins to drop, likely due to the effects of incorporating sequencing errors into kmers. This is especially noticeable for ONT minION reads, which have a much higher error rate (above 12% per base for the R7.4 pore) compared to ION Torrent and Illumina (<0.1% per base)
Fig. 2Precision/recall plots for type classification of 70,000 Ion Torrent reads from an HPV16 amplicon sequencing reaction (a) and 3660 ONT minION reads derived from two HPV16 isolates (b, c) at various read sketch pruning levels M indicated by the label attached to each point. Read sketch pruning removes rare kmers in the read sketch which might be random sequencing errors. (a, b) were classified using a kmer size of 16 and (c) was classified using a kmer size of 10. Ion Torrent reads have low substitution error rates, so pruning removes few kmers and the precision boost is small (<0.001%) (a). ONT minION reads have a much higher error rate approaching 10% per-base. For minION reads, pruning is able to improve precision to roughly 99.8% when using a kmer size of 16 (b). A smaller kmer size of 10 combined with high levels of pruning lead to an increase in both precision and recall, with precision and recall increasing from slightly more than 97.0% to over 99% (c)
Fig. 3a The performance of rkmh on a simulated HPV type coinfection. Summing the rows of this matrix gives percent prevalence estimates for each type b
Fig. 4Percent similarity for HPV sublineage; numbers above the diagonal are nucleotide similarity. Numbers under the diagonal are similarity estimates based on the number of shared hashes from rkmh
Fig. 5A The percentage of reads from a simulated coinfection classified by rkmh to each of the HPV16 sublineages, at default settings (k = 16, s = 1000, no pruning, no difference filter). Summing each row of a, with the exception of reads that couldn’t be classified, gives the percent prevalence estimate of each sublineage (b). c The percent of reads classified to each sublineage by rkmh at pruning level M = 100 and I = 1. This significantly improves the prevalence estimates (d)
Performance of rkmh on samples from [10] which were manually reviewed for their infecting sublineages and coinfection status
| N = 34 manually annotated samples | Agrees with annotations | disagrees with annotation | Concordance |
|---|---|---|---|
| Primary Lineage | 32 | 2 | 95% |
| Primary Sublineage | 31 | 3 | 91% |
| Secondary Lineage | 24 | 10 | 71% |
| Secondary Sublineage | 12 | 22 | 35% |
| Coinfection status, lineage | 27 | 7 | 79% |
| Coinfection status, sublineage | 24 | 10 | 70% |