| Literature DB >> 25157264 |
Shea N Gardner1, Crystal J Jaing2, Maher M Elsheikh2, José Peña2, David A Hysom1, Monica K Borucki2.
Abstract
Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.Entities:
Year: 2014 PMID: 25157264 PMCID: PMC4137498 DOI: 10.1155/2014/101894
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Summary of average lengths, number of sequences, and percentage of conserved bases in a multiple sequence alignment (with MUSCLE [5]), and number of tiled primers required for the short and long amplicon settings.
| Organism | Number of sequences | Avg. Length | Consensus (%) | Number of primers for ~3,000 bp amplicons | Number of primers for ~10,000 bp amplicons |
|---|---|---|---|---|---|
| CCHF_S | 56 | 1668 | 39 | 6 | 6 |
| CCHF_M | 49 | 5314 | 24 | 46 | 16 |
| CCHF_L | 31 | 12113 | 46 | 69 | 27 |
| RVF_S | 89 | 1684 | 53 | 2 | 2 |
| RVF_M | 69 | 3885 | 78 | 4 | 6 |
| RVF_L | 62 | 6404 | 83 | 6 | 4 |
| Ebola | 22 | 18659 | 5 | 116 | 35 |
| Marburg | 31 | 19115 | 70 | 34 | 8 |
| Hendra | 10 | 18234 | 97 | 12 | 4 |
| Nipah | 9 | 18247 | 91 | 18 | 6 |
| Junin_L | 12 | 7114 | 96 | 6 | 2 |
| Machupo_L | 5 | 7141 | 88 | 10 | 2 |
| Junin_S | 26 | 3410 | 80 | 4 | 4 |
| Machupo_S | 13 | 3432 | 76 | 4 | 4 |
| JEV | 144 | 10968 | 56 | 26 | 6 |
| NW_Arena_S | 100 | 3396 | 18 | 64 | 42 |
| NW_Arena_L | 42 | 7107 | 18 | 83 | 19 |
| OW_Arena_S | 54 | 3547 | 8 | 116 | 32 |
| OW_Arena_L | 45 | 7199 | 21 | 110 | 35 |
| TBEV | 67 | 10840 | 36 | 56 | 10 |
Abbreviations: CCHF = Crimean-Congo hemorrhagic fever, RVF = Rift Valley fever, JEV = Japanese encephalitis virus, NW_Arena = New World Arenavirus, OW_Arena = Old World Arenavirus, TBEV = tick-borne encephalitis virus, _L = L segment, _S = S segment.
Figure 1Diagram showing how the multiple sequence alignment is split into overlapping sections, and conserved; degenerate sets of primers are designed near the ends of the overlapping pieces so that overlapping amplicons should be produced which tile across the viral genome. FP = forward primer; RP = reverse primer.
Parameters used for primer design in in silico examples and MHV example presented here.
|
| MHV primer settings | |
|---|---|---|
| Primer length range | 18–25 | 18–27 |
|
| 60–65°C | 58–65°C |
| Number degenerate bases allowed per primer | 5 | 3 |
| Minimum distance of degenerate base to 3′ end of primer | 3 nt | 3 nt |
| Minimum trimer entropy allowed (to avoid repetitive sequence)2 | 3.5 | 3.3 |
| Maximum length of homopolymer allowed | 4 nt | 5 nt |
| GC% range allowed | 20–80 | 20–80 |
| Minimum primer dimer Δ | −6 kcal/mol | −15 kcal/mol |
| Minimum hairpin Δ | −5 kcal/mol | −12 kcal/mol |
| Primer selection iterations | 1 | 3 |
1 T is calculated using Unafold [6].
2Low complexity regions (repetitive sequence) are excluded from consideration as primers by setting a minimum entropy threshold for a primer candidate. The entropy S of a sequence was computed by counting the numbers of occurrences of n , n ,…, n of the 64 possible trimers in the probe sequence, and dividing by the total number of trimers, yielding the corresponding frequencies f ,…, f . The entropy is then given by the sum of −f log2f where the sum is over the trimers t with f ≠ 0.
Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons. In a multiplex of the 3 kb-amplicon tiled primers for a given organism, of the possible reactions producing products, only a small number of primer combinations are predicted to amplify regions in nontarget organisms. Counts show the number of unique primer combinations in a multiplex that yield products for any sequence in the NCBI nt nucleotide database. The numerator is for any nontarget organism in nt and the denominator is for any target or nontarget organism in nt, that is, nonspecific/total of the possible primer combinations in the multiplex predicted to yield product when compared against nt. Vastly more amplicons are produced from target organisms, indicating any contaminating nontarget species should be a small minority of amplified product.
| Organism | Nontarget amplicons/total amplicons | Nontarget amplicon source organism |
|---|---|---|
| CCHF_S | 0/160 | — |
| CCHF_M | 0/1934 | — |
| CCHF_L | 0/3753 | — |
| RVF_S | 0/137 | — |
| RVF_M | 0/356 | — |
| RVF_L | 0/753 | — |
| Ebola | 1/2657 | Zea mays clone BAC ZMMBBb0342E21 |
| Marburg | 0/1511 | — |
| Hendra | 0/206 | — |
| Nipah | 0/286 | — |
| Junin_L | 0/69 | — |
| Machupo_L | 0/153 | — |
| Junin_S | 0/84 | — |
| Machupo_S | 0/32 | — |
| JEV | 7/9515 | Rocio |
| NW_Arena_S | 56/1543 | Ippy |
| NW_Arena_L | 0/819 | — |
| OW_Arena_S | 73/2509 | Allpahuayo |
| OW_Arena_L | 1/1826 | Dandenong |
| TBEV | 0/4925 | — |
Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons. As in Table 3, but for the multiplexes of the 10 kb-amplicon tiled primers.
| Organism | Nontarget amplicons/total amplicons | Nontarget amplicon source organism |
|---|---|---|
| CCHF_S | 0/160 | — |
| CCHF_M | 0/261 | — |
| CCHF_L | 0/253 | — |
| RVF_S | 0/137 | — |
| RVF_M | 0/487 | — |
| RVF_L | 0/195 | — |
| Ebola | 0/534 | — |
| Marburg | 0/123 | — |
| Hendra | 0/50 | — |
| Nipah | 0/74 | — |
| Junin_L | 0/12 | — |
| Machupo_L | 0/7 | — |
| Junin_S | 0/95 | — |
| Machupo_S | 0/32 | — |
| JEV | 0/1554 | — |
| NW_Arena_S | 1/337 | Human chromosome 14 BAC C-2555K7 of library CalTech-D |
| NW_Arena_L | 0/86 | — |
| OW_Arena_S | 0/316 | — |
| OW_Arena_L | 0/131 | — |
| TBEV | 0/189 | — |
Figure 2Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were tested. The approximate position of each region amplified by primer sets is shown (MHV genome is not drawn to scale). Each multiplex reaction consisted of primer sets that do not overlap in regions amplified. Each region is amplified using 3 forward primers and 3 reverse primers (Table S1; see Supplementary Material available online at http://dx.doi.org/10.1155/2014/101894). For example, the A primer set consists of 3 forward primers (A1F, A2F, and A3F) and 3 reverse primers (A1R, A2R, and A3R). To verify that each region is amplified in the multiplex reaction, a second set of seminested PCRs were performed using the amplicons from the multiplex reaction as a template. For example, to ensure region A was amplified, the PCR product from the A mix multiplex was diluted 1 : 10,000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2). Primers are labeled according to genome region (A-I) and primer direction (F = forward, R = reverse).