| Literature DB >> 29434031 |
Jean Peccoud1, Sébastian Lequime2,3, Isabelle Moltini-Conclois2,3, Isabelle Giraud1, Louis Lambrechts2,3, Clément Gilbert4.
Abstract
Chimeric reads can be generated by in vitro recombination during the preparation of high-throughput sequencing libraries. Our attempt to detect biological recombination between the genomes of dengue virus (DENV; +ssRNA genome) and its mosquito host using the Illumina Nextera sequencing library preparation kit revealed that most, if not all, detected host-virus chimeras were artificial. Indeed, these chimeras were not more frequent than with control RNA from another species (a pillbug), which was never in contact with DENV RNA prior to the library preparation. The proportion of chimera types merely reflected those of the three species among sequencing reads. Chimeras were frequently characterized by the presence of 1-20 bp microhomology between recombining fragments. Within-species chimeras mostly involved fragments in opposite orientations and located less than 100 bp from each other in the parental genome. We found similar features in published datasets using two other viruses: Ebola virus (EBOV; -ssRNA genome) and a herpesvirus (dsDNA genome), both produced with the Illumina Nextera protocol. These canonical features suggest that artificial chimeras are generated by intra-molecular template switching of the DNA polymerase during the PCR step of the Nextera protocol. Finally, a published Illumina dataset using the Flock House virus (FHV; +ssRNA genome) generated with a protocol preventing artificial recombination revealed the presence of 1-10 bp microhomology motifs in FHV-FHV chimeras, but very few recombining fragments were in opposite orientations. Our analysis uncovered sequence features characterizing recombination breakpoints in short-read sequencing datasets, which can be helpful to evaluate the presence and extent of artificial recombination.Entities:
Keywords: Illumina; artificial chimeras; high-throughput sequencing; recombination; virus
Mesh:
Substances:
Year: 2018 PMID: 29434031 PMCID: PMC5873904 DOI: 10.1534/g3.117.300468
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Number of chimeric reads detected in all datasets analyzed in this study
| Type of chimeras | Total number of chimeras | Proportion of chimeras | Number of unique chimeras | Total replicates 1 + 2 |
|---|---|---|---|---|
| DENV 1–Moquito 1 | 350 | 0.013–0.13% | 343 | 555 |
| DENV 2–Mosquito 2 | 216 | 0.007–0.098% | 212 | |
| DENV 1–Pillbug 1 | 2353 | 0.09–0.22% | 2304 | 3639 |
| DENV 2–Pillbug 2 | 1364 | 0.05–0.16% | 1335 | |
| DENV 1–DENV 1 | 212644 | 8% | 44805 | 81516 |
| DENV 2–DENV 2 | 189633 | 6.70% | 36711 | |
| EBOV–EBOV | 144556 | 7.21% | 16504 | — |
| Pillbug18S–Pillbug18S 1 | 20324 | 4.90% | 4662 | 817 |
| Pillbug18S–Pillbug18S 2 | 14799 | 4.56% | 3508 | |
| MaHV-1–MaHV-1 | 2507 | 0.12% | 1139 | — |
| FHV-RNA1–FHV-RNA1 | 744201 | 2.90% | 3675 | — |
| FHV-RNA2–FHV-RNA2 | 45155 | 0.61% | 1313 | — |
The percentage of total reads made up by chimeras. For inter-genome chimeras, the proportions are given in respect of the total number of reads mapping on each genome.
Figure 1Distribution of lengths of microhomology motifs found in mosquito–DENV (A) and pillbug–DENV (B) breakpoints. The black/gray and red histograms show the observed and expected distributions, respectively. Negative lengths (gray bars) represent insertion of non-templated nucleotides at junction points. Frequencies of observed microhomologies were rescaled so that the heights of black bars sum to 1. This was needed for comparison to the expected distribution, which does not account for negative microhomology.
Figure 2Characteristics of DENV–DENV breakpoints. (A) Distribution of lengths of microhomology motifs, as in Figure 1. Negative lengths (gray bars) represent insertion of non-templated nucleotides at junction points. Frequencies of observed microhomologies were rescaled so that the heights of black bars sum to 1. (B) Density of observed (black/gray) and expected (red) distances separating recombining fragments found in DENV–DENV himeras. (C) Number of DENV–DENV breakpoints per 100-bp, non-overlapping windows along the DENV genome.
Figure 3Characteristics of FHV–FHV breakpoints. (A) Distribution of lengths of microhomology motifs as in Figure 1. Negative lengths (gray bars) represent insertion of non-templated nucleotides at junction points. Frequencies of observed microhomologies were rescaled so that the heights of black bars sum to 1. (B) Density of observed (black/gray) and expected (red) distances separating recombining fragments found in FHV–FHV chimeras. The number of FHV–FHV breakpoints per 100-bp, non-overlapping windows along the FHV genome is shown in (C) for the FHV RNA1 and (D) for the FHV RNA2.
Figure 4Model for the formation of artificial chimeras during the PCR step of the Illumina Nextera library preparation kit. Artificial chimeras most likely take place by intra-molecular template switching (TS) during extension of a primer along a double-stranded (or partially double-stranded) template. The dissociated extending strand reanneals to the displaced strand at a position upstream in respect to the direction of extension. Such reannealing is facilitated by base pairing over 1-20 bp (here, 3 bp), i.e., the microhomology motifs we detect in our analyses of recombination breakpoints. Recombining sequences are preferentially located at short distances from each other (<100 bp). The bottom schematic shows how a chimeric read sequenced from a recombining fragment (top) would align on the source genome sequence (bottom). Dashed lines connect start and end alignment coordinates on the read to their counterparts on the genome sequence.