| Literature DB >> 24569039 |
Joshua Chang Mell1, Jae Yun Lee, Marlo Firme, Sunita Sinha, Rosemary J Redfield.
Abstract
Naturally competent bacterial species actively take up environmental DNA and can incorporate it into their chromosomes by homologous recombination. This can bring genetic variation from environmental DNA to recipient chromosomes, often in multiple long "donor" segments. Here, we report the results of genome sequencing 96 colonies of a laboratory Haemophilus influenzae strain, which had been experimentally transformed by DNA from a diverged clinical isolate. Donor segments averaged 6.9 kb (spanning several genes) and were clustered into recombination tracts of ~19.5 kb. Individual colonies had replaced from 0.1 to 3.2% of their chromosomes, and ~1/3 of all donor-specific single-nucleotide variants were present in at least one recombinant. We found that nucleotide divergence did not obviously limit the locations of recombination tracts, although there were small but significant reductions in divergence at recombination breakpoints. Although indels occasionally transformed as parts of longer recombination tracts, they were common at breakpoints, suggesting that indels typically block progression of strand exchange. Some colonies had recombination tracts in which variant positions contained mixtures of both donor and recipient alleles. These tracts were clustered around the origin of replication and were interpreted as the result of heteroduplex segregation in the original transformed cell. Finally, a pilot experiment demonstrated the utility of natural transformation for genetically dissecting natural phenotypic variation. We discuss our results in the context of the potential to merge experimental and population genetic approaches, giving a more holistic understanding of bacterial gene transfer.Entities:
Keywords: bacteria; heteroduplex segregation; horizontal gene transfer; nearly isogenic lines; recombination
Mesh:
Year: 2014 PMID: 24569039 PMCID: PMC4059242 DOI: 10.1534/g3.113.009597
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Model of transformation and possible outcomes of heteroduplex segregation for two independent donor DNAs (blue) in the absence of heteroduplex correction. One fragment carries a selectable antibiotic resistance allele (R); the other is unselected (U). Strands that go on to produce a resistant colony are indicated with an asterisk (*). (A) The U fragment replaces resident sequences on the same strand as the R fragment, leading to a resistant colony homogeneous for the R and U recombination tracts. (B) The U fragment replaces the opposite strand as the R fragments, leading to a resistant colony with only the R tract. (C) and (D) are as in (A), except that the two donor molecules transform asynchronously with respect to segregation. This could be either temporal (one fragment transformed before replication and the other after) or spatial (the replication fork had passed through one locus but not the other when the fragments transformed). (C) The R fragment transforms first, leading to a colony either heterogeneous for the U fragment (shown by the blue circle) or missing the U fragment, depending on the strands transformed. (D) The U fragment transforms first, leading to a resistant colony homogeneous for either both fragments or only the R fragment.
Transformation experiments and clone isolation
| Cell Preparation | |||||
|---|---|---|---|---|---|
| Donor DNA | Value | MIV-1 | MIV-2 | MIV-3 | Late-log |
| RR666 donor DNA | NovR/CFU | 6.40 × 10−4 | 9.20 × 10−4 | 2.40 × 10−3 | 1.33 × 10−5 |
| NalR/CFU | 9.50 × 10−4 | 4.00 × 10−3 | 2.50 × 10−3 | ND | |
| Congression | 14.5 | 8.7 | 7.0 | ND | |
| RR3131 donor DNA | NovR/CFU | 2.70 × 10−4 | 6.10 × 10−4 | 7.30 × 10−4 | 1.15 × 10−5 |
| NalR/CFU | 8.50 × 10−4 | 1.10 × 10−3 | 8.80 × 10−4 | ND | |
| Congression | 9.6 | 8.2 | 6.2 | ND | |
| Strains collected and sequenced | NalR (36) | RR4001-12 | RR4033-44 | RR4065-76 | ND |
| NovR (44) | RR4013-24 | RR4045-56 | RR4077-88 | RR4117-24 | |
| None (16) | RR4025-26 | ND | RR4101-10,13-16 | ND | |
CFU, colony-forming unit; ND , no data.
Three independent MIV cultures of the lab strain Rd (also known as KW20, strain RR722) were prepared and transformed with genomic DNA from either MAP7 (RR666) or a NovR NalR derivative of 86-028NP (RR3131) as donor [(DNA) ~1 genome/cell]. A fourth late-log culture (OD600 = 1.2) also was transformed. The transformation frequency of the donor NovR and NalR alleles were measured by standard plating assay.
Congression was measured as the observed:expected ratio of NovRNalR recombinants, or (NovRNalR/CFU) / (NovR/CFU * NalR/CFU).
Isolated colonies were propagated from each transformation with RR3131 donor DNA for genome sequencing. The antibiotic used for selection is indicated in column 2, with the total number of isolated clones indicated in parentheses. Strain numbers are indicated for each selected type (Table S1). In addition to the strains listed here, the donor and recipient genomes were sequenced in parallel as controls (RR722, RR666, and RR3131).
For viable CFUs isolated from MIV cultures (none), DNAs were pooled into pairs prior to library construction to reduce sequencing costs.
Sequencing statistics
| Control Reads | Experimental Reads (Remaining 88 Samples) | |||||||
|---|---|---|---|---|---|---|---|---|
| Genome | Statistic | RR722 | RR666 | RR3131 | Mean | SD | Min | Max |
| None | QC-passed | 2,133,684 | 2,331,004 | 2,012,650 | 1,859,324 | 705,545 | 795,476 | 3,667,922 |
| % QC-failed | 45.68 | 47.81 | 44.01 | 48.62 | 13.96 | 24.1 | 72.48 | |
| Rd reference | % Aligned | 99.95 | 99.94 | 87.66 | 99.74 | 0.24 | 98.31 | 99.95 |
| Median depth | 55 ± 25 | 61 ± 27 | 49 ± 25 | 49 ± 23 | 18 ± 12 | 21 ± 9 | 94 ± 56 | |
| Unmapped | 674 | 1,200 | 112,011 | 1,790 | 1,172 | 375 | 5,888 | |
| % Coverage | 99.81 | 99.79 | 91.35 | 99.68 | 0.15 | 99.19 | 99.89 | |
| 86-028NP reference | % Aligned | 92.16 | 91.76 | 99.89 | 91.76 | 0.22 | 90.37 | 92.06 |
| Median depth | 56 ± 27 | 61 ± 27 | 50 ± 24 | 49 ± 23 | 18 ± 12 | 21 ± 9 | 94 ± 56 | |
| Unmapped | 241,928 | 242,245 | 1,821 | 243,056 | 1,625 | 236,352 | 246,081 | |
| % Coverage | 87.01 | 87.00 | 99.65 | 86.69 | 0.14 | 86.44 | 87.21 | |
QC, quality control; MAD, median absolute deviation
The sequence reference used for short-read alignment.
%QC-failed reads accounts for both those that had failed the Illumina chastity filter and those that were removed by the utility sortPairedReads, which identifies and culls read pairs containing the sequencing adaptors. The vast majority of these were adaptor dimers.
%QC-passed reads mapped indicates how many reads were aligned to the reference genome indicated.
Median ± MAD read depth across all reference positions supported by at least one read.
Number of reference positions with no supporting aligned read (read depth = 0).
Fraction of reference genome positions covered by at least three reads.
SNVs distinguishing donor from recipient genomes—Filtering and cross-validation
| Rd Reference | 86-028NP Reference | |
|---|---|---|
| Total initial variants detected in 91 short-read datasets | 40,398 | 44,591 |
| Short indels | 818 | 1228 |
| Ambiguous lift-over position in reciprocal reference | 1712 | 5495 |
| Invariant/ambiguous control genotype | 2251 | 1074 |
| High-frequency mixed genotype | 16 | 162 |
| Invariant/ambiguous genotype at lifted-over position | 226 | 1257 |
| Conflict between genotypes in reciprocal alignments | 312 | |
| Final set of “gold-standard” filtered SNVs | 35,063 | |
| Transforming SNVs (≥1) | 10,449 | |
SNV, single-nucleotide variation.
Because the sequence reads were aligned to both the donor and recipient genome references, variant calls and filtering were initially carried out independently on the two sets of alignments, one for each reference. This allowed for cross-validation of variant calls and elimination of alignment artifacts.
Total positions with ≥1 alternate allele out of 91 samples aligned to each parental reference (Rd or 86-028NP). Due to selectable markers introduced in the donor strain, genotypes were manually corrected for MAP7-specific variation prior to counting.
The number of short indels in ≥1 clones (predominantly simple sequence repeat variants). These were excluded from further analysis as SNVs.
Progressive filters against error-prone and ambiguous calls; report filtered positions that passed the previous filter.
Positions where whole-genome alignment gave two different lift-overs (conversions between recipient and donor coordinates), depending on which reference was used as the query during Mauve alignment.
Positions where parental base calls were invariant or ambiguous (excluding). The expected pattern is that donor reads would have the reference base against 86-028NP and an alternate base against Rd, while the recipient reads would have the reciprocal.
Positions where >5% of samples gave an ambiguous/mixed call (excluding), removing most error-prone positions but not mixed genotypes arising from transformation.
Positions in which the lift-over position (coordinate the reciprocal alignment) were invariant or ambiguous (excluding ), reconciling the variant positions between the two alignments.
Positions passing the above filters, but with ≥1 conflict in the base call made depending on the reference used.
Final set of filtered SNVs. All positions have a valid lift-over to the reciprocal reference, a low frequency of ambiguous/mixed genotypes, and consistent genotypes between both parental control reads and reciprocal alignments of the parental references.
Count of SNVs for which ≥1 of the 72 selected recombinants had a donor allele.
SVs distinguishing donor from recipient—Filtering and cross-validation
| Class | All SVs | Transforming SVs (≥1) | ||||
|---|---|---|---|---|---|---|
| Deletes | Inserts | Total | Deletes | Inserts | Total | |
| 1 bp | 277 / 166 | 318 / 203 | 595 / 369 | 43 | 46 | 89 |
| 2−10 bp | 192 / 154 | 180 / 148 | 369 / 300 | 33 | 29 | 61 |
| 11−100 bp | 53 / 34 | 63 / 51 | 95 / 68 | 5 | 12 | 14 |
| 101−1000 bp | 39 / 33 | 34 / 30 | 90 / 73 | 4 | 5 | 12 |
| >1000 bp | 23 / 21 | 30 / 18 | 62 / 48 | 5 | 4 | 10 |
| Complex | 10 | 0 | ||||
| Total | 584 / 408 | 625 / 450 | 1221 / 868 | 90 | 96 | 186 |
SVs, structural variants; IGV, Integrative Genomics Viewer.
Total SVs from a Mauve alignment. Indel directionality is relative to transformation, such that insertions are donor-specific and deletions are recipient-specific. Reporting indels is complicated by “insertional deletions,” where donor sequences would replace recipient sequences (17% of filtered SVs), a pattern more common for large SVs (44% affecting >100 bp), so net change is reported.
Subset of indels for which reads distinguished donor from recipient and <5% of genotypes were ambiguous.
Subset of SVs with the donor allele present in ≥1 of the 72 selected clones.
SVs with indicated number of bps would be deleted by transformation; the net deletion (i.e., the donor allele was shorter than the recipient allele).
SVs with indicated number of bps would be inserted by transformation; the net insertion (i.e., the donor allele was longer than the recipient allele).
SVs affecting the indicated number of bps; sum of donor and recipient allele lengths.
Complex SVs (inversions, relocations, etc.) missed by the genotyping method but manually inspected using IGV.
Figure 2Extent of recombination in 96 transformed colonies. Each row indicates a genome, with coordinates are according to the recipient Rd genome sequence. Blue hatches indicate donor-specific SNVs. Turquoise hatches indicate a mixture of donor- and recipient-specific SNVs. (A) 36 NovR-selected recombinants from MIV transformations. (B) 36 NalR-selected recombinants from MIV transformations. (C) Eight pools of two unselected clones from MIV transformations (such that turquoise hatches indicate donor variation in one of the two pooled clones). *Indicates the recombination tract in RR4108 that confers altered transformability (see text). (D) Eight NovR-selected recombinants from a late-log transformation. SNV, single-nucleotide variant.
Figure 3Summary histograms of transformation in the 72 selected colonies. (A) Total donor SNVs; (B) combined length of donor segments in each clone (measured from outermost donor-specific SNVs aligned to the recipient genome); (C) length of individual donor segments; (D) same as in (C), but with segment length on a log-scale and the 11 donor segments containing only one SNV are indicated with an arrow. Best fits are shown as red lines in (A−C) used distributions that best fit the data, as determined using the R package fitdistrplus [log-normal for (A−B) and exponential for (C)]. “Mixed” segments were included in these measurements. SNV, single-nucleotide variant.
Figure 4Cumulative SNV transformation. Count of donor-specific (dark blue) and mixed (turquoise) SNVs across the 72 selected transformants illustrated in Figure 2, A and B (y-axis) using the recipient genome coordinate (x-axis). Bottom pink track shows all 35,063 SNV markers used to distinguish donor from recipient. Regions with low marker density correspond to recipient-specific insertions, e.g., a recipient-specific prophage insertion at ~1600 kb. SNV, single-nucleotide variant.
Figure 5Cotransformation around selected sites. (A) and (B) show zooms around the NovR and NalR selected sites for NovR and NalR clones, respectively, whereas (C) and (D) draw the genotype for each specific clone. Donor SNVs are shown in light blue, whereas donor SVs are shown in dark blue (following the width of the recipient allele). The bottoms of the green trapezoids along the x-axis show the width of the SVs according to the donor genome. For example, the SV at ~584 kb is a 2.7-kb transposon specific to the donor strain, and the SV at ~1347 kb has 2 kb in the recipient that do not align with 300 bp in the donor. SNV, single-nucleotide variant, SV, structural variant.
Figure 6Estimates of maximal donor segment clustering. Histograms are shown of (A) donor segments detected per clone, (B) minimum “recombination tracts” per clone, defined as sets of adjacent segments falling within 100-kb windows. The black line estimates the number of independent recombination events in the original competent cells, assuming heteroduplex segregation and a maximum of four events.
Figure 7Effects of sequence divergence. (A) Nucleotide divergence has no observable effect. Plot shows the count of donor SNVs in each segment as a function of segment size. Colored symbols indicates whether the segment spanned either the NovR or NalR selected site (red triangles), was in the same tracts as a selected segment (green triangles), or was considered independent of the selected segment (blue circles). Black line indicates genome-wide average. Solid and dotted lines indicate the linear fit and 95% prediction intervals for all segments (dark gray) and only segments more than 100 kb from the selected site (light gray). (B) Indels/SVs changed length of the recombined segments. The x-axis shows length of recipient sequences replaced by each donor segment, while the y-axis shows the change in length between the recipient and donor segments. Color-coding as in (A), with blue lines to highlight unselected segments. The recurrent 2.4-kb insertion and 1.7-kb deletion correspond to the SVs adjacent to the NovR and NalR sites, as described in the example in Figure 5.
Figure 8Breakpoint interval lengths are longer than genome average. (A) Beanplot of sizes of breakpoint intervals (distance between outermost donor-specific SNVs and innermost recipient-specific SNVs): “All” includes all breakpoint intervals; “Clean” includes only unique breakpoints of unselected donor segments unaffected by SVs; and “Genome” includes all intervariant spacings (35,931 total). Black lines show the medians; width of bean is proportional to the density of values with that spacing. (B) Plot of the same data, where the x-axis is the cumulative proportion of the data with interval size less than the value indicated on the y-axis.
Classification of recombination breakpoint intervals
| No Flanking SV | Flanking SV | % SV | |
|---|---|---|---|
| Genome-wide intervals | 35,063 | 868 | 2.4 |
| Breakpoint locations | 346 | 27 | 7.2 |
| Total breakpoints | 385 | 47 | 10.9 |
| Unselected locations | 154 | 15 | 9.7 |
| Unselected breakpoints | 157 | 15 | 9.6 |
SVs, single variants.
Figure 9Transformants with evidence of clustered heteroduplex donor segments. Top panels show illustrations of each recombinant. Middle panels shows zoom on the clustered “mixed” donor segments, with the y-axis showing the donor-specific allele frequency. Bottom panel illustrates the inferred recombination intermediate. (A) Heteroduplex segregation of two molecules that transformed the same recipient strand. Clustering suggests a single long strand was translocated into the cytoplasm, but endonuclease activities disrupted the ssDNA before or during recombination. (B) Heteroduplex segregation of two molecules that transformed complementary recipient strands. If derived from a single molecule, each end of the periplasmic dsDNA would have been independently translocated. Since translocation occurs by transport of ssDNA by its 3′-end, translocation from both ends through two different pores would yield two cytoplasmic ssDNAs adjacent to each other, but complementary to opposite strands. Figure S4 shows a clone without mixed tracts that would also have required double-translocation across an inversion, if the clustered segments were derived from the same molecule. ssDNA, single-stranded DNA.
MAP7 SNVs
| Position | Resist | Gene | Mutation | Codon | Site | Substitution |
|---|---|---|---|---|---|---|
| 537,227 | RifR | HI0515 ( | C→T | 619 | 1 | Ala→Thr |
| 537,378 | “ | G→T | 568 | 3 | Asn→Lys | |
| 587,579 | NovR | HI0567 ( | G→T | 140 | 1 | Gln→Lys |
| 587,846 | “ | C→A | 51 | 1 | Ala→Ser | |
| 587,969 | “ | C→A | 10 | 1 | Gly→Cys | |
| 597,885 | KanR | HI0579 ( | G→C | 655 | 2 | Pro→Arg |
| 600,806 | StrR | HI0581 ( | T→C | 43 | 2 | Lys→Arg |
| 839,913 | SpcR | HI0778 ( | G→C | 65 | 2 | Gly→Ala |
| 935,046 | HI0883 (unk) | A→C | 356 | 3 | ||
| 1,344,100 | NalR | HI1264 ( | C→A | 88 | 1 | Asp→Tyr |
SNVs detected in MAP7 reads against the Rd reference that were also SNVs in the recipient reads, as well as positions with ambiguous/mixed genotype calls, are not reported. SNV, single-nucleotide variation.
These variants are also present in the donor RR3131 (86-028NP NovR NalR)
Figure 10Transformation mapping of a transformability factor. (A) For the primary screen, “transformation-during growth” assays measure transformation frequencies (KanR/CFU) of the set of 96 recombinants. Data are normalized to the frequencies observed for RR722 recipient controls in each experiment to account for experimental variation over the 11 independent assays performed. Thick lines indicate median value and thin lines indicate 5% and 95% quartiles. The RR3131 donor strain was included as a control. (B) Map of the donor segment interval in RR4108 is shown in blue replacing the syntenic genes in the recipient chromosome (pink). Black lines show donor segment endpoints. The recipient-specific xylose ABC transporter (shown with the gray triangle) was deleted in the recombinant. The comM gene is highlighted with a black box. (C) Transformation frequencies (KanR/CFU) using the standard MIV method for the recipient and recombinant RR4108 strains with and without the comM gene and complementation by plasmids carrying either the donor or recipient alleles of the comM gene. The donor RR3131 (NP) strain is shown as a control. The comM deletion in RR3131 had a transformation frequency below the limit of detection. Bars show mean transformation frequencies and error bars show standard deviations from triplicate experiments. Letters above the bars group strains with no significant difference by paired t-test. Note that the difference between donor and recipient in the standard MIV assay is larger than in the primary screen.