| Literature DB >> 27006753 |
Abstract
We review currently available technologies for deconvoluting metagenomic data into individual genomes that represent populations, strains, or genotypes present in the community. An evaluation of chromosome conformation capture (3C) and related techniques in the context of metagenomics is presented, using mock microbial communities as a reference. We provide the first independent reproduction of the metagenomic 3C technique described last year, propose some simple improvements to that protocol, and compare the quality of the data with that provided by the more complex Hi-C protocol.Entities:
Keywords: 3C; Hi-C; metagenomics
Year: 2015 PMID: 27006753 PMCID: PMC4798154 DOI: 10.12688/f1000research.7281.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Text box 1. 3C and proximity ligation methods.
Chromosome conformation capture (3C) was first developed as a means to determine the average three dimensional chromosome structure in a population of cells, for a single species ( Dekker ). This general approach was later coupled with high throughput DNA sequencing ( Lieberman-Aiden ), providing a means to generate detailed 3D structure models of chromosomes. Many extensions of the 3C technique have been developed ( Dekker ).
The basic 3C protocol involves an initial step of reversible crosslinking, typically via formaldehyde at 1–3%. This step crosslinks proteins to each other and to DNA. The formaldehyde is then quenched and the cells are lysed either enzymatically or via physical disruption. Next, a restriction digestion is carried out using a 4- or 6-cutter that leaves a single-stranded overhang. Subsequently the sample is placed in a large volume DNA ligase reaction; yielding conditions that strongly favor the ligation of free ends that are co-bound in a protein complex. This step is referred to as proximity ligation. After proximity ligation, the crosslinks are reversed via heat incubation and the DNA is purified via proteinase K & RNAse digestion and EtOH precipitation. Finally, the purified DNA is ready for standard high throughput sequencing library preparation, for example via adapter ligation and enrichment PCR.
Hi-C extends the protocol described above by incorporating steps that enrich the final sequencing library for proximity ligation events. In Hi-C, the single stranded overhangs left after the restriction digest are filled with biotinylated nucleotides. The proximity ligation which follows is thus a blunt-end ligation and the junctions contain biotinylated nucleotides. Biotinylated nucleotides must be removed from any remaining unligated free ends. In the final steps of sequencing library preparation, fragments containing the biotinylated ligation junctions can be captured on streptavidin-coated magnetic beads, yielding a library substantially enriched for proximity ligations ( Lieberman-Aiden ).
Comparison of 3C and Hi-C for metagenomics.
| 3C | Hi-C | |
|---|---|---|
| Proximity ligation read rate | Up to 6.5% | 4% (
|
| Resolution limit | 1–2kbp | 1–2kbp (4 cutter) or 15–30kbp
|
| Marked ligation junctions | No | Yes |
| Difficulty of library prep | hard | very hard |
| Erroneous association rate | <1% | <1% |
| Requires separate metagenomic library | No | Yes |
Table 1. Differences in the features of metagenomic 3C and Hi-C are listed. The proximity ligation read rate indicates the fraction of all reads that contain proximity ligation events. For Hi-C the rate varies widely in published data. The resolution limit is dictated by the density of restriction cut sites in the chromosome, which are typically more dense when using a 4-cutter (3C or Hi-C), than with a 6-cutter (Hi-C only). Marked ligation junctions are created as a by-product of the end-filling in Hi-C and can be identified as a tandem duplication of the overhang sequence in the data. The erroneous association rate is defined as the fraction of read pairs found to associate two different species or strains in mock community experiments.
Figure 1. 3C/Hi-C heatmap.
Contact map of chromatin interactions identified by metagenomic 3C. A synthetic community of four bacterial isolates was subjected to metagenomic 3C and the resulting read data mapped back to reference chromosome assemblies. Heat intensity is proportional to the number of read pairs associating the two chromosome regions. In P. aeruginosa and B. subtilis, the two arms of the circular chromosome are colocalized, as reflected in the column of intense heat emanating from the middle of their chromosomes. Erroneous cross-species associations are seen to be rare (deep blue field).