| Literature DB >> 27980066 |
Xiaolin Wei1,2,3, Zhichao Xu1,2, Guixing Wang4, Jilun Hou4, Xiaopeng Ma1,2, Haijin Liu4, Jiadong Liu5, Bo Chen5, Meizhong Luo5, Bingyan Xie6, Ruiqiang Li7, Jue Ruan8, Xiao Liu1.
Abstract
Applications that use Bacterial Artificial Chromosome (BAC) libraries often require paired-end sequences and knowledge of the physical location of each clone in plates. To facilitate obtaining this information in high-throughput, we generated pBACode vectors: a pool of BAC cloning vectors, each with a pair of random barcodes flanking its cloning site. In a pBACode BAC library, the BAC ends and their linked barcodes can be sequenced in bulk. Barcode pairs are determined by sequencing the empty pBACode vectors, which allows BAC ends to be paired according to their barcodes. For physical clone mapping, the barcodes are used as unique markers for their linked genomic sequence. After multi-dimensional pooling of BAC clones, the barcodes are sequenced and deconvoluted to locate each clone. We generated a pBACode library of 94,464 clones for the flounder Paralichthys olivaceus and obtained paired-end sequence from 95.4% of the clones. Incorporating BAC paired-ends into the genome preassembly improved its continuity by over 10-fold. Furthermore, we were able to use the barcodes to map the physical locations of each clone in just 50 pools, with up to 11 808 clones per pool. Our physical clone mapping located 90.2% of BAC clones, enabling targeted characterization of chromosomal rearrangements.Entities:
Mesh:
Year: 2017 PMID: 27980066 PMCID: PMC5397170 DOI: 10.1093/nar/gkw1261
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Construction of pBACode cloning vectors. (A) A pBACode-1 pool is generated by PCR amplification using pcc2FOS as a template. The 3΄ ends of primers are complementary to the vector sequences flanking the cloning sites so that the whole pcc2FOS vector except for its cloning sites is amplified. The 5΄ ends of the primers carry the cloning sites followed by 20-bp random sequences. (B) A pBACode-2 pool is generated by a combination of PCR amplification and subcloning. First, an intermediate vector is constructed based on pcc2FOS such that its lacZ sequence is replaced by an I-SceI site. The second intermediate vector is constructed by inserting the kan selection gene marker into the cloning site of pcc2FOS, and then the lacZ-kan segment is amplified using primers containing random barcodes. The resulting PCR product is subcloned into the first intermediate vector. Kanamycin selection is used to eliminate no-insert vectors.
Figure 2.Barcoding enables high-throughput profiling of a BAC Library. (A) The method to generate long and accurate BAC end sequences is illustrated using multiple copies of one side of a single BAC plasmid. Black thick lines represent the pBACode backbone while purple lines indicate the genomic insert. Deeper purple fragments are closer to the end of genomic insert. Different colored blocks represent different barcodes. (B) Barcode pair sequencing. Genomic inserts are removed and the empty vectors are self-circularized to co-ligate barcode pairs. This brings barcodes close enough together that an Illumina read can cover both barcodes. After the barcode, pairs are sequenced, the ends of the genomic inserts in each BAC clone are paired according to their barcoded ends. (C) Physical clone mapping. BAC clones in 96-well plates are pooled in multiple dimensions (by row, by column, and by plate, or by various combinations of plates). After BAC DNA is extracted from each pool, the barcodes are PCR amplified using primers with indexed Illumina adapters and sequenced. Each BAC clone is assigned to a specific location in the BAC library after deconvolution of their barcodes from the pooled sequences. These procedures are applicable to both pBACode-1- and pBACode-2-based genomic libraries.
Summary of barcode pairs and barcode-based physical clone mapping
| Yeast | Flounder | ||
|---|---|---|---|
| BAC library size | 1536 | 94 464 | |
| Barcode pair seuqencing | Clean readsa (million) | 1.5 | 30 |
| Raw barcode pairs | 52 243 | 4 315 007 | |
| Consolidated barcode pairs | 1381 (89.9%) | 90 156 (95.4%) | |
| Physical clone mapping | Pooling dimension | 3 | 5 |
| Pools | 36 | 50 | |
| Clones per pool | 96–128 | 7872–11 808 | |
| Clean readsa (million) | 0.78 | 55 | |
| Located BAC clonesb | 1324 (86.2%) | 85 173 (90.2%) | |
| Tested clones | 12 | 42 | |
| Validated clones | 12 | 42 |
aReads with QC > 20 were selected and their vector parts were trimmed.
bClones whose barcodes mapped to unique location in the BAC library by deconvolution.
Figure 3.Characterization of BAC-ends from the yeast BAC library. (A) The distribution of sequencing errors in filtered raw reads and BAC-end contigs, respectively. BAC-end contigs were derived from local assembly using 0.45 M reads encompassing 1373 barcodes for the left ends, and 0.45 M reads encompassing 1387 barcodes for the right ends. (B) The length distribution of the BAC-end contigs. (C) Barcode detection coverage. Left and right barcodes were sequenced with their linked BAC ends using our BAC-end sequencing protocol (Figure 2A). Barcode pairs were detected using our barcode pair sequencing protocol (Figure 2B). (D) The length distribution of genomic fragments spanned by unambiguous and correct paired BAC-end sequences. (E) Categorization of yeast BAC paired-ends. A BAC clone was considered undetected if either barcode was missing after BAC-end sequencing. A BAC clone was considered unambiguous and correct if its ends were aligned to two unique genomic loci spanning <300 kb apart in convergent orientation. An incorrect BAC clone is one whose paired-ends aligned to unique genome loci but on different chromosomes, or not in the convergent orientation on same chromosome, or more than 300 kb apart on same chromosome.
Figure 4.Improvement of genome assembly using BAC paired-ends (BAC-PE). Only BAC-PEs that aligned to unique loci in the pre-assembly are used. (A) BAC-PE-based mis-assembly correction. An example of a mis-assembly, defined by discordant BAC ends and tiling paths (see method) of a flounder scaffold in the pre-assembly. We deleted the mis-assembly site to split the scaffold. BAC names represent their location in plates, revealed by physical clone mapping. MTP: minimal tiling paths. Red arcs represent BAC clones whose paired ends are discordant. (B) The computation pipeline. First, the pre-assembly scaffolds were assembled using all unique BAC-PEs by SSPACE. From the unused BAC-PEs, we identified high-fidelity BAC-PEs when multiple BAC-PEs supported a scaffold pair and there were at least two BACs with ends more than certain distance (30 kb for the flounder assembly) apart in both scaffolds. After assembly using the high-fidelity BAC-PEs, low-fidelity BAC-PEs and related genome information were used for reference guided merging to generate the final assembly.
Figure 5.BAC-assisted comparative analysis. (A) myod2 upstream regions in the three fish species. The region of tandem repeats in the flounder genome was not fully assembled. Instead, its size and arrangement is based on restriction mapping. (B) Chromosomal regions around dmrt1 in the three species. The exact site of the insertion upstream of the sole dmrt1 is unknown. Genome fragments and scaffolds are drawn to scale unless specified. Genes are drawn without introns and the tapered sides represent their 3΄ ends. Dashed lines link orthologous genes.