| Literature DB >> 23592960 |
Stefano Lonardi1, Denisa Duma, Matthew Alpert, Francesca Cordero, Marco Beccuti, Prasanna R Bhat, Yonghui Wu, Gianfranco Ciardo, Burair Alsaihati, Yaqin Ma, Steve Wanamaker, Josh Resnik, Serdar Bozdag, Ming-Cheng Luo, Timothy J Close.
Abstract
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23592960 PMCID: PMC3617026 DOI: 10.1371/journal.pcbi.1003010
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Proposed sequencing protocol.
(A) obtain a BAC library for the target organism; (B) select gene-enriched BACs from the library (optional); (C) fingerprint BACs and build a physical map; (D) select a minimum tiling path (MTP) from the physical map; (E) pool the MTP BACs according to the shifted transversal design; (F) sequence the DNA in each pool, trim/clean sequenced reads; (G) assign reads to BACs (deconvolution); (H) assemble reads BAC-by-BAC using a short-read assembler.
Figure 2An illustration of the three cases we are dealing with during the deconvolution process (clones belong to a MTP).
Figure 3Count distribution for the signatures of all distinct 26-mers [(a) rice synthetic data, (c) barley HV5] and all the reads [(b) rice synthetic data, (d) barley HV5] in the 91 pools of sequencing data.
The x-axis represents the size of the signature and the y-axis is the absolute count.
Summary of the statistics of the various assemblies obtained using Velvet (rows 1–3, 5–9) and SOAPdenovo (rows 4, 10, 11).
|
|
|
|
|
|
|
| Rice – 1 BAC (perfect deconvolution) | 0.151 | 56x | 82.7% | 132,865 | 98.7% |
| Rice – 1 BAC (H | 0.151 | 87x | 82.3% | 47,551 | 90.7% |
| Rice – 169 BACs (no deconvolution) | 26 | 56x | 83.2% | 4,236 | 73.1% |
| Rice – 2,197 BACs ( | 332 | 56x | 5.9% | 1,148 | 30.6% |
| Barley HV3 – 1 BAC (H | 0.116 | 431x | 83.6% | 8,190 | 89.8% |
| Barley HV4 – 1 BAC (H | 0.125 | 134x | 86.0% | 5,883 | 81.5% |
| Barley HV5 – 1 BAC (H | 0.129 | 137x | 87.6% | 7,210 | 87.8% |
| Barley HV6 – 1 BAC (H | 0.129 | 72x | 83.2% | 6,032 | 75.9% |
| Barley HV5 — 169 BACs (no deconvolution) | 22 | 26x | 67.1% | 4,270 | 69.5% |
| Barley HV5 – 2,197 BACs ( | 286 | 180x | 25.3% | 3,845 | 56.6% |
| Barley – whole genome ( | 5,300 | 31x | 13.3% | 2,857 | 30.5% |
“% Sum” is the the sum of all contig sizes as percentage of the target size;
average over 2,197 assemblies;
average over 91 assemblies;
Velvet reports the number of reads used in the assembly but SOAPdenovo does not: for these assemblies we used Bowtie (allowing one mismatch) to align reads to the contigs.