| Literature DB >> 17067373 |
Thomas Wicker1, Edith Schlagenhauf, Andreas Graner, Timothy J Close, Beat Keller, Nils Stein.
Abstract
BACKGROUND: During the past decade, Sanger sequencing has been used to completely sequence hundreds of microbial and a few higher eukaryote genomes. In recent years, a number of alternative technologies became available, among them adaptations of the pyrosequencing procedure (i.e. "454 sequencing"), promising an approximately 100-fold increase in throughput over Sanger technology--an advancement which is needed to make large and complex genomes more amenable to full genome sequencing at affordable costs. Although several studies have demonstrated its potential usefulness for sequencing small and compact microbial genomes, it was unclear how the new technology would perform in large and highly repetitive genomes such as those of wheat or barley.Entities:
Mesh:
Year: 2006 PMID: 17067373 PMCID: PMC1633745 DOI: 10.1186/1471-2164-7-275
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of ABI-Sanger and 454 sequencing procedures
| ABI-Sanger | 454 | time requireda | |
| Isolation of BAC DNA | x | x | 1 day |
| Mechanical shearing | x | x | 2 h |
| Cloning | x | 4 h | |
| Clone picking | x | 2 h | |
| Plasmid DNA extraction | x | 20 h | |
| Reactions on thermocycler | x | 36 h | |
| Clean up reaction products | x | 2 h | |
| ABI 3730xl sequencer run | x | 24 h | |
| 454 sequencing library | x | 4 hb | |
| Amplification in PCR microreactors | x | 6 hb | |
| GS 20 sequencing run | x | 4 hb,c | |
| Assembly of raw sequences | x | x | days to weeks |
aThe procedures and estimated time requirements describe the process for sequencing a BAC clone with a size of 100 kb. For ABI-Sanger, numbers are calculated to reach an approximately 10-fold coverage.
baccording to [5]
cOne 454 GS20 run produces ~20 Mb, approximately 10 times more than required for a sufficient coverage (20 ×) of a 100 kb BAC clone.
Sequence coverage of four BAC clones from two independent sequencing experiments using 454 sequencing technology
| BAC | size (bp) | total readsa | avg. size (bp)b | contigs | contig size (bp) | coverage |
| 773K14 | 113.510c | 59.126e | 101 | 65 | 109.444 | 52.7 × |
| 6.108f | 104 | 210 | 106.476 | 5.6 × | ||
| 519J4 | 102.554c | 32.564e | 102 | 94 | 75.526 | 32.7 × |
| 4.917f | 102 | 209 | 72.634 | 4.9 × | ||
| 604D5 | 110.000d | 70.510e | 103 | 97 | 101.195 | 66 × |
| 9.683f | 103 | 137 | 102.468 | 9.1 × | ||
| 509D2 | 120.000d | 19.208e | 105 | 80 | 100.062 | 16.8 × |
| 3.801f | 104 | 302 | 66.647 | 3.3 × |
aSequences containing BAC vector or E. coli were removed.
bAverage read length of 454 sequences.
cPreviously published, exact size is known.
dEstimated by gel electrophoresis.
Figure 1Coverage of four BAC clones with sequence contigs assembled from sequence reads produced by 454 sequencing technology. a. Relationship between coverage and number of sequence contigs from two independent sequencing experiments 1 (blue) and 2 (red) for all four BACs. Because the BACs have different sizes, the number of contigs is normalised. b. Numbers of sequence contigs in different size ranges from experiment 1. Assembly of 454 sequences resulted for all four BAC clones in a few large and many small sequence contigs. c. Percentage of the total size of the BACs covered by sequence contigs of different size ranges from experiment 1. The cumulative size of all contigs was in all four cases smaller than the actual size of the BAC clone (percentage in parentheses underneath the BAC name). This is due to pooling of repetitive sequences into consensus contigs. For BAC 604D5 and 509D2, the percentage was calculated based on size estimates from agarose gels.
Figure 2Comparison of results from 454 sequencing with ABI-Sanger sequencing. a. Map of the previously published BAC 519J4. Genes are depicted by grey boxes with transcriptional orientations indicated by arrows. Transposable elements are depicted as coloured boxes with LTRs indicated as shaded areas. Nested transposable elements are raised above the ones into which they have inserted. Regions covered by 454 sequence contigs are depicted as blue and purple bars underneath the map. Note that single copy sequences are covered well whereas multicopy sequences such as transposons or tandem repeats contain a large number of gaps. Sequence contigs used for comparison of ABI-Sanger and 454 sequencing results are depicted in purple. b. Detailed map of the region of Gap1. Three tandem repeats were pooled into the consensus contig c68. c. Multiple sequence alignment of the three repeat units shown in (b.) and the resulting consensus contig. Differences between repeat units are highlighted. d. Sequence coverage provided by 454 sequencing (blue) and ABI-Sanger sequencing (black). Red lines indicate simulated coverages with the same number of sequences assuming a purely random distribution. Red arrows indicate gaps in the ABI-Sanger coverage. Grey lines indicate coverage with 454 sequences from an independent sequencing experiment with fewer reads. The region of clearly higher coverage with 454 sequences suggests the presence of a duplicated sequence that could not be resolved with ABI-Sanger sequencing. e. Map of BAC 773K14 with aligned 454 sequence contigs and coverage with individual 454 sequences (colours as in d).
Differences between ABI-Sanger sequences and sequence contigs assembled from 454 sequences in poly A or T homopolymers in 83,299 bp of compared sequence
| Motifa | total occurrencesb | differences | error rate |
| A5 | 242 | 8 | 3.3% |
| A6 | 90 | 10 | 11% |
| A7 | 33 | 11 | 33% |
| A8 | 16 | 8 | 50% |
| A9 | 7 | 3 | 43% |
| A10 | 2 | 0 | 0 |
| A11 | 1 | 0 | 0 |
| A13 | 1 | 0 | 0 |
aAlso includes the complementary poly-T motifs.
bNumber of motifs in the compared 89 kb region.
Results from hybrid assemblies of BAC 519J4.
| ABI-Sanger reads | total contigsa |
| 50b | 47, 48, 44, 47, 46 |
| 100b | 43, 40, 33, 35, 40 |
| 100c | 59, 47, 48, 52, 55 |
| 200b | 40, 31, 35, 36, 40 |
The 94 sequence contigs provided by 454 Life Sciences Corp (454 sequence contigs). were combined with different numbers of ABI-Sanger reads randomly selected from a set of 1,035 reads. Each assembly was repeated 5 times with different randomly selected sets. Note that the assignment of Phred scale quality values of 40 to the bases in the 454 sequence contigs decreased the number of false collapses considerably while only slightly increasing the overall number of contigs.
aNumber of contigs resulting from 5 repetitions of the assembly with 5 different randomly selected sets of ABI-Sanger sequences.
bBases in 454 sequence contigs were artificially assigned Phred scale quality values of 20.
cBases in 454 sequence contigs were artificially assigned Phred scale quality values of 40.
Genes identified on the two newly sequenced BAC clones 604D5 and 509D2.
| Indexa | BAC | Rice homologb | Description |
| 1 | 509D2 | Os01g70940 | Potassium uptake protein |
| 2 | 509D2 | Os01g70950 | Hypothetical protein |
| 3 | 509D2 | Os08g07830 | Hypothetical protein |
| 4 | 509D2 | Os05g08460 | putative F-box domain |
| 5 | 509D2 | Os05g01370 | Polygalacturonase-inhibiting protein |
| 1 | 604D5 | Os05g41170 | SET domain protein 105 |
| 2 | 604D5 | Os05g41180 | Proteasome subunit alpha |
| 3 | 604D5 | Os05g41190 | Expressed protein |
| 4 | 604D5 | Os05g41200 | Calmodulin |
| 5 | 604D5 | Os05g41210 | Calmodulin |
| 6 | 604D5 | Os05g41220 | Similar to GAL83 protein |
aNumbers correspond to gene numbers in Figure 2.
bIdentified by BLASTN.
Figure 3Production of working drafts of BAC sequences from assemblies of 454 sequences. The relative order of sequence contigs can be inferred through (a.) identification of target site duplications (TSD) of transposable element sequences located at the edges of contigs or (b.) sequence alignment with a known reference transposable element. The latter only works reliably for elements that occur only once on the BAC analysed. c. For BAC 604D5, information from the order of genes in the orthologous region of the rice genome was used as well as the structure and organisation of transposable elements. d. Five contigs from BAC 509D2 could be arranged in two supercontigs whose linear orientaion to each other is unknown. Regions covered by 454 sequencing contigs are indicated as grey bars underneath the maps in c. and d.. Genes are depicted as black and transposable elements as white boxes. Transcriptional orientations of genes are indicated by arrows. TSD used to infer contig order are indicated. Gaps that were closed through alignment to reference transposon sequences are indicated by a curly bracket. Gaps that could be closed with low-quality 454 sequences are indicated by upward arrows. Question marks indicate a gap of unknown size between. Numbers above genes correspond to gene descriptions in Table 5.