| Literature DB >> 23284958 |
Sisi Zhou1, Yonggui Fu, Jie Li, Lingyu He, Xingsheng Cai, Qingyu Yan, Xingqiang Rao, Shengfeng Huang, Guang Li, Yiquan Wang, Anlong Xu.
Abstract
Second generation sequencing has been widely used to sequence whole genomes. Though various paired-end sequencing methods have been developed to construct the long scaffold from contigs derived from shotgun sequencing, the classical paired-end sequencing of the Bacteria Artificial Chromosome (BAC) or fosmid libraries by the Sanger method still plays an important role in genome assembly. However, sequencing libraries with the Sanger method is expensive and time-consuming. Here we report a new strategy to sequence the paired-ends of genomic libraries with parallel pyrosequencing, using a Chinese amphioxus (Branchiostoma belcheri) BAC library as an example. In total, approximately 12,670 non-redundant paired-end sequences were generated. Mapping them to the primary scaffolds of Chinese amphioxus, we obtained 413 ultra-scaffolds from 1,182 primary scaffolds, and the N50 scaffold length was increased approximately 55 kb, which is about a 10% improvement. We provide a universal and cost-effective method for sequencing the ultra-long paired-ends of genomic libraries. This method can be very easily implemented in other second generation sequencing platforms.Entities:
Mesh:
Year: 2012 PMID: 23284958 PMCID: PMC3527410 DOI: 10.1371/journal.pone.0052257
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flow chart illustrating HTS-PEG.
The plasmids of the genomic library were sheared to yield fragments of 100–1500 bp larger than the vector (Blue). Then, the EcoR I sites were methylated and hairpin adaptors (Red) which contain non-methylated EcoR I sites were ligated to the fragment ends. After EcoR I digestion and circularization, the paired ends can be amplified by primers that are complementary to the ends of the vector. The PCR products with the desired size can be sequenced using the high-throughput sequencing method.
Figure 2The workflow of data processing.
The raw data were first filtered with the hairpin adaptor; reads without hairpin adaptor sequences were discarded. Then the vector sequences of the remaining reads were trimmed. The reads were then divided into left and right ends, and those reads with either end length less than 40 nt were discarded. The filtered paired-end reads were clustered and mapped to the genome.
Summary of the HTS-PEG data.
| Number | |
| Total raw data reads | 1,234,870 |
| Reads with hairpin adaptor sequences | 634,086 |
| Read pairs with both ends larger than 40nt | 443,353 |
| Full length read pairs with both ends larger than 40nt | 404,912 |
| Non-redundant read pair clusters | 12,670 |
| Non-redundant full length read pair clusters | 9,409 |
| Read pairs mapped to multiple genomic locations | 111,570 |
| Read pairs mapped to a unique genomic location | 178,399 |
| Read pairs mapped to the same scaffold | 101,914 |
| Read pairs mapped to different scaffolds | 188,055 |
| Non-redundant read pairs mapped to a unique genomic location | 4,569 |
| Non-redundant read pairs mapped to the same scaffold | 2,496 |
| Non-redundant read pairs mapped to different scaffolds | 4,979 |
| Chimeric read pairs with wrong orientation | 6,986 |
Figure 3Span distribution of the non-redundant read pairs mapped to the Chinese amphioxus genome.
Figure 4Increasing tendency of non-redundant read pairs.
50,000, 100,000, 150,000, 200,000, 250,000, 300,000 and 350,000 read pairs were randomly selected and clustered, and three replicates were made at each sampling size. Blue line represents the observed number of cluster, while red line represents its trend. And green line represents the number of increased cluster.
Improved scaffold length of the Chinese amphioxus genome using the paired BAC ends.
| Raw assembly of Chinese amphioxus genome | Using paired BAC ends | |
| Number of scaffolds | 11,522 | 10,753 |
| Span (bp) | 504,005,601 | 504,082,501 |
| N50 scaffold (bp) | 541,615 | 596,166 |