| Literature DB >> 16840528 |
Patrick Ng1, Jack J S Tan, Hong Sain Ooi, Yen Ling Lee, Kuo Ping Chiu, Melissa J Fullwood, Kandhadayar G Srinivasan, Clotilde Perbost, Lei Du, Wing-Kin Sung, Chia-Lin Wei, Yijun Ruan.
Abstract
The paired-end ditagging (PET) technique has been shown to be efficient and accurate for large-scale transcriptome and genome analysis. However, as with other DNA tag-based sequencing strategies, it is constrained by the current efficiency of Sanger technology. A recently developed multiplex sequencing method (454-sequencing) using picolitre-scale reactions has achieved a remarkable advance in efficiency, but suffers from short-read lengths, and a lack of paired-end information. To further enhance the efficiency of PET analysis and at the same time overcome the drawbacks of the new sequencing method, we coupled multiplex sequencing with paired-end ditagging (MS-PET) using modified PET procedures to simultaneously sequence 200,000 to 300,000 dimerized PET (diPET) templates, with an output of nearly half-a-million PET sequences in a single 4 h machine run. We demonstrate the utility and robustness of MS-PET by analyzing the transcriptome of human breast carcinoma cells, and by mapping p53 binding sites in the genome of human colorectal carcinoma cells. This combined sequencing strategy achieved an approximate 100-fold efficiency increase over the current standard for PET analysis, and furthermore enables the short-read-length multiplex sequencing procedure to acquire paired-end information from large DNA fragments.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16840528 PMCID: PMC1524903 DOI: 10.1093/nar/gkl444
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Schematic overview of the MS-PET sequencing strategy. (A) The outline procedure showing the construction of diPETs, which were subjected to multiplex sequencing. (B) Structural details of a diPET. The numbers 5 and 3 represent bases within the 5′ and 3′ signatures, respectively, in each PET component. The orientations of cDNAs are indicated by the ‘AA’ remaining after poly(A) tail removal.
Mapping of PETs to the human genome
| PETs | Percentage | |
|---|---|---|
| Total PETs in GIS-PET library analyzed by MS-PET | 313 983 | 100.00% |
| Initial mapped PETs | 157 697 | 50.22% |
| PETs mapped to a single-locus | 136 612 | 86.63% (of mapped PETs) |
| PETs mapped to multiple loci | 21 085 | 13.37% (of mapped PETs) |
| Initial unmapped PETs | 156 286 | 49.78% |
| Single-locus PETs recovered after homopolymer error analysis | 56 194a | 17.89% |
| homopolymer over-call errors (+1 base) | 35 523 | 11.31% |
| homopolymer under-call errors (−1 base) | 27 047 | 8.61% |
| Final mapped PETs | 213 891 | 68.12% |
| PETs mapped to a single-locus | 192 806 | 90.14% (of mapped PETs) |
| PETs mapped to multiple loci | 21 085 | 9.86% (of mapped PETs) |
| Final unmapped PETs | 100 092 | 31.88% |
aBecause the same PET sequence can contain both over-call (+1) and under-call (−1) errors, each category of recovered PETs is not mutually exclusive. Thus, the total PETs recovered is not a simple summation. See Supplementary Data for details on the error-distribution analysis performed.
Mapping of PETs to known gene transcripts
| Top 20 PET clusters | Percentage % | Known single-locus PETs | Percentage % | |
|---|---|---|---|---|
| Total PET sequences | 10 387 | 100.00 | 125 986 | 100.00 |
| Matched to known transcripts | 10 083 | 97.07 | 93 325 | 74.08 |
| Novel extended 5′ termini | 64 | 0.62 | 4742 | 3.76 |
| Novel extended 3′ termini | 24 | 0.23 | 2956 | 2.35 |
| Novel truncated 5′ termini | 36 | 0.35 | 3528 | 2.80 |
| Novel truncated 3′ termini | 169 | 1.63 | 5543 | 4.40 |
| Unclassified | 11 | 0.11 | 15 892 | 12.61 |
Figure 2Validation of MS-PET-identified transcripts by quantitative real-time RT–PCR. Columns represent Mean Ct values (normalized against that of Actin) ±SD of each of 11 amplicons (n = 3). PET counts for each candidate transcript are shown in italics above each column. IFGBP (Interferon-gamma binding protein); SFXN4 (sideroflexin 4); TRUB2 (TruB pseudouridine synthase homolog 2); AARS (alanyl-tRNA synthetase); BRCA1 (breast cancer 1, early onset); PP5 (protein phosphatase 5, catalytic subunit); CBX3 (Chromobox protein homolog 3); SSR2 (signal sequence receptor); SET (SET translocation); CTSD (cathepsin D lysosomal aspartyl peptidase); TFF1 (Trefoil factor 1); Actin (reference), PET counts = 33.
Categorization of transcripts identified by MS-PET sequencing analysis
| PET clusters | PETs | PET counts | |
|---|---|---|---|
| Known genes | 15 163 | 125 986 | 213 026 |
| ESTs | 3405 | 5504 | 7020 |
| Gene prediction | 2202 | 3168 | 4093 |
| Novel genes | 1476 | 1954 | 2240 |
Figure 3Examples of genes identified by MS-PET analysis. (A) Novel gene discovery. PET_ID# 48 955.1 (green arrowed line) identifies a novel gene transcript on chromosome 4, and is verified by PCR [inset; F, flanking (primary); N, nested (secondary)] and DNA sequencing of the amplicon (black arrowed lines A07FF and A07FR). (B) Validation of a predicted gene. PET_ID# 282 423.1 (green arrowed line) identifies a predicted gene on chromosome 4, and is verified by PCR [inset; F, flanking (primary); N, nested (secondary)] and DNA sequencing of the amplicon (black arrowed blocks G06FF and G06FR).