| Literature DB >> 17932070 |
Poornima Parameswaran1, Roxana Jalili, Li Tao, Shadi Shokralla, Baback Gharizadeh, Mostafa Ronaghi, Andrew Z Fire.
Abstract
Multiplexed high-throughput pyrosequencing is currently limited in complexity (number of samples sequenced in parallel), and in capacity (number of sequences obtained per sample). Physical-space segregation of the sequencing platform into a fixed number of channels allows limited multiplexing, but obscures available sequencing space. To overcome these limitations, we have devised a novel barcoding approach to allow for pooling and sequencing of DNA from independent samples, and to facilitate subsequent segregation of sequencing capacity. Forty-eight forward-reverse barcode pairs are described: each forward and each reverse barcode unique with respect to at least 4 nt positions. With improved read lengths of pyrosequencers, combinations of forward and reverse barcodes may be used to sequence from as many as n(2) independent libraries for each set of 'n' forward and 'n' reverse barcodes, for each defined set of cloning-linkers. In two pilot series of barcoded sequencing using the GS20 Sequencer (454/Roche), we found that over 99.8% of obtained sequences could be assigned to 25 independent, uniquely barcoded libraries based on the presence of either a perfect forward or a perfect reverse barcode. The false-discovery rate, as measured by the percentage of sequences with unexpected perfect pairings of unmatched forward and reverse barcodes, was estimated to be <0.005%.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17932070 PMCID: PMC2095802 DOI: 10.1093/nar/gkm760
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Design of forward and reverse primers. The synthesized primers are 45–46 nt long. (A and B) The template for the primers is: 454-Adapter:: Barcode:: Linker-Primer. Individual specifications for forward and reverse barcodes are also indicated. (C) Diagrammatic representation of the GS20 forward and reverse sequencing reads. ‘RC’ stands for Reverse Complement. Teeth denote base pairing.
Guidelines for barcodes
| Restrictions | Cloning protocol | Guidelines for restrictions |
|---|---|---|
| H ≠ (I, J, K) I ≠ (J, K) J ≠ K K ≠ L L ≠ (M, N, O) M ≠ (N, O) N ≠ O | 5-P and DL | Adjoining nucleotides must be different (except at dinucleotide positions). For each barcode, (H, I, J, K) and (L, M, N, O) uniquely map to one of (A, C, G, T). Each barcode has two distinct dinucleotide pairs: at positions ( |
| I ≠ (J, K) | ||
| J ≠ K | ||
| K ≠ L | ||
| L ≠ (M, N, O) | ||
| M ≠ (N, O) | ||
| N ≠ O | ||
| H ≠ G | 5-P and DL | The terminal nucleotide of the F-adapter and R-adapter (Figure 1) is a guanosine (in both (5′-phosphate-dependent and 5′-phosphate-independent protocols); ‘H’ cannot be part of a dinucleotide, and hence cannot be a guanosine. |
| O ≠ A | 5-P | First nucleotide of F-cloning-linker and R-cloning-linker is an adenosine (in the 5′-phosphate-dependent protocol). ‘O’ cannot be part of a dinucleotide, and hence cannot be an adenosine. |
| O ≠ T | DL | First nucleotide of F-cloning-linker and R-cloning-linker is a thymidine (in the 5′-phosphate- independent protocol). ‘O’ cannot be part of a dinucleotide, and hence cannot be a thymidine. |
Barcodes used with forward and reverse primers have different designs, with specific restrictions imposed at every nucleotide position. These restrictions are listed, and further discussed in the text. Numbers indicate positions of nucleotides in the barcode. 5-P: 5′-phosphate-dependent cloning; DL: 5′-phosphate-independent cloning.
FORWARD barcode format: H1I2J3J4K5 L6L7M8N9O10.
REVERSE barcode format: H1I2J3K4K5 L6M7M8N9O10.
List of possible barcodes
| Forward barcodes | Reverse barcodes |
|---|---|
| AGCCTAAGCT | AGCTTAGGCT |
| AGTTCAAGTC | AGTCCAGGTC |
| ACTTGAACTG | ACTGGACCTG |
| ACGGTAACGT | ACGTTACCGT |
| ATCCGAATCG | ATCGGATTCG |
| ATGGCAATGC | ATGCCATTGC |
| CAGGTCCAGT | CAGTTCAAGT |
| CATTGCCATG | CATGGCAATG |
| CTAAGCCTAG | CTAGGCTTAG |
| CGAATCCGAT | CGATTCGGAT |
| TCAAGTTAGC | TAGCCTAAGC |
| TACCGTTACG | TACGGTAACG |
| TGAACTTGAC | TGACCTGGAC |
| TAGGCTTCAG | TCAGGTCCAG |
| AGCCTCCAGT | AGCTTCTTAG |
| AGCCTGGCAT | AGCTTGCCAT |
| AGTTCGGACT | AGTCCGAACT |
| AGTTCTTGAC | AGTCCTCCAG |
| ACGGTCCATG | ACGTTCAATG |
| ACTTGTTCAG | ACTGGTGGAC |
| ACTTGCCGAT | ACTGGCGGAT |
| ACGGTGGATC | ACGTTGAATC |
| ATCCGCCTAG | ATCGGCAAGT |
| ATCCGTTAGC | ATCGGTAAGC |
| ATGGCGGTAC | ATGCCGTTAC |
| ATGGCTTACG | ATGCCTAACG |
| CAGGTAAGTC | CAGTTAGGTC |
| CAGGTGGCAT | CAGTTGCCAT |
| CATTGAAGCT | CATGGAGGCT |
| CTAAGTTCAG | CTAGGTCCAG |
| CTAAGAACGT | CTAGGACCGT |
| CTGGATTGAC | CTGAATGGAC |
| CATTGTTAGC | CATGGTAAGC |
| CTGGAGGACT | CTGAAGAACT |
| CGAATAACTG | CGATTACCTG |
| CGAATGGATC | CGATTGAATC |
| CGTTAGGTAC | CGTAAGTTAC |
| CGTTATTACG | CGTAATAACG |
| TAGGCAAGCT | TAGCCAGGCT |
| TACCGCCATG | TACGGCAATG |
| TACCGAAGTC | TACGGAGGTC |
| TGAACGGCAT | TGACCGCCAT |
| TGAACAATCG | TGACCATTCG |
| TGCCACCGAT | TGCAACGGAT |
| TGCCAGGACT | TGCAAGAACT |
| TCGGACCTAG | TCGAACTTAG |
| TCAAGCCAGT | TCAGGCAAGT |
| TCAAGAATGC | TCAGGATTGC |
| CTGGACCTGA | CTGAACTTGA |
| CGTTACCGTA | CGTAACGGTA |
| TGCCATTGCA | TGCAATGGCA |
| TCGGATTCGA | TCGAATCCGA |
| AGCCTGGCTA | AGCTTGCCTA |
| AGCCTCCTGA | AGCTTCTTGA |
| ACTTGTTCGA | ACTGGTCCGA |
| ATCCGCCGTA | ATCGGCGGTA |
| ATCCGTTCGA | ATCGGTCCGA |
| ATGGCGGTCA | ATGCCGTTCA |
| CAGGTGGCTA | CAGTTGCCTA |
| CTAAGTTGCA | CTAGGTGGCA |
| CGTTAGGTCA | CGTAAGTTCA |
| TCGGACCGTA | TCGAACGGTA |
| TCAAGCCTGA | TCAGGCTTGA |
| TGCCAGGTCA | TGCAAGTTCA |
The barcodes in this compilation have restrictions on the first nine bases only. Restrictions on the terminal base are applied after selection and adjustment of 5′–3′ cloning linker pairs.
Figure 2.A diagrammatic representation of heteroduplexes that may be formed in an amplified pool. (A) Heteroduplexes formed between molecules from the same sample that have the same barcode, but different RNA inserts (X and Y). (B) Heteroduplexes formed between molecules with RNA inserts and molecules with no RNA inserts (or with fragments of linkers as inserts). Formation of these unusual duplexes may be facilitated by the 45–46 nt complementarity at either end of the insert. Thus, three types of molecules may be present during later stages of PCR: single-stranded, perfectly double stranded (Figure 1C) and heteroduplexes (shown here). The ratio of the three species is determined by the number of PCR cycles. As in Figure 1, ‘RC’ stands for Reverse Complement, and teeth denote base pairing.
Figure 3.Visualizing the nature of amplified products from various cycles of PCR. With an increase in cycle number (twenty cycles of an initial round of PCR followed by six, eight or ten cycles of a second round of PCR), there is an evident shift in mobility of PCR products that contain a small RNA insert. Effect of PCR cycle number on three different samples is shown, stressing the importance of titrating the total number of DNA amplification cycles, to avoid saturation of the PCR amplification. Red arrows represent the two sizes of the insert-containing PCR products. PCR products without small RNA inserts migrate as faint bands between 75 and 100 bp (black arrow).
Summary of sequence distribution for both pyrosequencing runs as a function of barcode motif length
| Run number | Search motif | Perfect 5′ and 3′ Motifs | Mismatched 5′ and 3′ motifs | Perfect 3′ and imperfect 5′ motifs | Perfect 5′ and imperfect 3′ motifs | Duplicate perfect 5′ motifs | Duplicate perfect 3′ motifs | Total number of sequences |
|---|---|---|---|---|---|---|---|---|
| I | Barcode (10 nt) | 229 277 | 2349 | 2323 | 2923 | 1 | 0 | 23 6873 |
| I | Barcode (10 nt) + Cloning-Linker (1 nt) | 226 785 | 295 | 4339 | 5211 | 2 | 233 | 23 6865 |
| I | Barcode (10 nt) + Cloning-Linker (2 nt) | 226 108 | 4 | 5368 | 5858 | 0 | 0 | 23 7338 |
| I | Barcode (10 nt) + Cloning-Linker (3 nt) | 224 645 | 3 | 5800 | 6892 | 0 | 0 | 23 7340 |
| I | Barcode (10 nt) + Cloning-Linker (4 nt) | 223 744 | 3 | 6258 | 7335 | 0 | 0 | 23 7340 |
| I | Barcode (10 nt) + Cloning-Linker (COMPLETE) | 205 020 | 1 | 12 095 | 20 224 | 0 | 0 | 23 7340 |
| II | Barcode (10 nt) | 258 929 | 5061 | 3138 | 3736 | 2 | 179 | 27 1045 |
| II | Barcode (10 nt) + Cloning-Linker (1 nt) | 257 998 | 17 | 6128 | 7160 | 3 | 1 | 27 1307 |
| II | Barcode (10 nt) + Cloning-Linker (2 nt) | 257 150 | 12 | 6587 | 7557 | 2 | 1 | 27 1309 |
| II | Barcode (10 nt) + Cloning-Linker (3 nt) | 255 695 | 11 | 7175 | 8427 | 0 | 1 | 27 1309 |
| II | Barcode (10 nt) + Cloning-Linker (4 nt) | 254 491 | 11 | 7818 | 8988 | 0 | 1 | 27 1309 |
| II | Barcode (10 nt) + Cloning-Linker (COMPLETE) | 226 041 | 11 | 14 662 | 30 595 | 0 | 0 | 27 1309 |
This table summarizes the data in Supplementary Tables 4 and 5. Fewer sequences with mismatched 5′ and 3′ motifs (i.e. sequences that do not have corresponding 5′ and 3′ motifs) were obtained if the lengths of the search motif used as unique identifiers for the sequences were increased. The number of sequences with imperfect motifs also correspondingly increased with the increased length of the barcode motif, reflective of the high error rate of the pyrosequencing technology. There were also rare instances of sequences with two different 3′ motifs or two different 5′ motifs. These may be PCR artifacts. Terminology: 5′ barcodes may be of the forward or reverse category (depending on the sequencing primer used), and are in the 5′ flank of the read. 3′ barcodes are reverse complements of forward and reverse barcodes (depending on the sequencing primer used), and are in the 3′ flank of the read.