| Literature DB >> 20591181 |
David J Witherspoon1, Jinchuan Xing, Yuhua Zhang, W Scott Watkins, Mark A Batzer, Lynn B Jorde.
Abstract
BACKGROUND: Mobile elements (MEs) are diverse, common and dynamic inhabitants of nearly all genomes. ME transposition generates a steady stream of polymorphic genetic markers, deleterious and adaptive mutations, and substrates for further genomic rearrangements. Research on the impacts, population dynamics, and evolution of MEs is constrained by the difficulty of ascertaining rare polymorphic ME insertions that occur against a large background of pre-existing fixed elements and then genotyping them in many individuals.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20591181 PMCID: PMC2996938 DOI: 10.1186/1471-2164-11-410
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Mobile Element Scanning (ME-Scan) Library Preparation and Sequencing Protocol. (A) dsDNA genomic DNA is extracted and then fragmented by sonication. An AluYb8/9 element is depicted (black rectangle: Alu element; gray box: Poly-A tail of the Alu; TSD: target site duplication). Some fragments (darker) will contain most or all of the element along with some upstream genomic sequence. (B) Fragment ends are repaired, 3'A overhangs are added, and oligonucleotide adapters (pink) carrying sample-specific indexes (blue) are ligated onto the ends. (C) Multiple indexed libraries are pooled for subsequent processing. (D) A limited number of PCR cycles are performed using a biotinylated AluYb8/9-specific PCR primer (ALUBP2) and a primer (PEP2) that anneals to the adapters. PCR products in the 650-700 bp size range are selected using gel electrophoresis. (E) The biotinylated strands are purified away from other products using streptavidin-coated paramagnetic beads. (F) The biotinylated strands are amplified by PCR with primers matching the adapter sequences. The resulting product is checked using an Agilent Bioanalyzer DNA 1000 assay (electropherogram and gel-like image shown.) (G) Paired-end, 2x36-bp sequencing is carried out on the AluYb8/9-specific pooled fragment library using a custom Alu-specific primer (ALUSPv2) for the first (Alu junction) read and the standard adapter-specific primer (PESP2) for the second (genomic flank) read. The junction read begins inside the Alu element, yielding 16 bp of Alu sequence followed by 20 bp of genomic flank sequence. The flank read contains the 5-bp index and the 'T' added during sample preparation, followed by 30 bp of genomic sequence. Multiple read pairs are depicted, corresponding to different fragments carrying the same AluYb8/9 insertion (generic fragment diagrammed at bottom.)
Figure 2Sequence analysis pipeline. Paired-end 2x36-bp sequence reads generated on an Illumina GAII are received as fastq-formatted text files. The index of each read pair is identified and trimmed from the genomic flank reads. Read pairs that could not be assigned a valid index are filtered out. Read quality filtering then removes read pairs in which either read is composed mainly of a single nucleotide, contains too many 'N' base calls, or contains sequences derived from the adapter oligonucleotides. The remaining read pairs are then mapped to the human reference genome, in two ways: first with the expected 16 bp of Alu sequence in the junction read, to identify Alu insertions that are present in the reference; and then with those 16 bp trimmed off of the junction read, to enable identification of new Alu insertions. Read pairs that do not map to a unique location with the proper orientation and the expected distance between them are then filtered out. For each read pair, the position (in the reference genome) of the final nucleotide of the Alu junction read is computed for use as the unique identifier of the corresponding insertion. Read pairs are then grouped according to those positional identifiers. Loci that lack Alu sequence in the first 16 bp of the junction read are annotated as such and rejected as unreliable. The final data set consists of a list of insertion loci observed in at least one sample and the number of read pairs supporting the presence of each insertion in each indexed sample.
Quantity and quality of paired-end sequencing reads
| Replication experiment | Pooling experiment | |||
|---|---|---|---|---|
| Quality classification | # read pairs | % of total | # read pairs | % of total |
| Total read pairs | 3,047,279 | 100 | 2,389,900 | 100 |
| Both reads of high quality (not a, b or c, below) | 2,998,412 | 98.4 | 2,315,412 | 96.9 |
| (a) Either read is > 85% any one base | 1,370 | 0.0450 | 1,628 | 0.0681 |
| (b) Either read has > 2 'N' base calls | 4,345 | 0.143 | 3,073 | 0.13 |
| (c) Adapter sequence detected in either read | 43,192 | 1.42 | 69,827 | 2.9 |
| Both reads high quality, index valid | 2,963,735 | 97.3 | 2,287,571 | 95.7 |
| Junction read has | 2,874,329 | 94.3 | 2,201,101 | 92.1 |
| Supports an | 2,458,549 | 80.7 | 1,753,750 | 73.4 |
*Supports one of the 5,053 insertion loci reported in Additional File 1.
Figure 3Observed vs. Expected frequencies of indexes in pooled samples. Vertical bars represent the numbers of high-quality read pairs that were observed (black), compared with the number expected (white) for each index in two experiments. Expected numbers were calculated from the total numbers of high-quality read pairs for each experiment and the intended pooling proportions (from left to right: 50%, 50% for the Replication experiment; and 4%, 4%, 4%, 16%, 72% for the Pooling experiment). The individuals sampled (A through D) and the index sequences used are shown above the corresponding bars.
Observed vs. Expected frequencies of indexes in pooled samples
| Individual | Index | Read Pairs* | Expected % | Observed % |
|---|---|---|---|---|
| Replication | ||||
| A1 | ACCAT | 1,485,980 | 50 | 50.1 |
| A1 | TATTC | 1,477,755 | 50 | 49.9 |
| Pooling | ||||
| A | ACCAT | 88,418 | 4 | 3.86 |
| B | TATTC | 82,049 | 4 | 3.59 |
| C | GGTTA | 88,895 | 4 | 3.89 |
| D2 | CGCTA | 355,469 | 16 | 15.5 |
| D2 | TTGAT | 1,672,740 | 72 | 73.1 |
* Only read pairs passing all QC filters are counted.
1 Two aliquots of a DNA sample from this individual were processed in parallel and pooled for sequencing.
2 A DNA sample from this individual was sonicated, then divided into two aliquots for subsequent library construction steps.
Comparison of ME-Scan results to 1,708 presumably fixed AluYb8/9 insertions in the human reference genome
| Individual | Index | Number of ME-Scan negative loci | False Negative rate (%) | Genotypes checked by PCR1 | Checked and Present by PCR2 |
|---|---|---|---|---|---|
| Replication | |||||
| A | ACCAT | 147 | 8.61 | 18 | 4 |
| A | TATTC | 145 | 8.49 | 18 | 4 |
| Pooling | |||||
| A | ACCAT | 400 | 23.4 | 18 | 4 |
| B | TATTC | 420 | 24.6 | 18 | 2 |
| C | GGTTA | 400 | 23.4 | 17 | 5 |
| D | CGCTA | 164 | 9.60 | 17 | 2 |
| D | TTGAT | 134 | 7.85 | 17 | 2 |
| Combined | All | 1003 | 5.85 | 123 | 23 |
1 The number of genotypes compared differs from sample to sample due to occasional uncertain genotype calls (missing data) in the PCR and gel genotyping assays.
2 These genotypes represent actual false negative ME-Scan results (absent according to ME-Scan, present by PCR.)
3 For 100 of the 1,708 loci, ME-Scan did not observe the insertion-present allele in any of the samples tested ("Combined").
New variable AluYb8/9 loci identified by ME-Scan
| Individual | Index | New* | Positive genotypes checked by PCR | False positives | Negative genotypes checked by PCR | False negatives |
|---|---|---|---|---|---|---|
| Replication | ||||||
| A | ACCAT | 259 | 35 | 1 | 9 | 0 |
| A | TATTC | 273 | 35 | 1 | 9 | 0 |
| Pooling | ||||||
| A | ACCAT | 163 | 32 | 3 | 12 | 5 |
| B | TATTC | 153 | 20 | 2 | 22 | 3 |
| C | GGTTA | 168 | 19 | 1 | 23 | 5 |
| D | CGCTA | 242 | 27 | 1 | 16 | 0 |
| D | TTGAT | 290 | 27 | 1 | 16 | 0 |
| Combined | All | 487 | 195 | 10 | 107 | 13 |
* Not observed in the human reference genome (hg19/GRCh37).
Reproducibility
| Positive loci in either sample* | Absent from ACCAT | Absent from TATTC | Replication failure rate, average %† | |
|---|---|---|---|---|
| Known | 2,174 | 20 | 15 | 0.805 |
| New variable | 289 | 30 | 16 | 7.96 |
| Non-specific | 1,390 | 434 | 410 | 30.4 |
* Total positive loci are those observed in either of the two samples of the 'replication' experiment, indexed ACCAT and TATTC.
† Number of insertion loci absent in a sample, divided by the total observed in both samples, averaged over the two samples.
Comparison of ME-Scan results with previously genotyped AluYb8/9 insertion loci
| Individual | Index | ME-Scan Positives | ME-Scan Negatives | False Positives* | False Negatives |
|---|---|---|---|---|---|
| Replication | |||||
| A | ACCAT | 20 | 11 | 1 | 0 |
| A | TATTC | 20 | 11 | 1 | 0 |
| Pooling | |||||
| A | ACCAT | 18 | 13 | 1 | 2 |
| B | TATTC | 23 | 10 | 0 | 4 |
| C | GGTTA | 23 | 10 | 1 | 5 |
| D | CGCTA | 26 | 7 | 0 | 0 |
| D | TTGAT | 26 | 7 | 0 | 0 |
| Overall | All | 156 | 69 | 4 | 11 |
* All four of the apparent ME-Scan false positives are due to errors (false negatives) in the previous PCR- and gel-based genotyping.