| Literature DB >> 16893952 |
Bradley J Till1, Troy Zerr, Elisabeth Bowers, Elizabeth A Greene, Luca Comai, Steven Henikoff.
Abstract
Human individuals differ from one another at only approximately 0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16893952 PMCID: PMC1540726 DOI: 10.1093/nar/gkl479
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1SNP discovery in individual human DNA samples by Ecotilling. LI-COR gel analyzer images from the (A) IRDye 700 channel and (B) IRDye 800 channel are shown for a 1489 bp region of the DCLRE1A gene. Each lane contains a sample from a unique individual. Rare heterozygous polymorphisms are boxed, and a common SNP is marked by an arrow. Cleavage of polymorphisms with crude celery extract produces two fragments, one fluorescing in the IRDye 700 channel and its complement in the IRD 800 channel. Complementary fragments are marked in each channel image. Rare SNPs on this gel are found in only 1/90 individuals. Diamonds mark a 200-bp marker that marks every eighth lane beginning with lane 4. The trapezoid marks a band from mispriming. There are several such bands on this gel image, and none are scored as true polymorphisms because they lack a complementary fragment of the appropriate molecular weight in the other IRDye channel image. The band marked with a circle was scored as a low quality putative polymorphism. No appropriately sized fragment is found in this lane in (B), and thus the band represents a false positive error that could have been avoided.
Comparison of alleles identified by Ecotilling with NIEHS SNPs
| GenBank ID | Target name | Start position | Window size (bp) | Alleles | ||
|---|---|---|---|---|---|---|
| Ecotilling/NIEHS | New by Ecotilling | Total by Ecotilling | ||||
| AY607842 | DCLRE1A | 005669 | 1289 | 3/3 | 2 | 5 |
| AY337516 | GAD1 | 043917 | 1299 | 6/6 | 0 | 6 |
| AY632118 | HK2 | 040539 | 1297a | 3/4 | 4 | 7 |
| AY800271 | NAT1 | 054558 | 1027 | 6/6 | 0 | 6 |
| AY504960 | TNFRSF5 | 003810 | 1298 | 6/6b | 1 | 8 |
| Total | 24/25 | 7 | 32 | |||
aData are not reported between positions 41 090 and 41 160 of HK2 because of difficulties in sequencing and Ecotilling caused by low nucleotide complexity and the presence of heterozygous indels.
bOne allele reported by NIEHS was not validated by resequencing the corresponding sample screened by our lab.
Comparison of polymorphisms identified by sequencing, Ecotilling with individual samples and Ecotilling with pooled samples
| Target name | Allele position | Base change | No. of SNPs by sequencing | No. of SNPs by Ecotilling | No. of SNPs by pooled Ecotillingc | Effectd | SIFTe score | PARSESNPf score | |
|---|---|---|---|---|---|---|---|---|---|
| NIEHSa | Newb | ||||||||
| DCLRE1A | 005855 | C→T | 1 | 1 | 1 | P287L | 1.00 | ||
| 005944 | C→G | 38 | 38 | — | H317D | 1.00 | −0.1 | ||
| 006371 | C→T | 1 | 1 | 0 | P459L | ||||
| 006419 | G→A | 1 | 1 | 1 | G475E | 0.18 | 3.6 | ||
| 006939 | G→A | 1 | 1 | 1 | A648= | ||||
| GAD1 | 043953 | C→G | 28 | 26g | — | Intron | |||
| 044207 | G→A | 6 | 5h | ( | R532Q | 0.37 | 8.6 | ||
| 044526 | A→G | 2 | 2 | 2 | Intron | ||||
| 044582 | T→C | 1 | 1 | 1 | Intron | ||||
| 044940 | A→G | 7 | 6h | ( | Intron | ||||
| 044971 | G→A | 1 | 1 | 0 | Intron | ||||
| HK2 | 040543 | C→T | 1 | 1 | 0 | Intron | |||
| 040750 | G→A | 1 | 1 | 1 | Intron | ||||
| 040966 | T→C | 1 | 1 | 1 | Intron | ||||
| 041056 | A→C | 1 | 1 | 1 | Intron | ||||
| 041233 | G→C | 1 | 1 | 1 | V204= | ||||
| 041606 | G→A | 10 | 9h | — | Intron | ||||
| 041696 | T→C | 31 | 30 | — | D251= | ||||
| 041763 | C→T | 1 | 0 | 0j | R274C | ||||
| NAT1 | 054792 | A→T | 3 | 3 | 3 | utr | |||
| 054796 | A→T | 6 | 5h | 4 | utr | ||||
| 055194 | C→T | 1 | 1 | 1 | V121= | ||||
| 055276 | G→A | 3 | 3 | 3 | V149I | 1.00 | −3.8 | ||
| 055290 | G→A | 3 | 3 | 0 | T153= | ||||
| 055471 | T→G | 3 | 3 | 3 | S214A | 0.40 | 1.6 | ||
| TNFRSF5 | 004013 | C→T | 1 | 1 | 0 | Intron | |||
| 004356 | T→C | 1 | 1 | 1 | Intron | ||||
| 004439 | C→T | 9 | 9 | ( | Intron | ||||
| 004641 | A→G | 1 | 1 | 1 | Intron | ||||
| 004694 | C→T | 1 | 1 | 1 | Intron | ||||
| 004695 | G→A | 3 | 3 | 2 | Intron | ||||
| 004764 | A→C | 1k | 0 | 0 | |||||
| 004952 | C→T | 2 | 2 | 2 | S124L | 0.32 | 0.0 | ||
| Total | 171 | 163 | 170 | ||||||
aAlleles sequenced by the NIEHS SNPs program.
bAlleles sequenced by STP to confirm TILLING results.
cIn some cases, the frequency of the polymorphism is sufficiently high that genotypes cannot be assigned to individuals in 8× pools (indicated by —). To calculate the total number of SNPs detected in these pools, we used the number of SNPs detected by NIEHS.
dSynonymous (=) and non-synonymous changes are shown, where the amino acid residue number is based on the exon–intron model for the TILLed fragment. Utr = 5′ or 3′ untranslated.
eA non-synonymous SNP is predicted to be damaging to the encoded protein if the SIFT score is <0.05 (in boldface). Low-confidence predictions are indicated (+).
fA non-synonymous SNP is predicted to be damaging to the encoded protein if the PARSESNP score is >10 (in boldface).
gNo data collected in two individuals.
hNo data collected in one individual.
iHomozygous SNPs are discovered in pools. The SNP frequency is too high to assign genotypes in 8× pools. The number in parenthesis indicates the number of individuals with the SNP determined by sequencing. We used this number of SNPs to calculate the total number of SNPs detected in pools.
jThis polymorphism was overlooked when screening blind. Upon comparison with the known sequence, it was determined that the allele was clear on the gel and overlooked because of human error.
kResequencing of the individual identified by NIEHS showed that this SNP is not present in the corresponding sample screened by our group. This SNP is not counted when calculating false negative errors.
Figure 2Ecotilling of pooled samples to discover rare nucleotide changes. (A) Schematic diagram of sample pooling and arraying. A 2D arraying strategy is used whereby 64 unique samples are first arranged in an 8 × 8 grid (upper panel), pooled by row, and deposited into a 96-well screening plate (vertical striped wells, lower panel). Samples are then pooled by column and deposited in the adjacent column of the 96-well plate (horizontal striped wells). Each well in the 96-well plate contains eight pooled samples. Per set of 64 samples, an individual sample is present only once in a row pool and only once in a column pool. Samples are robotically loaded onto gels with sample A1 in lane 1, B1 in lane 2, A2 in lane 9 and so on. A true nucleotide change present in one of the first eight lanes must be present again in one of lanes 9 through 15. The exact lane numbers provide the coordinates to determine the individual harboring the nucleotide change. A total of 384 unique samples can be assayed per gel run. (B) Example of a pooled Ecotilling image (IRDye 700 shown). The first 48 of 96 lanes are shown from this run screening for polymorphisms in the DCLRE1A gene. Individuals screened in Figure 1 lanes 1–64 are rescreened in pooled lanes 1–16. Lanes to the left of the striped bars are row pools, and to the right are the corresponding column pools from a set of 64 samples. Solid black lines separate sets of 64 unique samples. Rare polymorphism are boxed. The arrow indicates a rare nucleotide change that was not found in the first 96 individuals screened (First two sets of lanes and Figure 1).
Additional rare alleles discovered by 2D Ecotillinga
| Target name | Allele position | Base change | Discovered in | Effectb | SIFTc score | PARSESNPd score |
|---|---|---|---|---|---|---|
| DCLRE1A | 005797 | G→A | P0253 | D268N | 9.8 | |
| 006494 | C→A | P0263 | T500N | — | ||
| 006497 | A→G | P0165 | N501S | 0.43 | — | |
| GAD1 | 044479 | A→G | P0324 | Intron | ||
| 044724 | C→G | P0357 | Intron | |||
| HK2 | 040806 | G→C | P0117 | Intron | ||
| 041438 | C→A | P0263 | Intron | |||
| 041527 | A→G | P0168 | Intron | |||
| NAT1 | 054682 | A→T | P0193 | Non-coding | ||
| 054852 | T→G | P0111,P0334 | L7= | |||
| 055021 | C→T | P0108 | R64W | |||
| 055291 | G→T | P0213 | E154* | |||
| 055484 | C→A | P0249 | T218N | |||
| 055608 | T→C | P0111,P0334 | S259= | |||
| 055608 | T→G | P0097 | S259R | 0.31 | 6.9 | |
| TNFRSF5 | 004313 | G→C | P0288 | Intron | ||
| 004362 | T→C | P0335 | Intron | |||
| 004501 | G→A | P0363 | T57= | |||
| 004629 | A→G | P0092 | Intron | |||
| 004850 | G→A | P0141 | R90Q | 0.48 | 4.2 | |
| 005004 | A→G | P0218,P0332 | Intron |
aNot found in individuals P0001 to P0090, which were scrutinized by both NIEHS SNPs and Ecotilling (Table 2).
bSynonymous (=), non-synonymous and stop codon (*) changes are indicated, where the amino acid residue number is based on the exon–intron model for the TILLed fragment.
cA non-synonymous SNP is predicted to be damaging to the encoded protein if the SIFT score is <0.05 (in boldface). Low-confidence predictions are indicated as (+). SIFT analysis with default settings used the full-protein sequence as query of SWISS-PROT 48.7 + TREMBL 31.7.
dA non-synonymous SNP is predicted to be damaging to the encoded protein if the PARSESNP score is >10 (in boldface). PARSESNP () used default alignments.
Figure 3Comparison of Ecotilling results using gene specific 5′-end-labeled primers (A) and universal 5′-end-labeled primers (B). Unadjusted IRDye 700 images displayed. Results from 34 individual samples are shown. Polymorphisms identified in a 1227 bp region of the NAT1 gene are boxed.
Figure 4Automated band recognition using the GelBrain feature of GelBuddy. There are four main features of automated band detection. Lanes are automatically defined (A), a normalized electropherogram is constructed from image data (B), a decorrelation algorithm then detects bands that deviate from background signal (C) and bands are automatically detected and marked on the image (D). Bands are boxed in white. Common bands of the same molecular weight are linked by a horizontal connector. Discovery of both a rare SNP (upper box) and common SNPs (lower linked boxes) are shown. Data shown were extracted from the DCLRE1A IRDye 800 image (Figure 1B). When complete, a user can make manual modifications to the automatically marked up gel.
Comparison of GelBrain detection in Ecotilling images to sequence detection
| Target | Allele | Heterozyotesa | GelBrain false positives | |
|---|---|---|---|---|
| No. of SNPs by sequencing | No. of SNPs by GelBrain | |||
| DCLRE1A | 005855 | 1 | 1 | |
| 005944 | 38 | 38 | ||
| 006371 | 1 | 1 | ||
| 006419 | 1 | 1 | ||
| 006939 | 1 | 1 | ||
| 0 | ||||
| Total | 42 | 42 | 0 | |
| GAD1 | 043953 | 28 | 26 | |
| 044207 | 6 | 0 | ||
| 044526 | 2 | 2 | ||
| 044582 | 1 | 1 | ||
| 044940 | 7 | 6 | ||
| 044971 | 1 | 0 | ||
| 6 | ||||
| Total | 45 | 35 | 6 | |
| HK2 | 040543 | 1 | 0 | |
| 040750 | 1 | 0 | ||
| 041056 | 1 | 0 | ||
| 041233 | 1 | 1 | ||
| 041606 | 10 | 0 | ||
| 041696 | 31 | 0 | ||
| 041763 | 1 | 0 | ||
| 5 | ||||
| Total | 46 | 1 | 5 | |
| NAT1 | 054792, 054796b | 9 | 8 | |
| 055194 | 1 | 1 | ||
| 055276 | 3 | 3 | ||
| 055290 | 3 | 2 | ||
| 055471 | 3 | 0 | ||
| 0 | ||||
| Total | 19 | 14 | 0 | |
| TNFRSF5 | 004013 | 1 | 1 | |
| 004356 | 1 | 1 | ||
| 004439 | 9 | 9 | ||
| 004641 | 1 | 0 | ||
| 004694 | 1 | 0 | ||
| 004695 | 3 | 1 | ||
| 004764 | 1 | 0 | ||
| 004952 | 2 | 2 | ||
| 6 | ||||
| Total | 19 | 14 | 6 | |
aSNP heterozygosity was detected by automatic analysis of images generated by Ecotilling of unpooled DNA samples. Heterozygosity was determined by sequencing using NIEHS SNPs and/or Seattle Tilling Project sequence data.
bAutomated analysis did not distinguish between individuals heterozyous for SNPs 054792 and 054796 (spacing 4 bp).
Figure 5Partial image of Ecotilling data for target HK2 (IRDye 700 shown). The arrow marks a strong band present in all lanes. Coincident with this band is a 49 bp region of low nucleotide complexity containing only guanine and adenine residues. The band interferes with signal detection in this region of the gel and with sequencing.