| Literature DB >> 17953769 |
Martin Krzywinski1, Ian Bosdet, Carrie Mathewson, Natasja Wye, Jay Brebner, Readman Chiu, Richard Corbett, Matthew Field, Darlene Lee, Trevor Pugh, Stas Volik, Asim Siddiqui, Steven Jones, Jacquie Schein, Collin Collins, Marco Marra.
Abstract
We present a method, called fingerprint profiling (FPP), that uses restriction digest fingerprints of bacterial artificial chromosome clones to detect and classify rearrangements in the human genome. The approach uses alignment of experimental fingerprint patterns to in silico digests of the sequence assembly and is capable of detecting micro-deletions (1-5 kb) and balanced rearrangements. Our method has compelling potential for use as a whole-genome method for the identification and characterization of human genome rearrangements.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17953769 PMCID: PMC2246298 DOI: 10.1186/gb-2007-8-10-r224
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Desirability ranking of 4,060 five-enzyme combinations. We determined desirability of enzyme combinations based on S(n), defined as the fraction of the chromosome 7 that is represented by restriction fragments in the range 1-20 kb (a subset of our sizing range within which sizing accuracy is increased) for ≥n enzymes. Enzyme combinations with high values of S(n) are desirable because a large fraction of fragments in their fingerprint patterns can be accurately sized and because the number of large fragment covers found in regions represented exclusively by large fragments in all digests is minimized. Points represented by hollow glyphs correspond to enzyme combinations which achieved rank in top 10% for each of S(n = 1..5).
Figure 2Specificity of individual restriction fragments and patterns based on exact and experimental sizing tolerance. (a) HindIII restriction fragment specificity for the human genome for fragments within the experimental size range of 500 bp to 30 kb. For a given fragment size, the vertical scale represents the fraction of fragments in the genome that are indistinguishable by size in the case of either exact sizing (fragments in common between two fingerprints must be of identical size) or within experimental tolerance (fragments in common between two fingerprints must be within experimental sizing error; Figure 3) on a fingerprinting gel. When sizing is exact, fragment specificity follows approximately the exponential distribution of fragment sizes and spans a range of 3.5 orders of magnitude. When experimental tolerance is included, the number of distinguishable fragment size bins is reduced and the range of fragment specificity drops to two orders of magnitude. (b) The specificity of a fingerprint pattern of a given size in the human genome. Fingerprint pattern size is measured in terms of number of fragments. Regions with identical patterns are those in which there is a 1:1 mapping within tolerance between all sizeable fragments. The specificity of experimental fingerprint patterns is cumulatively affected by specificity of individual fragments. The specificity of fragments is sufficiently low (that is, due to high experimental precision) so that 96.5% of the genome is uniquely represented by fragment patterns of 8 fragments or more.
Figure 3Experimental error of fragment sizing within the 0.5-30 kb sizing range of our single digest protocol. The error is expressed in relative size (left axis) and standard mobility (right axis). Standard mobility is a distance unit that takes into account inter-gel variation and is approximately linear with the distance traveled by the fragment on the gel.
Figure 4Simulation results of sensitivity and spatial error of rearrangement detection by FPP using experimental sizing tolerance. (a) Sensitivity is measured as the fraction of clone regions of a given size with successful FPP alignments and is plotted for five digests (labeled 1-5). (b) Spatial error is measured by the median distance between FPP and theoretical alignment positions. The largest improvement in both sensitivity and spatial error is realized by migrating FPP from one digest to two. With two fingerprint patterns used to align the clone, 50% of >25 kb clone regions are aligned (90% of >45 kb regions) with a spatial error of 1.7 kb.
Comparison of number of rearrangements detected by ESP and FPP in a 487 MCF7 BACs
| ESP | |||||||
| N | Y | ||||||
| No. of clones | Agree | Disagree | No. of clones | No. agree | No. disagree | ||
| FPP | N | 250 | 243 | 2b/5c | 72 | 3 | 63d/6e |
| Ya | 11 | 8 | 2f/1g | 154 | 126 | 26h/2i | |
The clones are partitioned based on whether a rearrangement was detected by ESP and/or FPP. For each combination of detection (for example, FPP = Y, ESP = N, where Y/N indicates the presence/absence of rearrangement, respectively, as measured by the corresponding method), the table shows the number of clones in this category, which is further broken down into the number of clones in which ESP and FPP mappings agreed and the number of clones for which ESP and FPP mappings did not agree (for example, both can show no rearrangement but disagree about clone position). Clones in the 'Agree' column have an FPP alignment within 50 kb of both end sequence alignments. Clones in the 'Disagree' column are reported as two groups: clones with an FPP alignment agreeing with one end sequence alignment and clones for which no agreement with either end sequence alignment was detected. Both groups with the disagree category are annotated with a reason for the disagreement. aClones in this row are further classified based on the number of FPP alignments in Table 2. bDel (2); cmispick (5); dbne (33), hr (14), lowcomplex (1), nip (10), rep (5); elowcomplex (1), mispick (3), rep (2); frep (2); gmispick (1); hbne (14), hr (8), nip (3), rep (1); ibne (1), mispick (1). Bne, breakpoint near end of clone; del, clone appears deleted; hr, highly rearranged; lowcomplex, fingerprint has very few fragments; mispick, FPP/ESP data mismatch; nip, FPP alignment detected but not added to partition; rep, alignments in repeat regions.
Profile of candidate rearrangements detected by FPP
| ESP | |||||||
| N | Y | ||||||
| No. of clones | Agree | Disagree | No. of clones | No. agree | No. disagree | ||
| FPP alignments | 2 | 11 | 8 | 1/2 | 123 | 101 | 22/0 |
| 3 | 0 | - | - | 29 | 22 | 5/2 | |
| 4 | 0 | - | - | 2 | 2 | 0/0 | |
Clones are grouped in rows by the number of distinct FPP alignments. For each group, the clones are partitioned based on whether ESP detected a rearrangement. Clones in the 'Agree' column have an FPP alignment within 50 kb of both end sequence alignments. Clones in the 'Disagree' column are partitioned in the same manner as in Table 1.
Positional accuracy of FPP alignments
| |FPP-BES| | Clone ends* | Clones† |
| <1 kb | 50% | 28% |
| <2 kb | 70% | 50% |
| <5 kb | 88% | 79% |
| <10 kb | 96% | 93% |
| <25 kb | 99% | 98% |
| <50 kb | 100% | 100% |
Accuracy was measured by comparing the distance between the positions of end sequence alignments and nearest edge of an FPP alignment. For this comparison the subset of clones for which ESP and FPP agreed in both rearrangement detection and mapping position (243 + 126 = 369 clones; Table 1) was used. *Cumulative distribution of nearest distances between FPP and individual BES alignments, mini|FPPi-BES|. †Cumulative distribution of maxj(mini|FPPi-BESj|) - the larger of two distances between a clone's FPP and BES alignments
Location of breakpoints in the MCF7 genome in regions sampled by clones on chromosomes 1, 3, 17 and 20
| ID | Chromosome | Position | Uncertainty | Clones |
| 1L | 1 | 106446622 | M0035E03 | |
| 2L | 1 | 107325668 | 0 | M0090F09 M0095D18 |
| 3R | 1 | 107642673 | 1,640 | M0012O05 M0064A13 M0089C03 M0090K07 M0126M04 M0152M23 |
| 4L | 1 | 112083301 | 957 | M0035A16 M0039B19 M0041G20 M0043K05 M0062P11 M0078P07 |
| 5R | 1 | 112119925 | 0 | M0090F09 M0095D18 |
| 6R | 3 | 62612471 | 856 | M0012A19 M0041A24 |
| 7L | 3 | 63679826 | 757 | M0005P04 M0007J14 M0030P20 M0043O24 M0093C20 M0134N23 |
| 8R | 3 | 63716623 | 1,755 | M0005P04 M0007J14 M0030P20 M0043O24 M0093C20 M0107G11 |
| 9R | 3 | 63908884 | M0035E03 | |
| 10L | 3 | 63954937 | 8,740 | M0007J14 M0030P20 M0037J18 M0043O24 M0066M03 M0067H12 |
| 11R | 3 | 63995878 | 0 | M0066M03 M0067H12 M0124I19 M0137G17 |
| 12L | 3 | 63997257 | 1,178 | M0003F05 M0031O08 M0039A05 M0088O13 M0145B06 |
| 13R | 3 | 64074753 | 3,228 | M0014E11 M0031O08 M0088O13 M0144L06 M0145B06 |
| 14L | 3 | 64660949 | 0 | M0012A19 M0041A24 |
| 15R | 3 | 64927120 | 304 | M0006B19 M0014P03 |
| 16L | 17 | 54050256 | 11,312 | M0037J18 M0066C22 |
| 17R | 17 | 54158022 | 0 | M0037J18 M0073I23 |
| 18L | 17 | 54397666 | 9,801 | M0035A16 M0039B19 M0041G20 M0043K05 M0062P11 M0078P07 |
| 19R | 17 | 54549098 | 6,065 | M0009I10 M0013G05 M0105A20 M0107H09 |
| 20L | 17 | 55260098 | 5,548 | M0001M18 M0009I10 M0013G05 M0107H09 |
| 21R | 17 | 55468383 | 15,761 | M0001M18 M0090P15 M0092G06 |
| 22L | 17 | 56176919 | 163 | M0089C03 M0090K07 M0126M04 M0152M23 |
| 23R | 17 | 56206584 | 1,204 | M0064A13 M0089C03 M0090K07 M0126M04 M0152M23 |
| 24R | 17 | 56233933 | 3,684 | M0005P04 M0007J14 M0030P20 M0043O24 M0093C20 M0134N23 |
| 25L | 17 | 56644007 | 1,148 | M0005I19 M0045E13 M0054A01 M0054C03 M0058D14 M0058K11 |
| 26L | 17 | 56961440 | M0021C24 | |
| 27R | 17 | 57339860 | 1,364 | M0024G06 M0123G10 M0155O05 M0156I16 |
| 28L | 17 | 59745950 | 6,571 | M0006B19 M0014P03 |
| 29R | 17 | 59781552 | 688 | M0006B19 M0014P03 |
| 30L | 20 | 38948829 | M0011K13 | |
| 31L | 20 | 40249289 | 2,622 | M0003F05 M0031O08 M0039A05 M0043G01 M0145B06 |
| 32R | 20 | 40271873 | 1,207 | M0003F05 M0031O08 M0039A05 M0043G01 M0088O13 M0145B06 |
| 33R | 20 | 40664609 | M0011K13 | |
| 34L | 20 | 45230184 | 278 | M0001A11 M0010D13 M0026L11 M0028H13 M0031E14 M0038G05 |
| 35L | 20 | 45736731 | M0021C24 | |
| 36L | 20 | 45847023 | 1,846 | M0014E11 M0088O13 M0144L06 |
| 37L | 20 | 46174956 | M0159C23 | |
| 38L | 20 | 48694494 | 933 | M0001A11 M0055I11 M0151F12 |
| 39L | 20 | 48729868 | 6,077 | M0010D13 M0026L11 M0028H13 M0031E14 M0038G05 M0038P15 |
| 40R | 20 | 48863824 | 720 | M0001A11 M0005I19 M0045E13 M0054A01 M0054C03 M0058D14 |
| 41L | 20 | 51618225 | 4,895 | M0003F05 M0005H09 M0008J22 M0029C09 M0031O08 M0036L24 |
| 42R | 20 | 52046458 | 2,367 | M0066M03 M0067H12 M0124I19 M0137G17 |
| 43R | 20 | 52066649 | 126 | M0012O05 M0089C03 M0152M23 |
| 44R | 20 | 52248474 | M0066C22 | |
| 45R | 20 | 52985221 | M0014P03 | |
| 46R | 20 | 53545530 | 0 | M0036B13 M0141F19 |
| 47L | 20 | 55122587 | 853 | M0024G06 M0123G10 M0155O05 M0156I16 |
| 48L | 20 | 55254895 | 3,310 | M0003F05 M0031O08 M0036L24 M0039A05 M0043G01 M0071O17 |
| 49R | 20 | 55287488 | 1,269 | M0003F05 M0005H09 M0008J22 M0029C09 M0031O08 M0036L24 |
| 50L | 20 | 59150999 | 936 | M0036B13 M0141F19 |
| 51R | 20 | 59176749 | 0 | M0036B13 M0141F19 |
Breakpoint position is the average position of blunt alignment ends with the standard deviation of these quantities taken as the uncertainty. Breakpoint ID is composed of a unique numerical index and L/R suffix that indicates which edge of the FPP alignment (left/right) is considered to be the breakpoint.
PCR primers used to validate the presence of breakpoints detected by fingerprints
| Left primer | Right primer | |||||||
| Primer transform | Sequence | Position | Sequence | Position | ||||
| Chr | Start (bp) | End (bp) | Chr | Start (bp) | End (bp) | |||
| ar+ br+ | TGCTAAATTTCCCAAGTGCC | 20 | 45,794,352 | 45,794,371 | CCGTCCTCTTAGCGAACTTG | 20 | 46,968,304 | 46,968,323 |
| ar+ br- | TGCTAAATTTCCCAAGTGCC | 20 | 45,794,352 | 45,794,371 | AATTTCAAAATGCGTCTGGG | 20 | 46,968,631 | 46,968,650 |
| ar+ bl+ | TGCTAAATTTCCCAAGTGCC | 20 | 45,794,352 | 45,794,371 | TGACACGCAGGGTAGATCAG | 20 | 46,923,060 | 46,923,079 |
| ar+ bl- | TGCTAAATTTCCCAAGTGCC | 20 | 45,794,352 | 45,794,371 | TCCAACAGGAAGGAGTACCG | 20 | 46,922,743 | 46,922,762 |
| al+ br+ | CTCTCTTTTGTGGGACGAGC | 20 | 45,718,752 | 45,718,771 | CCGTCCTCTTAGCGAACTTG | 20 | 46,968,304 | 46,968,323 |
| al+ br- | CTCTCTTTTGTGGGACGAGC | 20 | 45,718,752 | 45,718,771 | AATTTCAAAATGCGTCTGGG | 20 | 46,968,631 | 46,968,650 |
| al+ bl+ | CTCTCTTTTGTGGGACGAGC | 20 | 45,718,752 | 45,718,771 | TGACACGCAGGGTAGATCAG | 20 | 46,923,060 | 46,923,079 |
| | CTCTCTTTTGTGGGACGAGC | 20 | 45,718,752 | 45,718,771 | TCCAACAGGAAGGAGTACCG | 20 | 46,922,743 | 46,922,762 |
| | AATAGAAGCCAGGCATGGTG | 20 | 48,861,156 | 48,861,175 | GTTAGGAGGAGGGTGGAACC | 17 | 56,663,181 | 56,663,200 |
| br+ ar- | AATAGAAGCCAGGCATGGTG | 20 | 48,861,156 | 48,861,175 | TAGCCGTTCTGACTGGTGTG | 17 | 56,663,261 | 56,663,280 |
| br+ al+ | AATAGAAGCCAGGCATGGTG | 20 | 48,861,156 | 48,861,175 | TAGCTGGGATTACAGGTGCC | 17 | 56,646,379 | 56,646,398 |
| br+ al- | AATAGAAGCCAGGCATGGTG | 20 | 48,861,156 | 48,861,175 | ACAACCTGTCCGACCAGAAC | 17 | 56,646,305 | 56,646,324 |
| ar+ cr+ | GGACAGAGGCTTTTGTAGCG | 17 | 56,687,628 | 56,687,647 | ACCACGTAGACAAAGACGGG | 20 | 59,173,964 | 59,173,983 |
| ar+ cr- | GGACAGAGGCTTTTGTAGCG | 17 | 56,687,628 | 56,687,647 | TTCTGGATTCTCCTTGGTGC | 20 | 59,173,950 | 59,173,969 |
| | GGACAGAGGCTTTTGTAGCG | 17 | 56,687,628 | 56,687,647 | ATTTGGTTCCTGGTGAGTGC | 20 | 59,153,746 | 59,153,765 |
| ar+ cl- | GGACAGAGGCTTTTGTAGCG | 17 | 56,687,628 | 56,687,647 | AGAAGAACCCGACGACATTG | 20 | 59,153,849 | 59,153,868 |
| br+ cr+ | TATCCTTCAGGAATCGCCAC | 20 | 53,542,992 | 53,543,011 | ACCACGTAGACAAAGACGGG | 20 | 59,173,964 | 59,173,983 |
| | TATCCTTCAGGAATCGCCAC | 20 | 53,542,992 | 53,543,011 | TTCTGGATTCTCCTTGGTGC | 20 | 59,173,950 | 59,173,969 |
| br+ cl+ | TATCCTTCAGGAATCGCCAC | 20 | 53,542,992 | 53,543,011 | ATTTGGTTCCTGGTGAGTGC | 20 | 59,153,746 | 59,153,765 |
| br+ cl- | TATCCTTCAGGAATCGCCAC | 20 | 53,542,992 | 53,543,011 | AGAAGAACCCGACGACATTG | 20 | 59,153,849 | 59,153,868 |
Primer sequence is the appropriately transformed (reversed, complemented, reverse-complemented) primer sequence to test a specific order/orientation of clone regions within the insert. Products were detected for reactions where the primer transform field is in bold. Primer combinations (e.g. ar+ br+) correspond to order and orientation of putative rearrangement and are described in detail in Additional data file 1.
Figure 5PCR reactions validating the presence of breakpoints in clones listed in Table 5. Each reaction is labeled by the primer combination (e.g. AR+ CL+) used to test order and orientation of the clone's fused regions (Materials and methods; primer combination nomenculature is described in detail in Additional data file 1). The presence of a product demonstrates the adjacency of the regions within the clone's insert.
Figure 6Detailed reconciliation of sequence and fingerprint alignments for clone 3F05, which contains at least four internal breakpoints. FPP is capable of dissecting complex rearrangements in a clone, as illustrated in this figure showing the internal structure of M0003F05. This BAC was sequenced [26] and found to be composed of content from at least five distinct regions (A-E). FPP detected 4/5 of these regions. BLAT (grey rectangles with alignment orientation arrows) and FPP (thin black lines) alignments of M0003F05 are shown; values underneath coordinate pairs are differences in edge positions between BLAT and FPP alignments.
Location of 17 putative small-scale aberrations identified in MCF7 clones
| Aberration position and size | PCR validation | |||||||
| Chr. | Start (bp) | End (bp) | Size (bp) | Affected/all clones | Sampled clone* | Reaction† | Primers | Products (bp) |
| 1 | 54,737,944 | 54,742,444 | 4,500 | 1/1 | M0025G14 | |||
| 2 | 15,468,892 | 15,471,992 | 3,100 | 1/1 | M0015O22 | D | GGGGCCCTTTAGTGCCTTAG | |
| 2 | 110,086,572 | 110,101,972 | 15,400 | 1/1 | M0006P20 | |||
| 3 | 63,591,911 | 63,594,011 | 2,100 | 2/5 | M0118E13 | |||
| 3 | 159,597,920 | 159,602,020 | 4,100 | 1/1 | M0012G17 | B | TACTTACGGCAGAGGTTGGG | |
| 4 | 13,455,944 | 13,464,544 | 8,600 | 1/1 | M0004J18 | |||
| 5 | 177,652,902 | 177,661,002 | 8,100 | 1/1 | M0019C11 | |||
| 10 | 45,658,295 | 45,662,695 | 4,400 | 1/1 | M0021J21 | |||
| 18 | 13,660,940 | 13,670,040 | 9,100 | 1/1 | M0040N18 | |||
| 19 | 46,075,421 | 46,081,021 | 5,600 | 1/1 | M0005H04 | |||
| 20 | 8,877,965 | 8,903,965 | 26,000 | 1/1 | M0013M22 | A | CTTGGGTTGGGAACTGAAAG | |
| 20 | 39,042,929 | 39,047,029 | 4,100 | 1/1 | M0011K13 | |||
| 20 | 48,823,455 | 48,827,555 | 4,100 | 3/21 | M0107O02 | |||
| 20 | 51,886,035 | 51,891,935 | 5,900 | 1/1 | M0089C13 | |||
| 20 | 52,157,503 | 52,161,003 | 3,500 | 1/1 | M0004L22 | |||
| 20 | 59,158,037 | 59,163,037 | 5,000 | 1/2 | M0141F19 | |||
| X | 97,281,472 | 97,287,472 | 6,000 | 1/1 | M0018J12 | C | CCCACCAATGGATTACAACC | |
Four aberrations were tested with PCR using the primers shown here. The expected primer products based on inter-primer distance on the reference genome are shown in bold, with the observed product sizes shown below. *See Additional data file 2. †See Figure 7.
Figure 7PCR reactions validating small-scale aberrations listed in Table 6. Reactions are labeled A-D, corresponding to the aberrations with the same label in Table 6. In each case the observed product sizes, shown here, are different from the expected sizes based on the inter-primer distance on the reference sequence.
Figure 8Expected fraction of breakpoints, given five-fold redundant clone coverage, captured by ≥N clones with the distance between breakpoint and clone terminus larger than detection cutoff. The plot shows detection profiles for 150 kb and 220 kb clones. The plot illustrates the benefit of redundant coverage and of using clones with larger inserts - for a given detection cutoff, a breakpoint is captured by significantly more clones on average. The detection sensitivity (Figure 4) needs to be applied to the fraction of breakpoints on this plot (for example, 80% of breakpoints found in ≥2 clones within 50 kb of the ends of the clone; assuming 2 digests, 95% of 50 kb regions can be aligned (Figure 4); therefore, 80% × 0.95 = 76% of breakpoints are expected to be detected in these conditions).