| Literature DB >> 23347462 |
Qiang Gong1, Yong Tao, Jian-Rong Yang, Jun Cai, Yunfei Yuan, Jue Ruan, Jin Yang, Hailiang Liu, Wanghua Li, Xuemei Lu, Shi-Mei Zhuang, San Ming Wang, Chung-I Wu.
Abstract
BACKGROUND: Genomic deletions are known to be widespread in many species. Variant sequencing-based approaches for identifying deletions have been developed, but their powers to detect those deletions that affect medium-sized regions are limited when the sequencing coverage is low.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23347462 PMCID: PMC3608957 DOI: 10.1186/1471-2164-14-51
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schema of ditag library construction and data analysis. A) The blue spots represent restriction sites/ends, and the blue arrows represent the SOLiD mate-paired reads, which can be translated into experimental ditags. B) Ditags were identified and sorted as reference-type or variant-type based on the alignments. Then, an in silico PCR program was used to check the alignment’s uniqueness and the accuracy of the ditags for inferring possible deletions.
Figure 2analysis of detected deletions by restriction ditags. A) Evaluation of the detection resolution of deleted restriction sites with a wide range of cutting frequencies. The bar plot represents the fraction of detectable deletions associated with restriction sites in the YH genome (blue) and the DGV database (red). The dark line represents the estimated sequence input for various restriction sites, assuming that the restriction ends are covered by an average of 12× paired 50-bp reads. B) The fraction of deletions that are targeted by TaqI ditags (cutting at T,CGA) across various size ranges in YH genome.
Statistics on read coverage and the identified deletions
| Raw reads | 45.9 M1 / 1.51 Gb2 | 148 M / 7.11 Gb | 194 M / 8.63 Gb |
| Mapped reads | 24.3 M / 0.80 Gb | 68.4 M / 3.28 Gb | 92.7 M / 4.08 Gb |
| Read pairs | 9.41 M / 0.62 Gb | 24.7 M / 2.37 Gb | 34.1 M / 2.99 Gb |
| Ditags | 7.66 M / 0.51 Gb | 21.3 M / 2.05 Gb | 29.0 M / 2.55 Gb |
| Ditags mapped to Ref-Ditags | 6.73 M / 0.44 Gb | 11.4 M / 1.09 Gb | 18.1 M / 1.53 Gb |
| Ref-Ditags identified (1,509,487 in all) | 794,515 (53%) | 981,480 (65%) | 1,024,072 (68%) |
| Average ditag depth on each Ref-Ditag | 8.47 | 11.57 | 17.66 |
| Genomic regions covered (Mb)3 | 834 (28%) | 1,274 (42%) | 1,336 (45%) |
| Median fragment length (bp) | 880 | 1,056 | 1,056 |
| Deletions identified | 51 (29%) | 150 (86%) | 175 |
| Average ditag depth on each deletion | 5.37 | 4.53 | 5.70 |
1 M: counts of the reads, pairs or ditags in millions.
2Gb: Gigabases.
3Genomic regions identified in the insert regions of the experimental ditags that correspond to the reference genome.
Figure 3Ditag insert size analysis. A) The size distribution of restriction fragments were inferred from the ditags based on the reference sequence. The asterisks indicate the significance (p<0.001) according to a t-test. B) Distribution of ditag coverage as a function of restriction fragment size. The size distribution of restriction fragments that were generated from the hg18 reference sequence is indicated with a green line, with the vertical dashed lines indicating the average and median sizes. The observed ditag frequencies as a function of fragment size are indicated by the blue and cyan lines (one for each library). The grey line shows the average ditag coverage distribution over various fragment sizes. The horizontal dashed lines show the average and median frequency values.
Figure 4A 3290-bp deletion that skips three consecutive restriction sites on chromosome 17. Ditags were used to design a pair of primers to amplify the breakpoint-containing sequences. The results showed bands of different sizes for the control and the test DNA. The breakpoint sequence was identified by direct Sanger sequencing with a micro-insertion observed at the break sites.
Validation of the candidate deletions
| 1_67822_2/1_67825_1 | chr1:147489387-147489688 | 302 | Heterozygote | Intergenic | Novel |
| 1_110491_2/1_110494_1 | chr1:232385269-232386371 | 1103 | Homozygote | Variation_109897 | |
| 1_110644_2/1_110647_1 | chr1:232653827-232654136 | 310 | Heterozygote | Novel | |
| 5_79470_2/5_79473_1 | chr5:172632846-172633171 | 326 | Homozygote | Intergenic | Variation_46815 |
| 7_50712_2/7_50719_1 | chr7:96313845-96319938 | 6094 | Homozygote | Intergenic | Variation_23855 |
| 7_83664_2/7_83673_1 | chr7:158193509-158197522 | 4014 | Heterozygote | Intergenic | Variation_43560 |
| 8_33206_2/8_33210_1 | chr8:64316994-64318575 | 1582 | Homozygote | Intergenic | Novel |
| 9_54312_2/9_54319_1 | chr9:129221441-129225898 | 4458 | Homozygote | Variation_106098 | |
| 11_33186_2/11_33189_1 | chr11:67939676-67939985 | 310 | Homozygote | Novel | |
| 12_1219_2/12_1222_1 | chr12:1734274-1734588 | 315 | Heterozygote | Variation_11592 | |
| 12_63591_2/12_63594_1 | chr12:123274954-123275072 | 118 | Heterozygote | Intergenic | Variation_11646 |
| 14_29251_2/14_29254_1 | Not Validated | 0 | DPV3 | - | - |
| 17_141_2/17_145_1 | chr17:193768-197057 | 3290 | Homozygote | Variation_25792 | |
| 17_6977_2/17_6981_1 | chr17:8187338-8188694 | 1357 | Homozygote | Variation_43957 | |
| 17_47495_2/17_47501_1 | chr17:71873831-71876905 | 3075 | Heterozygote | Intergenic | Variation_77728 |
| 19_22244_2/19_22248_1 | chr19:34642031-34648013 | 5983 | Homozygote | Intergenic | Variation_43984 |
| 22_10635_2/22_10639_1 | chr22:30106417-30110625 | 4209 | Homozygote | Intergenic | Variation_43568 |
| X_7293_2/X_7300_1 | chrX:11635278-11641324 | 6047 | Homozygote | Intergenic | Variation_22612 |
| X_21618_2/X_21622_1 | chrX:39977151-39979777 | 2627 | Homozygote | Intergenic | Novel |
1 An ID of a ditag from which the local structural information is inferred (Method).
2 Deletions that overlap with structural changes in the Database of Genomic Variants[5].
3 Double point variants inactivating both restriction sites.