| Literature DB >> 23834397 |
Juzoh Umemori1, Akihiro Mori, Kenji Ichiyanagi, Takeaki Uno, Tsuyoshi Koide.
Abstract
BACKGROUND: Copy number variation (CNV), an important source of diversity in genomic structure, is frequently found in clusters called CNV regions (CNVRs). CNVRs are strongly associated with segmental duplications (SDs), but the composition of these complex repetitive structures remains unclear.Entities:
Mesh:
Year: 2013 PMID: 23834397 PMCID: PMC3722088 DOI: 10.1186/1471-2164-14-455
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Tartan-checked structure of SDs visualized using SHEAP. Diagonal lines aligned in the same column or row represent repetitive sequences. The lower left triangle of each panel shows a self-plot of the sequence after known repeat sequences have been masked using RepeatMasker. Each of the upper right triangles shows a self-plot of the intact sequence. (A) All of the SDs detected by SHEAP for all chromosomes except Chr Y. Rough estimates of the genomic positions of segmental duplications were: Chr. X-1, 33,454–43,954 (10,500); Chr. 12, 25,264–32,863 (7,599); Chr. 14–1, 8,647–14,424 (5,776); Chr. 7–1, 10,860–14,710 (3850); Chr. 7–4, 24,250–28,000 (3,750); Chr. 14–2, 45,553–49,106 (3,553); Chr. 2 T, 177,898–181,042 (3,144); Chr. 7–5, 34,938–37,919 (2,981); Chr. X-3, 122,800–125,582 (2,782); Chr. 5–2, 96,512–99,314 (2,802); Chr. X-4, 146,542–148,959 (2417); Chr. X-2, 53,082–54,812 (1730); Chr. 4–2, 61,950–64,050 (2,100); Chr. 13–2, 67,076–68,893 (1817); Chr. 4–1, 42,923–44,207 (1284); Chr. 7–2, 11,339–12,249 (910); Chr. 13–1, 62,919–64,168 (1,249); Chr. 5–1, 11,973–13,221 (1,249); Chr. 7–3, 12,327–13,309 (982); Chr. 13–3, 68,702–69,433 (730). Numbers in parentheses indicate the sizes of SDs (kbp). (B) Higher magnification for the self-plot of SD13M, located from 65,370,000 to 67,000,000 on Chr 13. The diagonal lines from top left to bottom right, which indicate a complete match between SD13M sequences, were eliminated by the algorithm. To remove redundant and overlapping sequences from 2,638 repetitive sequences, the sequences represented by diagonal lines that were located in the same column or in the same row (enclosed by two lines in the horizontal and vertical directions, respectively) were eliminated.
Comparison of the proportion of known repeats in SD13M with the values for the entire genome and average values for SDs
| DNA | 0.18 | 0.36 | 0.86 |
| LINE | 23.9 | 34.4 | 20.3 |
| Low_complexity | 0.82 | 0.58 | 0.79 |
| LTR | 18.3 | 19.5 | 10.2 |
| Satellite (MMSAT4) | 1.92 | 0.32 | 0.05 |
| Simple_repeat | 1.94 | 1.66 | 2.47 |
| SINE | 5.20 | 3.80 | 7.42 |
| snRNA | 0.00 | 0.01 | 0.01 |
| tRNA | 0.01 | 0.01 | 0.01 |
| Unknown (MurSatRep) | 5.97 | 0.54 | 0.05 |
| Total | 58.2 | 61.2 | 42.2 |
Figure 2Flowchart of core element identification for analysis of CNVs. The figure provides an overview of the steps used to identify core elements.
Figure 3Determination of the sizes of fundamental repetitive sequences for the identification of core elements. (A) The distribution of 16,872 repetitive sequences. Four small peaks at 3, 3.6, 4.1, and 6–8.1 kbp are indicated by arrows. (B) The minimal lengths of the extracted repetitive sequences are plotted against the coverage ratio as a red line. The maximum lengths of the repetitive sequences are plotted against the coverage ratio as a blue line. The red line shows that SD13M is covered almost completely (>96%) by the extracted repetitive sequences, and the broken red line shows that 94% of SD13M is covered by repetitive sequences larger than 3 kbp. The broken blue line shows that most of SD13M (>94%) is covered by repetitive sequences smaller than 4.5 kbp. (C) The minimum lengths of the regions shared when two repetitive sequences were aligned are plotted against the number of sequences in which such regions are shared. The broken red line shows that 92% of pairs share 2.7 kbp of consensus sequences, whereas the broken blue line shows that 14% of pairs share 4.5 kbp of consensus sequences.
Figure 4Identification and characterization of core elements for SD13M. (A) MUSCLE analysis of identified core elements. The radial pattern with branches of similar length indicates that the levels of difference between the 60 types of core elements are similar. (B) Map of sequences homologous to each core element on SD13M. The locations and directions of homologous sequences for each core element are mapped on the horizontal line. Red and blue diamonds indicate positive and negative orientations, respectively.
Figure 5Copy number analyses of core elements. (A) Results of aCGH for the entire SD13M region. Mapped aCGH log2 values in SD13M that compare MSM (musculus subspecies group) or BLG2 (musculus) with B6 (domesticus) are shown by green and magenta diamonds, respectively. Relative copy numbers estimated by with unique probes are shown by X in the same colors. Average aCGH values for MSM compared with B6, and for BLG2 compared with B6 in the relevant regions are shown by red and green lines, respectively. (B) Boxplots of aCGH log2 values mapped on each core element. Boxplot of aCGH values for BLG2 compared with B6, and of MSM compared with B6 are shown in magenta and green, respectively. Dots indicate outliers. The boxplots are aligned in the order of the average aCGH values from the comparison of BLG and B6.
Figure 6Copy number analyses in representative core elements. Mapped aCGH log2 values in the core elements that compare MSM (musculus subspecies group) or BLG2 (musculus) with B6 (domesticus) are shown by green and magenta diamonds, respectively. Relative copy numbers estimated by qPCR with unique probes using genomic DNA from BLG2, MSM, and B6 are shown by X in the same colors. Average aCGH values for MSM compared with B6, and for BLG2 compared with B6 in the relevant regions are shown by red and green lines, respectively. (A) Mapping of aCGH log2 values to a representative CNV-type core element (core element 541). Three relative values obtained by qPCR plotted around 1.0 (indicated by X), which shows that the copy number of core element 541 is higher in BLG2 and MSM than in B6. Primers for qPCR were designed for three regions of core element 541 (320–456, 1599–1693, and 2050–2162) (Additional file 5: Table S5). (B) Mapped aCGH values on a representative constant-type core element (core element 454) were distributed around zero. Furthermore, all the qPCR values plotted around zero (indicated by X). These results showed no CNV between B6 and MSM within core element 454.
Annotation of known repeats in core elements with large CNV and without CNV
| | | | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Core element 042 | Constant | 3000 | LINE, | 0.610 | 9 | 10 | 9 | 252 | 27000 |
| Core element 108 | Constant | 3000 | 0.250 | 3 | 3 | 3 | 154 | 9000 | |
| Core element 127 | Constant | 2700 | 0.158 | 12 | 13 | 13 | 196 | 32400 | |
| Core element 244 | Constant | 3247 | LINE, | 0.563 | 8 | 8 | 8 | 68 | 25976 |
| Core element 352 | Constant | 3900 | LINE,LTR, Simple, SINE | 0.911 | 8 | 8 | 8 | 264 | 31200 |
| Core element 454 | Constant | 3372 | 0.123 | 11 | 10 | 11 | 348 | 37092 | |
| Core element 462 | Constant | 2700 | LINE, SINE, | 0.910 | 6 | 7 | 6 | 40 | 16200 |
| Core element 484 | Constant | 3000 | LINE, Simple, SINE | 0.839 | 4 | 4 | 4 | 54 | 12000 |
| AVERAGE (Constant) | | 3115 | | 0.545 | 8 | 8 | 8 | 172 | 23859 |
| Core element 103.1 | CNV | 1802 | 0.959 | 14 | 20 | 19 | 127 | 25228 | |
| Core element 103.2 | CNV | 1686 | 0.225 | 8 | 15 | 12 | 129 | 13488 | |
| Core element 146 | CNV | 2877 | LINE, SINE | 0.979 | 19 | 33 | 26 | 103 | 54663 |
| Core element 154 | CNV | 2691 | 0.749 | 24 | 30 | 28 | 236 | 64584 | |
| Core element 177.1 | CNV | 3900 | Simple, | 0.110 | 23 | 28 | 27 | 112 | 89700 |
| Core element 182 | CNV | 2629 | LINE, | 0.622 | 20 | 27 | 25 | 305 | 52580 |
| Core element 364 | CNV | 1468 | SINE, | 0.343 | 13 | 16 | 18 | 190 | 19084 |
| Core element 447 | CNV | 2248 | 0.768 | 24 | 35 | 30 | 108 | 53952 | |
| Core element 510 | CNV | 2735 | LINE, Simple/Sat | 0.094 | 13 | 19 | 16 | 348 | 35555 |
| Core element 541 | CNV | 3195 | Simple/Sat, | 0.076 | 14 | 21 | 20 | 413 | 44730 |
| AVERAGE (CNV) | 2523 | 0.492 | 17 | 24 | 22 | 207 | 45356 | ||
a Detailed information on the known repeats and RefSeq genes that were detected in each core element is provided in Additional file 5. LTR retrotransposons are underlined. Core elements with large CNV were selected by two criteria: 1) a P value lower than the significance value; 2) the 10 core elements that showed the greatest variation among strains. b Proportion of each core element that comprised the known repeat. The proportion was calculated by dividing the total length of the known repeat in the core element by the total length of the core element. c Expected copy numbers in B6 were calculated from the number of homologous sequences in each of the 60 groups (Additional file 5). Expected copy numbers in BLG2 and MSM were calculated from the average of the aCGH values and the copy number in B6 (Additional file 7).
Figure 7Characteristics of core elements. (A) Comparison of known repeats in CNV-type and constant-type core elements. Average proportions of the known repeats, classified into six categories (SINE, LINE, LTR, DNA transposon (DNA), simple/satellite repeats (Simple/Sat), and unknown repeats) in each core element were compared between the CNV and constant types of core element. *, P < 0.05; **, P < 0.001. Average proportions were calculated from the proportion of each type of repeat in each type of core element. (B) Comparison of divergence between CNV-type and constant-type core elements. Divergence is represented by percentage values for the number of mutations and insertions or deletions that were detected after pairwise comparison of’ sequences. (C) Correlation between the divergence of the sequences and the number of duplications of core elements. Core elements 177 and 254 were excluded from these analyses because their sequences are contained within core elements 541 and 244, respectively.