| Literature DB >> 26141061 |
M Elizabeth O Locke1, Maja Milojevic2, Susan T Eitutis3, Nisha Patel4, Andrea E Wishart5, Mark Daley6,7, Kathleen A Hill8,9.
Abstract
BACKGROUND: Copy number variation is an important dimension of genetic diversity and has implications in development and disease. As an important model organism, the mouse is a prime candidate for copy number variant (CNV) characterization, but this has yet to be completed for a large sample size. Here we report CNV analysis of publicly available, high-density microarray data files for 351 mouse tail samples, including 290 mice that had not been characterized for CNVs previously.Entities:
Mesh:
Year: 2015 PMID: 26141061 PMCID: PMC4490682 DOI: 10.1186/s12864-015-1713-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Number of CNV calls on the autosomes by mouse classification and copy number state
| Mouse classification | Number of samples | CNV calls | Copy number statea | Del/ampb | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3+ | ||||||||
| All | 334 | 9634 | (28.84) | 1995 | (5.97) | 3661 | (10.96) | 3978 | (11.91) | 1.42 |
| Classical | 114 | 2824 | (24.77) | 424 | (3.72) | 867 | (7.61) | 1533 | (13.45) | 0.84 |
| Wild Derived | 52 | 2611 | (50.21) | 1214 | (23.35) | 594 | (11.42) | 803 | (15.44) | 2.25 |
| Wild Caught | 19 | 969 | (51.0) | 231 | (12.15) | 491 | (25.84) | 247 | (13.0) | 2.92 |
| C57BL/6 J | 8 | 90 | (11.52) | 0 | (0.0) | 38 | (4.75) | 52 | (6.5) | 0.73 |
| C57BL/6NJ | 6 | 46 | (7.67) | 5 | (0.83) | 23 | (3.83) | 18 | (3.0) | 1.56 |
Values in parentheses are normalized by sample count
a Copy number 0 is a full deletion, or no copies, then 1 copy, then 3 or more copies respectively
b Deletion/Amplification is the total number of deletions (0 and 1 copy-state call counts) divided by the number of amplifications (3+ copy-state call counts)
Fig. 1CNV call summary. Sankey diagram depicting CNV calls on the autosomes classified into unique categories stacked vertically for length, type, mouse strain type, uniqueness and gene content from left to right. Flows between vertical categories (in grey) are proportional to the number of calls sharing both horizontally neighboring classifications. For example, almost half of the “100 + kb” classified CNV calls are also “Amplifications”
Number of unique or recurrent CNV calls on the autosomes by mouse classification
| Mouse classification | CNV occurrence a | Unique/recurrent | |||
|---|---|---|---|---|---|
| Unique | Recurrent | ||||
| All | 2418 | (7.24) | 7216 | (21.60) | 0.34 |
| Classical | 576 | (5.05) | 2248 | (19.72) | 0.26 |
| Wild derived | 870 | (16.73) | 1741 | (33.48) | 0.46 |
| Wild caught | 453 | (23.84) | 516 | (27.16) | 0.88 |
| C57BL/6 J | 12 | (1.5) | 78 | (9.75) | 0.15 |
| C57BL/6NJ | 0 | (0) | 46 | (7.6) | 0.00 |
Uniqueness and recurrence (found in two or more mice) are both consistently based on the entire analysis and are not reevaluated within mouse classification types (classical, wild derived, etc.), i.e. a call being unique in the wild-caught group was not found to overlap with any other call in the entire analysis and is not only unique within the samples classified as wild caught. In brackets, the call count is normalized by sample size
aA call is considered recurrent if it has 40 % reciprocal overlap with any other call
Fig. 2Number and recurrence of CNV calls. Each CNV call is represented as a single dot within larger circular clusters, with each cluster representing the autosomes 1–5, then 6–10 and so on. Calls with at least 40 % reciprocal overlap are joined by a line and considered recurrent. Each dot is then coloured on a heatmap scale, based on how many overlaps that call has with other calls on the same chromosome. The heat map colours range from 0 overlaps (dark blue) to 175 overlaps (red, chosen as it is half the number of total samples present). The total size of each chromosomal cluster is proportionate to the number of events found on that chromosome. Larger collections of connected dots represent CNV calls that are found in many samples, while unconnected dots represent unique events not shared among any samples. Labels A through D indicate complex clusters
Fig. 3Copy number variants identified. For each chromosome both unique calls and recurrent regions are plotted. The unique calls are plotted for each chromosome as follows (listed from top to bottom): copy number amplification calls for three or more copies are plotted in dark blue above the region of the chromosome where they are found, the chromosome line in black, followed by one copy deletions in light red and full deletions in dark red below the chromosome line. The regions of recurrent CNV calls are plotted directly on the black chromosome line. Here, if the overlapping calls were all deletions, they are plotted in red; If they were all amplifications they are plotted in blue; If they are a mix of amplifications and deletions they are plotted in green
Most common CNVs and complexa CNV regions
| Genomic location | Region type | Number of mice affectedb | Gene symbol (gene type)c |
|---|---|---|---|
| 17:6635443-6646618 | CNV | 66 |
|
| 14:44540155-44579921 | CNV | 43 |
|
| 17:35383895-35392718 | CNV | 41 |
|
| 11:116603748-116629092 | CNV | 40 |
|
| 4:122366514-122382286 | CNV | 38 |
|
| 7:111681502-111683670 | CNV | 35 |
|
| 4:111790559-111972640 | CNV | 35 |
|
| 17:30593663-31058945d | CNV | 34 |
|
| 5:114856193-114895051 | CNV | 34 |
|
| 14:20443929-20587951 | CNV | 34 |
|
| 17:30508869-31058945d | Complex | 216 |
|
| 1:90097201-90210184 | Complex | 130 |
|
| 11: 70955732-71137889 | Complex | 152 |
|
a A complex region is defined as having boundary concordance below 0.75 as described in Cahan [7]. CNV events have exact boundary concordance
b For CNV events, each mouse with the CNV is counted. In complex regions, a mouse is counted if they have any CNVs in this region and are not counted twice if more than one CNV in this region is present
c Gene names are as in Mouse Genome Informatics Symbol. Gene types are one of: Protein coding (pc), RNA type as listed, or pseudogene (ps)
d This CNV event is contained within this complex region
CNVs only in either wild-caught M. m. domesticus or M. m. musculus subspecies
| Genomic Location | Type | Number of samples | Gene overlap | ||
|---|---|---|---|---|---|
| 1: | 28084515 | −28126393 | Del | 5 | - |
| 2: | 71652530 | −71687549 | Amp | 5 |
|
| 4: | 121036609 | −121090109 | Del | 5 |
|
| 14: | 10031933 | −10032515 | Del | 5 | - |
| 18: | 7348782 | −7356220 | Amp | 8 |
|
| 19: | 25900801 | −25901740 | Del | 5 | - |
| 1: | 90108589 | −90166813 | Amp | 7 |
|
| 4: | 122366514 | −122382286 | Del | 7 |
|
| 4: | 137702213 | −137772702 | Amp | 5 |
|
| 6: | 142975631 | −143048578 | Amp | 5 |
|
| 7: | 18883984 | −18892209 | Del | 5 |
|
| 7: | 92886425 | −92976094 | Del | 7 |
|
| 8: | 82175129 | −82201642 | Amp | 9 |
|
| 17: | 31316283 | −31478341 | Amp | 5 |
|
Fig. 4SNP and CNV distance for MGI priority and 17 genomes project strains. a. Neighbor joining tree constructed using SNP distance. b. Multidimensional scaling (MDS) for SNP distance matrix, showing first two principle coordinates. c. Neighbour joining tree constructed using CNV distance. Trees are not proportional to each other. The dashed line for MA/MyJ indicates a manually shortened branch. d. MDS for CNV distance matrix. All diagrams are coloured based on similarity in SNP distance (panel a)