| Literature DB >> 29702671 |
Shuo Li1, Xialiang Dou1, Ruiqi Gao1, Xinzhou Ge1, Minping Qian1, Lin Wan2.
Abstract
Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.Entities:
Mesh:
Year: 2018 PMID: 29702671 PMCID: PMC5922522 DOI: 10.1371/journal.pone.0196226
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Plots of copy number loss candidates and their adjacency.
In each plot, x-axis shows the loci on the Chromosome 11; y-axis shows the read depth; purple dot is read depth for corresponding loci; orange vertical line is the reported copy number loss boundary; pink dashed line is the loci of probe on SNP microarray.
Comparison of copy number loss from Hapmap Project and 1000 Genome Project.
| Chrom | Hapmap | Hap. over. | % | Seq | Seq. over. | % |
|---|---|---|---|---|---|---|
| 11 | 998 | 89 | 8.92% | 756 | 192 | 25.39% |
| whole genome | 16493 | 2846 | 17.26% | 9305 | 2517 | 27.05% |
We denote the number of copy number losses reported by 1000 Genome Project as “Seq”, the part of which overlapping with Hapmap report by Seq.over and the percentage of the overlapping proportion as “%”; while copy number losses from Hapmap Project as “Hapmap”, the part of which overlapping with 1000 Genome report as “Hap. over.” and percentage of overlapping part as “%”
Comparison of copy number losses from 1000 Genome Project and CNVhac calling.
| Chrom. | CNVh. | CNVh. over. | % | Seq | Seq. over. | % |
|---|---|---|---|---|---|---|
| 11 | 281 | 46 | 16.37% | 756 | 70 | 9.26% |
| whole genome | 5185 | 750 | 14.46% | 9305 | 955 | 9.19% |
We compare them in two scales: chromosome 11 (see S2 File) and the whole genome. We denote the copy number loss candidates reported by CNVhac based on SNP microarray as “CNVh.CNV”, and denote that called by sequencing as “Seq.CNV”. In each scale, “CNVh.” column is the total number of “CNVh.CNV”, “CNVh. over.” is the number of “CNVh.CNV” shared no less than 50% base pairs with some “Seq.CNV”; “Seq.” is the total number of “Seq.CNV”, “Seq. over.” is the number of “Seq.CNV” shared no less than 50% base pairs with some “CNVh.CNV”. “%”column is the ratio of overlap cases to the total number in “CNVh.” and “Seq.” respectively. Note that the difference between “Seq. over.” and “CNVh. over.” causes by multiple copy number loss candidates from one method matching to a single candidate from the other method.
Comparison of copy number losses from Hapmap Project and CNVhac calling.
| Chrom. | Hapmap. | Hap. over. | % | CNVh. | CNVh. over. | % |
|---|---|---|---|---|---|---|
| 11 | 998 | 113 | 11.32% | 281 | 145 | 51.60% |
| whole genome | 16493 | 2726 | 16.53% | 5185 | 2758 | 53.19% |
We also compare them in two scales: chromosome 11 (see S2 File) and the whole genome. We denote the copy number loss candidates reported by CNVhac based on SNP microarray as “CNVh.CNV”, and denote CNVs from Hapmap Project as “Hapmap.CNV”. In each scale, “CNVh.” column is the total number of “CNVh. CNV”,“CNVh. over.” is the number of “CNVh. CNV” shared no less than 50% base pairs with some “Hapmap. CNV”; “Hapmap.” is the total number of “Hapmap. CNV”, “Hap. over.” is the number of “Hap. CNV” shared no less than 50% base pairs with some “CNVh. CNV”. “%” column is the ratio of overlap cases to the total number in “CNVh.” and “Hapmap.” respectively. Note that the difference between “Hap. over.” and “CNVh. over.” causes by multiple copy number loss candidates from one method matching to a single candidate from the other method.
Statistical testings of the reported region with the whole chromosome, only two samples accept the null hypothesis.
| test | ||
|---|---|---|
| 100% | 100% | |
| permutation test | 99.10% | 99.10% |
Statistical testings of the reported region with its adjacent area.
| test | NA | ||
|---|---|---|---|
| 99.10% | 99.10% | 0% | |
| 93.60% | 93.60% | 0% | |
| permutation test for upstream | 96.60% | 96.60% | 0.90% |
| permutation test for downstream | 89.80% | 89.80% | 6.40% |
The first and third rows are tests for upstream; others are tests for downstream. Here NA is caused by zero depth in whole reported region.
Fig 2Pie plot of results in seeking breakpoints on deletion candidates and its adjacent area by the breakpoint seeking method.