| Literature DB >> 25919136 |
Yu Wang, Wei Li1, Yingying Xia2, Chongzhi Wang1, Y Tom Tang3, Wenying Guo4, Jinliang Li1, Xia Zhao1, Yepeng Sun1, Juan Hu1, Hefu Zhen1, Xiandong Zhang1, Chao Chen1, Yujian Shi1, Lin Li1, Hongzhi Cao5, Hongli Du6, Jian Li5.
Abstract
Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25919136 PMCID: PMC4412667 DOI: 10.1371/journal.pone.0123081
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the ICLU pipeline.
The pipeline takes the raw FASTQ files or the aligned BAM files as input, and outputs the genome-wide CNV, LOH and UPD results with visualization.
Data production and mapping results for the three samples used.
| Sample | YH | HG00537 | GM50275 |
|---|---|---|---|
|
| 41,795,106 | 41,795,106 | 41,795,106 |
|
| 78,544,670 | 77,826,604 | 80,181,866 |
|
| 7,067.69 | 7,003.09 | 7,211.26 |
|
| 66,288,119 | 63,136,026 | 62,010,469 |
|
| 5,965.93 | 5,682.24 | 5,580.94 |
|
| 99.29 | 98.13 | 98.14 |
|
| 97.09 | 95.96 | 95.16 |
|
| 67.43 | 66.93 | 67.87 |
|
| 70.89 | 68.15 | 67.3 |
|
| 99.94 | 99.73 | 99.95 |
|
| 99.9 | 99.52 | 99.89 |
|
| 99.48 | 99.19 | 99.44 |
|
| 96.65 | 96.57 | 96.34 |
|
| 90.02 | 89.79 | 88.99 |
|
| 80.03 | 79.01 | 77.81 |
Fig 2Characteristics of SeTRs in three real samples.
(A) Distribution of coverage depth in SeTR; (B) The distribution of supported non-reference and reference allele reads at SNPs’ sites.
Fig 3Characteristics of three ratios in YH sample.
(AB)The distribution of three ratios across Chromosome 5.The imaginary line (Ratio = 0.5) means the CN equals to 1 and the imaginary line (Ratio = 1.5) CN equals to 3. After smoothing and renormalized steps, the fluctuation of ratios decreased gradually from preRi (gray circle points) to Ri, and then to Rmi (black line). (C)The distribution of Ri in the whole genome;(D)The distribution of Rmi in the whole genome.
Fig 4Characteristics of preRi and Rmi on Chromosome 5 of GM50275 individual.
The detected results of genome-wide CNV of 15 confirmed samples.
| Sample | Confirmed | ICLU(~42Mb SeTRs) | ICLU(~5Mb SeTRs) | ||
|---|---|---|---|---|---|
| CNV | CNV | CN | CNV | CN | |
|
| 46,XY | 46,XY | 2 | 46,XY | 2 |
|
| 46,XX | 46,XX | 2 | 46,XX | 2 |
|
| 46,XX,del(5)(p15.3) | 46,XX,del(5)(p15.3) | 1 | 46,XX,del(5)(p15.3) | 1 |
|
| 46,XY,del(5)(p14) | 46,XY,del(5)(p14) | 1 | 46,XY,del(5)(p14) | 1 |
|
| 46,XY,del(1)(q43) | 46,XY,del(1)(q43q44) | 1 | 46,XY,del(1)(q43q44) | 1 |
|
| 49,XYYYY | 49,XYYYY | 4 | 49,XYYYY | 4 |
|
| 46,XY,dup(15)(q11q12) | 46,XY,dup(15)(q11q12q13.1) | 3 | 46,XY,dup(15)(q11q12q13.1) | 3 |
|
| 46,XY,dup(10)(q11.2q23.2) | 46,XY,dup(10)(q11.2q23.2) | 3 | 46,XY,dup(10)(q11.2q23.2) | 3 |
|
| 46,XY,dup(18)(q21.2q22) | 46,XY,dup(18)(q21.2q22) | 3 | 46,XY,dup(18)(q21.2q22) | 3 |
|
| 46,XY,del(16)(q22q23) | 46,XY,del(16)(q22q23) | 1 | NA | NA |
|
| 46,XY,del(3)(p25) | 46,XY,del(3)(p25p26) | 1 | NA | NA |
|
| 46,XY,del(8)(p23) | 46,XY,del(8)(p23) | 1 | NA | NA |
|
| 48,XXX,+18 | 48,XXX,+18 | 3,3 | 48,XXX,+18 | 3,3 |
|
| 46,XX,del(16)(p12p11.2) | 46,XX,del(16)(p12p11.2) | 1 | 46,XX,del(16)(p12p11.2) | 1 |
|
| 46,XY,dup(18)(q21.3q12.1) | 46,XY,dup(18)(q21.3q12) | 3 | 46,XY,dup(18)(q21.3q12) | 3 |
Note: “NA” means there is no result due to failing to make a NGS library.
Fig 5The distribution of RHet across Chromosome 5 in YH, multiple samples and GM50275.
(A) RHets for the normal sample, YH; (B) Median RHets for multiple samples; (C) RHets for the positive sample,GM50275.
The detected results of genome-wide LOH and UPD in15 test samples.
| Sample | Chromosome | Start | End | Size(>5M) | LOH | CN |
|---|---|---|---|---|---|---|
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
|
| chrX | 103489643 | 108870605 | 5.38 | UPD | 2 |
| chr5 | 38139 | 5893356 | 5.86 | LOH_nonUPD | 1 | |
|
| chr5 | 18601469 | 28281734 | 9.68 | LOH_nonUPD | 1 |
|
| chr1 | 242808483 | 248553940 | 5.75 | LOH_nonUPD | 1 |
| chr10 | 38160098 | 43475568 | 5.32 | UPD | 2 | |
|
| chr3 | 46077525 | 51871405 | 5.79 | UPD | 2 |
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
|
| chr16 | 67747306 | 75697469 | 7.95 | LOH_nonUPD | 1 |
|
| chr3 | 75084 | 11736290 | 11.66 | LOH_nonUPD | 1 |
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
|
| - | - | - | - | - | - |
Note: “LOH_nonUPD” means there is a LOH, but not UPD; “-” means there is no LOH events in this sample.
Fig 6The Circos result of GM50275.
In part II, CN can be predicted by dividing Rmi by 0.5 and a red line indicates a loss event and a green line displays a gain event.