| Literature DB >> 32005131 |
Junhua Zhou1, Minqiong Zhao1, Zefang Sun1, Feilong Wu1, Yucong Liu1, Xianghua Liu2, Zuping He1, Quanze He3, Quanyuan He4.
Abstract
BACKGROUND: Whole genome bisulfite sequencing (WGBS) also known as BS-seq has been widely used to measure the methylation of whole genome at single-base resolution. One of the key steps in the assay is converting unmethylated cytosines into thymines (BS conversion). Incomplete conversion of unmethylated cytosines can introduce false positive methylation call. Developing a quick method to evaluate bisulfite conversion ratio (BCR) is benefit for both quality control and data analysis of WGBS.Entities:
Keywords: Bisulfite conversion ratio (BCR); DNA methylation; Telomere; Whole genome bisulfite sequencing (WGBS)
Mesh:
Substances:
Year: 2020 PMID: 32005131 PMCID: PMC6995172 DOI: 10.1186/s12859-019-3334-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The characteristics of bisulfite converted telomeric WGSBS data. a The distribution of read with distinct number of telomeric blocks in two sample (ENCLB443JJF and ENCLB890RFU). The G-strand and C-strand original reads are color blue and orange respectively. FASTQ1 and FASTQ2 indicate the file containing forward and reversed reads in paired-end NGS sequencing; b A scatter plot showing the numbers of telomeric reads and total reads of 12 WGSBS experiments (red and orange dots represent tissue samples); c UCRs calculated by using forward (+) and reversed (-) reads in three cytosines sites in two samples (ENCLB098BGY, ENCLB167QQW); d The UCRs of technical repeats of two tissues (skeletal muscle myoblast and mouse liver) and two cell lines (HepG2 and H1-hESC); e The box-plot showing the distribution of UCR of three cytosines sites across eleven samples.
Fig. 2The distribution of telomeric repeat blocks. a Bar chart showing the distribution of telomeric repeat block (N0~N3 blocks) in eleven samples. Note, the y axis was log10 transformed; b The raw data of the figure 4A, in which the numbers of N3 block above zero are highlighted as red; c The dot plot of the calculated ratio of N2 blocks against observed ones. Each dot represents a sample and the green trend line was calculated only using blue dots.
The performance comparison of BCREval and Bismark
| ENCODE ID | File Size | Reads Number | Processing Time | Memory Usage | CHH methylation ratio % | |||
|---|---|---|---|---|---|---|---|---|
| Bismark | BCREval | Bismark | BCREval | Bismark | BCREval | |||
| ENCFF055UXZ | 1.1G | 12 M | 4 h 49 m | 10 m | 10G | 44 M | 0.7 | 0.56 |
| ENCFF336KJH | 687 M | 12 M | 4 h 8 m | 9 m | 10G | 44 M | 0.5 | 0.54 |
| ENCFF677BSB | 926 M | 12 M | 5 h 28 m | 9 m | 10G | 44 M | 1.1 | 0.42 |
| ENCFF781BRM | 833 M | 12 M | 5 h 5 m | 9 m | 10G | 44 M | 0.5 | 0.26 |
| ENCFF710XQC | 1011 M | 12 M | 5 h 5 m | 10 m | 10G | 44 M | 0.8 | 0.45 |
| ENCFF211RZY | 1.1G | 12 M | 4 h 39 m | 10 m | 10G | 44 M | 0.5 | 0.17 |
| ENCFF563QAT | 821 M | 12 M | 4 h 31 m | 8 m | 10G | 44 M | 0.5 | 0.18 |
| ENCFF311PSV | 686 M | 12 M | 4 h 36 m | 10 m | 10G | 44 M | 1.1 | 0.3 |
The data using in the manuscript
| Biosample Type | Library_ID | ENCODE_ID (FASTQ) | Strand | Biosample summary |
|---|---|---|---|---|
| Primary cell | ENCLB587BLQ | ENCFF055UXZ | ||
| ENCLB587BLQ | ENCFF764NTF | |||
| ENCLB988SSO | ENCFF710XQC | |||
| ENCLB988SSO | ENCFF331AID | |||
| Cell line | ENCLB542OXH | ENCFF336KJH | ||
| ENCLB542OXH | ENCFF585HYM | |||
| ENCLB890RFU | ENCFF211RZY | |||
| ENCLB890RFU | ENCFF717MDZ | |||
| ENCLB443JJF | ENCFF563QAT | |||
| ENCLB443JJF | ENCFF954LFD | |||
| Stem cell | ENCLB098BGY | ENCFF677BSB | ||
| ENCLB098BGY | ENCFF800KIP | |||
| ENCLB167QQW | ENCFF311PSV | |||
| ENCLB167QQW | ENCFF335TUD | |||
| Tissue | ENCLB353RJB | ENCFF781BRM | ||
| ENCLB353RJB | ENCFF535VCB | |||
| ENCLB585SDT | ENCFF283GDL | |||
| ENCLB506AYR | ENCFF978EJO | |||
| ENCLB760KHX | ENCFF348XNA |
Fig. 3The procedure of telomeric DNA bisulfite conversion and paired-end sequencing. The treatments and their productions are labeled as orange and green. The methylated and unmethylated cytosines in sequences are colored as red and blue respectively. The labels “centromere” and “telomere” indicate the direction of sequences or reads in genome. The numbers above C indicate the indexes of three non-CpG cytosines in telomeric DNA blocks.
Fig. 4The diagram about the procedure to calculate UCRs for three non-CpG cytosine sites using C-strand original reads. In the patterns, the N represents G or C