| Literature DB >> 25081589 |
Lijun Liu, Victor Missirian, Matthew Zinkgraf, Andrew Groover, Vladimir Filkov.
Abstract
BACKGROUND: One of the great advantages of next generation sequencing is the ability to generate large genomic datasets for virtually all species, including non-model organisms. It should be possible, in turn, to apply advanced computational approaches to these datasets to develop models of biological processes. In a practical sense, working with non-model organisms presents unique challenges. In this paper we discuss some of these challenges for ChIP-seq and RNA-seq experiments using the undomesticated tree species of the genus Populus.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25081589 PMCID: PMC4120141 DOI: 10.1186/1471-2164-15-S5-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of the challenges we faced in experimental design and data analysis.
Mapping Pt_input short sequencing reads to different P.trichocarpa genome assembly versions.
| Genome reference version | Genome length (Mb) | # of scaffold | % | # of genes | # of transcript | Gene model overlap with v3 | % of uniquely mapping reads | % of multiply mapping reads |
|---|---|---|---|---|---|---|---|---|
| Pt_v3 | 434.13 | 1,446 | 32.88 | 41,335 | 73,013 | 100% | 45.02 | 28.97 |
| Pt_v2 | 417.14 | 2,518 | 32.57 | 40,668 | 45,033 | 82.46% | 47.38 | 19.28 |
| Pt_v1 | 485.51 | 22,012 | 29.68 | 45,555 | 45,555 | 46.93% | 47.28 | 34.77 |
- Data retrieved from: Pt_v3 - phytozome/v9.1; Pt_v2 - phytozome/v8.0; Pt_v1 - phytozome/v4.1/1.1.
Comparison of short sequencing reads cross species mapping efficiency.
| File | % of uniquely mapping reads | % of multiply mapping reads | % of unmapped reads |
|---|---|---|---|
| Pt_input | 45.04 | 28.98 | 25.98 |
| Aspen_input | 29.79 | 24.41 | 45.8 |
| Pt_RNA pol II_r1 | 43.38 | 13.38 | 43.24 |
| Pt_RNA pol II_r2 | 61.69 | 27.72 | 10.59 |
| Aspen_RNA pol II | 29.44 | 22.20 | 48.36 |
| Pt_RNA-seq | 46.51 | 45.60 | 7.89 |
| Aspen_RNA-seq | 51.83 | 24.82 | 23.35 |
Distribution of short sequencing reads mapping in genic vs. non-genic regions.
| Size (bp) | Size % in whole genome | % of uniquely mapped reads | % of multiply mapped reads | |
|---|---|---|---|---|
| Genic region | 123,959,649 | 28.55 | 37.56 | 30.21 |
| Intergenic region | 310,175,213 | 71.45 | 62.44 | 69.79 |
| Whole genome | 434,134,862 | 100 | 100 | 100 |
% of uniquely mapped reads was calculated by counting total uniquely mapped reads as 100%.
% of multiply mapped reads was calculated by counting total multiply mapped reads as 100%.
Effects of varying ratio of input control (C) to RNA pol II ChIP-seq (S) in MACS1.4 ChIP-seq peak calling.
| C/S ratio | 0 :1 | 0.5 : 1 | 1 : 1 | 1.5 : 1 |
|---|---|---|---|---|
| #reads (S) | 13,075,907 | 13,075,907 | 13,075,907 | 13,075,907 |
| #reads (C) | 0 | 6,222,374 | 12,443,567 | 18,673,093 |
| #peaks | 11474 | 9322 | 13350 | 13292 |
| mean width | 933.5906 | 1106.532 | 877.0872 | 881.8897 |
| mean score | 320.9228 | 457.1019 | 486.15 | 486.0935 |
#reads was the total number of filtered reads passed into Bowtie2, "S" was the ChIP-seq sample, and "C" was the input control.
Effects of varying ratio of input control (C) to ARK1_3738 ChIP-seq (S) in MACS1.4 ChIP-seq peak calling.
| C/S ratio | 0 :1 | 0.5 : 1 | 1 : 1 | 1.5 : 1 |
|---|---|---|---|---|
| #reads (S) | 67,680,143 | 67,680,143 | 67,680,143 | 67,680,143 |
| #reads (C) | 0 | 34,232,969 | 68,470,396 | 103,734,004 |
| #peaks | 19198 | 13961 | 15683 | 16526 |
| mean width | 704.9477 | 821.5005 | 794.4457 | 779.2845 |
| mean score | 498.1456 | 447.5715 | 535.8703 | 528.0994 |
#reads was the total number of filtered reads passed into Bowtie2, "S" was the ChIP-seq sample, and "C" was the input control.
Overlap of peaks returned in different ratios of input control to RNA pol II ChIP-seq.
| C/S ratio | 0 :1 | 0.5 : 1 | 1 : 1 | 1.5 : 1 |
|---|---|---|---|---|
| 0 : 1 | 100 | 80.05 | 79.21 | 79.28 |
| 0.5 : 1 | 80.05 | 100 | 99.55 | 99.57 |
| 1 : 1 | 79.21 | 99.55 | 100 | 94.12 |
| 1.5 : 1 | 79.28 | 99.57 | 94.12 | 100 |
Numbers in the table represent % of peaks in smaller set that overlap with at least one peak in larger set.
Overlap of peaks returned in different ratios of input control to ARK1_3738 ChIP-seq.
| C/S ratio | 0 :1 | 0.5 : 1 | 1 : 1 | 1.5 : 1 |
|---|---|---|---|---|
| 0 : 1 | 100 | 97.93 | 95.75 | 95.76 |
| 0.5 : 1 | 97.93 | 100 | 99.68 | 99.79 |
| 1 : 1 | 95.75 | 99.68 | 100 | 98.78 |
| 1.5 : 1 | 95.76 | 99.79 | 98.78 | 100 |
Numbers in the table represent % of peaks in smaller set that overlap with at least one peak in larger set.
Figure 2Effects of p-value on ChIP-seq peaks calling.
Effects of RNA pol II ChIP-seq coverage levels on peak calling.
| % of coverage | 25% | 40% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| #reads (S) | 3,270,142 | 5,230,586 | 6,536,212 | 9,805,292 | 13,075,907 |
| #reads (C) | 3,111,220 | 4,976,697 | 6,222,374 | 9,332,876 | 12,443,567 |
| #peaks | 5890 | 7912 | 9170 | 11508 | 13350 |
| mean width (bp) | 715.0667 | 702.0507 | 773.6713 | 729.8757 | 877.0872 |
| mean score | 341.7044 | 381.6625 | 409.0407 | 442.2795 | 486.15 |
#reads was the total number of filtered reads passed into Bowtie2, "S" was the ChIP-seq sample, and "C" was the input control.
Effects of ARK1_3738 ChIP-seq coverage levels on peak calling.
| % of coverage | 25% | 40% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| #reads (S) | 16,922,615 | 27,069,263 | 33,843,184 | 50,761,689 | 67,680,143 |
| #reads (C) | 17,112,375 | 27,388,564 | 34,232,969 | 51,341,520 | 68,470,396 |
| #peaks | 9991 | 12222 | 12959 | 14687 | 15683 |
| mean width | 552.914 | 686.6993 | 710.9952 | 854.1241 | 794.4457 |
| mean score | 349.4032 | 422.6096 | 458.1307 | 503.2893 | 535.8703 |
#reads was the total number of filtered reads passed into Bowtie2, "S" was the ChIP-seq sample, and "C" was the input control.
Figure 3Effect of sequencing coverage on number of ChIP-seq peaks. X-axis represents number of detected peaks while Y-axis represents the percentage of reads from the 100% coverage dataset.
Figure 4Effect of sequencing coverage on ChIP-seq peaks robustness. X-axis represents percentage of sequencing reads while Y-axis represents the percentage of ChIP-seq peaks detected from the 100% coverage dataset.
Summary of ChIP-seq sequencing datasets.
| File | # of raw reads | # of reads after scythe and sickle | % of uniquely mapped reads | % of multiply mapped reads |
|---|---|---|---|---|
| Pt_Input | 114,312,040 | 103,734,004 | 45.04 | 28.98 |
| IgG_r1 | 9,649,641 | 9,439,338 | 41.63 | 28.70 |
| IgG_r2 | 29,580,452 | 28,238,882 | 52.56 | 33.13 |
| Pt_RNA pol II_r1 | 13,436,432 | 13,075,907 | 43.38 | 13.38 |
| Pt_RNA pol II_r2 | 19,877,305 | 15,313,223 | 61.69 | 27.72 |
| ARK1_3738_r1 | 25,221,831 | 22,778,470 | 42.05 | 21.69 |
| ARK1_3738_r2 | 75,397,188 | 67,680,143 | 53.61 | 25.47 |
| ARK1_3940_r1 | 47,192,418 | 36,767,421 | 12.50 | 11.28 |
| ARK1_3940_r2 | 30,058,970 | 27,579,626 | 36.06 | 32.72 |
Summary of ChIP-seq MACS1.4 peaks.
| File | # of MACS1.4 peaks | MACS1.4 score | peaks width (bp) |
|---|---|---|---|
| IgG_r1 | 197 | 141.4154 | 995.4619 |
| IgG_r2 | 493 | 122.3166 | 600.7343 |
| Pt_RNA pol II_r1 | 13350 | 486.15 | 877.0872 |
| Pt_RNA pol II_r2 | 8688 | 523.0676 | 815.1068 |
| ARK1_3738_r1 | 12155 | 382.06 | 555.9356 |
| ARK1_3738_r2 | 15683 | 535.8703 | 794.4457 |
| ARK1_3940_r1 | 2049 | 318.0036 | 513.0278 |
| ARK1_3940_r2 | 9076 | 193.6104 | 348.9233 |
Figure 5Overview of IgG and RNA pol II ChIP-seq peaks distribution in .
Overlap study of ChIP-seq peaks.
| File | IgG_r1 | IgG_r2 | RNA pol II_r1 | RNA pol II_r2 | ARK1_3738_r1 | ARK1_3738_r2 | ARK1_3940_r1 | ARK1_3940_r2 |
|---|---|---|---|---|---|---|---|---|
| IgG_r1 | 197 | |||||||
| IgG_r2 | 12 | 493 | ||||||
| Pt_RNA pol II_r1 | 9 | 2 | 13350 | |||||
| Pt_RNA pol II_r2 | 16 | 69 | 6944 | 8688 | ||||
| ARK1_3738_r1 | 118 | 39 | 3210 | 1610 | 12155 | |||
| ARK1_3738_r2 | 104 | 87 | 4610 | 2512 | 11075 | 15683 | ||
| ARK1_3940_r1 | 152 | 122 | 127 | 168 | 741 | 871 | 2049 | |
| ARK1_3940_r2 | 157 | 344 | 1111 | 896 | 3253 | 3611 | 1361 | 9076 |
Numbers indicate the overlapping peaks between two ChIP-seq datasets.
Figure 6Gene expression boxplots showing the difference of transcript levels between genes having a Pt_RNA pol II ChIP peak within 500 bp of TSS (right) and genes without such a peak (left).