| Literature DB >> 21645359 |
Lauren M McIntyre1, Kenneth K Lopiano, Alison M Morse, Victor Amin, Ann L Oberg, Linda J Young, Sergey V Nuzhdin.
Abstract
BACKGROUND: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.Entities:
Mesh:
Year: 2011 PMID: 21645359 PMCID: PMC3141664 DOI: 10.1186/1471-2164-12-293
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Experimental Design. A figure showing the design of the three experiments evaluated here. Biological replicates are separate individuals used for library construction. Technical replicates for the D. melanogaster female heads and D. simulans male heads data are a single library run on multiple lanes. For D. melanogaster c167 cell lines the exact nature of the technical replication is uncertain.
Figure 2Library construction and sequencing. Beginning with 100 ng of mRNA the manufacturer's protocol is used to estimate a sampling fraction.
Mappable reads per lane in each of the three experiments.
| Experiment | BR | TR | Mappable Reads | Exons detected | Exons with an average coverage of more than 5 reads per nucleotide | Contigs present in all samples of each experiment |
|---|---|---|---|---|---|---|
| c167 | 1 | 1 | 5888686 | 39156 | 13432 | 19248 |
| c167 | 1 | 2 | 5951769 | 39202 | 13517 | 19248 |
| c167 | 1 | 3 | 7146461 | 39954 | 15684 | 19248 |
| c167 | 1 | 4 | 7544117 | 40201 | 16355 | 19248 |
| c167 | 1 | 5 | 7377032 | 40120 | 16089 | 19248 |
| D. sim. | 1 | 1 | 5174398 | 45878 | 14517 | 20339 |
| D. sim. | 1 | 2 | 4979485 | 45808 | 13912 | 20339 |
| D. sim. | 2 | 1 | 27595266 | 51701 | 35303 | 20339 |
| D. sim. | 2 | 2 | 28691914 | 51857 | 35942 | 20339 |
| D. sim. | 3 | 1 | 27601233 | 51834 | 34968 | 20339 |
| D. sim. | 3 | 2 | 27748704 | 51822 | 35008 | 20339 |
| D. mel. | 2 | 1 | 10584341 | 48114 | 13396 | 17864 |
| D. mel. | 2 | 2 | 13399722 | 49073 | 19916 | 17864 |
| D. mel. | 3 | 1 | 12065885 | 48281 | 14794 | 17864 |
| D. mel. | 3 | 2 | 11794255 | 48319 | 17961 | 17864 |
| D. mel. | 4 | 1 | 10375138 | 47812 | 15718 | 17864 |
| D. mel. | 4 | 2 | 9283979 | 47460 | 14344 | 17864 |
BR indicates a biological replicate and TR indicates a technical replicate. The experiments are described in Figure 1. There are a total number of 60,277 exons corresponding to distinct genomic regions in Flybase 5. In this replicate 14,972 replicates disagree by at least 1 log and 204 disagree by 2 or more logs.
Agreement between technical replicates and biological replicates for RPKM measured on FB 5.4 exons (n = 60,277).
| Experiment | Comparison | Number of exons in common | Exons detected in only one of the two replicates | Kappa for detection | Detected in one replicate, RPKM > 20 in the other replicate | Kappa on a 3 level scale | Kappa on a 9 level log scale | Number of exons where disagreement is greater than 2 logs | Number of exons that disagree 1 log or more |
|---|---|---|---|---|---|---|---|---|---|
| c167 | TR1-TR2 | 36602 | 5154 | 0.812 | 0 | 0.854 | 0.886 | 128 | 11596 |
| c167 | TR1-TR3 | 36937 | 5236 | 0.808 | 0 | 0.853 | 0.888 | 119 | 11395 |
| c167 | TR1-TR4 | 37102 | 5153 | 0.810 | 0 | 0.854 | 0.887 | 111 | 11562 |
| c167 | TR1-TR5 | 37037 | 5202 | 0.808 | 0 | 0.853 | 0.885 | 112 | 11686 |
| c167 | TR2-TR3 | 36974 | 5208 | 0.808 | 0 | 0.855 | 0.889 | 104 | 11285 |
| c167 | TR2-TR4 | 37102 | 5199 | 0.808 | 1 | 0.853 | 0.886 | 95 | 11600 |
| c167 | TR2-TR5 | 37039 | 5244 | 0.807 | 2 | 0.851 | 0.884 | 102 | 11779 |
| c167 | TR3-TR4 | 37514 | 5127 | 0.809 | 0 | 0.857 | 0.892 | 76 | 11026 |
| c167 | TR3-TR5 | 37470 | 5134 | 0.809 | 0 | 0.856 | 0.891 | 98 | 11051 |
| c167 | TR4-TR5 | 37626 | 5069 | 0.811 | 0 | 0.858 | 0.893 | 58 | 10869 |
| D. mel.. | BR2:TR1-TR2 | 46123 | 4941 | 0.738 | 67 | 0.779 | 0.801 | 204 | 14972 |
| D. mel.. | BR3:TR1-TR2 | 45942 | 4716 | 0.754 | 46 | 0.783 | 0.798 | 297 | 15122 |
| D. mel.. | BR4:TR1-TR2 | 45310 | 4652 | 0.767 | 110 | 0.814 | 0.848 | 105 | 12206 |
| D. sim. | BR1:TR1-TR2 | 43614 | 4458 | 0.797 | 343 | 0.834 | 0.861 | 317 | 14590 |
| D. sim. | BR2:TR1-TR2 | 49941 | 3676 | 0.748 | 0 | 0.864 | 0.909 | 2 | 8530 |
| D. sim. | BR3:TR1-TR2 | 49983 | 3690 | 0.746 | 2 | 0.861 | 0.905 | 6 | 8648 |
| D. mel.. | BR2-BR3 | 45675 | 5045 | 0.739 | 62 | 0.803 | 0.843 | 58 | 11266 |
| D. mel.. | BR2-BR3 | 45715 | 5003 | 0.741 | 70 | 0.776 | 0.796 | 289 | 15304 |
| D. mel.. | BR2-BR3 | 46274 | 4806 | 0.744 | 41 | 0.779 | 0.797 | 271 | 15149 |
| D. mel.. | BR2-BR3 | 46381 | 4630 | 0.753 | 50 | 0.815 | 0.852 | 70 | 11775 |
| D. mel.. | BR3-BR4 | 45612 | 4967 | 0.748 | 84 | 0.774 | 0.787 | 444 | 15891 |
| D. mel.. | BR3-BR4 | 45387 | 4869 | 0.750 | 96 | 0.773 | 0.785 | 446 | 15938 |
| D. mel.. | BR3-BR4 | 45723 | 4685 | 0.759 | 88 | 0.803 | 0.831 | 176 | 13434 |
| 'D. mel.. | BR3-BR4 | 45450 | 4879 | 0.752 | 108 | 0.797 | 0.828 | 201 | 13718 |
| D. mel.. | BR2-BR4 | 45459 | 5008 | 0.744 | 113 | 0.774 | 0.789 | 405 | 15754 |
| D. mel.. | BR2-BR4 | 45200 | 5174 | 0.739 | 108 | 0.771 | 0.790 | 401 | 15808 |
| D. mel.. | BR2-BR4 | 46067 | 4751 | 0.750 | 75 | 0.801 | 0.834 | 154 | 13104 |
| D. mel.. | BR2-BR4 | 45846 | 4841 | 0.749 | 89 | 0.799 | 0.832 | 152 | 13312 |
| D. sim. | BR1-BR2 | 45442 | 6695 | 0.645 | 362 | 0.640 | 0.654 | 4936 | 29729 |
| D. sim. | BR1-BR2 | 45440 | 6855 | 0.635 | 346 | 0.640 | 0.658 | 4687 | 29659 |
| D. sim. | BR1-BR2 | 45341 | 6827 | 0.639 | 395 | 0.639 | 0.654 | 4909 | 29709 |
| D. sim. | BR1-BR2 | 45375 | 6915 | 0.633 | 402 | 0.640 | 0.658 | 4665 | 29629 |
| D. sim. | BR1-BR3 | 45444 | 6824 | 0.637 | 326 | 0.640 | 0.655 | 4835 | 29452 |
| D. sim. | BR1-BR3 | 45464 | 6772 | 0.640 | 346 | 0.642 | 0.655 | 4814 | 29486 |
| D. sim. | BR1-BR3 | 45368 | 6906 | 0.634 | 355 | 0.640 | 0.656 | 4761 | 29404 |
| D. sim. | BR1-BR3 | 45368 | 6894 | 0.635 | 343 | 0.640 | 0.656 | 4704 | 29438 |
| D. sim. | BR2-BR3 | 49844 | 3847 | 0.737 | 1 | 0.828 | 0.865 | 104 | 12328 |
| D. sim. | BR2-BR3 | 49835 | 3853 | 0.737 | 0 | 0.830 | 0.865 | 98 | 12327 |
| D. sim. | BR2-BR3 | 49955 | 3781 | 0.739 | 2 | 0.829 | 0.865 | 113 | 12376 |
| D. sim. | BR2-BR3 | 49891 | 3897 | 0.731 | 0 | 0.828 | 0.865 | 95 | 12413 |
All technical replicates were compared between biological replicates. For example, BR1 TR1 was compared to BR2 TR1, BR2 TR2, and BR2 TR3 for all possible pairwise comparisons. Agreement in whether exons are detected (defined as at least one read mapping to the exon), detected at a low level (0 < RPKM < 20), and detected at a high level (RPKM > 20) and agreement on a 9 level ordinal scale as follows: RPKM less than 10, 20, 40, 80, 160, 320, 1000 and greater than 1000. Agreement was measured using a kappa coefficient. The number of exons where disagreement is greater than 2 logs and greater than one log are also given. Agreement for common contigs is in Additional file 9, Supplementary Table S3
Figure 3Coefficient of variation (CV) plotted on Y axis and average depth per nucleotide (APN) on X axis. Points with average depth of greater than 1000 are not displayed. Panel A is D. simulans BR2 TR2. Panel B is D. melanogaster female heads BR2 TR1. Panel C is TR1 for cell line c 167. Note that despite the difference in the number of mappable reads, the pattern of CV against the mean remains the same. CVs are very large when the average expression is low. Individual points represent exonic regions (Flybase 5.4) cubic smoothing line fit using R's smooth.spline function.
Figure 4Scatterplot of technical replicates. Points where RPKM is 1000 or less are displayed (A). The red line is the 45 degree line. Left panel is D. simulans male heads BR2, middle panel is D. melanogaster female heads BR2 and right panel is D. melanogaster cell line c167 Tr3 vs TR4. Spearman correlation values are (0.95, 0.99, 0.96), respectively. Scatterplot of technical replicates on the log scale (log(RPKM+1)) for RPKM values of less than 1000) (B). The red line is the 45 degree line. Left panel is D. simulans male heads BR2, middle panel is D. melanogaster female heads BR2 and right panel is D. melanogaster cell line c167 Tr3 vs TR4. Spearman correlation values are (0.95, 0.99, 0.96), respectively.
Agreement between technical replicates for biological replicate 2 D. melanogaster female heads.
| TR2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 9213 | 0 | 0 | 0 | 0 | 0 | 0 | 12163 | |||
| 24845 | 0 | 0 | 0 | 0 | 0 | 31254 | ||||
| 0 | 4884 | 0 | 0 | 0 | 0 | 7999 | ||||
| 0 | 3293 | 0 | 0 | 0 | 4741 | |||||
| 1669 | 0 | 2264 | ||||||||
| 0 | 0 | 0 | 749 | 0 | 0 | 1039 | ||||
| 0 | 0 | 0 | 0 | 0 | 357 | 0 | 500 | |||
| 0 | 0 | 0 | 0 | 0 | 0 | 241 | 261 | |||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 54 | 56 | ||
| 11204 | 28628 | 9560 | 5889 | 2797 | 1201 | 583 | 352 | 63 | 60277 | |
RPKM was grouped into 7 categories on an approximate log10 scale. The categories were: zero reads, average less than 10, 20, 40, 80, 160, 320, 1000 and greater than 1000. Technical replicates 1 and 2 were compared. Values on the diagonal are where the ordinal categories agree and off diagonal values (bold/italic) are disagreements.
Figure 5Bland-Altman plot showing level of agreement between technical replicates for natural log transformed RPKM . On the Y axis is the difference between technical replicates and on the X axis is the average between technical replicates. Green lines are the average of all differences +/- 1.96 (standard deviation of the differences). The red line is drawn at zero. The blue line is a loess fit. The discrepancy between technical replicates is a function of the estimated expression level. The horizontal line is drawn at an average coverage per nucleotide of 5. Bland-Altman plots for all the remaining comparisons among technical replicates are in Additional file 11.
Figure 6Bland-Altman plot for simulated data. The data were log transformed and the average of the two technical replicates is on the X axis and the difference between technical replicates is on the Y axis. (A) Simulated replicates 1 versus 2. (B) Simulated replicates 1 versus 3. (C) Simulated replicates 2 versus 3. Green lines are the average of all differences +/- 1.96 (standard deviation of the differences). The red line is drawn at zero. The blue line is a lowess fit.