| Literature DB >> 28984205 |
Yan Guo1, Shilin Zhao2, Quanhu Sheng2, David C Samuels3, Yu Shyr4.
Abstract
BACKGROUND: High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue.Entities:
Keywords: DNA-RNA difference; RNA editing; Single nucleotide variant
Mesh:
Year: 2017 PMID: 28984205 PMCID: PMC5629567 DOI: 10.1186/s12864-017-4022-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Alignment summary
| Sample | Total reads | Mapped reads | Unmapped reads |
|---|---|---|---|
| TCGA-A7-A0D9-DNA_NB | 142,860,012 | 136,174,786 | 6,685,226 |
| TCGA-A7-A0D9-DNA_NT | 158,844,243 | 155,372,460 | 3,471,783 |
| TCGA-A7-A0D9-DNA_TP | 138,383,452 | 136,764,896 | 1,618,556 |
| TCGA-A7-A0D9-RNA_NT | 141,376,864 | 134,152,483 | 7,224,381 |
| TCGA-A7-A0D9-RNA_TP | 149,200,610 | 141,630,167 | 7,570,443 |
| TCGA-BH-A0B3-DNA_NB | 211,311,809 | 209,315,407 | 1,996,402 |
| TCGA-BH-A0B3-DNA_NT | 170,360,878 | 165,875,755 | 4,485,123 |
| TCGA-BH-A0B3-DNA_TP | 159,731,541 | 158,253,223 | 1,478,318 |
| TCGA-BH-A0B3-RNA_NT | 164,452,329 | 156,498,369 | 7,953,960 |
| TCGA-BH-A0B3-RNA_TP | 164,079,925 | 155,833,920 | 8,246,005 |
| TCGA-BH-A0B8-DNA_NB | 171,951,966 | 170,021,858 | 1,930,108 |
| TCGA-BH-A0B8-DNA_NT | 143,464,049 | 140,389,068 | 3,074,981 |
| TCGA-BH-A0B8-DNA_TP | 216,218,230 | 213,713,105 | 2,505,125 |
| TCGA-BH-A0B8-RNA_NT | 152,562,120 | 143,571,886 | 8,990,234 |
| TCGA-BH-A0B8-RNA_TP | 128,002,634 | 122,243,065 | 5,759,569 |
| TCGA-BH-A0BJ-DNA_NB | 147,410,868 | 145,768,369 | 1,642,499 |
| TCGA-BH-A0BJ-DNA_NT | 162,172,150 | 158,678,527 | 3,493,623 |
| TCGA-BH-A0BJ-DNA_TP | 143,442,013 | 141,770,778 | 1,671,235 |
| TCGA-BH-A0BJ-RNA_NT | 138,807,984 | 131,847,427 | 6,960,557 |
| TCGA-BH-A0BJ-RNA_TP | 149,966,756 | 144,440,232 | 5,526,524 |
| TCGA-BH-A0BM-DNA_NB | 159,310,853 | 156,835,192 | 2,475,661 |
| TCGA-BH-A0BM-DNA_NT | 165,501,253 | 162,838,285 | 2,662,968 |
| TCGA-BH-A0BM-DNA_TP | 119,192,355 | 117,149,967 | 2,042,388 |
| TCGA-BH-A0BM-RNA_NT | 149,007,565 | 138,576,725 | 10,430,840 |
| TCGA-BH-A0BM-RNA_TP | 117,977,848 | 100,498,089 | 17,479,759 |
| TCGA-BH-A0C0-DNA_NB | 176,208,298 | 173,440,163 | 2,768,135 |
| TCGA-BH-A0C0-DNA_NT | 177,261,968 | 172,796,230 | 4,465,738 |
| TCGA-BH-A0C0-DNA_TP | 143,339,652 | 141,217,919 | 2,121,733 |
| TCGA-BH-A0C0-RNA_NT | 189,543,380 | 180,211,183 | 9,332,197 |
| TCGA-BH-A0C0-RNA_TP | 125,992,620 | 118,740,948 | 7,251,672 |
| TCGA-BH-A0DK-DNA_NB | 160,749,783 | 158,782,935 | 1,966,848 |
| TCGA-BH-A0DK-DNA_NT | 158,654,513 | 155,523,188 | 3,131,325 |
| TCGA-BH-A0DK-DNA_TP | 178,103,631 | 175,051,156 | 3,052,475 |
| TCGA-BH-A0DK-RNA_NT | 191,328,391 | 184,115,083 | 7,213,308 |
| TCGA-BH-A0DK-RNA_TP | 143,488,953 | 136,143,128 | 7,345,825 |
| TCGA-BH-A0DP-DNA_NB | 157,712,348 | 155,347,716 | 2,364,632 |
| TCGA-BH-A0DP-DNA_NT | 167,557,587 | 163,348,435 | 4,209,152 |
| TCGA-BH-A0DP-DNA_TP | 168,097,321 | 165,554,381 | 2,542,940 |
| TCGA-BH-A0DP-RNA_NT | 169,655,182 | 159,641,615 | 10,013,567 |
| TCGA-BH-A0DP-RNA_TP | 136,210,380 | 129,171,483 | 7,038,897 |
| TCGA-BH-A0E0-DNA_NB | 151,357,163 | 141,201,519 | 10,155,644 |
| TCGA-BH-A0E0-DNA_NT | 159,040,104 | 156,614,176 | 2,425,928 |
| TCGA-BH-A0E0-DNA_TP | 130,825,456 | 129,444,757 | 1,380,699 |
| TCGA-BH-A0E0-RNA_NT | 146,561,149 | 136,899,519 | 9,661,630 |
| TCGA-BH-A0E0-RNA_TP | 111,749,610 | 105,126,617 | 6,622,993 |
| TCGA-BH-A0H7-DNA_NB | 170,784,467 | 168,285,144 | 2,499,323 |
| TCGA-BH-A0H7-DNA_NT | 173,665,210 | 169,363,318 | 4,301,892 |
| TCGA-BH-A0H7-DNA_TP | 156,659,296 | 154,959,185 | 1,700,111 |
| TCGA-BH-A0H7-RNA_NT | 154,651,936 | 146,599,114 | 8,052,822 |
| TCGA-BH-A0H7-RNA_TP | 186,962,558 | 179,990,244 | 6,972,314 |
Fig. 1a The probability of detecting the alternative allele given depth under the binomial distribution. b The distribution of the allele frequency for alternative allele. The expected value is 0.5, the actual median measure is a few percent shifted to the left (red dotted line), caused by reference preferential biases
Fig. 2Genotype consistencies between any two pairs of sequencing data
Heterozygous genotype consistencies
| Stricta | Looseb | ||||
|---|---|---|---|---|---|
| Sample A | sample B | Consistency Ac | Consistency Bd | Consistency Ac | Consistency Bd |
| DNA-NB | DNA-NT | 0.99 | 0.97 | 0.99 | 0.98 |
| DNA-NB | DNA-TP | 0.98 | 0.99 | 0.99 | 0.99 |
| DNA-NB | RNA-NT | 0.90 | 0.80 | 0.91 | 0.83 |
| DNA-NB | RNA-TP | 0.84 | 0.83 | 0.87 | 0.85 |
| DNA-NT | DNA-TP | 0.96 | 0.98 | 0.97 | 0.99 |
| DNA-NT | RNA-NT | 0.89 | 0.79 | 0.90 | 0.82 |
| DNA-NT | RNA-TP | 0.82 | 0.83 | 0.86 | 0.85 |
| DNA-TP | RNA-NT | 0.90 | 0.79 | 0.91 | 0.82 |
| DNA-TP | RNA-TP | 0.84 | 0.83 | 0.87 | 0.85 |
| RNA-NT | RNA-TP | 0.82 | 0.87 | 0.86 | 0.91 |
aStrict means if two genotypes are consistent, their genotype call from Unifiedgenotyper has to agree
bLoose means if two genotypes are consistent, their alternative alleles has to be supported by at least 1 read with BQ > 20 at that position
cThe genotype consistency is computed with the number of heterozygous genotypes of sample A as denominator
dThe genotype consistency is computed with the number of heterozygous genotypes of sample B as denominator
Fig. 3Background dinucleotide distribution computed from GRCh37
Fig. 4Cluster and heatmaps based on upstream and downstream dinucleotide patterns. Clear differentiation can be observed based on whether RNA is included in the comparisons
Upstream dinucleotide distribution
| Dinucleotide | DNA NB DNA NT | DNA NB DNA TP | DNA NT DNA TP | DNA NB RNA NT | DNA NB RNA TP | DNA NT RNA NT | DNA NT RNA TP | DNA TP RNA NT | DNA TP RNA TP | RNA NT RNA TP |
|---|---|---|---|---|---|---|---|---|---|---|
| CC | 9.37% | 9.45% | 9.33% | 10.05% | 10.29% | 9.84% | 10.07% | 10.00% | 10.17% | 10.33% |
| AA | 5.57% | 5.78% | 5.37% | 3.77% | 3.65% | 3.89% | 3.82% | 3.88% | 3.73% | 4.14% |
| CG | 6.54% | 7.71% | 7.04% | 10.51% | 11.27% | 10.11% | 10.39% | 10.16% | 10.69% | 9.03% |
| GC | 6.97% | 7.73% | 7.14% | 9.53% | 10.06% | 9.47% | 9.78% | 9.54% | 10.17% | 9.82% |
| TT | 5.29% | 4.60% | 5.11% | 3.41% | 3.21% | 3.57% | 3.43% | 3.47% | 3.32% | 3.99% |
| TA | 4.87% | 4.29% | 5.00% | 3.06% | 2.97% | 3.15% | 3.08% | 3.06% | 2.99% | 3.48% |
| AC | 6.24% | 7.32% | 6.58% | 7.52% | 7.33% | 7.49% | 7.26% | 7.69% | 7.31% | 8.11% |
| CA | 5.25% | 5.89% | 5.37% | 4.79% | 5.30% | 4.86% | 5.29% | 4.90% | 5.36% | 5.15% |
| TG | 6.20% | 5.25% | 6.15% | 5.54% | 5.22% | 5.72% | 5.34% | 5.52% | 5.26% | 5.02% |
| AT | 3.14% | 3.83% | 3.22% | 3.06% | 3.03% | 2.97% | 2.98% | 3.08% | 3.04% | 3.37% |
| TC | 6.14% | 6.83% | 6.26% | 6.57% | 6.86% | 6.44% | 6.78% | 6.47% | 6.84% | 7.10% |
| CT | 4.21% | 5.26% | 4.34% | 3.94% | 4.43% | 3.91% | 4.34% | 4.06% | 4.48% | 4.62% |
| GG | 15.93% | 10.01% | 14.95% | 13.14% | 11.31% | 13.42% | 12.41% | 12.94% | 11.34% | 10.51% |
| GA | 5.20% | 5.77% | 5.12% | 5.55% | 5.43% | 5.53% | 5.44% | 5.58% | 5.56% | 5.39% |
| AG | 5.09% | 5.62% | 5.04% | 5.34% | 5.34% | 5.42% | 5.33% | 5.41% | 5.40% | 5.23% |
| GT | 4.00% | 4.67% | 3.97% | 4.23% | 4.31% | 4.22% | 4.27% | 4.25% | 4.36% | 4.72% |
Downstream dinucleotide distribution
| Dinucleotide | DNA NB DNA NT | DNA NB DNA TP | DNA NT DNA TP | DNA NB RNA NT | DNA NB RNA TP | DNA NT RNA NT | DNA NT RNA TP | DNA TP RNA NT | DNA TP RNA TP | RNA NT RNA TP |
|---|---|---|---|---|---|---|---|---|---|---|
| AA | 5.15% | 5.11% | 5.18% | 3.17% | 3.23% | 3.34% | 3.41% | 3.26% | 3.33% | 3.67% |
| TA | 4.35% | 4.31% | 4.31% | 2.45% | 2.54% | 2.53% | 2.70% | 2.51% | 2.59% | 2.89% |
| GG | 10.55% | 10.20% | 10.50% | 15.58% | 14.40% | 15.24% | 14.17% | 15.56% | 14.50% | 17.17% |
| CG | 6.55% | 7.40% | 6.60% | 13.10% | 12.98% | 12.62% | 11.97% | 12.74% | 12.62% | 11.28% |
| TT | 5.25% | 5.05% | 5.19% | 3.27% | 3.24% | 3.45% | 3.46% | 3.33% | 3.29% | 3.33% |
| TG | 5.65% | 5.98% | 5.82% | 4.37% | 5.04% | 4.50% | 5.11% | 4.49% | 5.06% | 4.73% |
| TC | 5.34% | 5.57% | 5.61% | 4.73% | 5.24% | 4.81% | 5.27% | 4.78% | 5.30% | 4.17% |
| GC | 6.95% | 7.68% | 7.14% | 8.97% | 9.27% | 8.79% | 8.77% | 8.99% | 9.16% | 9.22% |
| AT | 3.47% | 4.00% | 3.51% | 2.43% | 2.51% | 2.45% | 2.52% | 2.44% | 2.50% | 2.73% |
| CT | 4.95% | 5.65% | 5.05% | 4.20% | 4.64% | 4.29% | 4.63% | 4.24% | 4.62% | 3.99% |
| CA | 5.76% | 5.05% | 5.59% | 5.12% | 4.67% | 5.21% | 4.86% | 5.13% | 4.71% | 4.82% |
| AG | 4.50% | 5.29% | 4.61% | 4.11% | 4.23% | 4.05% | 4.27% | 4.14% | 4.27% | 5.04% |
| GT | 6.71% | 7.62% | 7.04% | 6.44% | 7.00% | 6.46% | 6.78% | 6.48% | 7.10% | 6.34% |
| CC | 14.57% | 9.65% | 13.68% | 11.98% | 10.15% | 12.20% | 11.42% | 11.84% | 10.15% | 9.18% |
| GA | 6.18% | 6.70% | 6.33% | 6.20% | 6.73% | 6.26% | 6.63% | 6.18% | 6.67% | 7.23% |
| AC | 4.06% | 4.73% | 3.86% | 3.87% | 4.13% | 3.82% | 4.02% | 3.88% | 4.14% | 4.22% |
Fig. 5Cluster and heatmap results based on nucleotide differences between any two pairs of samples. Samples can be clearly differentiated by whether RNA is involved in the comparison. The nucleotide changes can be categorized by transversions and transitions. Transversion is heavily favored over transitions
Nucleotide difference between any two pairs of samples from same subject
| DNA-RNA Difference | DNA NB DNA NT | DNA NB DNA TP | DNA NT DNA TP | DNA NB RNA NT | DNA NB RNA TP | DNA NT RNA NT | DNA NT RNA TP | DNA TP RNA NT | DNA TP RNA TP | RNA NT RNA TP |
|---|---|---|---|---|---|---|---|---|---|---|
| A-C | 16.52% | 11.98% | 15.64% | 11.70% | 10.34% | 12.06% | 11.42% | 11.52% | 10.42% | 7.25% |
| A-G | 25.52% | 29.60% | 26.36% | 27.93% | 30.00% | 27.87% | 29.34% | 28.21% | 30.19% | 32.46% |
| A-T | 6.52% | 7.70% | 6.53% | 5.21% | 4.86% | 5.14% | 4.82% | 5.25% | 4.88% | 4.72% |
| C-G | 9.83% | 9.87% | 9.83% | 8.15% | 8.57% | 8.39% | 8.77% | 8.09% | 8.48% | 6.33% |
| C-T | 24.36% | 28.73% | 25.31% | 35.70% | 36.01% | 35.04% | 34.86% | 35.81% | 35.92% | 41.61% |
| G-T | 17.25% | 12.11% | 16.34% | 11.30% | 10.22% | 11.51% | 10.80% | 11.12% | 10.12% | 7.63% |
Fig. 6a Boxplots that show the number of differences between DNA and RNA. b There were 41,529 DNA-RNA differences, 14,876 of which were reported in dbSNP and 877 were reported in RNA editing databases. c Regional categorization of the DNA-RNA differences. d Functional categorization of the DNA-RNA differences
Regional Categories of DNA-RNA differences
| Categories | Number |
|---|---|
| downstream | 160 |
| exonic | 27,073 |
| intergenic | 3623 |
| intronic | 2839 |
| ncRNA_exonic | 958 |
| ncRNA_intronic | 663 |
| ncRNA_splicing | 7 |
| splicing | 42 |
| upstream | 160 |
| UTR3 | 5319 |
| UTR5 | 644 |
Functional Categories of DNA-RNA differences
| Categories | Number |
|---|---|
| Nonsynonymous | 16,611 |
| Stopgain | 485 |
| Stoploss | 30 |
| Synonymous | 9716 |
| Unknown | 249 |