| Literature DB >> 21075741 |
Ryan Koehler1, Hadar Issac, Nicole Cloonan, Sean M Grimmond.
Abstract
SUMMARY: Quantification applications of short-tag sequencing data (such as CNVseq and RNAseq) depend on knowing the uniqueness of specific genomic regions at a given threshold of error. Here, we present the 'uniqueome', a genomic resource for understanding the uniquely mappable proportion of genomic sequences. Pre-computed data are available for human, mouse, fly and worm genomes in both color-space and nucletotide-space, and we demonstrate the utility of this resource as applied to the quantification of RNAseq data. AVAILABILITY: Files, scripts and supplementary data are available from http://grimmond.imb.uq.edu.au/uniqueome/; the ISAS uniqueome aligner is freely available from http://www.imagenix.com/.Entities:
Mesh:
Year: 2010 PMID: 21075741 PMCID: PMC3018812 DOI: 10.1093/bioinformatics/btq640
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Color-space (CS-50-5) and nucleotide-space (NS-50-2) uniqueome plots visualized alongside RNAseq data. The same 50mer RNAseq tags were aligned using several specialized short-read aligners in both nucleotide-space (red) and color-space (green). The yellow region highlights an area with no uniqueome coverage (confirmed by BLAT as a multimapping region), where tags have been falsely declared as ‘uniquely mapping’ by the various aligners. No repetitive elements were detected by RepeatMasker. See Supplementary Material for details.
Proportions of unique start sites for nucleotide-space short tag alignments
| Species | 25 (1) (%) | 30 (1) (%) | 35 (1) (%) | 50 (2) (%) | 60 (3) (%) | 75 (4) (%) | 90 (5) (%) |
|---|---|---|---|---|---|---|---|
| 66.0 | 70.9 | 74.1 | 76.9 | 77.5 | 79.3 | 80.8 | |
| 69.9 | 74.4 | 77.1 | 79.1 | 79.4 | 80.7 | 81.7 | |
| 85.3 | 87.7 | 89.0 | 89.8 | 89.9 | 90.6 | 91.1 | |
| 67.5 | 68.4 | 69.0 | 69.2 | 69.2 | 69.5 | 69.8 |
Columns shown are length of tag matched; numbers in parentheses represent the number of mismatches allowed.
aBuild hg19.
bBuild mm9.
cBuild ce6.
dBuild dm3.
Fig. 2.A mirror image plot showing the relationship between the length of a gene and the unique length of a gene for color-space (red) and nucleotide-space (blue). The uniqueomes of human RefSeq genes (release 39) using hg19 coordinates were investigated for 50mer tags using two mismatches in nucleotide-space and five mismatches in color-space.
Strategies to deal with multimapping tags and their correlation to microarray data from the same RNA sample
| Method | Pearson | 95% confidence interval |
|---|---|---|
| Raw tag counts (RPKM) | 0.38 | 0.35–0.41 |
| Non-unique tag rescue counts (RPKM) | 0.46 | 0.43–0.49 |
| Uniqueome normalized tag counts (RPKM) | 0.50 | 0.47–0.52 |