| Literature DB >> 22113004 |
Abstract
Initial high-throughput RNA sequencing (RNA-Seq) experiments have revealed a complex and dynamic transcriptome, but because it samples transcripts in proportion to their abundances, assessing the extent and nature of low-level transcription using this technique has been difficult. A new assay, RNA CaptureSeq, addresses this limitation of RNA-Seq by enriching for low-level transcripts with cDNA tiling arrays prior to high-throughput sequencing. This approach reveals a plethora of transcripts that have been previously dismissed as 'noise', and hints at single-cell transcription fingerprints that may be crucial in defining cellular function in normal and disease states.Entities:
Year: 2011 PMID: 22113004 PMCID: PMC3308029 DOI: 10.1186/gm290
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1To demonstrate the difficulty of accurate isoform-level abundance estimation on low-abundance genes, we simulated an RNA CaptureSeq experiment on the 18 isoforms of the dystrophin gene as annotated in RefSeq (hg19), shown in (a). For each number of 76 bp paired-end fragments that aligned to the gene, we estimated abundances of each isoform using the online EM algorithm (for details of the model used see [8] and for details of the implementation see [11]). (b) The accuracy of isoform abundance estimation measured as the Pearson correlation coefficient (r) of the logged relative abundance estimates compared with the true abundance used to generate the simulated data. The results are averaged over four simulations from different random abundance distributions. Because of the similarity of the isoforms, only 2.5% of fragments aligned uniquely to a single isoform on average, making the deconvolution particularly difficult. The bottom x-axis shows how many alignable paired-end fragments would be required to achieve the same r in a genome-wide experiment as in the CaptureSeq simulation. Here we assume 3.17 fragments per kilobase per million mapped reads (FPKM) for the gene, which is what we estimated from a sample ENCODE dataset [accession SRR065495].