Literature DB >> 20081834

FRT-seq: amplification-free, strand-specific transcriptome sequencing.

Lira Mamanova1, Robert M Andrews, Keith D James, Elizabeth M Sheridan, Peter D Ellis, Cordelia F Langford, Tobias W B Ost, John E Collins, Daniel J Turner.   

Abstract

We report an alternative approach to transcriptome sequencing for the Illumina Genome Analyzer, in which the reverse transcription reaction takes place on the flowcell. No amplification is performed during the library preparation, so PCR biases and duplicates are avoided, and because the template is poly(A)(+) RNA rather than cDNA, the resulting sequences are necessarily strand-specific. The method is compatible with paired- or single-end sequencing.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20081834      PMCID: PMC2861772          DOI: 10.1038/nmeth.1417

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


Analysis of complementary DNA by Next-Generation sequencing (RNA-seq) enables us to build an accurate picture of active transcriptional patterns within an organism 1. The ideal RNA-seq protocol would be accurate, strand-specific, quantitative across a wide dynamic range, compatible with paired-end sequencing, and would detect antisense transcripts unambiguously 2,3. Some, but not all, of these requirements are met by existing methodologies. Neither polydeoxythymine priming nor random hexamer priming yield the strand-specific information that is essential for comprehensive annotation of the transcriptome 4 and identification of antisense transcription 5,6. Consequently, several strand-specific approaches to RNA-seq have been developed 3,7-11, and with the exception of Helicos’ ‘Direct RNA Sequencing’ approach 3, in each case the cDNA is amplified by the polymerase chain reaction (PCR), an inherently biased procedure 12. PCR-amplified libraries can have reduced complexity compared to the total mRNA pool, because different fragments tend to amplify with unequal efficiency. This causes drop-out of some RNA species, and excessive amplification of others – such PCR duplicates are difficult to distinguish from genuinely abundant RNA species. To overcome these limitations, it is preferable to avoid library amplification altogether 3,13. Here we report an RNA-seq approach for the Illumina Genome Analyzer in which reverse transcription takes place on the flowcell surface (‘FRT-seq’; Supplementary Fig. 1, Supplementary Table 1 and Methods). The method is strand-specific, amplification-free, compatible with paired-end sequencing, and avoids any ambiguities that might arise from the addition of non-templated nucleotides by the reverse transcriptase 14: in our method, these will occur at the 3′ end of the adapter sequence and are therefore not sequenced. To evaluate the performance of reverse transcriptase in the flowcell environment, we exploited the ability of this enzyme to use DNA as well as RNA as a template, and performed first strand synthesis on a PCR-amplified PhiX DNA library (Illumina, USA, cat no CT-901-1001). We then completed cluster generation and sequencing following the standard protocols. We calculated sequence coverage in ten-base bins, and compared it to that obtained from the same library following the standard protocol, in which Taq polymerase performs first strand synthesis. The two enzymes performed similarly (Supplementary Fig. 2a). We then divided the PhiX genome (mean % G+C = 44.7%) into low (< 44.7%) and high (> 44.7%) % G+C bins and calculated Spearman correlations between sequence coverage and % G+C for both bins using window sizes from 20 to 210 bp, at 10 bp intervals (Supplementary Table 2). We found a moderate positive correlation for both enzymes with the low % G+C bin, indicating underrepresentation of low % G+C sequences in the mapped sequence data, and a much weaker correlation at high % G+C. The correlation at low % G+C was stronger for Taq polymerase than for reverse transcriptase. Additionally, we found a moderate negative correlation between coverage difference for the two enzymes and % G+C content (Supplementary Table 2 and Supplementary Fig. 2b). Together, this confirms that the reverse transcriptase is no less efficient at seeding clusters than Taq polymerase. There was no discernible difference in the % of reads mapping to the PhiX genome, or in the read quality of the sequences produced with either enzyme (data not shown). We prepared two FRT-seq libraries using a human placental poly A+ RNA sample (Clontech, USA, cat. no. 636103), and prepared one paired end flowcell for each library. We sequenced each for 2 × 37 cycles on an Illumina Genome Analyzer, generating 3.3 and 3.5 Gb of sequence. For comparison we prepared two standard RNA-seq libraries from the same sample, using Illumina’s random priming protocol, and generated 1.6 Gb and 0.5 Gb of sequence. We mapped all reads to annotated genes from the ENSEMBL 15 database, normalized read counts and calculated Pearson correlations between libraries and between all lanes (Supplementary Table 3). FRT-seq was highly reproducible, with a Pearson correlation of 0.993 between the datasets obtained from separate libraries (Fig. 1). Correlations between individual lanes from the same FRT-seq library were close in value to this figure (0.998-1.000), indicating that the slight discrepancy that exists is due to sampling bias, rather than stochastic systematic biases in the library preparation and RT reactions. The correlation between standard RNA-seq libraries was very high between lanes from the same library (approximately 1.000), but lower between libraries (0.866), presumably reflecting stochastic amplification biases incurred during the library preparation PCR (Supplementary Fig. 3a-f). The comparatively poor technical reproducibility is not necessarily representative of the Illumina standard RNA-seq library preparation method per se, but indicates that care must be taken to ensure consistent results throughout the library preparation. Alternative approaches to RNA-seq have been reported 8,11, in which very good technical reproducibility was demonstrated (Pearson correlations = 0.98-0.99), but against which our FRT-seq method still compares favorably.
Figure 1

Correlation plots for FRT-seq libraries

We plotted sequence data obtained from two FRT libraries prepared from the same poly A+ RNA sample. All reads were mapped to annotated genes from the ENSEMBL database, normalized read counts and calculated Pearson correlations between the libraries. RKPM = reads per kilobase of sequence per million reads.

The percentage of duplicate reads is low for the FRT-seq libraries (6.1 % and 7.2 % for libraries FRT1 and FRT2 respectively; Supplementary Table 4), but is higher and varies appreciably between standard libraries (94.1 % and 39.7 % for libraries STD1 and STD2 respectively). Regardless of the causative mechanism, duplicate sequences will be more prevalent for more abundant transcripts. Calculating the frequency of positions at which one or more duplicate sequences are observed, we obtained 2.2 % for each FRT-seq library and 74.2 % and 13.9 % for standard RNA-seq libraries respectively. The fragmentation methods are identical between standard and FRT-seq libraries, indicating that the observed difference in duplication frequency between library types is largely due to PCR bias. To evaluate the influence of template % G+C on read depth, we divided sequences obtained by both methods into bins of % G+C, for the entire mapped fragment. Sequences generated by the PCR-based standard method appear to be biased away from lower % G+C towards a more neutral % G+C, compared to the FRT-seq data (Supplementary Fig. 4a, b). This mirrors the effect of PCR on genomic DNA 12. For both methods, we assessed the evenness of sequence coverage along the length of genes, both in their entirety (Supplementary Fig. 5a, b) and across individual exons (Supplementary Fig. 5c-g). Representation was observed to be more even in the FRT-seq libraries compared to standard libraries. To determine how closely the FRT-seq data correlated with microarray-derived expression data, we ran the poly A+ RNA sample on Human Expression BeadChips (Illumina) in triplicate, and compared the results to transcript counts obtained from FRT-seq and standard RNA-seq libraries (Supplementary Fig. 6). The Pearson correlation between transcription levels derived from array data and those obtained from FRT-seq (0.676) was substantially better than between array data and standard RNA-seq library (0.482), indicating that FRT-seq is the more quantitative approach. Correlations between individual RNA-seq libraries and array data differed slightly, reflecting differences in library quality (0.423 and 0.493 for libraries STD1 and STD2 respectively), whereas those between libraries FRT1 and FRT2 were in close agreement (0.676 and 0.674 respectively). These correlations are lower than has been reported previously for standard libraries 16. The arrays used in our study, Illumina HumanWG-6 v3 Expression BeadChips, were designed to detect mainly the 3′ end of transcripts, whereas the FRT-seq data represents their entirety, making the two types of data difficult to normalize, and hindering direct comparison. Additionally, the background signal of arrays may contribute to the failure of sequence and array data to correlate perfectly 16. Nevertheless, our results reveal that PCR amplification bias is a major cause of discordance between array and sequence data. Tables of called genes and read counts from both FRT-seq and standard libraries are available at ftp://ftp.sanger.ac.uk/pub/transseq Sequences obtained using FRT-seq are necessarily strand-specific. To demonstrate this, we mapped all reads to the NCBI build 36 version of the human genome and created forward and reverse strand .wig files, for viewing in the Integrated Genome Browser (IGB, http://www.affymetrix.com/partners_programs/programs/developer/tools/igbsource_terms.affx; Fig. 2). The majority of reads produced by FRT-seq mapped with +− orientation, the first read corresponding to the sense strand and the second read corresponding to the antisense strand. For the standard, non-directional libraries, reads map to both strands with similar frequency (Supplementary Figures 7a, b and Supplementary Table 5).
Figure 2

Strand specificity of FRT-seq

Sequences generated by FRT-seq were mapped against the human genome,. .wig files are displayed in IGB, though the colours were modified for clarity (dark red). For comparison, sequences made using the standard RT-seq library preparation protocols and flowcell amplification are also shown (blue). Below is a representation of the region of human chromosome 1p36, and beneath this genes are shown in Ensembl together with the strands from which the transcript is produced.

An appreciable percentage of reads mapped in the −+ orientation (2.55%), compared to the gene annotation. This is the least likely combination to arise from chimaerism, but would be expected for antisense transcripts. The value is highly consistent between the different libraries and between different lanes within the same library. Approximately 40 % of sequences mapping within the 1 kb upstream regions are in the antisense orientation, compared to < 3 % overall, indicating significant enrichment of antisense reads in the promoter regions (2-tailed p < 0.0001, Fisher’s exact test), consistent with their being genuine antisense transcripts 6 (Supplementary Table 6). A reasonably high proportion of sequences mapped to intergenic regions, both for FRT-seq and standard RNA-seq libraries. When FRT-seq was performed on zebrafish ovary poly A+ RNA, mapping to Zv8, very few intergenic sequences were evident (Supplementary Figure 8). It is possible that the commercial human placental poly A+ RNA sample may have been contaminated with DNA or unspliced RNA, or that the human gene annotations in the ENSEMBL database are incomplete 16. To conclude, FRT-seq enables amplification-free RNA-seq, and generates sequences that are strand-specific and compatible with paired end sequencing, presents no opportunity for the formation of intermolecular priming artifacts. We anticipate that this method will prove to be the method of choice for transcriptome sequencing in the future.
  22 in total

1.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

2.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

3.  Stem cell transcriptome profiling via massive-scale mRNA sequencing.

Authors:  Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

4.  Direct RNA sequencing.

Authors:  Fatih Ozsolak; Adam R Platt; Dan R Jones; Jeffrey G Reifenberger; Lauryn E Sass; Peter McInerney; John F Thompson; Jayson Bowers; Mirna Jarosz; Patrice M Milos
Journal:  Nature       Date:  2009-09-23       Impact factor: 49.962

5.  A high-resolution map of transcription in the yeast genome.

Authors:  Lior David; Wolfgang Huber; Marina Granovskaia; Joern Toedling; Curtis J Palm; Lee Bofkin; Ted Jones; Ronald W Davis; Lars M Steinmetz
Journal:  Proc Natl Acad Sci U S A       Date:  2006-03-28       Impact factor: 11.205

6.  Highly integrated single-base resolution maps of the epigenome in Arabidopsis.

Authors:  Ryan Lister; Ronan C O'Malley; Julian Tonti-Filippini; Brian D Gregory; Charles C Berry; A Harvey Millar; Joseph R Ecker
Journal:  Cell       Date:  2008-05-02       Impact factor: 41.582

Review 7.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

8.  Quantification of the yeast transcriptome by single-molecule sequencing.

Authors:  Doron Lipson; Tal Raz; Alix Kieu; Daniel R Jones; Eldar Giladi; Edward Thayer; John F Thompson; Stan Letovsky; Patrice Milos; Marie Causey
Journal:  Nat Biotechnol       Date:  2009-07-05       Impact factor: 54.908

9.  The transcriptional landscape of the mammalian genome.

Authors:  P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal:  Science       Date:  2005-09-02       Impact factor: 47.728

10.  Transcriptome analysis by strand-specific sequencing of complementary DNA.

Authors:  Dmitri Parkhomchuk; Tatiana Borodina; Vyacheslav Amstislavskiy; Maria Banaru; Linda Hallen; Sylvia Krobitsch; Hans Lehrach; Alexey Soldatov
Journal:  Nucleic Acids Res       Date:  2009-07-20       Impact factor: 16.971

View more
  68 in total

1.  A multiplex RNA-seq strategy to profile poly(A+) RNA: application to analysis of transcription response and 3' end formation.

Authors:  Kristi Fox-Walsh; Jeremy Davis-Turak; Yu Zhou; Hairi Li; Xiang-Dong Fu
Journal:  Genomics       Date:  2011-04-15       Impact factor: 5.736

2.  Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes.

Authors:  Katsuyuki Shiroguchi; Tony Z Jia; Peter A Sims; X Sunney Xie
Journal:  Proc Natl Acad Sci U S A       Date:  2012-01-09       Impact factor: 11.205

3.  Low-bias, strand-specific transcriptome Illumina sequencing by on-flowcell reverse transcription (FRT-seq).

Authors:  Lira Mamanova; Daniel J Turner
Journal:  Nat Protoc       Date:  2011-10-20       Impact factor: 13.491

4.  Single-tube linear DNA amplification for genome-wide studies using a few thousand cells.

Authors:  Pattabhiraman Shankaranarayanan; Marco-Antonio Mendoza-Parra; Wouter van Gool; Luisa M Trindade; Hinrich Gronemeyer
Journal:  Nat Protoc       Date:  2012-01-26       Impact factor: 13.491

5.  Going small is the new big.

Authors:  Kornelia Polyak
Journal:  Nat Methods       Date:  2010-08       Impact factor: 28.547

6.  Global analysis of trans-splicing in Drosophila.

Authors:  C Joel McManus; Michael O Duff; Jodi Eipper-Mains; Brenton R Graveley
Journal:  Proc Natl Acad Sci U S A       Date:  2010-07-01       Impact factor: 11.205

Review 7.  RNA sequencing: advances, challenges and opportunities.

Authors:  Fatih Ozsolak; Patrice M Milos
Journal:  Nat Rev Genet       Date:  2010-12-30       Impact factor: 53.242

Review 8.  Single-molecule direct RNA sequencing without cDNA synthesis.

Authors:  Fatih Ozsolak; Patrice M Milos
Journal:  Wiley Interdiscip Rev RNA       Date:  2011-03-14       Impact factor: 9.957

9.  Cleavage of rRNA ensures translational cessation in sperm at fertilization.

Authors:  G D Johnson; E Sendler; C Lalancette; R Hauser; M P Diamond; S A Krawetz
Journal:  Mol Hum Reprod       Date:  2011-08-10       Impact factor: 4.025

10.  Linear amplification for deep sequencing.

Authors:  Wieteke A M Hoeijmakers; Richárd Bártfai; Kees-Jan Françoijs; Hendrik G Stunnenberg
Journal:  Nat Protoc       Date:  2011-06-23       Impact factor: 13.491

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.