Literature DB >> 19808880

HTqPCR: high-throughput analysis and visualization of quantitative real-time PCR data in R.

Abstract

MOTIVATION: Quantitative real-time polymerase chain reaction (qPCR) is routinely used for RNA expression profiling, validation of microarray hybridization data and clinical diagnostic assays. Although numerous statistical tools are available in the public domain for the analysis of microarray experiments, this is not the case for qPCR. Proprietary software is typically provided by instrument manufacturers, but these solutions are not amenable to the tandem analysis of multiple assays. This is problematic when an experiment involves more than a simple comparison between a control and treatment sample, or when many qPCR datasets are to be analyzed in a high-throughput facility.
RESULTS: We have developed HTqPCR, a package for the R statistical computing environment, to enable the processing and analysis of qPCR data across multiple conditions and replicates. AVAILABILITY: HTqPCR and user documentation can be obtained through Bioconductor or at http://www.ebi.ac.uk/bertone/software. CONTACT: bertone@ebi.ac.uk

Entities: Chemical

Mesh：

Year: 2009 PMID： 19808880 PMCID： PMC2788924 DOI： 10.1093/bioinformatics/btp578

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Quantitative real-time polymerase chain reaction (qPCR) is widely used for the detection of specific nucleic acids, measurement of RNA transcript abundance and validation of high-throughput experimental results. qPCR is often performed in standard 96-well plates, and newer instruments can utilize higher density formats. These include the Roche LightCycler, which can accommodate 384-well thermocycler blocks, and the Applied Biosystems TaqMan machines employing 384-well Low Density Array (TLDA) micro-fluidic cards. The technology relies on fluorescence data as a measure of RNA or DNA template concentration, represented by cycle threshold (C) values determined at the initial phase of exponential amplification. The calculation of fold changes between genes often entails only limited comparisons of C values across two conditions, and omits statistical testing of the significance of observed differences. We have developed a package for high-throughput analysis of qPCR data (HTqPCR) within the R/Bioconductor framework (Gentleman et al., 2004). The software performs quality assessment, normalization, data visualization and statistical significance testing for C values between features (genes and microRNAs) across multiple biological conditions, such as different cell culture treatments, comparative expression profiles or time-series experiments.

2 SOFTWARE FEATURES

HTqPCR is developed for the R statistical computing environment (www.r-project.org), will run on all major platforms and is available as open source. Core R and Bioconductor packages are the only software dependencies and the package includes a detailed tutorial.

2.1 Data input requirements

The input data format consists of tab-separated text files containing C values, feature identifiers (genes, microRNAs, etc.) and other (optional) information. Data files can be user-formatted plain text or the direct output of Sequence Detection Systems (SDS) software. Internally, this information is embodied as instances of the qPCRset class, which are analogous to the ExpressionSet objects typically used to represent fluorescence data in microarray analyses.

2.2 Visualization features

HTqPCR contains multiple functions for data visualization. Subsets of genes across one or more samples can be represented in bar plots, displaying either absolute C values or fold changes compared with a calibrator sample (Fig. 1). Data quality control across samples can be assessed via diagnostic aids such as density distributions, box plots, scatterplots and histograms, some of which can be stratified according to various attributes of the features (Fig. 2). When qPCR assays are performed in multiwell plates or another spatially defined layout, the C values can be plotted accordingly to visualize any spatial artifacts such as edge effects (Fig. 3). Clustering of samples or genes can be performed using principal component analysis, heatmaps or dendrograms.

Fig. 1.

Fig. 2.

Box plot of C values across all samples, stratified based on the class membership of each gene (A) and the distribution of C values across samples after normalization using three different methods (B).

Fig. 3.

C values for a typical qPCR assay performed in 384-well format. Gray wells overlaid with crosses were flagged as ‘Undetermined’.

Log2 ratios between the normalized C values for four different sample groups, relative to the calibrator (Group 1; ratio=0.0). Error bars indicate the 90% confidence interval compared with the average calibrator C. Box plot of C values across all samples, stratified based on the class membership of each gene (A) and the distribution of C values across samples after normalization using three different methods (B). C values for a typical qPCR assay performed in 384-well format. Gray wells overlaid with crosses were flagged as ‘Undetermined’.

2.3 C quality control

Individual C values are a principal source of uncertainty in qPCR results. This can arise due to inherent bias in the amplification conditions (variable primer annealing, amplicon sequence content, suboptimal reaction temperature or salt concentration, etc.), or when initial template concentrations are insufficient to generate copy numbers exceeding the minimum detection threshold. In HTqPCR, the reliability of C values can be assessed either individually or across replicates. Through user-adjustable parameters all values are flagged as one of ‘OK’, ‘Undetermined’ or ‘Unreliable’, and this information is propagated throughout the analysis. Non-specific filtering can be applied to remove genes that are marked ‘Undetermined’ and/or ‘Unreliable’ across samples, or those having low variability (i.e. not differentially expressed) after normalization.

2.4 Data normalization

The qPCR data are often normalized by subtracting average C values from those of predetermined housekeeping genes, producing a ΔC readout (Livak and Schmittgen, 2001). More sophisticated normalization procedures are also implemented in HTqPCR, for use when housekeeping genes are not present or not reliably expressed. Three different normalization options are available in HTqPCR: (i) rank-invariant features across the experiment can be used to scale each sample; (ii) quantile normalization can be performed to produce a uniform empirical distribution of C values across samples; and (iii) a pseudo-mean or -median reference can be defined, rank-invariant features for each sample are identified, and a normalization curve is generated by smoothing the corresponding C values (Fig. 2B). For the rank-invariant methods, low-quality C values can also be excluded when calculating a scaling factor or normalization curve, thereby avoiding additional bias.

2.5 Statistical testing

Assuming normally distributed C values and equal variance across sample groups being compared, fold-change significance can be assessed in two ways: applying a t-test between two conditions, or using methods from the limma package (Smyth, 2005) for more sophisticated comparisons. Information about the quality of each feature (‘OK’, ‘Undetermined’ or ‘Unreliable’) across biological and technical replicates is summarized in the final results.

3 CONCLUSIONS

Efficient data processing is required for the use of high-throughput qPCR applications. HTqPCR is a software package amenable to the analysis of high-density qPCR assays, either for individual experiments or across sets of replicates and biological conditions. Methods are implemented to handle all phases of the analysis, from raw C values, quality control and normalization to final results. As the software is R based, it runs on different operating systems and is easy to incorporate into an analysis pipeline, or used in conjunction with other tools available through the Bioconductor project.

2 in total

1. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method.

Authors: K J Livak; T D Schmittgen
Journal: Methods Date: 2001-12 Impact factor: 3.608

2. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

2 in total

113 in total

1. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Authors: Sheng Li; Scott W Tighe; Charles M Nicolet; Deborah Grove; Shawn Levy; William Farmerie; Agnes Viale; Chris Wright; Peter A Schweitzer; Yuan Gao; Dewey Kim; Joe Boland; Belynda Hicks; Ryan Kim; Sagar Chhangawala; Nadereh Jafari; Nalini Raghavachari; Jorge Gandara; Natàlia Garcia-Reyero; Cynthia Hendrickson; David Roberson; Jeffrey Rosenfeld; Todd Smith; Jason G Underwood; May Wang; Paul Zumbo; Don A Baldwin; George S Grills; Christopher E Mason
Journal: Nat Biotechnol Date: 2014-08-24 Impact factor: 54.908

2. Skeletal muscle miR-34a/SIRT1:AMPK axis is activated in experimental and human non-alcoholic steatohepatitis.

Authors: André L Simão; Marta B Afonso; Pedro M Rodrigues; Margarida Gama-Carvalho; Mariana V Machado; Helena Cortez-Pinto; Cecília M P Rodrigues; Rui E Castro
Journal: J Mol Med (Berl) Date: 2019-05-28 Impact factor: 4.599

3. Effect of salt stress on genes encoding translation-associated proteins in Arabidopsis thaliana.

Authors: Mohammad Amin Omidbakhshfard; Nooshin Omranian; Farajollah Shahriari Ahmadi; Zoran Nikoloski; Bernd Mueller-Roeber
Journal: Plant Signal Behav Date: 2012-08-17

4. Transcription factor Etv5 is essential for the maintenance of alveolar type II cells.

Authors: Zhen Zhang; Kim Newton; Sarah K Kummerfeld; Joshua Webster; Donald S Kirkpatrick; Lilian Phu; Jeffrey Eastham-Anderson; Jinfeng Liu; Wyne P Lee; Jiansheng Wu; Hong Li; Melissa R Junttila; Vishva M Dixit
Journal: Proc Natl Acad Sci U S A Date: 2017-03-28 Impact factor: 11.205

5. Impaired expression of DICER, DROSHA, SBDS and some microRNAs in mesenchymal stromal cells from myelodysplastic syndrome patients.

Authors: Carlos Santamaría; Sandra Muntión; Beatriz Rosón; Belén Blanco; Olga López-Villar; Soraya Carrancio; Fermín M Sánchez-Guijo; María Díez-Campelo; Stela Alvarez-Fernández; María E Sarasquete; Javier de las Rivas; Marcos González; Jesús F San Miguel; María Consuelo Del Cañizo
Journal: Haematologica Date: 2012-02-27 Impact factor: 9.941

6. Serum microRNAs as novel biomarkers for primary sclerosing cholangitis and cholangiocarcinoma.

Authors: F Bernuzzi; F Marabita; A Lleo; M Carbone; M Mirolo; M Marzioni; G Alpini; D Alvaro; K M Boberg; M Locati; G Torzilli; L Rimassa; F Piscaglia; X-S He; C L Bowlus; G-X Yang; M E Gershwin; P Invernizzi
Journal: Clin Exp Immunol Date: 2016-05-17 Impact factor: 4.330

7. Gene expression signature that predicts early molecular response failure in chronic-phase CML patients on frontline imatinib.

Authors: Chung H Kok; David T Yeung; Liu Lu; Dale B Watkins; Tamara M Leclercq; Phuong Dang; Verity A Saunders; John Reynolds; Deborah L White; Timothy P Hughes
Journal: Blood Adv Date: 2019-05-28

8. miR-152 regulates the proliferation and differentiation of C2C12 myoblasts by targeting E2F3.

Authors: Mailin Gan; Jingjing Du; Linyuan Shen; Dongli Yang; Anan Jiang; Qiang Li; Yanzhi Jiang; Guoqing Tang; Mingzhou Li; Jinyong Wang; Xuewei Li; Shunhua Zhang; Li Zhu
Journal: In Vitro Cell Dev Biol Anim Date: 2018-03-05 Impact factor: 2.416

9. P2X7 receptors in body temperature, locomotor activity, and brain mRNA and lncRNA responses to sleep deprivation.

Authors: Christopher J Davis; Ping Taishi; Kimberly A Honn; John N Koberstein; James M Krueger
Journal: Am J Physiol Regul Integr Comp Physiol Date: 2016-10-05 Impact factor: 3.619

10. SOX11 identified by target gene evaluation of miRNAs differentially expressed in focal and non-focal brain tissue of therapy-resistant epilepsy patients.

Authors: Sierk Haenisch; Yi Zhao; Aparna Chhibber; Kitti Kaiboriboon; Lynn V Do; Silke Vogelgesang; Nicholas M Barbaro; Brian K Alldredge; Daniel H Lowenstein; Ingolf Cascorbi; Deanna L Kroetz
Journal: Neurobiol Dis Date: 2015-03-10 Impact factor: 5.996