Literature DB >> 24260380

DAG expression: high-throughput gene expression analysis of real-time PCR data using standard curves for relative quantification.

María Ballester1, Rubén Cordón, Josep M Folch.   

Abstract

BACKGROUND: Real-time quantitative PCR (qPCR) is still the gold-standard technique for gene-expression quantification. Recent technological advances of this method allow for the high-throughput gene-expression analysis, without the limitations of sample space and reagent used. However, non-commercial and user-friendly software for the management and analysis of these data is not available.
RESULTS: The recently developed commercial microarrays allow for the drawing of standard curves of multiple assays using the same n-fold diluted samples. Data Analysis Gene (DAG) Expression software has been developed to perform high-throughput gene-expression data analysis using standard curves for relative quantification and one or multiple reference genes for sample normalization. We discuss the application of DAG Expression in the analysis of data from an experiment performed with Fluidigm technology, in which 48 genes and 115 samples were measured. Furthermore, the quality of our analysis was tested and compared with other available methods.
CONCLUSIONS: DAG Expression is a freely available software that permits the automated analysis and visualization of high-throughput qPCR. A detailed manual and a demo-experiment are provided within the DAG Expression software at http://www.dagexpression.com/dage.zip.

Entities:  

Mesh:

Year:  2013        PMID: 24260380      PMCID: PMC3832397          DOI: 10.1371/journal.pone.0080385

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Real-time quantitative PCR (qPCR) analysis is the most common method to analyze gene expression due to the excellent sensitivity and specificity of PCR. It generates high-quality data without the requirement of additional validation. In fact, this methodology is applied to validate data obtained by higher throughput technologies such as microarray or RNA-Seq experiments [1]. Due to the technological advances produced in recent years, this methodology can also be used to perform high-throughput gene-expression quantification [2], [3]. In parallel to these technological advances, several methodologies have been developed to calculate the relative fold-change expression taking into account the efficiency of PCR [4]–[8]. Among them, the comparative threshold cycle method [4], [8] is one of the most commonly used methods and has recently been implemented in R packages for users with experience in this statistical environment [9], [10]. However, this method requires the PCR efficiencies of target and control genes to be approximately equal, and close to 100%, requirements that are not always achieved with high-throughput gene-expression measurements with qPCR. Other methods using different mathematical models, in which target-specific amplification efficiencies are introduced into equations to calculate relative expression data normalized to one [5] or multiple [7] reference genes, have also been developed. The latter is now available through the commercial software qBasePlus (Biogazelle) [7]. It is well established that the best method to calculate PCR efficiency is through the construction of standard curves [7]. One of the advantages of the commercial microarrays currently available, such as the OpenArray® (Applied Biosystems) or the microfluidic dynamic Arrays (Fluidigm), is the feasibility of constructing standard curves of multiple assays using the same n-fold, serial diluted samples. Hence, this allows for the use of relative standard quantification to compare the relative concentrations among multiple samples and the analysis of gene-expression profiling among multiple assays. In the linear regression analysis method, the relative concentrations of unknown samples are calculated accounting for unequal efficiencies of target and control genes. Here, we have developed user-friendly software for the automated analysis of high-throughput gene-expression data by drawing relative standard curves for relative quantification, allowing the use of one or multiple-reference genes for sample normalization. To the best of our knowledge, there is no available software to perform an interactive analysis with standard curves with multiple genes and samples.

Methods

Implementation

Data Analysis Gene (DAG) Expression has been developed in Visual Basic.Net and will run under the major Microsoft operating systems (Microsoft® Windows® 7 or XP). A detailed manual and a demo-experiment (example results data) consisting of a microfluidic dynamic array™ IFC (48.48) containing 48 assays (44 target and 4 reference genes) and 48 samples are available in the Help menu, which allow users to be familiar with the DAG Expression software.

Installation

The program does not need installation, it is zipped into a file and the user can unzip it with a standard program to a folder or desktop. In most systems, Framework.Net is installed, however, if the program gives an error message, the user must download and install Framework.Net 4 or above (http://www.microsoft.com/en-us/download/details.aspx?id=17851).

Software features

The diagrams for processing analysis are presented in Figure S1 and S2. At present, the DAG Expression workflow can be summarized as follows:

Data import

Output files (.csv or txt files) of different software (Fluidigm Real-Time PCR analysis software, SDS, etc.) or user-formatted input data containing the assay name (genes being measured), sample name and Ct (threshold cycle) values separated by a semicolon can be imported (Figure 1A). The program is able to import multiple files, allowing for the analysis of more than 96 genes and hundreds of samples; the name of each file can be visualized in the ‘File Name’ column of the sample data table (main work area).
Figure 1

DAG Expression; (A) Import window; (B) Assay data table (left) and sample data table (right) work area; (C) Standard curve with a four-fold dilution series (1/4, 1/16, 1/64, 1/256, 1/1024) used to extrapolate the quantity values of Unknown samples; (D) Control-gene stability analysis.

M-values for 4 selected control genes; (E) Results table with different parameters presented for each assay.

DAG Expression; (A) Import window; (B) Assay data table (left) and sample data table (right) work area; (C) Standard curve with a four-fold dilution series (1/4, 1/16, 1/64, 1/256, 1/1024) used to extrapolate the quantity values of Unknown samples; (D) Control-gene stability analysis.

M-values for 4 selected control genes; (E) Results table with different parameters presented for each assay. Once imported, the DAG expression software converts these files into a standard internal format which contains information on the assay name, sample name (user can modify it), Ct value, and Mean Ct value (the arithmetic mean of the raw Ct values for the technical replicates of a given gene; samples with the same name are considered as being technical replicates by the program). Furthermore, when output files are imported to DAG expression, the software directly omits the data points that fail during the qPCR amplification. This allows the identification of these missing data points and allows users to further select data points and exclude them for subsequent analysis. Two output files (fluidigm or generic csv) of the same demo-experiment have been included in the zip file to permit users to practice with the import action and to be familiar with the DAG Expression accepted formats.

Setting parameters

To start data analysis, different parameters such as the assay type (Selected Control or Target), the sample type (Standard, Unknown and Non-Template Control-NTC) and the quantity values in the Standard samples (serial dilutions) have to be set (Figure 1B).

Calculation of relative quantities

Once the raw qPCR data have been imported and the different parameters have been set, the software draws the standard curves for each assay interactively by plotting the Ct values (independent variable, Y) versus the log input amount (serial dilution values; dependent variable, X) of Standard samples (Figure 1C). Then, the software uses the standard curves to calculate the relative Quantity for Unknown samples (Step 1) applying the linear regression equation (y = mx+b) for the best fit line, where the slope m and intercept b are calculated as: where y is Ct of the Standard sample, x is the quantity of the Standard sample in log, n =  number of technical replicates i. Step 1 (equations 1 and 2): Conversion of Ct values into relative quantities where y is Ct of the Unknown sample, m is the slope of the standard curve, x is the quantity of the Unknown sample in log, b is the y-intercept of the standard curve line and is the quantity of a Unknown sample (s) for a given assay (g): Selected Control or Target. Step 2 (equations 3 and 4): For all technical replicates, i, of an Unknown sample for a given assay, the software calculates the average quantity (Mean Quantity, ) and the standard deviation (SD) of the average (Quantity SD, ). where  =  Quantity value of a Unknown sample for a given assay; n =  number of replicates i where  =  Mean Quantity value for all replicates of a Unknown sample for a given assay; n =  number of technical replicates i

Identification and selection of the most stable expressed reference genes

Single or multiple genes can be used as a normalizer. The multiple-gene normalization method is based on the principles and formulas described by [11] in which the lowest gene-stability measure (M) value indicates genes with the most stable expression (Figure 1D). Once the user has selected the reference genes (Selected Control), the program calculates the arithmetic mean of the mean quantity values of the Selected Controls (Mean Q. S. Control) to produce a normalization factor (NF), step 3. Step 3 (equation 5): Calculation of the NF where  =  Mean Quantity value for all replicates of an Unknown sample for a given selected control; n =  number of selected controls j

Normalization of relative quantity values

Once the NF is obtained, the program calculates the normalized quantity (Norm. Q., ) and the SD of the normalized quantity (Norm. Q. SD, ) of all Unknown samples for each assay (Figure 1E), step 4. Step 4 (equation 6 and 7): Calculation of the normalized value and SD where  =  Mean Quantity value for all replicates of an Unknown sample for a given target assay;  =  Standard deviation of the average quantity of a Unknown sample for a given target assay;  =  Standard deviation of the average quantity of an Unknown sample for a given selected control, n =  total number of selected controls.

Visualization of results

DAG Expression normalized data can be visualized via bar plots (NQ Plot; Figure 2) displaying normalized data (Linear, Log10 or Log2) vs target or sample (Figure 2A). Groups of samples can be also created to visualize the data via bar plots; the software plots the arithmetic mean of the normalized data (Linear, Log10 or Log2) of each group vs target or group (Figure 2B).
Figure 2

Bar chart example.

(A) NQ plot displaying linear normalized quantity (NQ) vs sample. (B) NQ plot by group displaying logarithmic (Log2) normalized quantity (NQ) of each group (1 and 2) vs target.

Bar chart example.

(A) NQ plot displaying linear normalized quantity (NQ) vs sample. (B) NQ plot by group displaying logarithmic (Log2) normalized quantity (NQ) of each group (1 and 2) vs target. Other utilities such as the “coefficient of variation inter-run” or the “efficiency between genes” are also available in the DAG expression software (See the user manual for more information).

Results and Discussion

DAG Expression was created to analyze the expression of 48 genes (44 target genes and 4 reference genes) in 115 samples, using a 48.48 microfluidic dynamic array™ on the BioMarkTM system (Fluidigm) (Ballester et al., manuscript in preparation, 2013). However, the software can be used for the analysis of data obtained from different instruments (See data import in software features). Data from our experiment were collected using the Fluidigm Real-Time PCR analysis software 3.0.2 (Fluidigm), and the output csv files were imported into DAG Expression, as is shown in Figure 1A. The demo experiment (Help menu: Example results data) shows the first experimental 48.48 dynamic array in which a seven-point four-fold dilution series (1/4, 1/16, 1/64, 1/256, 1/1024, 1/4096, 1/16389) per triplicate and 11 unknown samples per duplicate were run. Following “The MIQE Guidelines for Real-Time PCR Experiments” [12], standard curves covering at least 3 orders of magnitude were constructed for each gene (Figure 1C). PCR efficiencies were almost 90% for all of the assays except for Gene35, which was expressed at low levels and was discarded for further analysis. Reference genes Gene03 and Gene28 showed the lowest M value [11] (Figure 1D) and were selected as reference controls to perform subsequent analysis. To calculate the inter-run coefficient of variation, the same unknown sample was consecutively added in all the independent runs. We obtained a very small coefficient of variation (1.9%), indicating a good reproducibility of the microfluidic dynamic arrays (data not shown) [2]. Next, standard curves were used to extrapolate the quantities of the 115 unknown samples using linear regression analysis, and data were normalized using the previously calculated normalization factor (See the calculation step 3 in software features). At this point, the user can visualize the results using the Bar Plots tool to compare the relative expression levels among multiple samples or to view the expression profile of multiple genes. Furthermore, depending on the main aim of the study, data can be easily exported for further statistical and biological interpretation of gene expression data using other specialized programs. To assess the quality of the analysis performed by our software, a comparison with other methods [5], [8] was performed. First, we compared our results against results obtained using conventional qPCR on microliter volume samples and the comparative CT (ΔΔCT) method [8]. In a previous study, the expression profile of the pig ELOVL fatty acid elongase 6 (ELOVL6) gene was evaluated by qPCR using an ABI PRISM 7900HT Sequence Detection System (Applied Biosystems). Results were analyzed using the RQ manager v1.2.1 and the DataAssist™v3.0 software (Applied Biosystems) (for more details, see [13]). The same set of primers (ELOVL6 and reference genes) was added into the 48.48 microfluidic dynamic array™ to be used as control. When the 2−ΔΔCT values (DataAssist™) and the normalized quantity values (DAG Expression) of the same 94 samples corrected by the same calibrator sample were compared, a high correlation coefficient (r = 0.928) was obtained (Figure 3A). The subtle differences observed between the two methods are most probably due to variations in the qPCR amplification efficiencies obtained between experiments (conventional qPCR vs microfluidic array), although slightly differences between measurement platforms cannot be discarded [2]. In the former experiment, qPCR of ELOVL6 and endogenous genes were optimized to obtain PCR efficiencies close to 2 and equal between target and endogenous genes. This optimization is time-consuming and expensive when profiling lots of genes using high-throughput technologies such as 48.48 microfluidic dynamic arrays and, for that reason, the method used by DAG expression, based on standard curves and linear regression analysis, corrects for differences in PCR efficiencies between target and endogenous genes without the need for further qPCR optimization. On the other hand, our demo-results were compared with the results obtained using the mathematical model developed by [5]. For this comparison, we used the target gene Gene17, the reference gene Gene03 and the control sample 11, obtaining also a high correlation coefficient (r = 0.999) (Figure 3B). In this case, both methods adjust for differences in PCR efficiency between target and internal control using different equations. Therefore, our software represents a reliable and accurate tool to perform relative quantification of high-throughput gene-expression data.
Figure 3

Comparisons of data.

(A) Comparison of data obtained with the 2−ΔΔCT method (DataAssist™) vs the relative standard quantification (DAG Expression). (B) Comparison of data obtained with the Pfaffl model vs the relative standard quantification (DAG Expression). The coefficient of correlation (r) is shown above each plot.

Comparisons of data.

(A) Comparison of data obtained with the 2−ΔΔCT method (DataAssist™) vs the relative standard quantification (DAG Expression). (B) Comparison of data obtained with the Pfaffl model vs the relative standard quantification (DAG Expression). The coefficient of correlation (r) is shown above each plot.

Conclusions

In conclusion, recent advances in large-scale RT-qPCR platforms have allowed for the generation of a great amount of gene-expression data. Several methods to determine the relative gene-expression levels have been developed over the years. DAG Expression allows for the management and analysis of high-throughput gene-expression data-sets obtained by RT-qPCR using standard curves for relative quantification and one or multiple genes for sample normalization. DAG expression flow chart. Workflow diagram for the general processing analysis of DAG expression. (TIFF) Click here for additional data file. Control gene stability flow chart. Workflow diagram for the ‘find control gene stability’ tool. (TIFF) Click here for additional data file.
  13 in total

1.  A new mathematical model for relative quantification in real-time RT-PCR.

Authors:  M W Pfaffl
Journal:  Nucleic Acids Res       Date:  2001-05-01       Impact factor: 16.971

2.  Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method.

Authors:  K J Livak; T D Schmittgen
Journal:  Methods       Date:  2001-12       Impact factor: 3.608

3.  Improved real-time RT-PCR method for high-throughput measurements using second derivative calculation and double correction.

Authors:  Van Luu-The; Nathalie Paquet; Ezequiel Calvo; Jean Cumps
Journal:  Biotechniques       Date:  2005-02       Impact factor: 1.993

4.  The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments.

Authors:  Stephen A Bustin; Vladimir Benes; Jeremy A Garson; Jan Hellemans; Jim Huggett; Mikael Kubista; Reinhold Mueller; Tania Nolan; Michael W Pfaffl; Gregory L Shipley; Jo Vandesompele; Carl T Wittwer
Journal:  Clin Chem       Date:  2009-02-26       Impact factor: 8.327

5.  HTqPCR: high-throughput analysis and visualization of quantitative real-time PCR data in R.

Authors:  Heidi Dvinge; Paul Bertone
Journal:  Bioinformatics       Date:  2009-10-06       Impact factor: 6.937

6.  qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data.

Authors:  Jan Hellemans; Geert Mortier; Anne De Paepe; Frank Speleman; Jo Vandesompele
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

7.  ReadqPCR and NormqPCR: R packages for the reading, quality checking and normalisation of RT-qPCR quantification cycle (Cq) data.

Authors:  James R Perkins; John M Dawes; Steve B McMahon; David L H Bennett; Christine Orengo; Matthias Kohl
Journal:  BMC Genomics       Date:  2012-07-02       Impact factor: 3.969

8.  Polymorphism in the ELOVL6 gene is associated with a major QTL effect on fatty acid composition in pigs.

Authors:  Jordi Corominas; Yuliaxis Ramayo-Caldas; Anna Puig-Oliveras; Dafne Pérez-Montarelo; Jose L Noguera; Josep M Folch; Maria Ballester
Journal:  PLoS One       Date:  2013-01-14       Impact factor: 3.240

9.  High throughput gene expression measurement with real time PCR in a microfluidic dynamic array.

Authors:  Sandra L Spurgeon; Robert C Jones; Ramesh Ramakrishnan
Journal:  PLoS One       Date:  2008-02-27       Impact factor: 3.240

10.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.

Authors:  Jo Vandesompele; Katleen De Preter; Filip Pattyn; Bruce Poppe; Nadine Van Roy; Anne De Paepe; Frank Speleman
Journal:  Genome Biol       Date:  2002-06-18       Impact factor: 13.583

View more
  22 in total

1.  Suppressing Farnesyl Diphosphate Synthase Alters Chloroplast Development and Triggers Sterol-Dependent Induction of Jasmonate- and Fe-Related Responses.

Authors:  David Manzano; Paola Andrade; Daniel Caudepón; Teresa Altabella; Montserrat Arró; Albert Ferrer
Journal:  Plant Physiol       Date:  2016-07-05       Impact factor: 8.340

2.  From SNP co-association to RNA co-expression: novel insights into gene networks for intramuscular fatty acid composition in porcine.

Authors:  Yuliaxis Ramayo-Caldas; Maria Ballester; Marina R S Fortes; Anna Esteve-Codina; Anna Castelló; Jose L Noguera; Ana I Fernández; Miguel Pérez-Enciso; Antonio Reverter; Josep M Folch
Journal:  BMC Genomics       Date:  2014-03-26       Impact factor: 3.969

3.  New insight into the SSC8 genetic determination of fatty acid composition in pigs.

Authors:  Manuel Revilla; Yuliaxis Ramayo-Caldas; Anna Castelló; Jordi Corominas; Anna Puig-Oliveras; Noelia Ibáñez-Escriche; María Muñoz; Maria Ballester; Josep M Folch
Journal:  Genet Sel Evol       Date:  2014-04-23       Impact factor: 4.297

4.  Identity, proliferation capacity, genomic stability and novel senescence markers of mesenchymal stem cells isolated from low volume of human bone marrow.

Authors:  Gabrielis Kundrotas; Evelina Gasperskaja; Grazina Slapsyte; Zivile Gudleviciene; Jan Krasko; Ausra Stumbryte; Regina Liudkeviciene
Journal:  Oncotarget       Date:  2016-03-08

5.  A global analysis of CNVs in swine using whole genome sequence data and association analysis with fatty acid composition and growth traits.

Authors:  Manuel Revilla; Anna Puig-Oliveras; Anna Castelló; Daniel Crespo-Piazuelo; Ediane Paludo; Ana I Fernández; Maria Ballester; Josep M Folch
Journal:  PLoS One       Date:  2017-05-04       Impact factor: 3.240

6.  Tomato UDP-Glucose Sterol Glycosyltransferases: A Family of Developmental and Stress Regulated Genes that Encode Cytosolic and Membrane-Associated Forms of the Enzyme.

Authors:  Karla Ramirez-Estrada; Nídia Castillo; Juan A Lara; Monserrat Arró; Albert Boronat; Albert Ferrer; Teresa Altabella
Journal:  Front Plant Sci       Date:  2017-06-09       Impact factor: 5.753

Review 7.  With Reference to Reference Genes: A Systematic Review of Endogenous Controls in Gene Expression Studies.

Authors:  Joanne R Chapman; Jonas Waldenström
Journal:  PLoS One       Date:  2015-11-10       Impact factor: 3.240

8.  LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time PCR (qPCR) Data as an Alternative to Reference Gene Based Methods.

Authors:  Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M Zanger; Maria Thomas
Journal:  PLoS One       Date:  2015-09-01       Impact factor: 3.240

9.  Expression-based GWAS identifies variants, gene interactions and key regulators affecting intramuscular fatty acid content and composition in porcine meat.

Authors:  Anna Puig-Oliveras; Manuel Revilla; Anna Castelló; Ana I Fernández; Josep M Folch; Maria Ballester
Journal:  Sci Rep       Date:  2016-08-18       Impact factor: 4.379

10.  Expression analysis of candidate genes for fatty acid composition in adipose tissue and identification of regulatory regions.

Authors:  Manuel Revilla; Anna Puig-Oliveras; Daniel Crespo-Piazuelo; Lourdes Criado-Mesas; Anna Castelló; Ana I Fernández; Maria Ballester; Josep M Folch
Journal:  Sci Rep       Date:  2018-02-01       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.