| Literature DB >> 33243147 |
Valeriia Sherina1, Helene R McMurray2,3, Winslow Powers4, Harmut Land2, Tanzy M T Love1, Matthew N McCall5,6.
Abstract
BACKGROUND: Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference.Entities:
Keywords: Direct estimation; Gene expression; Missing not at random (MNAR); Multiple imputation; Non-detects; Quantitative real-time PCR (qPCR)
Mesh:
Year: 2020 PMID: 33243147 PMCID: PMC7693525 DOI: 10.1186/s12859-020-03807-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparison of methods to handle missing data based on 100 simulated data sets. The left and the right panels show the bias of and the standard error of respectively, calculated with mean imputation (Mean), single imputation (SI), direct estimation (DirEst), and multiple imputation (MI)
Performance assessments of direct estimation and multiple imputation under misspecification of the missing data mechanism based on 100 simulated data sets with 16 genes and 6 replicates per dataset
| Direct estimation | Multiple imputation | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | MSE | Bias | MSE | |||||||||
| Logit | ||||||||||||
| − 0.024 | 0.003 | 0.025 | 0.072 | 0.114 | 0.156 | − 0.024 | 0.003 | 0.023 | 0.072 | 0.114 | 0.155 | |
| − 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.001 | 0.020 | − 0.001 | 0.014 | 0.000 | 0.001 | 0.004 | |
| Probit | ||||||||||||
| − 0.022 | 0.003 | 0.024 | 0.071 | 0.114 | 0.153 | − 0.024 | − 0.003 | 0.023 | 0.072 | 0.113 | 0.155 | |
| − 0.002 | − 0.001 | 0.000 | 0.000 | 0.001 | 0.001 | − 0.020 | − 0.001 | 0.014 | 0.000 | 0.001 | 0.004 | |
| Cloglog | ||||||||||||
| − 0.025 | 0.003 | 0.026 | 0.007 | 0.114 | 0.156 | − 0.025 | 0.002 | 0.022 | 0.072 | 0.114 | 0.156 | |
| − 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.001 | − 0.022 | 0.000 | 0.014 | 0.000 | 0.001 | 0.004 | |
The 25th (left), 50th (center), and 75th (right) quantiles of the bias and mean squared error (MSE) are reported
Performance assessments of direct estimation and multiple imputation for varying number of replicates based on 100 simulated data sets
| Direct estimation | Multiple imputation | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | MSE | Bias | MSE | |||||||||
| k=4 | ||||||||||||
| − 0.024 | 0.005 | 0.031 | 0.112 | 0.171 | 0.225 | − 0.024 | 0.004 | 0.029 | 0.113 | 0.170 | 0.232 | |
| − 0.002 | 0.000 | 0.003 | 0.001 | 0.003 | 0.006 | − 0.045 | − 0.019 | 0.030 | 0.001 | 0.002 | 0.005 | |
| k=6 | ||||||||||||
| − 0.024 | 0.003 | 0.025 | 0.072 | 0.114 | 0.156 | − 0.024 | 0.003 | 0.023 | 0.072 | 0.114 | 0.155 | |
| − 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.001 | − 0.02 | − 0.001 | 0.014 | 0.000 | 0.001 | 0.004 | |
| k=10 | ||||||||||||
| − 0.014 | 0.003 | 0.020 | 0.044 | 0.069 | 0.089 | − 0.015 | 0.002 | 0.017 | 0.044 | 0.069 | 0.090 | |
| 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | − 0.008 | 0.003 | 0.024 | 0.000 | 0.000 | 0.002 | |
The 25th (left), 50th (center), and 75th (right) quantiles of the bias and MSE are reported
Summary statistics for the difference between estimates of within replicate variance: in three real data sets
| Min. | 1st. Qu. | Median | Mean | 3rd. Qu. | Max. | |
|---|---|---|---|---|---|---|
| Dataset 1 | 0.024 | 0.026 | 0.033 | 0.095 | 0.109 | 0.429 |
| Dataset 2 | 0.006 | 0.009 | 0.027 | 1.014 | 0.110 | 35.380 |
| Dataset 3 | 0.010 | 0.019 | 0.040 | 0.056 | 0.042 | 0.199 |
Fig. 2Comparison of methods for non-detects based on masked at Cq=30 values. The left panel shows the difference between estimated from the complete data and the estimate of obtained from the truncated data, applying PEMM, DirEst, and MI respectively. The right panel shows the difference between calculated from the complete data and four estimates of : from the truncated data, PEMM, DirEst, and MI. Black solid lines represent means of the differences between parameter estimates from the complete data and parameter estimates from the masked data