Literature DB >> 25535453

Propensity score method for partially matched omics studies.

Abstract

This paper focuses on the problem of partially matched samples in the presence of confounders. We propose using propensity score matching to adjust for confounding factors for the subset of data with incomplete pairs, followed by integrating the P-values computed from the complete and incomplete paired samples, respectively. Several simulations and a case study on DNA methylation are considered to evaluate the operating characteristics of the proposed method.

Entities: Chemical Disease Gene Species

Keywords: confounders; full matching; microarray; observational studies; regression

Year: 2014 PMID： 25535453 PMCID： PMC4267441 DOI： 10.4137/CIN.S16352

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

The advancement in biotechnologies has revolutionized numerous disciplines including biology and medicine. Several high-throughput platforms including whole genome arrays and the next-generation sequencing instruments are available for profiling large-scale omics data. These cutting edge biotechnologies have spurred rapid biomarker discovery and personalized medicine approach in multiple diseases, in particular, cancer research. In recent years, genomewide profiling utilizing these technologies has been carried out to identify biomarkers associated with cancer development and progression. In this paper, we consider the matched pairs samples for identifying differentially expressed biomarkers between two groups. Matched/paired study design is commonly used in omics/biomarkers profiling because it automatically accounts for confounding factors. Examples of matched/paired designs include profiling n (1) tumor and adjacent normal lesions, (2) pre- and post-drug treatment samples, or (3) one-to-one matching of patients by demographic covariates (eg, age, gender, race, etc.) from the two groups of interest. We will use the tumor versus normal samples hereafter for expository purpose. Ideally, one expects a total of 2n samples from such matched/paired design. However, in practice, circumstances such as RNA degradation, array failure, or insufficient resources could result in a subset of patients missing in either the tumor or matched normal biomarker profiles. For example, n1(patients have both the tumor and matched normal profiles, whereas n2 and n3 patients have only tumor and normal samples, respectively. Such incomplete or missing paired samples are also known as partially matched samples. Several methods have been developed to analyze partially matched data generated from the Gaussian distribution.1–4 Recently, Kuan and Huang5 and Yu et al.6 extended the approach to non-parametric setting, which does not require the Gaussian assumption. Specifically in Kuan and Huang,5 we introduced a simple and robust method for analyzing partially matched samples based on the weighted Z-test to combine the P-values computed using (1) paired sample tests (eg, paired t-test or Wilcoxon sign rank test) on the n1 matched pairs and (2) two-sample tests (eg, two-sample t-test or Mann–Whitney test) on the incomplete n2 and n3 pairs. The P-value pooling approach has been shown to achieve good operating characteristics compared to existing methods. As alluded earlier, matched/paired design is an appealing approach to avoid confounding. However, when a subset of samples has incomplete pairs in partially matched samples scenarios, this can result in unbalanced covariates between the tumor and normal groups. Nonetheless, the above-mentioned methods assume that the confounding factors among the n2 and n3 incomplete matched pairs are absent or negligible. If this assumption does not hold, the conclusions drawn from these methods are no longer valid. In this paper, we introduce an approach to adjust for potential confounders based on propensity score matching method in partially matched samples. Our paper is organized into five sections. In Section 2, we describe the proposed method, followed by Sections 3 and 4, which demonstrate the operating characteristics of the proposed approach in simulations and case study, respectively. We conclude with a discussion in Section 5.

Method

Let (Xi, Yi) be a matched pair for subject i, i = 1, …, n, where X and Y are the tumor and normal measurements, respectively. Without loss of generality, we assume that (X, Y) are complete matched pairs for i = 1, …, n1; Y’s are missing for i = n1 + 1, …, n1 + n2, and X’s are missing for i = n1 + n2 + 1, …, n1 + n2 + n3. That is, n1 patients have both the tumor and matched normal profiles, whereas n2 and n3 patients have only tumor or normal samples, respectively. Let = (Z1, …, Z) denote the p covariates for subject i, for instance, Z1 = age, Z2 = gender, etc. The first step is to create pseudo-pairs between the n2 and n3 incomplete pairs by matching the covariate information. We will use propensity score method to accomplish this step. To simplify the notation, we introduce subscript j to denote sample j, j = 1, …, n2 + n3 among the incomplete pairs, and let denote the corresponding covariate information. Let O denote the measurement for subject j. Note that O = X for j = 1, …, n2 and O = Y for j = n2 + 1, …, n2 + n3. We also let G denote the group indicator for sample j, ie, G = 1 and 2 for normal and tumor samples, respectively.

Propensity score method

The propensity score method, introduced by Rosenbaum and Rubin,7 is a popular approach in observational studies to create balance in multiple confounding covariates between the two groups. The propensity score is defined as There are several approaches for estimating e, including logistic regression and machine learning techniques such as boosted regression,8 classification trees (CART), and random forests. A comparison of these methods is provided in Lee et al.9 There are four main methods for removing confounding effects based on e, namely (1) propensity score matching, (2) stratification on propensity score, (3) covariate adjustment using propensity score, and (4) inverse probability weighting by propensity score. We refer the readers to Austin10 for a review on these different approaches. In this paper, we consider two approaches based on propensity scores to account for confounding effects. The first approach is covariate adjustment using propensity score via a linear model where ε ~ N(0, 1) if the biomarker measurements are approximately Gaussian distributed after appropriate normalization and transformation. Otherwise, one can use the generalized linear model11 with appropriate link function for non-Gaussian data. One can then evaluate if the expression of tumor is significantly different from normal by testing the hypothesis H0: β = 0. The second approach is based on Mahalanobis distance on covariate ranks with propensity score caliper12 for matching the covariates between the n2 tumor and n3 normal samples from incomplete pairs. Let be the vectors of covariate ranks for sample j. The Mahalanobis distance between sample j in the tumor group and sample k in the normal group is defined as where is the estimated pooled covariance matrix for the ranks. On the other hand, the propensity score caliper c is defined as the maximum propensity score distance between sample j and k allowed within a match. In other words, The choice of caliper width is related to bias–variance trade-off where small caliper width results in bias reduction but at the expense of increasing variance, and vice versa.13 A few studies have been conducted to investigate the optimal caliper width in propensity score matching, including the work of Austin13 and Wang et al.14 Based on these works and our own experience in propensity score matching, we recommend using caliper width equal to 0.2 of the standard deviation of the logit of the propensity score, which tends to have better performance, ie, where is the variance of logit of the propensity score in the Gth group. The samples are matched using the optimal full matching algorithm.15–17 Matching algorithm aims to group tumor and normal samples that have similar covariates, ie, small d. Optimal full matching subdivides the samples into collection of matched sets , where each set consists of a tumor with any number of normal samples or a normal sample with any number of tumors by minimizing the net discrepancy Σ, 15,17 The Olsen’s algorithm is used to create optimal matching (see Hansen15 and Hansen and Klopfer16 for details).

Test statistics for matched set

The tumor and normal samples within each matched set tend to be correlated since they have comparable baseline covariates.7,18 For one-to-one pairing, one usually uses paired sample t-test or Wilcoxon signed-rank test to test if the expression level of tumors is significantly different from the normal samples. However, in the full matching scenario, each tumor is paired with several normal samples and vice versa; thus, the paired sample t-test or Wilcoxon signed-rank test needs to be generalized to such one-to-many pairing. In this paper, we consider a generalization of the paired sample t-test under the scenario that the biomarker measurements are approximately Gaussian distributed. Following Rosner,19 a generalized paired sample t-test can be derived based on a one-way random effects ANOVA model given by where α is the overall within-pair mean difference between tumor and normal samples, is the random effect for the sth pairing, is the random error, and S is the total number of matched sets. In addition, σ = σ2 (1/m1 + 1/m2) where m1 and m2 are the number of tumors and normals in matched set s, respectively. The hypothesis for testing if the expression of tumor is different from normal samples translates into testing α = 0. The test statistic is given by where and σ2 is estimated using the usual unbiased estimator, whereas α and are estimated using numerical methods (see Rosner19 for details). For large samples, the P-value of can be obtained from the asymptotic distribution N(0, 1). For small samples, the P-value can be computed from the permutation test, by permuting the labels of tumor and normal samples within each matched set. Suppose there are m1 tumors and a total of N samples in matched set s, then the total number of possible permutations is . On the other hand, the generalized non-parametric test for one-to-many pairing can be carried out via the aligned rank test of Hodges and Lehmann20 if the data are non-Gaussian. We refer the readers to Hodges and Lehmann,20 and Heller et al.21 for additional details on implementing the aligned rank test.

P-values pooling

We follow the idea of our earlier work in Kuan and Huang5 to test if the biomarker is significantly up or down regulated in tumor compared to normal samples by pooling the P-values from the n1 complete and (n2, n3) incomplete pairs. The P-value for the n1 complete matched pairs is computed using either the paired sample t-test or Wilcoxon signed-rank test, denoted as p1. On the other hand, the P-value for the incomplete pairs p2 is computed based on the linear model using propensity score as covariate (equation (1) of Section 2.1), the generalized t-test, or aligned signed rank test (Section 2.2). The next step is to pool the two P-values by borrowing the idea of meta-analysis. Several methods are available for pooling P-values including the inverse normal and Fisher’s methods. In Kuan and Huang,5 we showed that pooling P-values based on weighted Z-test has good operating characteristics compared to other methods. The weighted Z-test for combining the P-values is based on transforming the P-values into Z-score Z = Φ−1(1 − p), k = 1, 2. The combined P-value by the weighted Z-test5,22 is given by where w’s are the corresponding weights. Although different choices of weights have been proposed in the literature, Kuan and Huang,5 Zaykin23 showed that setting the weights as the square root of the sample sizes works well in practice. Thus, we set and . In addition, pooling P-values is only meaningful if p1 and p2 are computed from one-sided hypothesis tests to avoid directional conflict. One can obtain a two-sided combined P-value as follows. Let p1 and p2 be the one-sided P-value for the same alternative (eg, “greater”) hypothesis, and pc be the combined one-sided P-value from equation (2). The two-sided P-value is given by

Simulation

We carry out simulation to evaluate the performance of propensity score method to adjust for potential confounders in partially matched samples. n paired sample measurements (X, Y) of a biomarker i for the tumor and matched normal group are generated from bivariate Gaussian distribution, where Z1 and Z2 are confounders, and μ and μ are the true mean expressions for tumor and normal groups, respectively. We consider (n1, n2, n3) = (70, 15, 15), (50, 25, 25), and (30, 35, 35) and set σ = σ = 1, whereas ρ ~ U(0, 1) to capture various degrees of correlation between tumor and normal matched pairs. In addition, we set μ = 0 and μ = 0, 0.1, 0.2, …, 0.5 for different effect sizes, and β = 0, 0.5, 1, and 2 for zero, moderate, strong, and very strong confounding effects. To simulate unbalanced confounders arising from incomplete matched pairs, we generate Z1, Z2 ~ N(0, 1) for i = 1, …, n1, Z1 ~ N(−0.2, 1), Z2 ~ N(0.2, 1) for i = n1 + 1, …, n1 + n2, and Z1 ~ (0.2, 1), Z2 ~ N(−0.2, 1) for i = n1 + n2 + 1, …, n1 + n2 + n3. We compare the performance of the following methods in our simulation studies: Gold standard (Gold-std): The P-value was computed from paired sample t-test on n1 + n2 + n3 original matched pairs assuming complete data set. This is the reference test. Paired only: The P-value was computed from paired sample t-test on the n1 complete matched pairs only, and discarding the n2 and n3 incomplete pairs. Two sample: Combining the P-value from paired sample t-test on the complete n1 matched pairs and the P-value from two-sample t-test on the incomplete n2 and n3 samples using the weighted Z-test approach.5 Propensity score with full matching (FM-PS): Combining the P-value from paired sample t-test on the complete n1 matched pairs and the P-value from generalized t-test on full matched data by Mahalanobis distance with propensity score caliper c = 0.2 on the incomplete n2 and n3 samples. Propensity score with regression adjustment (Reg-PS): Combining the P-value from paired sample t-test on the complete n1 matched pairs and the P-value from linear regression model using propensity score as covariate on the incomplete n2 and n3 samples.

Single biomarker

We first evaluate the performance of propensity score methods in adjusting for unbalanced covariates in single biomarker setting. Table 1 reports the average empirical Type I error at nominal α = 0.05 over 10,000 replications. When there is no confounding effect, ie, β = 0, all the methods control the Type I error. However, for β ≠ 0, the two-sample method exhibits the largest Type I error inflation. On the other hand, paired only, FM-PS, and Reg-PS methods control the Type I error under all the scenarios considered in the simulation. Figure 1 shows the average power for different combinations of n1, n2, n3, and β for methods that control the Type I error (empirical Type I error ≤ 0.055). As expected, missing samples in incomplete matched pairs reduce the power compared to complete data set (Gold-std). However, incorporating the incomplete matched pairs with proper adjustment for confounders via FM-PS or Reg-PS methods exhibit increased statistical power compared to using only the n1 paired samples when n2 and n3 are substantially large. When n2 and n3 are small relative to n1, using only the n1 paired samples is comparable to methods that incorporate n2 and n3. Both FM-PS and Reg-PS methods show comparable performance in this simulation study.

Table 1

Average empirical Type I error at nominal α = 0.05. Italicized values indicate that the empirical Type I error is greater than 0.055.

METHOD	β= 0	β= 0.5	β= 1	β= 2
n₁ = 70, n₂ = 15, n₃ = 15
Gold-std	0.0455	0.0473	0.0503	0.0501
Paired only	0.0479	0.0494	0.0490	0.0518
Two-sample	0.0475	0.0643	0.0794	0.0922
FM-PS	0.0462	0.0485	0.0469	0.0476
Reg-PS	0.0480	0.0502	0.0441	0.0442
n₁ = 50, n₂ = 25, n₃ = 25
Gold-std	0.0527	0.0541	0.0495	0.0541
Paired only	0.0486	0.0536	0.0534	0.0531
Two-sample	0.0476	0.1044	0.1460	0.1809
FM-PS	0.0483	0.0467	0.0465	0.0485
Reg-PS	0.0494	0.0476	0.0443	0.0416
n₁ = 30, n₂ = 35, n₃ = 35
Gold-std	0.0522	0.0543	0.0483	0.0489
Paired only	0.0507	0.0494	0.0499	0.0465
Two-sample	0.0522	0.1712	0.2805	0.3663
FM-PS	0.0449	0.0454	0.0422	0.0460
Reg-PS	0.0524	0.0479	0.0452	0.0446

Figure 1

Average power at nominal α = 0.05. Sample size (n1, n2, n3) and effect of confounding β are indicated in the header of each plot. Power curves for methods that did not control Type I error (ie, empirical Type I error > 0.055) are not shown.

Notes: ○: Gold-std, ∆: FM-PS, +: Reg-PS, ×: two-sample, and ⋄: paired only.

Multiple biomarkers

As omics data involve testing multiple biomarkers simultaneously (within a multiple hypothesis testing framework), we also simulate the observations from multiple biomarkers setting. We consider 1000 biomarkers and repeat each simulation setting over 100 replications. We use the false discovery rate (FDR) procedure of Benjamini and Hochberg24 to adjust for multiple hypothesis testing. Figure 2 reports the average empirical FDR for the different methods at nominal FDR = 0.05. Similar to the single biomarker case, two-sample method exhibits the largest inflated empirical FDR for β ≠ 0. On the other hand, FM-PS, Reg-PS, and paired only methods control the FDR across the different scenarios. Figure 3 shows the empirical false nondiscovery rate (FNR) for the methods under comparison. FNR is an analog of Type II error in multiple hypothesis testing settings, and is defined as the proportion of false negatives among the total number of non-rejection. Empirical FNR is large when the number of incomplete matched pairs is large. On the other hand, both FM-PS and Reg-PS methods result in lower FNR compared to paired only method.

Figure 2

Average empirical FDR at nominal FDR = 0.05. Sample size (n1, n2, n3) and effect of confounding β are indicated in the header of each plot.

Notes: ○: Gold-std, ∆: FM-PS, +: Reg-PS, ×: two-sample, and ⋄: paired only.

Figure 3

Average empirical FNR at nominal FDR = 0.05. Sample size (n1, n2, n3) and effect of confounding β are indicated in the header of each plot. FNR values for methods that did not control FDR (ie, empirical FDR > 0.055) are not shown.

Notes: ○: Gold-std, ∆: FM-PS, +: Reg-PS, ×: two-sample, and ⋄: paired only.

Case Study

We illustrate the proposed propensity score adjustment for partially matched samples on a publicly available DNA methylation data from Selamat et al.25 (downloaded from Gene Expression Omnibus (GEO) under accession number GSE32861). The data set consists of 58 matched pairs of lung adenocarcinoma and adjacent non-tumor lung tissue after removing paired sample 3023_T/N.25 Methylation for these samples was profiled using the Illumina HumanMethylation27 BeadChip, which covers 27,578 CpGs. We use a subset of baseline covariates measured for each sample (ie, age, smoking status, stage, recurrence, KRAS mutation, EGFR mutation, and LKB1 mutation) to illustrate the performance of the different methods. Age and stage are continuous and ordinal variables, respectively, whereas the other covariates are binary variables. We randomly choose n1 out of 58 matched pairs to be complete matched pairs. Next, we generate 58 − n1 indicator variables, ie, δ, k = 1, …, 58 − n1, where and after standardizing each covariate. This function generates approximately equal number of 0s and 1s on average. Among the remaining 58 – n1 pairs, we set n2 pairs to be missing in non-tumor lung tissue corresponding to those with δ = 1, and the remaining to be missing in lung adenocarcinoma. We consider n1 = 10, 20, 30, 40, and for each n1, the process is repeated 50 times. We apply FM-PS, Reg-PS, two-sample, and paired only methods on the logit-transformed methylation β-values, ie, log(β/(1 − β))26,27 of each CpG. Since CpGs that are truly differentially methylated between lung adenocarcinoma and non-tumor lung tissues are unknown in the case study, we use the results from paired sample t-test on the full 58 matched pairs as Gold-std. We define the true positive CpGs as the subset of CpGs that are significant at the Benjamini and Hochberg24 FDR of 0.05 for the Gold-std method. We compare the list of significant CpGs identified by FM-PS, Reg-PS, two-sample, and paired only methods at FDR = 0.05 to the true positive CpGs. In Table 2, we report the average empirical FDR, FNR, and average true positive (ATP) CpGs identified by each method. The ATP CpG is also the number of overlapping CpGs identified by each method and the Gold-std method. Two-sample method declares a larger number of false positives as indicated by the inflated empirical FDR. In this case study, the effect of confounding is moderate; thus, FM-PS, Reg-PS, and paired only methods are able to control the FDR. However, both FM-PS and Reg-PS methods have lower FNR and larger ATP compared to paired only method. This shows that the propensity score method for partially matched samples is able to adjust for confounders and improve the power of detecting differentially methylated CpGs.

Table 2

Average empirical FDR, FNR, and ATP at nominal FDR = 0.05.

METHOD	PAIRED ONLY	TWO-SAMPLE	FM-PS	Reg-PS
n₁ = 10, n₂ = 24, n₃ = 24
FDR	0.0304	0.0822	0.0285	0.0256
FNR	0.5439	0.2121	0.4649	0.4478
ATP	3457	13722	6908	7527
n₁ = 20, n₂ = 19, n₃ = 19
FDR	0.0322	0.0625	0.0256	0.0247
FNR	0.4443	0.1897	0.4053	0.3810
ATP	7690	14014	8941	9568
n₁ = 30, n₂ = 14, n₃ = 14
FDR	0.0352	0.0537	0.0393	0.0333
FNR	0.3512	0.1632	0.3241	0.2948
ATP	10422	14395	11183	11715
n₁ = 40, n₂ = 9, n₃ = 9
FDR	0.0354	0.0648	0.0404	0.0534
FNR	0.2404	0.1193	0.2269	0.1803
ATP	12955	15028	13201	13973

We carry out a gene ontology (GO) analysis to provide biological insights into the list of significant CpGs at FDR = 0.05 identified from the Gold-std method using the Bioconductor package topGO.28 We consider both the elim Fisher’s exact test (elim.Fisher) and elim Kolmogorov–Smirnov test (elim.KS) implemented in topGO based on Alexa et al.29 The elim method has been shown to improve interpretation of the GO analysis by integrating GO graph topology and iteratively removing genes that map to significant GO terms from a higher level GO terms.29 The P-values from each of the test are adjusted via the Benjamini–Hochberg method.24 Tables 3–5 report the GO terms corresponding to biological process (BP), molecular function (MF), and cellular component (CC) that exhibit adjusted P-values <0.05 by both the elim.Fisher and elim.KS test, respectively. For example, the BP GO analysis identifies a GO term related to positive regulation of ERK1 and ERK2 cascade, which has been shown to be implicated in lung adenocarcinomas.30,31

Table 3

Significant BP GO terms for the CpGs identified by the Gold-std method in the lung adenocarcinoma case study. The reported P-values are adjusted via the Benjamini–Hochberg FDR control.24

BIOLOGICAL PROCESS
GO ID	TERM	ANNOTATED	SIGNIFICANT	EXPECTED	elim.Fisher	elim.KS
GO:0007268	Synaptic transmission	1153	803	685.54	2.18e-07	2.54e-17
GO:0048704	Embryonic skeletal system morphogenesis	159	120	94.54	0.0122	1.94e-ll
GO:0031424	Keratinization	58	53	34.49	0.00019	2.97e-ll
GO:0007155	Cell adhesion	1577	1067	937.64	0.0122	7.27e-08
GO:0048265	Response to pain	54	46	32.11	0.0139	9.22e-08
GO:0009952	Anterior/posterior pattern specification	350	246	208.1	0.0283	1.21e-07
GO:0007156	Homophilic cell adhesion	177	133	105.24	0.0102	1.29e-07
GO:0007186	G-protein coupled receptor signaling pat.	1035	721	615.38	0.000349	6.24e-06
GO:0007193	Adenylate cyclase-inhibiting G-protein c.	91	73	54.11	0.0122	6.24e-06
GO:0007267	Cell-cell signaling	1923	1319	1143.36	0.0122	6.24e-06
GO:0070374	Positive regulation of ERK1 and ERK2 cas.	0.174	128	103.46	0.0201	3.26e-05
GO:0050911	Detection of chemical stimulus involved.	19	19	11.3	0.0161	4.15e-05
GO:0001755	Neural crest cell migration	80	65	47.57	0.0122	4.15e-05
GO:0023019	Signal transduction involved in regulati.	33	30	19.62	0.0201	4.15e-05
GO:0007204	Elevation of cytosolic calcium ion conce.	326	240	193.83	0.0322	9.08e-05
GO:0048484	Enteric nervous system development	30	28	17.84	0.0139	9.73e-05
GO:0030198	Eextracellular matrix organization	537	375	319.28	0.0102	9.73e-05
GO:0021527	Spinal cord association neuron different.	27	25	16.05	0.0322	0.000155
GO:0042742	Defense response to bacterium	209	150	124.27	0.0313	0.000266
GO:0006954	Inflammatory response	845	564	502.41	0.0217	0.000556
GO:0030855	Epithelial cell differentiation	905	611	538.09	0.0139	0.00112
GO:0045666	Positive regulation of neuron differenti.	112	86	66.59	0.0217	0.00185
GO:0030335	Positive regulation of cell migration	428	294	254.48	0.0139	0.00265
GO:0019233	Sensory perception of pain	137	105	81.46	0.0122	0.00301
GO:0048485	Sympathetic nervous system development	45	40	26.76	0.0122	0.00496
GO:0045165	Cell fate commitment	417	299	247.94	0.034	0.00999
GO:0007215	Glutamate receptor signaling pathway	88	75	52.32	0.0122	0.0181
GO:0021846	Cell proliferation in forebrain	43	38	25.57	0.0139	0.0194
GO:0007631	Feeding behavior	155	117	92.16	0.0122	0.0273

Table 5

Significant CC GO terms for the CpGs identified by the Gold-std method in the lung adenocarcinoma case study. The reported P-values are adjusted via the Benjamini–Hochberg FDR control.24

CELLULAR COMPONENT
GO ID	TERM	ANNOTATED	SIGNIFICANT	EXPECTED	elim.Fisher	elim.KS
GO:0005887	Integral to plasma membrane	2079	1443	1242.77	3.14e-14	3.64e-35
GO:0005576	Extracellular region	3134	2161	1873.41	1.35e-09	1.82e-27
GO:0005615	Extracellular space	1398	964	835.68	5.49e-ll	4.19e-25
GO:0005886	Plasma membrane	6289	4079	3759.38	2.08e-05	2.84e-14
GO:0005578	Proteinaceous extracellular matrix	547	402	326.98	1.32e-07	1.39e-13
GO:0016021	Integral to membrane	6898	4421	4123.42	0.000363	1.86e-ll
GO:0 04 5211	Postsynaptic membrane	294	209	175.74	0.00296	4.46e-10
GO:0008076	Voltage-gated potassium channel complex	117	92	69.94	0.0012	7.64e-10
GO:0030054	Cell junction	11 9 7	776	715.53	0.0105	2.57e-06
GO:0034774	Secretory granule lumen	105	79	62.77	0.0 419	0.00149

Discussion

Partially matched samples could give rise to unbalanced covariate distribution among the incomplete matched pairs in large-scale matched pair omics studies. This paper extends the P-value pooling method of Kuan and Huang5 to a framework based on propensity score for adjusting unbalanced covariate distribution among the incomplete matched pairs. We consider two approaches using propensity score, namely, (1) full matching followed by generalized t-test (FM-PS) and (2) propensity score as covariate in regression model (Reg-PS). Both methods are able to reduce the number of false positives by accounting for the confounders. Currently, we use the full matching approach based on Mahalanobis distance with propensity score calipers15–17 and the one-way random effects ANOVA model19 for deriving the generalized paired t-test. One can also use other matching algorithms based on propensity score.10 In this paper, we assume that the biomarker measurements are properly transformed such that they are approximately Gaussian distributed. If Gaussian assumption is violated, one can replace the generalized paired t-test with the generalized non-parametric aligned rank test of Hodges and Lehmann20 and Heller et al.21 in FM-PS method, and replace regular linear regression with generalized linear models.11 For instance, in our case study on DNA methylation, the analysis is carried out on the logit transformed beta values (also known as M values). An alternative approach is to analyze the untransformed beta values using beta regression in the Reg-PS method. The choice of analyzing DNA methylation data on either beta values or M values is an ongoing active research.32,33 Both the FM-PS and Reg-PS methods exhibit comparable performance in both our simulations and case study. In this paper, we assume a linear propensity score–outcome relationship that enables us to apply direct adjustment with a linear propensity score term in Reg-PS. In such cases, Reg-PS method is computationally more efficient and easier to implement compared to FM-PS method. However, if the propensity score–outcome relationship is non-linear, one will need to consider more complicated models, for instance, the generalized additive model (GAM) as proposed in Myers and Louis.34 In such cases, the FM-PS method may be a better alternative as this approach does not require specification of the propensity score–outcome relationship. Thus, we recommend that the users compare the results from both Reg-PS and FM-PS methods in practice.

Table 4

Significant MF GO terms for the CpGs identified by the Gold-std method in the lung adenocarcinoma case study. The reported P-values are adjusted via the Benjamini–Hochberg FDR control.24

MOLECULAR FUNCTION
GO ID	TERM	ANNOTATED	SIGNIFICANT	EXPECTED	elim.Fisher	elim.KS
GO:0004930	G-protein coupled receptor activity	741	566	440.86	2.51e-10	1.67e-22
GO:0004984	Olfactory receptor activity	67	62	39.86	7.72e-07	2.26e-15
GO:0005509	Calcium ion binding	977	654	581.27	0.000243	1.29e-13
GO:0043565	Sequence-specific DNA binding	1129	735	671.71	0.00888	5.34e-12
GO:0005201	Extracellular matrix structural constitu.	123	95	73.18	0.00634	1.81e-08
GO:0005234	Extracellular-glutamate-gated ion channe.	29	26	17.25	0.0283	6.5e-07
GO:0005125	Cytokine activity	332	234	197.53	0.00604	1.33e-05
GO:0005230	Extracellular ligand-gated ion channel a.	122	99	72.58	0.0147	3.79e-05
GO:0004890	GABA-A receptor activity	33	29	19.63	0.0283	6e-05
GO:0015269	Calcium-activated potassium channel acti.	27	25	16.06	0.0192	0.000132
GO:0004888	Transmembrane signaling receptor activit.	1333	979	793.08	0.0363	0.000591
GO:0005540	Hyaluronic acid binding	32	29	19.04	0.0179	0.00185
GO:0015293	Symporter activity	213	152	126.73	0.0216	0.00232
GO:0020037	Heme binding	201	144	119.59	0.0216	0.00558
GO:0005506	Iron ion binding	238	169	141.6	0.0192	0.00558
GO:0016918	Retinal binding	28	25	16.66	0.0363	0.0104
GO:0005242	Inward rectifier potassium channel activ.	37	32	22.01	0.0283	0.0126
GO:0015279	Store-operated calcium channel activity	16	16	9.52	0.0229	0.0212
GO:0008227	G-protein coupled amine receptor activit.	67	58	39.86	0.0216	0.0315

19 in total

1. A statistical framework for Illumina DNA methylation arrays.

Authors: Pei Fen Kuan; Sijian Wang; Xin Zhou; Haitao Chu
Journal: Bioinformatics Date: 2010-09-29 Impact factor: 6.937

2. Propensity score estimation with boosted regression for evaluating causal effects in observational studies.

Authors: Daniel F McCaffrey; Greg Ridgeway; Andrew R Morral
Journal: Psychol Methods Date: 2004-12

3. Matching methods for observational microarray studies.

Authors: Ruth Heller; Elisabetta Manduchi; Dylan S Small
Journal: Bioinformatics Date: 2008-12-19 Impact factor: 6.937

Review 4. Analysing and interpreting DNA methylation data.

Authors: Christoph Bock
Journal: Nat Rev Genet Date: 2012-10 Impact factor: 53.242

5. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.

Authors: Pan Du; Xiao Zhang; Chiang-Ching Huang; Nadereh Jafari; Warren A Kibbe; Lifang Hou; Simon M Lin
Journal: BMC Bioinformatics Date: 2010-11-30 Impact factor: 3.169

6. Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer.

Authors: Byung Soo Kim; Inyoung Kim; Sunho Lee; Sangcheol Kim; Sun Young Rha; Hyun Cheol Chung
Journal: Bioinformatics Date: 2004-09-16 Impact factor: 6.937

7. ERK1/2 mediates lung adenocarcinoma cell proliferation and autophagy induced by apelin-13.

Authors: Li Yang; Tao Su; Deguan Lv; Feng Xie; Wei Liu; Jiangang Cao; Irshad Ali Sheikh; Xuping Qin; Lanfang Li; Linxi Chen
Journal: Acta Biochim Biophys Sin (Shanghai) Date: 2013-12-29 Impact factor: 3.848

8. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies.

Authors: Peter C Austin
Journal: Pharm Stat Date: 2011 Mar-Apr Impact factor: 1.894

9. Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples.

Authors: Peter C Austin
Journal: Stat Med Date: 2011-02-21 Impact factor: 2.373

10. Genome-scale analysis of DNA methylation in lung adenocarcinoma and integration with mRNA expression.

Authors: Suhaida A Selamat; Brian S Chung; Luc Girard; Wei Zhang; Ying Zhang; Mihaela Campan; Kimberly D Siegmund; Michael N Koss; Jeffrey A Hagen; Wan L Lam; Stephen Lam; Adi F Gazdar; Ite A Laird-Offringa
Journal: Genome Res Date: 2012-05-21 Impact factor: 9.043

3 in total

1. A systematic review on metabolomics-based diagnostic biomarker discovery and validation in pancreatic cancer.

Authors: Nguyen Phuoc Long; Sang Jun Yoon; Nguyen Hoang Anh; Tran Diem Nghi; Dong Kyu Lim; Yu Jin Hong; Soon-Sun Hong; Sung Won Kwon
Journal: Metabolomics Date: 2018-08-10 Impact factor: 4.290

2. Optimal weighted two-sample t-test with partially paired data in a unified framework.

Authors: Xu Guo; Yan Wang; Niwen Zhou; Xuehu Zhu
Journal: J Appl Stat Date: 2020-04-20 Impact factor: 1.416

3. Circulating Plasma Biomarkers of Survival in Antifibrotic-Treated Patients With Idiopathic Pulmonary Fibrosis.

Authors: Ayodeji Adegunsoye; Shehabaldin Alqalyoobi; Angela Linderholm; Willis S Bowman; Cathryn T Lee; Janelle Vu Pugashetti; Nandini Sarma; Shwu-Fan Ma; Angela Haczku; Anne Sperling; Mary E Strek; Imre Noth; Justin M Oldham
Journal: Chest Date: 2020-05-22 Impact factor: 9.410

3 in total