Literature DB >> 30137866

Bahadur representations of M-estimators and their applications in general linear models.

Abstract

Consider the linear regression model yi=xiTβ+ei,i=1,2,…,n, where ei=g(…,εi-1,εi) are general dependence errors. The Bahadur representations of M-estimators of the parameter β are given, by which asymptotically the theory of M-estimation in linear regression models is unified. As applications, the normal distributions and the rates of strong convergence are investigated, while {εi,i∈Z} are m-dependent, and the martingale difference and (ε,ψ) -weakly dependent.

Entities: Chemical Disease Gene Species

Keywords: Bahadur representation; Linear regression models; M-estimate; Normal distribution; Rate of strong convergence

Year: 2018 PMID： 30137866 PMCID： PMC5978921 DOI： 10.1186/s13660-018-1715-x

Source DB: PubMed Journal: J Inequal Appl ISSN： 1025-5834 Impact factor: 2.491

Introduction

Consider the following linear regression model: where is an unknown parametric vector, denotes the ith row of an design matrix X, and are stationary dependence errors with a common distribution. An M-estimate of β is defined as any value of β minimizing for a suitable choice of the function ρ, or any solution for β of the estimating equation for a suitable choice of ψ. There is a body of statistical literature dealing with linear regression models with independent and identically distributed (i.i.d.) random errors, see e.g. Babu [1], Bai et al. [2], Chen [7], Chen and Zhao [8], He and Shao [24], Gervini and Yohai [23], Huber and Ronchetti [28], Xiong and Joseph [50], Salibian-Barrera et al. [44]. Recently, linear regression models with serially correlated errors have attracted increasing attention from statisticians; see, for example, Li [33], Wu [49], Maller [38], Pere [41], Hu [25, 26]. Over the last 40 years, M-estimators in linear regression models have been investigated by many authors. Let be i.i.d. random variables. Koul [30] discussed the asymptotic behavior of a class of M-estimators in the model (1.1) with long range dependence errors . Wu [49] and Zhou and Shao [52] discussed the model (1.1) with and derived strong Bahadur representations of M-estimators and a central limit theorem. Zhou and Wu [53] considered the model (1.1) with , and obtained some asymptotic results including consistency of robust estimates. Fan et al. [20] investigated the model (1.1) with the errors and established the moderate deviations and strong Bahadur representations for M-estimators. Wu [47] discussed strong consistency of an M-estimator in the model (1.1) for negatively associated samples. Fan [19] considered the model (1.1) with φ-mixing errors, and the moderate deviations for the M-estimators. In addition, Berlinet et al. [4], Boente and Fraiman [5], Chen et al. [6], Cheng et al. [9], Gannaz [22], Lô and Ronchetti [37], Valdora and Yohai [45] and Yang [51] have also studied some asymptotic properties of M-estimators in nonlinear models. However, no people have investigated a unified the theory of M-estimation in linear regression models with more general errors. In this paper, we assume that where is a measurable function such that is a proper random variable, and (where Z is the set of integers) are very general random variables, including m-dependent, martingale difference, -weakly dependent, and so on. We try to investigate the unified the theory of M-estimation in the linear regression model. In the article, we use the idea of Wu [49] to study the Bahadur representative of M-estimator, and we extend some results to general errors. The paper is organized as follows. In Sect. 2, the weak and strong linear representation of an M-estimate of the vector regression parameter β in the model (1.1) are presented. Section 3 contains some applications of our results, including the m-dependent, -weakly dependent, martingale difference. In Sect. 4, proofs of the main results are given.

Main results

In the section, we investigate the weak and strong linear representation of an M-estimate of the vector regression parameter β in the model (1.1). Without loss of generality, we assume that the true parameter . We start with some notation and assumptions. For a vector , let . A random vector V is said to be in , if . Let , , and assume that is positive definite for large enough n. Let . Then the model (1.1) can be written as with , where is an identity matrix of order p. Assume that ρ has derivative ψ. For and a function f, write if f has derivatives up to lth order and is continuous. Define the function where , let be an i.i.d. copy of , and . Throughout the paper, we use the following assumptions. is a convex function, . has a strictly positive derivative at . is continuous at . . There exists a such that Let . For some and

Remark 1

Conditions (A1)–(A5) and (A6) are imposed in the M-estimation considering the theory of linear regression models with dependent errors (Wu [49]; Zhou and Shao [52]). Condition (2.6) is similar to (7) of Wu [49]. measures the difference of the contribution of and its copy in predicting . However, measures the contribution of in predicting under the given copy of : . If are i.i.d., then (A6) and (A7) hold. For the other settings, (A6) and (A7) are very easily satisfied. The following proposition provides some sufficient conditions for (A6) and (A7).

Proposition 2.1

Let and be the conditional distribution and density function of at u given , respectively. Let and be the density function of and , respectively. Let , and . If , then (A6) holds. Let and . If and , then assumption (A7) holds.

Proof

(1) By the conditions of (1), we have Namely (A6) holds. (2) (A7) follows from and Hence, the proposition is proved. □ Define the M-processes where

Theorem 2.1

Let be a sequence of positive numbers such that and . If (A1)–(A5), and (A6) and (A7) with hold, then where

Corollary 2.1

Assume that (A1)–(A5), and (A6) and (A7) with hold. If as , , then, for , Moreover, if, as , for some , then

Remark 2

If i.i.d., then follows from (3.2) of Rao and Zhao [42]. If i.i.d., then follows from Theorem 1 of Wu [49] and Zhou and Shao [52]. If , where the function satisfies some condition and i.i.d., then follows from Theorem 2.2 of Fan et al. [20]. If NA, then follows from Theorem 1 of Wu [47]. Therefore the condition is not strong. In the paper, we do not discuss it.

Theorem 2.2

Assume that (A1)–(A3), (A5), and (A6) and (A7) with hold. Let be the minimum eigenvalue of , , , and . If and , then where , and

Corollary 2.2

Assume that and as , and . Under the conditions of Theorem 2.2, we have: where is the minimizer of (1.2). ; ,

Remark 3

From the above results, we easily obtain the corresponding conclusions of Wu [49]. From the corollary below, we only derive convergence rates of . However, it is to be regretted that we cannot give laws of the iterated logarithm , which is still an open problem.

Corollary 2.3

Under the conditions of Corollary 2.2, we have Note that and as ; we have and By Corollary 2.2, we have and Thus the conclusion follows from (2.11) and (2.12). □

Applications

In the following three subsections, we shall investigate some applications of our results. In Sect. 3.1, we consider that is a m-dependent random variable sequence. We shall investigate that are -weakly dependent in Sect. 3.2, and martingale difference errors in Sect. 3.3.

m-dependent process

In the subsection, we shall firstly show that the m-dependent sequence satisfies conditions (A6) and (A7) and secondly obtain the asymptotic normal distribution and strong convergence rates for M-estimators of the parameter. Koul [30] discussed the asymptotic behavior of a class of M-estimators in the model (1.1) with long range dependence errors , where i.i.d. Here we assume that is a m-dependent sequence, of which the definition was given by Example 2.8.1 in Lehmann [32]. For m-dependent sequences or processes, there are some results (e.g., see Hu et al. [27], Romano and Wolf [43] and Valk [46]).

Proposition 3.1

Let in (1.4) be a m-dependent sequence. Then (A6) and (A7) hold.

Proof

Note that is a m-dependent sequence, we have and Therefore, (A6) and (A7) follow from (3.1), (3.2) and . □

Corollary 3.1

Assume that (A1)–(A5) hold. If and for some as , and , then In order to prove Corollary 3.1, we give the following lemmas.

Lemma 3.1

(Lehmann [32]) Let be a stationary m-dependent sequence of random variables with and , and . Then where . Using the argument of Lemma 3.1, we easily obtain the following result. Here we omit the proof.

Lemma 3.2

Let be a stationary m-dependent sequence of random variables with and , and . Then where .

Proof of Corollary 3.1

By (2.10), we have Since is a stationary m-dependent sequence, so is . Let , . Then and Therefore, by and , we have Thus the corollary follows from Lemma 3.2, (3.3) and (3.4). □

Corollary 3.2

Assume that (A1)–(A5) hold. If and as , and , then The corollary follows from Proposition 3.1 and Corollary 2.2. □

-weakly dependent process

In the subsection, we assume that are -weakly dependent (Doukhan and Louhichi [14] and Dedecker et al. [11]) random variables. In 1999, Doukhan and Louhichi proposed a new idea of -weakly dependence which focuses on covariance rather than the total variation distance between joint distributions and the product of the corresponding marginal. It has been shown that this concept is more general than mixing and includes, under natural conditions on the process parameters, essentially all classes of processes of interest in statistics. Therefore, many researchers are interested in the -weakly dependent and related possesses, and one obtained lots of sharp results. For example, Doukhan and Louhichi [14], Dedecker and Doukhan [10], Dedecker and Prieur [12], Doukhan and Neumann [16], Doukhan and Wintenberger [17], Bardet et al. [3], Doukhan and Wintenberger [18], Doukhan et al. [13]. However, a few people (only Hwang and Shin [29], Nze et al. [40]) investigated regression models with -weakly dependent errors. Nobody has investigated a robust estimate for the regression model with -weakly dependent errors. To give the definition of the -weakly dependence, let us consider a process with values in a Banach space . For , , we define the Lipschitz modulus of h, where we have the -norm, i.e., .

Definition 1

(Doukhan and Louhich [14]) A process with values in is called a -weakly dependent process if, for some classes of functions : as . According to the definition, mixing sequences (-mixing), associated sequences (positively or negatively associated), Gaussian sequences, Bernoulli shifts and Markovian models or time series bootstrap processes with discrete innovations are -weakly dependent (Doukhan et al. [15]). From now on, assume that the classes of functions contain functions bounded by 1. Distinct functions Ψ yield and a λ weak dependence of the coefficients as follows (Doukhan et al. [15]): In Corollary 3.3, we only consider λ and η-weakly dependence. Let be λ or η-weakly dependent, and assume that g satisfies: for each , if satisfy for each index

Lemma 3.3

(Dedecker et al. [11]) Assume that g satisfies the condition (3.7) with and some sequence such that . Assume that with for some . Then: If the process is λ-weakly dependent with coefficients , then is λ-weakly dependent with coefficients If the process is η-weakly dependent with coefficients , then is η-weakly dependent and there exists a constant such that

Lemma 3.4

(Bardet et al. [3]) Let be a sequence of -valued random variables. Assume that there exists some constant such that . Let h be a function from to R such that and for , there exist a in and such that Now we define the sequence by . Then: (1) If the process is λ-weakly dependent with coefficients , then is also with coefficients (2) If the process is ζ-weakly dependent with coefficients , so is with coefficients .

Lemma 3.5

(Dedecker et al. [11]) Let be a centered and stationary real-valued sequence with , , and . If for , then as .

Corollary 3.3

Let be λ-weakly dependent with coefficients for some , and for some . Assume that , and, for , there exists a constant such that Under the conditions of Corollary 2.1, we have where . Note that is λ-weakly dependent. By Lemma 3.3, we find that is λ-weakly dependent with coefficients from (3.8) and Proposition 3.1 in Chap. 3 (Dedecker et al. [11]). Let , , and . Then . Choose , in (3.9), and by (3.11), we have for and . Therefore, by Lemma 3.4, is λ-weakly dependent with coefficients By Corollary 2.1, we have By (3.13) and (3.15), there exist and for some such that for enough large r and with . By Lemma 3.5 and (3.16)–(3.17), we have where . Using the Cramer device, we complete the proof of Corollary 3.3. □

Lemma 3.6

(Dedecker et al. [11]) Suppose that are stationary real-valued random variables with and for all . Let be one of the following functions: for some . We assume that there exist constants and a nonincreasing sequence of real coefficients such that, for all u-tuples and all v-tuples with the following inequality is fulfilled: where Let and . If , then

Corollary 3.4

Let be η-weakly dependent with coefficients for some , and for some . Assume that and (3.11) hold. Under the conditions of Corollary 2.2 with replaced by , and , we have: for ; for . Let . Then for as Therefore, there exists some such that Similar to the proofs of (3.13) and (3.15), we easily obtain where By (3.24) and (3.25), we have Let and Thus (3.19) holds. Since , there exist and for some and such that Thus By Lemma 3.6 and Corollary 2.3, we have Therefore, by Corollary 2.3, (3.23) and (3.31), we complete the proof of Corollary 3.4. □

Linear martingale difference processes

In the subsection, we will investigate martingale difference errors . We shall provide some sufficient conditions for (A6) and (A7) and give the central limit theorem and strong convergence rates. Let be a martingale difference sequence, and be real numbers such that exists. It is well known that the theory of martingales provides a natural unified method for dealing with limit theorems. Under its influence, there is great interest in the martingale difference. Liang and Jing [34] were concerned with the partial linear model under the linear com of martingale differences and obtained asymptotic normality of the least squares estimator of the parameter. Nelson [39] has given conditions for the pointwise consistency of weighted least squares estimators from multivariate regression models with martingale difference errors. Lai [31] investigated stochastic regression models with martingale difference sequence errors and obtained strong consistency and asymptotic normality of the least squares estimate of the parameter. Let be the distribution function of and let be its density.

Proposition 3.2

Suppose that , and , where . If , then and . Let , and where . By the Schwartz inequality, we have Note that and Let . By the Schwartz inequality, we have By and Chatterji’s inequality (Lin and Bai [35]), we have By (3.33)–(3.37) and the Schwartz inequality, we have Note that implies and , and by (3.33) and (3.39), we have The general case similarly follows. Similar to the proof of (3.39), we easily prove the other results. □ From Propositions 2.1 and 3.2, (A6) and (A7) hold. Hence, we can obtain the following two corollaries from Corollaries 2.1 and 2.2. In order to prove the following two corollaries, we first give some lemmas.

Lemma 3.7

(Liptser and Shiryayev [36]) Let be a strictly stationary sequence on a probability space , and be a σ-algebra of invariant sets of the sequence ξ and . For a certain , let and , where . Then where the random variable Z has the characteristic function , and .

Corollary 3.5

Assume that (A1)–(A5) hold, and for some as , . Under the conditions of Proposition 3.2, and , we have where the random variable Z has the characteristic function , and . By Proposition 2.1, Proposition 3.2 and Corollary 2.1, we have By and , we have and By Proposition 2.1, Proposition 3.2 and Corollary 2.2, we easily obtain the following result. Here we omit the proof. □

Corollary 3.6

Assume that (A1)–(A5) hold, and as , . Under the conditions of Proposition 3.2, we have

Proofs of the main results

For the proofs of Theorem 2.1 and Theorem 2.2, we need some lemmas as follows.

Lemma 4.1

(Freedman [21]) Let τ be a stopping time, and K a positive real number. Suppose that , where are measurable random variables and . Then, for all positive real numbers a and b,

Lemma 4.2

Let Assume that (A5) and (A6) hold. Then Note that , and , we have . For any positive sequence , let and By the monotonicity of ψ and , we have By (4.3), the -inequality and (A3), we have Thus By the Chebyshev inequality, Similarly, Let . For , define Since , it suffices to prove that Lemma 4.2 holds with replaced by . Let and Note that By (4.9), for large enough n, we have Let the projections . Since Note that are bound martingale differences. By Lemma 4.1 and (4.10), for , we have Let and . Then , where the symbol # denotes the number of elements of the set . It is easy to show By (4.12) and (4.13), for , we have By (4.5), (4.6) and (4.14), we have For a, let and . For a vector , let . By (A5), for and large n, we have Let . By condition (A5), the Markov inequality and , we have Note that , which implies . Thus Without loss of generality, assume that in the following proof. Let . Then and . Since ψ is nondecreasing, Note that Namely Therefore By (4.17) and (4.18), we have Note that , (4.2) immediately follows from (4.15) and (4.19). □

Lemma 4.3

Assume that the processes . Let . Then where . Since we have By the Jensen inequality, we have By (4.21) and (4.22), we have That is, Note that and By (4.24), (4.25) and the Jensen inequality, we have □

Remark 4

If i.i.d., then . In this case, the above lemma becomes Theorem 1 of Wu [48].

Lemma 4.4

Let be a sequence of positive numbers such that and . If (A6)–(A7) hold, then where Let be a nonempty set and , and , with vector . Write In the following, we will prove that uniformly over . In fact, let and Then , and are martingale differences. By the orthogonality of martingale differences and the stationarity of , and Lemma 4.3, we have By Lemma 4.3, and the -inequality, for , we have where Note that , we have By the conditions (A6), (A7) and (4.29)–(4.31), we have Let . By . Note that and . By (4.28), we have Since the result (4.27) follows from (4.32) and (4.33). □

Lemma 4.5

Let be a sequence of bounded positive numbers, and let there exist a constant such that holds for all large n. And let and . Assume that (A5) and hold. Then as where . Let and Since and . By the argument of Lemma 4.2 and the Borel–Cantelli lemma, we have Similar to the proof of (4.12), we have Let and . Then . By (4.34) and (4.35), for , we have Therefore, Since and in (4.17) can be replaced by , and the lemma follows from . □

Lemma 4.6

Let be a sequence of bounded positive numbers, and let there exist a constant such that and hold for all large n. And let . Assume that (A6), (A7) and hold. Then and, as , for any , where . Let , and Note that It is easy to see that the argument in the proof of Lemma 4.4 implies that there exists a positive constant such that holds uniformly over . Therefore (4.38) holds. Let , where For a positive integer , write its dyadic expansion , where , and . By the Schwartz inequality, we have Thus Since and , (4.42) implies that By the Borel–Cantelli lemma, (4.39) follows from (4.46). □

Lemma 4.7

Under the conditions of Theorem 2.2, we have: ; for and , where . Observe that . Since , (1) follows from Lemma 4.5 and 4.6. □ As with the argument in (4.29), we have .

Proof of Theorem 2.1

Observe that By (4.47), Lemma 4.2 and Lemma 4.4, we have This completes the proof of Theorem 2.1. □

Proof of Corollary 2.1

Take an arbitrary sequence , which satisfies the assumption of Theorem 2.1. Note that and for . By Theorem 2.1 and (4.49), we have By (4.50) and (4.51), we have By (4.52), as , and , we have and Namely By for some , we have Then it follows from (4.53) and (4.54) that for any , which implies □

Proof of Theorem 2.2

By Lemma 4.7, we have Theorem 2.2. □

Proof of Corollary 2.2

(1) By Lemma 4.7, we have where . Let and Note that By (4.57)–(4.60), we have It is easy to show that . By , we have By as , we have . Thus By the convexity of the function , we have Therefore the minimizer satisfies . (2) Let . By a Taylor expansion, we have Therefore (2) follows from Theorem 2.2 and (1). □

1 in total

1. Nonlinear system theory: another look at dependence.

Authors: Wei Biao Wu
Journal: Proc Natl Acad Sci U S A Date: 2005-09-22 Impact factor: 11.205

1 in total