Literature DB >> 34873245

Generalized partially functional linear model.

Weiwei Xiao¹, Yixuan Wang², Haiyan Liu^3,4.

Abstract

In this paper, a generalized partially functional linear regression model is proposed and the asymptotic property of the proposed estimated coefficients in the model is established. Extensive simulation experiment results are consistent with the theoretical result. Finally, two application examples of the model are given. One is sleep quality study where we studied the effects of heart rate, percentage of sleep time on total sleep in bed, wake after sleep onset and number of wakening during the night on sleep quality in 22 healthy people. The other one is mortality rate where we studied the effects of air quality index, temperature, relative humidity, GDP per capita and the number of beds per thousand people on the mortality rate across 80 major cities in China.

Entities: Chemical

Mesh：

Substances：
Air Pollutants

Year: 2021 PMID： 34873245 PMCID： PMC8648855 DOI： 10.1038/s41598-021-02896-7

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

Nowadays, more and more data, in various fields such as medicine, economics, biology and computer, are recorded in the form of curves or images. For these high-dimensional data, traditional multiple regression analysis is insufficient. Ramsay [23] proposed the concept of functional data. Functional data can be thought of as a real valued function defined on a compact interval. In other words, the data sampled can be represented as a function by interpolation or fitting. Indeed, functional data is different from scalar data because of its infinite dimension. Ramsay and Dalzell[24] proposed functional data analysis. Since then, functional data analysis has been very popular among researchers. Many statistical methods such as principal component analysis, functional regression, and cluster analysis have been developed and widely used. For details on functional data analysis, see monographs Ramasy and Sliverman[25], Ferraty and Vieu[10], Horvath and Kokoszka[14], Hsing and Eubank[15], and so on. With respect to functional regression, functional and scalar data can respectively appear in the regression model as predictors and responses. And estimation of regression coefficients is important in functional regression model. The case that response is scalar and predictors are functional is well-developed. For examples, Cardot et al.[5] studied two estimation methods of principal component estimation and penalty spline, Cai and Hall[3] used a principal components approach to estimate regression coefficients, Kou and Liu[19] used a wavelet basis to estimate regression coefficients, and so on. The case where the response is scalar with functional and scalar predictors can be referred to Shin[31] which used the least squares method to estimate regression parameters, Shin and Lee[32] and its regression parameter estimation is based on functional principal components regression (FPCR) and the alternative is functional ridge regression (FRR) based on Tikhonov regularization, Kong et al.[18] characterized the effects of regularization on the resulting estimators, Wang et al.[33] used a two-step estimation method based on functional effective dimension reduction, slice inverse regression and kernel estimation to give the estimation of partial functional linear models, and gave the convergence rate of the estimation, and so on. Generalized functional regression model has attracted more and more researchers’ interest. Müller[21] develop generalized functional regression but only include one functional predictor and proposed a functional estimating equation which is maximizing a functional quasi-likelihood. Goldsmith et al.[11] proposed the generalized linear mixed model and compared likelihood based and Bayesian estimation. However the response in their simulation study and application is continuous, and the corresponding asymptotic property of estimated parameters is not obtained. Crainiceanu et al.[8] introduced generalized multilevel functional linear models. They proposed and compared two methods for inference: a two-stage frequentist approach and a joint Bayesian analysis. They studied the generalized functional linear model in which the predictive variables are both functional and scalar and the response variables are scalar. However, they do not give corresponding results for the generalized functional linear model with binary or Poisson response variables. Ieva and Paganoni[16] proposed a generalized functional linear regression model for binary response and applied the model to the ECG signals. They studied the generalized functional linear model with binary response variables, but they did not give the corresponding asymptotic property of estimated parameters, and the predictive variables only have functional data in their application. For general response such as binary or Poisson the theory is not well-developed. In this paper, We establish generalized partially functional regression model which can deal with general response and multiple scalar and functional predictors and we develop quasi-likelihood method to estimate the regression coefficients. Mental health problem reduces quality of life, especially during this pandemic, and sleep quality can result in severe mental health problem. Therefore study of sleep quality has received extensive attention from researchers, see Zhang[36], Kadoya[17] and so on. Our work is motivated by a sleep quality study aiming to investigate the relationship between sleep quality which is binary and covariates which are functional and scalar. The Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset provides 24 hours of continuous beat-to-beat heart data, sleep quality, physical activity and psychological characteristics (i.e., anxiety status, stress events and emotions) for 22 healthy participants. Moreover, saliva bio-markers (i.e. cortisol and melatonin) and activity log were also provided in this dataset. In MMASH database, we have curves like heart rate, inter-beat intervals; we have scalar data like wake after sleep onset, number of awakening during the night, hormones concentrations in the saliva and so on, and we have binary response i.e. good or bad sleep quality. Our work is also motivated by another study in public health, i.e. to investigate the impact of air quality index (AQI), temperature, relative humidity (RH), GDP per capita and the number of beds per thousand people on the mortality rate across different cities in China. In particular, we are interested in studying the effect of AQI on mortality. AQI reflects the degree of air pollution and it is judged by the concentration of pollutants in the air, where the main pollutants in the air include PM2.5, PM10, carbon monoxide, nitrogen dioxide, sulfur dioxide, ozone and so on. The paper is organized as follows. The generalized partially functional linear model is proposed in Section “Generalized partially functional linear regression”. The estimation of the regression coefficients within the generalized functional linear model is discussed in Section “Estimation of ”. In Section “Asymptotic inference”, asymptotic normality of estimators is derived where the needed appropriate metrics are given. Simulation results are reported in Section “Simulation study”. Two real data examples, the sleep quality study in healthy people and the mortality rate study in major Chinese cities, are given in Section “Application”. We conclude in Section “Conclusion”. Proofs and other supplementary materials are followed in Appendix.

Generalized partially functional linear regression

The data we observe for the i-th subject are . We assume that these data are independent identically distributed (i.i.d.) copies of . For , the functional predictor is a random curve which is observed for subject i and corresponds to a square integrable stochastic process on a real interval , i.e. . And the scalar predictor vector is a q dimensional random vector. The dependent variable or response is a real-valued random variable which may be continuous or discrete (e.g. binary, count etc.). We assume there is a known link function which is a monotone and twice continuously differentiable function with bounded derivatives and is thus invertible. We further assume there is a variance function which is defined on the range of the link function and is strictly positive. We assume the following relation between and where is the intercept, is the regression function corresponding to functional predictor , and is the regression coefficient vector corresponding to the scalar predictor vector Z. Furthermore, is random variable from exponential family that satisfies with . Therefore, the generalized partially functional linear model is determined by parameter coefficient function , parameter coefficient vector , link function and variance function . Define linear predictors ,We have conditional mean and for a function . For simplicity, assume that both functional predictors X(t) and scalar predictors Z are centralized, i.e. and . In order to solve the problem of infinite dimension of functional data, we adopt functional principal component analysis method to reduce dimension. For functional predictor , by Karhunen-Loeve (KL) expansion and Mercer’s theorem, it can be expanded as:where is called functional principal component score and the functions are called functional principal components obtained via the covariance structure of . Notice that forms an orthonormal basis for the function space and clearly . Then parameter coefficients can be expanded aswith . Therefore, after plugging the above two expansions in to (1), we have:Notice that in (3), in order to solve the difficulty caused by the infinite dimension of the functional predictors, we truncate the predictors at , and the dimension increases asymptotically with .

Estimation of and

Denote parameter vectorThen the maximum likelihood estimator of can be obtained by solving the score equation:where , and We denote the MLEand therefore , , are the estimators of respectively. We introduce the following matrices:and vectors and . Then the score Eq. (4) can be rewritten asThis equation is usually solved iteratively by the method of integrated weighted least squares. Under our basic conditions in Section “Generalized partially functional linear regression”, is a fixed positive definite matrix.

Asymptotic inference

Generalized auto-covariance operator

Given an integrable kernel function , define the linear integral operator on the Hilbert space for byAn operator is compact self-adjoint Hilbert-Schmidt operator ifand can then be diagonalized. Integral operator of special interest is the auto-covariance operator of K with kerneland the generalized auto-covariance operator with kernelwhere is a non-increasing sequence of eigenvalues. Hilbert-Schmidt operators generate a metric in ,for , and given an arbitrary orthonormal basis , the Hilbert-Schmidt kernels R can be expressed asfor suitable coefficients . Thus, for each , the correlated Hilbert-Schmidt operator of produces a metric in :where is the estimator of , is a symmetric and positive definite matrix with elements to be the eigenvalues of generalized auto-covariance operator . For the sequence of truncated model (3), we havewhere is the estimator of , and here,We note that is a symmetric and positive definite matrix and that the inverse matrix exists. And . We note that higher-order oscillations associated with property contribute to the norm of the parameter functions , relative to the oscillations of processes , i.e.,The specific proof follows by combining the results of corollary 4.1 of Müller (2005). To derive the asymptotic property of , in addition to the basic assumptions in Section “Generalized partially functional linear regression” and usual conditions on variance and link functions, we require some technical conditions which restrict the growth of and the independence of and Z. The basic model assumption is as follows: The link function g is monotone, invertible and has two continuous bounded derivatives with , for a constant . The variance function has a continuous bounded derivative and there exists a such that . The fourth moment of is finite, i.e. . and are independent of each other. The number of truncated terms in the sequence of approximating truncated model (3) satisfies and as .

Remark 1

In the assumption (A1), because the link function g we defined is continuous, its first derivative is bounded. And its second derivative is bounded in order to make sure that the Hessian matrix is a meaningful existence when we prove corresponding asymptotic property. In the assumptions (A2) and (A3), the point is to simplify our proof, and we do not have to worry about the interactions between the predictors. As shown in the assumption (A4), the truncation goes to infinity, but in order for lemmas to hold and for the rate of convergence to be even faster, we have to to control the rate at which goes to infinity. Since d and q represent finite number of predictors, then .

Asymptotic convergence of and

Lemma 1

As , assumptions (A1)–(A4)

Remark 2

Lemma 1 essentially implies .

Lemma 2

Under the assumptions (A1)–(A4), we have thatwhere are matricesand are diagonal matrices.

Theorem 1

If the basic conditions in Section “Generalized partially functional linear regression” and assumptions (A1)–(A4) are satisfied, thenwhere and is a identity matrix.

Remark 3

For , each obeys a asymptotically normal distribution with the mean of and the variance of . For , each obeys a asymptotically normal distribution with the mean of 0 and the variance of . Because we assume that are independent of each other, the variance term in the jointly asymptotically normal distribution is a diagonal matrix.

Corollary 1

Denote the eigenvectors and eigenvalues of the matrix byand letwhere is an orthonormal basis, which is mentioned in Section “Generalized partially functional linear regression”. Then for large n and an approximate simultaneous confidence band is given bywhere . And when , then .

Simulation study

We consider the case with two functional predictors and three scalar predictors and the response is binary. For the functional predictors and , we first generate which satisfyandDefining orthonormal basis and which satisfyTherefore and satisfywhere are principal component base. Figure 1 plots part of the and .

Figure 1

Part of the and .

Part of the and . We simulate scalar predictors as follows: , and . We assume the true regression coefficients are generated aswhere . Therefore the binary response is generated as follows: We defineand choose logit linkThen we generate responseas pseudo-Bernoulli random variable sequence with probability p(X, Z), when , , otherwise . Therefore, we have a samplewhere n is the sample size. In the simulation study, in order to study the asymptotic property of our estimators we choose separately , and the number of functional principal components that explain 90% of cumulative variation contribution are , . We do 100 simulations. Table 1 shows the values of statistics McFadden’s pseudo (McFadden) and Maximum likelihood pseudo (r2ML) for different n. The simulation results show that the performance of the model gets better and better with the increase of sample size. This is consistent with our expectations.

Table 1

Statistics and statistical criteria under different sample sizes.

n	McFadden	r2ML
50	0.736	0.976
500	0.816	0.984
1000	0.925	0.996

Statistics and statistical criteria under different sample sizes. Then, in order to study the performance of and , we show the 95% confidence band under different sample sizes of the estimator and in Fig. 2. The red curves are true and , the black curves are the corresponding estimations and , the gray parts are the 95% confidence band. As we can see, as the sample size increases, the confidence band becomes narrower and narrower and the estimate (red line) gets closer to the theoretical true value (black line). Therefore, the simulation clearly indicates that the larger the sample size, the closer the estimated and true values.

Figure 2

In order to verify Theorem 1, we calculate the mean and variance of and . To be specific, the theoretical values of the mean and variance of are 3, 6 and 5, 10 respectively for . The theoretical values of the mean of are 0. Table 2 shows the sample mean and variance of and . From Table 2 we can see that the sample mean and sample variance of tend to 3, 6 for and 5, 10 for and the sample mean of tend to 0 as n increases which verifies Theorem 1. The simulation results show that the estimation of regression coefficient functions gets better and better as the sample size increases.

Table 2

The mean and variance of and in Theorem 1.

	n	Mean	Variance
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$nd_{G}^2 \left( {\hat{\beta }}_{1},\beta _{1}\right) $$\end{document}ndG2β^1,β1	50	4.15	7.95
	500	3.31	6.75
	1000	3.03	6.08
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$nd_{G}^2 \left( {\hat{\beta }}_{2},\beta _{2}\right) $$\end{document}ndG2β^2,β2	50	6.06	13.42
	500	5.51	11.49
	1000	5.06	10.63
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}(\gamma _{1}-{\hat{\gamma }}_{1})$$\end{document}n(γ1-γ^1)	50	0.19	1.26
	500	0.16	1.11
	1000	0.09	1.08
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}(\gamma _{2}-{\hat{\gamma }}_{2})$$\end{document}n(γ2-γ^2)	50	0.07	0.15
	500	0.04	0.13
	1000	0.03	0.11
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}(\gamma _{3}-{\hat{\gamma }}_{3})$$\end{document}n(γ3-γ^3)	50	0.10	5.01
	500	0.05	4.02
	1000	0.04	3.93

95% confidence band for the estimator and under different sample sizes. The red curves are theoretical and , the black curves are the corresponding estimation and it’s the result of a simulation, the gray parts are the 95% confidence band. The mean and variance of and in Theorem 1. Table 3 shows the estimated values and corresponding standard deviations of the under different sample sizes. The simulation results show that as the sample size increases, the standard deviation becomes smaller and smaller, and tends to the theoretical value, where the theoretical values of are 2,3,5.

Table 3

The estimated values and corresponding standard deviations in brackets of the estimator .

n	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\gamma _1}$$\end{document}γ1^	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\gamma _2}$$\end{document}γ2^	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\gamma _3}$$\end{document}γ3^
50	1.84 (0.16)	2.97 (0.06)	4.89 (0.34)
500	2.06 (0.04)	2.99 (0.02)	4.95 (0.09)
1000	2.01 (0.03)	2.99 (0.01)	5.03 (0.05)

The estimated values and corresponding standard deviations in brackets of the estimator . We use GCV to demonstrate the predictive accuracy of the estimators. When , the corresponding GCV values are 0.104, 0.005, 0.001 which imply that as the sample size increases the prediction becomes more accurate. For details of GCV, see Roozbeh et al.[27], Roozbeh et al.[28], Roozbeh[29], Amini and Roozbeh[1] for details.

Application

Sleep quality

Data on activity and sleep of healthy person from PhysioNet Databases were analyzed in this section. The data were collected and provided by BioBeats (biobeats.com)[34] in collaboration with researchers from the University of Pisa. Data were recorded on 22 healthy males in various aspects of their daily lives, such as cardiovascular responses, psychological perception, sleep quality, and exercise information. We investigated the effects of hourly heart rate (HR), percentage of sleep time on total sleep in bed (Efficiency), wake after sleep onset (WASO) and number of awakenings during the night (Number) on sleep quality. The volunteers’ Pittsburgh Sleep Quality Questionnaire index was used as response, which follows a Bernoulli distribution where good sleep quality is represented by 1 and bad sleep quality is represented by 0. In our study, 7 men had a bad sleep quality and 15 had a good sleep quality. Efficiency, WASO and Number were used as scalar predictors. The volunteers’ hourly heart rate was used as a functional predictor. We chose as link function. We chose the number of functional principal components that explain 80% of cumulative variation contribution and . By substituting the data into our model, the functional parameter coefficients and the non-functional parameter coefficients can be obtained. Moreover, with respect to the prediction accuracy, the GCV is obtained: is 0.358. The results of the estimation are shown in Table 4 and Fig. 3.

Table 4

Parameter coefficient estimation and significance levels.

	Estimate	SE	t value	Pr( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$> \|t\|$$\end{document}>\|t\|)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\gamma }}_{Efficiency}$$\end{document}γ^Efficiency	0.106	0.213	0.497	0.063	–
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\gamma }}_{WASO}$$\end{document}γ^WASO	0.059	0.063	0.938	0.037	–
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\gamma }}_{Number}$$\end{document}γ^Number	− 0.143	0.007	− 0.07	0.006	**
r2ML = 0.917

Figure 3

Regression coefficient function and its 95% confidence band.

Parameter coefficient estimation and significance levels. Regression coefficient function and its 95% confidence band. From the estimated non-functional parameter coefficient vector shown in Table 4, we can see that Efficiency and WASO are positively correlated with sleep quality. In other words, the greater the proportion of sleep time or the longer the first waking time after falling asleep, the better the corresponding sleep quality. On the contrary, Number is negatively correlated with sleep quality, that is, the more awakenings during the night, the poorer sleep quality. Figure 3 shows the estimated regression coefficient function and its 95% confidence band. We can tentatively conclude that heart rate is positively correlated with sleep quality between 8:00 and 11:00. On the contrast, heart rate is mainly negatively correlated with sleep quality at other time. So, proper exercise and hard work to raise your heart rate in the morning will help improve the quality of sleep at night. While, some people may think that doing some exercise at night will improve their sleep quality because they are tired, but this is exactly the opposite. Sajjadieh (2020) used the method of time domain spectral analysis and obtained that heart rate has a negative correlation with sleep quality. But we give more insights on the dynamic and temporal behaviour of the influence of heart rate on sleep quality.

Mortality

We have a dataset which includes daily temperature, daily relative humidity (RH), daily air quality index (AQI), GDP per capita, the number of beds per thousand people and mortality in 80 cities in China collected during 2019. A main goal of the study was to investigate the impact of temperature, RH, AQI, GDP per capita and the number of beds per thousand people on the mortality rate. We apply the proposed generalized partially functional linear model to the data for different cities. We have three functional predictors: daily AQI, daily temperature and daily RH from 1 January 2019 to 31 December 2019. The two scalar predictors are GDP per capita and the number of beds per thousand people. The response is the mortality rate for each city in 2019, which is documented in the statistical bulletin of each city. Mortality follows a Bernoulli distribution where the mortality rate greater than 6‰is considered high and represented by 1, otherwise the mortality rate is low and the mortality rate is represented by 0. In our study, there are 30 cities with high mortality and 50 with low mortality. Figure 4 shows daily AQI, temperature and RH in some cities in 2019.

Figure 4

Daily AQI, temperature and RH.

Daily AQI, temperature and RH. We chose as link function. We chose the number of functional principal components that explain 75% of cumulative variation contribution, i.e. as standard orthogonal basis. By substituting the data into our model, the regression coefficients of functional predictors and regression coefficients of scalar predictors are shown in Table 5 and Fig. 5. Moreover, the prediction accuracy is shown by the GCV with value 0.037.

Table 5

Regression coefficient estimation and significance levels.

	Estimate	SE	t value	Pr( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$> \|t\|$$\end{document}>\|t\|)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\gamma }}_{\text {GDP per capita}}$$\end{document}γ^GDP per capita	− 3.181e−06	1.659e−06	− 2.165	0.035	*
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\gamma }}_{\text {Number of beds}}$$\end{document}γ^Number of beds	− 3.048e−01	4.718e−02	− 1.917	0.061	–

Figure 5

Regression coefficient function and its 95% confidence band.

Regression coefficient estimation and significance levels. Regression coefficient function and its 95% confidence band. From the estimated regression coefficient vector shown in Table 5, we can see that GDP per capita and the number of beds per thousand people are negatively correlated with mortality rate. In other words, the greater GDP per capita or the number of beds per thousand people, the lower the mortality rate. Figure 5 shows the estimated regression coefficient functions and their 95% confidence band. From Fig. 5, we can conclude that AQI is positively correlated with mortality in winter, early spring and fall. In other words, the higher the AQI, i.e. the worse the air conditions, the higher the death rate. Kong[18] studied the effect of PM2.5 on mortality in US cities from 1 April 2000 to 31 August 2000. Unsurprisingly, in our study, the effect of AQI on mortality from April to August is concise with their results. For the effect of temperature on mortality, we can conclude that temperature is negatively correlated with mortality from March to May (in spring), and is positively correlated with mortality for the rest of the year. For the effect of RH on mortality, we can conclude that RH is positively correlated with mortality from August to November (in autumn), and is negatively correlated with mortality for the rest of the year. The effect of temperature and RH on mortality is very easy to use in traditional Chinese medicine (TCM) theory. In TCM, Yin and Yang are emphasized, and there is a saying that spring is born, summer is growing, autumn is converging and winter is stored. In spring, the gradually high temperature indicates that the climate is normal and Yang occurs normally. But if temperature is low in spring then Yang occurs abnormally which results in health problems. In autumn, Yang is converging, and both temperature and RH should gradually decrease. If the RH is still high, it shows that Yang cannot dive into the internal storage, floating over the outside, excessive consumption of human healthy.

Conclusion

In this paper, we propose a generalized functional linear regression model with multiple scalar and functional predictors. We develop maximum likelihood estimators for the regression coefficients. For the functional predictors, we adopt the method of functional principal component analysis to reduce their dimensions. We then propose the generalized auto-covariance operator, based on which an appropriate measure quantifies the difference between the estimators and their true values is established. The asymptotic joint distribution of estimated regression functions is proved. For the scalar predictors, we establish a distance between the estimated value and the true value, and prove the asymptotic property of the estimated regression coefficients. The model is applied to two examples: sleep quality study and mortality rate study, and the research results clearly show that the predictors in this model explain the responses well and reveal the influence of predictors on response. In sleep quality research, we find that Efficiency and WASO are positively correlated with sleep quality, and Number is negatively correlated with sleep quality. We also find that heart rate is positively correlated with sleep quality between 8:00 and 11:00. On the contrast, heart rate is mainly negatively correlated with sleep quality at other time. In mortality rate research, we conclude that GDP per capita and the number of beds per thousand people are negatively correlated with mortality rate. AQI is positively correlated with mortality in winter, early spring and fall. Temperature is negatively correlated with mortality from March to May (in spring). Relative humidity is positively correlated with mortality from August to November (in autumn). The application of generalized partially functional linear model has been further extended, which lays a foundation for further research on the generalized partial-function linear model of unknown link function and variance function, predictors with interactions and variable selection with high-dimensional predictors. Supplementary Information.

7 in total

1. Generalized Multilevel Functional Regression.

Authors: Ciprian M Crainiceanu; Ana-Maria Staicu; Chong-Zhi Di
Journal: J Am Stat Assoc Date: 2009-12-01 Impact factor: 5.033

2. Longitudinal Penalized Functional Regression for Cognitive Outcomes on Neuronal Tract Measurements.

Authors: Jeff Goldsmith; Ciprian M Crainiceanu; Brian Caffo; Daniel Reich
Journal: J R Stat Soc Ser C Appl Stat Date: 2012-01-05 Impact factor: 1.864

3. Risk prediction for myocardial infarction via generalized functional regression models.

Authors: Francesca Ieva; Anna M Paganoni
Journal: Stat Methods Med Res Date: 2013-07-18 Impact factor: 3.021

4. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research.

Authors: D J Buysse; C F Reynolds; T H Monk; S R Berman; D J Kupfer
Journal: Psychiatry Res Date: 1989-05 Impact factor: 3.222

5. Serum Macro TSH Level is Associated with Sleep Quality in Patients with Cardiovascular Risks - HSCAA Study.

Authors: Manabu Kadoya; Sachie Koyama; Akiko Morimoto; Akio Miyoshi; Miki Kakutani; Kae Hamamoto; Masafumi Kurajoh; Takuhito Shoji; Yuji Moriwaki; Masahiro Koshiba; Tetsuya Yamamoto; Masaaki Inaba; Mitsuyoshi Namba; Hidenori Koyama
Journal: Sci Rep Date: 2017-03-13 Impact factor: 4.379

6. Prevalence and socio-demographic correlates of poor sleep quality among older adults in Hebei province, China.

Authors: Yun-Shu Zhang; Yu Jin; Wen-Wang Rao; Yuan-Yuan Jiang; Li-Jun Cui; Jian-Feng Li; Lin Li; Gabor S Ungvari; Chee H Ng; Ke-Qing Li; Yu-Tao Xiang
Journal: Sci Rep Date: 2020-07-23 Impact factor: 4.379

7. The Association of Sleep Duration and Quality with Heart Rate Variability and Blood Pressure.

Authors: Amirreza Sajjadieh; Ali Shahsavari; Ali Safaei; Thomas Penzel; Christoph Schoebel; Ingo Fietze; Nafiseh Mozafarian; Babak Amra; Roya Kelishadi
Journal: Tanaffos Date: 2020-11

7 in total