Héctor Echavarría-Heras1, Cecilia Leal-Ramírez1, Enrique Villa-Diharce2, Abelardo Montesinos-López3. 1. Centro de Investigación Científica y de Estudios Superiores de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Código Postal 22860, Apdo. Postal 360 Ensenada, B.C., Mexico. 2. Centro de Investigación en Matemáticas, A.C. Jalisco s/n, Mineral Valenciana, Guanajuato Gto., Código Postal 36240, Mexico. 3. Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430 Guadalajara, Jalisco, Mexico.
Abstract
Conservation of eelgrass relies on transplants and evaluation of success depends on nondestructive measurements of average leaf biomass in shoots among other variables. Allometric proxies offer a convenient way to assessments. Identifying surrogates via log transformation and linear regression can set biased results. Views conceive this approach to be meaningful, asserting that curvature in geometrical space explains bias. Inappropriateness of correction factor of retransformation bias could also explain inconsistencies. Accounting for nonlinearity of the log transformed response relied on a generalized allometric model. Scaling parameters depend continuously on the descriptor. Joining correction factor is conceived as the partial sum of series expansion of mean retransformed residuals leading to highest reproducibility strength. Fits of particular characterizations of the generalized curvature model conveyed outstanding reproducibility of average eelgrass leaf biomass in shoots. Although nonlinear heteroscedastic regression resulted also to be suitable, only log transformation approaches can unmask a size related differentiation in growth form of the leaf. Generally, whenever structure of regression error is undetermined, choosing a suitable form of retransformation correction factor becomes elusive. Compared to customary nonparametric characterizations of this correction factor, present form proved more efficient. We expect that offered generalized allometric model along with proposed correction factor form provides a suitable analytical arrangement for the general settings of allometric examination.
Conservation of eelgrass relies on transplants and evaluation of success depends on nondestructive measurements of average leaf biomass in shoots among other variables. Allometric proxies offer a convenient way to assessments. Identifying surrogates via log transformation and linear regression can set biased results. Views conceive this approach to be meaningful, asserting that curvature in geometrical space explains bias. Inappropriateness of correction factor of retransformation bias could also explain inconsistencies. Accounting for nonlinearity of the log transformed response relied on a generalized allometric model. Scaling parameters depend continuously on the descriptor. Joining correction factor is conceived as the partial sum of series expansion of mean retransformed residuals leading to highest reproducibility strength. Fits of particular characterizations of the generalized curvature model conveyed outstanding reproducibility of average eelgrass leaf biomass in shoots. Although nonlinear heteroscedastic regression resulted also to be suitable, only log transformation approaches can unmask a size related differentiation in growth form of the leaf. Generally, whenever structure of regression error is undetermined, choosing a suitable form of retransformation correction factor becomes elusive. Compared to customary nonparametric characterizations of this correction factor, present form proved more efficient. We expect that offered generalized allometric model along with proposed correction factor form provides a suitable analytical arrangement for the general settings of allometric examination.
The model of relative growth of Huxley [1] is formally stated by means of a scaling relationship of the formwhere w and a are measurable traits and the parameter α is designated as the allometric exponent, while β is identified as the normalization constant. This model, also termed equation of simple allometry, has been extensively used in research problems in biology [1-5], physics [6], economics [7], earth sciences [8], resource management, and conservation [9, 10], among other fields.Eelgrass provides nursery for waterfowl and fish species. By trapping sediment and stumping wave energy, this seagrass promotes shoreline stabilization. Eelgrass services also include nutrient recycling, water filtration, and carbon dioxide removal. Current anthropogenic influences threaten eelgrass permanence. Conservation efforts rely on plot transplanting in a fundamental way. Monitoring effectiveness depends on measurements of standing stock and productivity through time. This makes the assessment of average leaf biomass in shoots a necessary input. But traditional estimation of eelgrass leaf biomass units relies on destructive methods. This could alter shoot density in a developing transplant. Thus evaluation renders indirect assessment methods necessary [10, 11]. Results show that an allometric scaling of the form of (1) for eelgrass leaf biomass w and associated area a is consistent [11]. Derived projections of individual leaf biomass convey useful surrogates for mean leaf biomass in shoots. Moreover, estimates of the parameters α and β are invariant within a given geographical region [10-12]. Hence, estimates fitted at site can endow suitable projections of leaf biomass values currently observed in other places of the region. This bears the referred allometric projections of a convenient nondestructive feature ([11, 13]).Simplicity of (1) makes allometric projection of eelgrass leaf biomass units convenient. But there are caveats on dependability. For instance, even though α and β are invariant, environmental influences can induce a relative extent of variability on local estimates ([12, 14]). Besides, the response in the power function-like scaling of (1) is very sensitive to variation of parameter estimates. Then, accuracy of derived proxies is subject of error propagation of estimates. In addition, there are factors of biological scaling which can influence precision of estimates (e.g., [10, 11, 13, 15]). Packard [16] questioned results of Mascaro et al. [17] on allometric examination, and Mascaro et al. [18] responded to criticism. Going over this exchange highlights the relevance of procedural factors in determining precision of parameter estimates of allometric scaling. It also offers a convenient framework for the aims of the present research.An important factor influencing precision of estimates of allometric parameters is analysis method. A widespread approach is the traditional analysis method of allometry (TAMA hereafter). It relies on a log transformation of data in arithmetical scale in order to contemplate a linear regression model in geometrical scale. Then, the fitted line is back-transformed to yield a two-parameter power function in the original scale. But embracing this procedure fuels a vivid unresolved debate. Views assert that this protocol can lead to biased results (e.g., [16, 19–29]). And other practitioners consistently wed to the idea that logarithmic transformations are necessary (e.g., [18, 30–39]). An alternative to the TAMA approach is using nonlinear regression methods in the direct scale of the data [16]. Echavarria-Heras et al. [11] concluded that producing allometric projections of average leaf biomass in eelgrass shoots must rely on this protocol. Yet a direct nonlinear regression approach in allometry is also not unfailing. For instance, inadequate identification of the inherent error structure can lead to significant bias [18]. Besides, Lai et al. [30] found that estimates of allometric parameters fitted by nonlinear regression can exhibit a high sensitivity to the largest values of the covariate. Therefore, evaluation of analysis method suitability in acquiring consistent eelgrass leaf biomass proxies needs revision.The adoption of methods of curvature in geometrical space could offer a way to overcome inadequacy of the TAMA procedure ([27, 40–42]). In particular, it is pertinent to examine if taking curvature into account leads to improved accuracy of eelgrass leaf biomass proxies. But, according to Mascaro et al [18], curvature could manifest because of methodological factors of data gathering. Thus, an examination of the effect of curvature in eelgrass leaf biomass allometry must also take into account a possible participation of data quality effects. Mascaro et al. [18] reminds on three ways of handling curvilinearity in geometrical space. One is by separating data to contemplate different local linear models to account for heterogeneity of effects of the covariate [43-45]. A second one is by fitting a polynomial model [46-49]. A third approach endorses direct nonlinear regression assuming a heteroscedastic error structure as contemplated by Mascaro et al. [17]. Either approach above bears complexity beyond the linear model in geometrical space that associates with the customary bivariate power function of allometry. This suggested putting forward a generalized allometric model intended to deal with curvature in geometrical space. This paradigm incorporates parameters that change as continuous functions of the log transformed covariate ([27, 39, 46]). As we explain further on, the curvature arrangements recommended by Mascaro et al. [18] can be all derived from the offered formalization. Moreover, a nonzero intercept power function that Packard [50] recommends to handle curvilinearity in geometrical space also derives from the presented generalized scaling model.But any scheme addressing curvature in geometrical space depends on a factor for correction of bias of retransformation of the regression error. In the general settings, if ϵ stands for the regression error, then the said correction factor, through denoted using the symbol δ(ϵ), is given by the mean of the exponentiated error random variable; that is, δ(ϵ) = E(e∈) [51-53]. Furthermore, the TAMA approach relies on the essential assumption that ϵ is additive, normally distributed, and homoscedastic [33]. When this happens, δ(ϵ) takes its lognormal-mean form [51-53]. But if ϵ fails to be normally distributed, there are two possibilities. If the distribution of ϵ is known, we could derive a closed form for δ(ϵ). In turn, if the error distribution is not identified a priori, a widespread approach is taking δ(ϵ) in the nonparametric form given by the smearing estimator of bias of Duan [54]. Still, there are provisions on this. A smearing estimate form can fail to compensate the downward retransformation bias of logged data ([53, 55, 56]). Thus, in a circumstance where ϵ is unspecified, characterizing δ(ϵ) seems elusive. Here, we put forward an arrangement for δ(ϵ) aimed to get around this circumstance. Zeng and Tang [57] proposed a nonparametric alternate to the smearing form. It matches the first three terms' partial sum of a power series expression of E(e∈), assuming E(ϵ) = 0. Form suggested here corresponds to a generalization of this construct. It does not abide the restriction E(ϵ) = 0 and matches an n −terms partial sum approximation of the exponential series representation of E(e∈). The partial sum that maximizes reproducibility strength of retransformed mean response sets criterion to choose n.Present results show that a consideration of curvature in geometrical space, as well as a suitable characterization of the correction factor of retransformation bias, offers consistent allometric proxies of observed mean leaf biomass in eelgrass shoots. Hence, contrary to views asserting direct nonlinear regression as mandatory in allometric examination, our findings validate a parallel reliability of log transformation based methods. This is well in line with claims of Mascaro et al. [18] and many others about blaming the use of logarithms of incongruent results in allometric analysis. Moreover, keeping the analysis in geometrical space unraveled heterogeneity in the inherent leaf biomass scaling pattern. This could not be achieved by clinging to direct nonlinear regression in arithmetic space as the only valid approach of allometric examination. Offered analytical arrangement is expected to be applicable in the general settings of allometry.
2. Materials and Methods
2.1. Data
For the aims of the present research, we relied on an extensive eelgrass data set collected in San Quintin Bay, a coastal lagoon on the Pacific side of the Baja California Peninsula, México (30°30' N – 116°00′W), and through a 13 months' long sampling period covering a whole-year cycle. Data composes measurements of length (mm), width (mm), and dry weight w (g) of a total of 10412 individual eelgrass leaves taken from 20 randomly thrown 400cm2 quadrats every monthly visit to the site. A sampling visit will be further referred to as “sampling time” in the text. The length times width proxy [11] provided estimations of leaf area a (mm2). In order to test for methodological influences of data gathering, we processed raw data set according to a mean plus or minus two standard deviations outlier's removal procedure [58]. Appendix A presents results of an exploratory analysis of data.
2.2. Models
As above specified symbols, w and a stand for the biomass of an individual eelgrass leaf and its respective area one to one. Echavarría-Heras et al. [11] assert that these variables can be related through the bivariate allometric model of (1). One procedure to acquire estimates for the parameters α and β is fitting directly in arithmetical scale a nonlinear homoscedastic regression model. Besides, we can use a TAMA approach, that is, fitting the linear regression modelwhere v = lnw, u = lna, and ϵ is a random error term assumed to be normally distributed with zero mean and variance σ2, that is, ϵ ~ N(0, σ2).We conceive curvature in geometrical space as a circumstance where fitting results of regression model of (2) are inconsistent. Dealing with this situation amounts for considering complexity beyond incorporated by (1). One possible approach to address curvature is assuming that scaling parameters α and β in (2) depend continuously on the covariate ([27, 39, 46]). This is consistent with the generalized allometric model,with β(a) and α(a) intended to be continuous and differentiable functions defined on R+ and with β(a) being positive. Certainly a log transformation v = lnw, u = lna of (3) establishes the regression modelwherewhere ϵ, a residual error term, is conceived as a random variable that in the general settings is ψ-distributed with mean μ and variance set by a function σ2(u) of the covariate u, that is, ϵ ~ ψ(μ, σ2(u)).Setting β(a(u)) = β and α(a(u)) = α with α and β constants reduces (4) to the regression model of (2). In Appendix B, we explain that (3) accommodates all curvature paradigms suggested by Mascaro et al. [18]. These include a biphasic and a polynomial model in geometrical space, as well as the nonlinear heteroscedastic model referred to by Mascaro et al. [17] in direct arithmetical space. Moreover, as shown in Appendix B, the three-parameter power function chosen as an alternate standard for curvature [59] also derives from (3).
2.2.1. Biphasic Model in Geometrical Space
In order to characterize the model of (3) in a biphasic mode, we let β(a) = β(a) and α(a) = α(a), such thatincluding parameters β and α and the function ϑ(u(a)) given byfor i = 1,2. H(u − u) is a Heaviside function H(z) [60], evaluated at z = u − u and correspondingly u = lna, with a ≤ a ≤ a being a point separating growth phases u ≤ u and u > u. Then, denoting by means of v(a, u) the resulting form of v(a, u) from (4), we get the biphasic regression modelwhere ϵ is a random error term as defined in (4) andwithwhere β and α for i = 1,2 parameters are to be estimated from data.
2.2.2. Polynomial Model in Geometrical Space
Similarly, assume that α(a) = α(a) and β(a) = β(a), withα and β for 1 ≤ k ≤ m are coefficients; one can acquire a polynomial representation v(a, u) for the generalized mean response function in geometrical space v(a, u). This way the polynomial form of regression (4) becomeswhere ϵ is a random error term as defined in (4), withand p, for k = 0,2,…, m parameters.
2.2.3. Nonlinear Heteroscedastic Model in Arithmetical Space
As we explain ahead, direct algebraic manipulation of (3) leads to the consideration of the nonlinear heteroscedastic regression model addressed by Mascaro et al. [17]; namely,with α, β, and θ being parameters and ϵ being a zero mean normally distributed error term with a covariate dependent variance σ2(a) = σ2a2, that is, ϵ ~ N(0, σ2a2).A nonlinear homoscedastic form derives from (14) by setting θ = 0; that is,And, again ϵ is an additive error term assumed to be normally distributed with zero mean and variance σ2, that is, ϵ ~ N(0, σ2).Appendix A deals with exploratory analysis of data Appendix B presents notation convention and also explains how all addressed paradigms derive from the generalized model of (3). Appendix C explains the addressed forms of correction factor for bias of retransformation of the regression error. Fitting results of the geometrical space based models appear in Appendix D. Those corresponding to the nonlinear heteroscedastic and homoscedastic models pertain to Appendix E. Agreement between observed and projected values is commonly evaluated by analyzing values of Lin's Concordance Correlation Coefficient (CCC) [61]. This correspondence index is commonly denoted by means of the symbol ρ. Agreement will be defined as poor whenever ρ < 0.90, moderate for 0.90 ≤ ρ < 0.95, good for 0.95 ≤ ρ < 0.99, or excellent for ρ ≥ 0.99 [62]. Besides CCC values, we assessed reproducibility by comparing goodness-of-fit statistics, such as the coefficient of determination (CD), standard error of estimate (SEE), mean prediction error (MPE), total relative error (TRE), average systematic error (ASE), and mean percent standard error (MPSE) ([63-65]). For statistical tasks, we relied on the R package release 3.5.
3. Results
Exploratory analysis in Appendix A identifies maximum, minimum, and sample mean values for observed leaf area values a and associated dry weights w. We also explain distribution of variables in terms of quantiles of probability 0.1, 0.25, 0.50, 0.75, and 0.90, for both crude and processed data. Statistical exploration extends to log transformed values of these variables. We present Q-Q plots (quantile-quantile) for comparison of distribution patterns, as well as boxplots for the 13 months' long sampling scheme for both crude and processed data. We can learn that, from month 2 to month 6, a reduction in the values of weight and area occurred; this perhaps is explained by an increase in temperature during the period. We can be also aware that a similar variation pattern over time is shown in both raw and processed data sets.
3.1. Fitting Results of Geometrical Space Models
In order to validate curvature in geometrical space, we compared the linear model derived from (1) as well as biphasic and polynomial alternates derived from the generalized model of (3). Appendix B explains formal matters. Tables 1 and 2 summarize notation convention. Equations numbered beyond (15) belong to the appendices. Appendix D explains corresponding regression protocols.
Table 1
Summary of notation convention of the bivariate scaling model of (1) and its generalization to account for curvature as given by (3). GS stands for geometrical space, AS stands for arithmetical space, and CF means correction factor of retransformation bias.
Typical bivariate
Eq.
Generalized for curvature
Eq.
Basic form
w = βaα
(1)
w = β(a)aα(a)
(3)
Regression equation in GS
v = lnβ + αu + ϵ
(2), (D.1)
v = vC(a, u) + ϵ
(4)
Mean response GS
ET(v | u) = lnβ + lnα
(B.4)
EC(v | u) = lnβ(a) + α(a)u(a)
(B.1)
Back transformation AS
w = eET(v | u)eϵ
(B.5)
w = eEC(v | u)eϵ
(B.2)
Mean response AS
ETδ(w | a) = βaαδ(ϵ) δ(ϵ) = E(eϵ)
(B.6)
ECδ(w | a) = β(a)aα(a)δ(ϵ) δ(ϵ) = E(eϵ)
(B.3)
CF Baskerville
ETB(w | a) = βaαδB(ϵ) δB(ϵ) = eσ2/2
(B.6), (C.1)
ECB(w | a) = β(a)aα(a)δB(ϵ)
(B.3), (C.1)
CF Duan
ETD(w | a) = βaαδD(ϵ) δDϵ=∑1meϵjm
(B.6), (C.2)
EC(w | a) = β(a)aα(a)δD(ϵ)
(B.3), (C.2)
CF Zeng and Tang
ETZT(w | a) = βaαδZT(ϵ) δZTϵ=1+σ22
(B.6), (C.3)
ECZT(w | a) = β(a)aα(a)δZT(ϵ)
(B.3), (C.3)
CF n-partial sum
ETn(w | a) = βaαδn(ϵ) δnϵ=∑0nE(ϵk)k!
(B.6), (C.4)
ECn(w | a) = β(a)aα(a)δn(ϵ)
(B.3), (C.4)
Table 2
Notation convention for curvature models in geometrical space. We include the biphasic and polynomial characterizations of the generalized allometric model of (3).GS stands for geometrical space, AS stands for arithmetical space, and CF means correction factor of retransformation bias of the regression error.
Biphasic model
Eq.
Polynomial model
Eq.
Basic form
w = βB(a)aαB(a)
(B.11)
w = βP(a)aαP(a)
(B.27)
Regression equation
v = vB(a, u) + ϵ
(8), (D.4)
v = vP(a, u) + ϵ
(B.32), (D.8)
Mean response GS
EB(v | u) = ∑12 ϑi(u(a))fi(u(a))
(B.18)
EP(v | u) = ∑0mpku(a)k
(B.33)
Back transformation AS
w = βB(a)aαB(a)eϵ
(B.21)
w = βP(a)aαP(a)eϵ
(B.34)
Mean response AS
EBδ(w | a) = βB(a)aαB(a)δ(ϵ) δ(ϵ) = E(eϵ)
(B.22)
EP(w | a) = βP(a)aαP(a)δ(ϵ) δ(ϵ) = E(eϵ)
(B.35)
CF Baskerville
EBB(w | a) = βB(a)aαB(a)δB(ϵ)
(B.22), (C.1)
EPB(w | a) = βP(a)aαP(a)δB(ϵ)
(B.35), (C.1)
CF Duan
EBD(w | a) = βB(a)aαB(a)δD(ϵ)
(B.22), (C.2)
EPD(w | a) = βP(a)aαP(a)δD(ϵ)
(B.35), (C.2)
CF Zeng and Tang
EBZT(w | a) = βB(a)aαB(a)δZT(ϵ)
(B.22), (C.3)
EPZT(w | a) = βP(a)aαP(a)δZT(ϵ)
(B.35), (C.3)
CF n-partial sum
EBn(w | a) = βB(a)aαB(a)δn(ϵ)
(B.22), (C.4)
EPn(w | a) = βP(a)aαP(a)δn(ϵ)
(B.35), (C.4)
Fitting results of the TAMA arrangement of (2) appear in Appendix D. Figure 1 shows the spread about TAMA's linear mean response function E(v∣u). We can visually ascertain that deviations from the linear mean response function E(v∣u) suggest curvature (red dots). Thus, data processing removed inconsistent replicates but shown spread still deviates from a linear mean response. This suggests that curvature in geometrical space could not be explained by methodological factors related to data gathering.
Figure 1
Spread about the TAMA's linear mean response function E(v | u). Deviations about E(v | u) shown by red dots suggest curvature.
Fitting results of the biphasic protocol of (8) are summarized in Appendix D. Figure 2 displays the spread about mean response function E(v∣u) in geometrical space. Compared with Figure 1, we can ascertain that the biphasic fit provides a consistent account of different variation patterns among smaller and larger leaves. We can visually ascertain that fit produced consistent results. This confirms a judgement that identified curvature might be due to intrinsic factors of leaf growth rather than methodological influences related to data gathering.
Figure 2
Spread about the biphasic mean response function E(v | u). Compared with the plot in Figure 1, we observe that the biphasic choice offers a better account of variability of the log transformed response than the TAMA alternate.
Appendix D presents fitting results of the polynomial model (m = 6). Figure 3 displays dispersion about the polynomial mean response function in geometrical space E(v∣u). A polynomial representation also exhibits higher consistency than the TAMA arrangement. Recalling the biphasic scheme, the polynomial suggests a smooth transition between two growing phases.
Figure 3
Spread about the polynomial mean response function E(v | u). Compared with the plot in Figure 1, we can be aware that, as opposite to the TAMA protocol, the polynomial scheme offers a consistent account of curvature.
3.2. Model Selection in Geometrical Space
Assessment of models fitted on geometrical space relied on goodness-of-fit statistics, that is, the coefficient of determination, standard error of estimate, mean prediction error, total relative error, average systematic error, and mean percent standard error ([63-65]). Besides, we took into account concordance correlation coefficient [61] and Akaike's information index [66]. Table 3 presents results. Goodness-of-fit statistics and ρ and AIC values disfavored the TAMA protocol. On the contrary, comparison indices favored the biphasic model. Moreover, differences among indices but TRE and ASE for this scheme and the polynomial (m = 6) are slight. Particularly, the highest AIC is associated with the TAMA protocol (AIC = 15069.9, ∆AIC = 2491.9). Therefore, this model bears the less support. The biphasic choice delivered the smallest AIC's value (AIC = 12578.0, ∆AIC = 0). Nevertheless, difference in AIC is just barely relative to the m = 6 polynomial model, since this choice conveyed (AIC = 12615.0 and ∆AIC = 37). Thus, model confrontation shows that the TAMA protocol is unsuited, thus backing the assertion that whatever model aims to be consistent with the present data, it ought to be nonlinear in geometrical space.
Table 3
Assessment of geometrical space fitted models. Comparison took into account goodness-of-fit statistics, that is, the coefficient of determination (R2) standard error of estimate (SEE), mean prediction error (MPE), total relative error (TRE), average systematic error (ASE), and mean percent standard error (MPSE) ([63–65]). Besides, we considered concordance correlation coefficient (ρ) [61] and Akaike's information index (AIC) [66].
Method
AIC
ρ
R2
SEE
TRE
ASE
MPE
MPSE
TAMA
15069.9
0.9505
0.9045
0.5135
-0.0286
-0.2462
-0.1879
5.7877
Biphasic
12578.0
0.9614
0.9256
0.4530
0.0037
0.0286
-0.1658
5.1446
Polynomial
12615.0
0.9614
0.9241
0.4576
0.8497
1.2008
-0.1675
5.3187
3.3. Retransformation Results
The TAMA protocol was not supported by the model selection criteria. Anyway, for comparison, corresponding retransformation results are included in Appendix D. Related with the TAMA protocol, fitting results of the biphasic model display a relatively improved distribution of residuals about the zero line. Nevertheless, normal Q-Q plot still shows heavier tails than those expected for a normal distribution. And, again both test statistics and p values of an Anderson-Darling test [67] provide evidence against normality of residuals. This justifies choosing the nonparametric forms δ(ϵ), δ(ϵ), or δ(ϵ) for compensation of downward bias induced by retransformation of the regression error ([11, 52, 53]). Table 4 displays comparison statistics for the reproducibility strength of the biphasic mean response E(w∣a) as shaped by the different forms of δ(ϵ).
Table 4
Comparison of reproducibility strength statistic for the biphasic mean response E(w | a) as calculated by different forms of the correction factor for bias of retransformation of the regression error ϵ. The ρ(δ(ϵ)) symbol stands for the concordance correlation value associated with δ(ϵ).
CF
ρ(δ(ϵ))
R2
SEE
TRE
ASE
MPE
MPSE
δD(ϵ)
0.9664
0.9256
0.0049
-9.2196
4.48e-13
0.8133
31.2839
δZT(ϵ)
0.9687
0.9319
0.0047
-7.5530
1.8359
0.7780
31.4388
δn(ϵ)
0.9727
0.9464
0.0042
1.9516
12.3058
0.6903
33.9525
We can learn that agreement between the biphasic mean response and leaf biomass data is best for δ(ϵ). Figure 4 shows spread of processed leaf biomass values about the biphasic mean response function E(w∣a) as shaped by the considered forms of δ(ϵ). We can observe that both δ(ϵ) and δ(ϵ) overcompensate the bias correction by δ(ϵ). Moreover results show that as opposed to the TAMA a biphasic protocol along with the δ(ϵ) form offers consistent proxies of individual leaf biomass. But it is worth mentioning that in spite of the fact that model selection favored the biphasic scheme, examination of the polynomial model output reveals similar predictive strength to the biphasic alternate.
Figure 4
Spread about the biphasic mean response function E(w | a) in arithmetical space. Black lines relate to δ(ϵ), green to δ(ϵ), and red ones to δ(ϵ).
3.4. Assessing Curvature by Direct Nonlinear Regression
As suggested by Mascaro et al. [18], effects of curvature in geometrical space can be analyzed by means of the direct nonlinear heteroscedastic regression model of (14). In Appendix B, we explain that such a protocol also derives from the generalized bivariate allometric model of (3). Table 5 presents pertinent notation convention. For comparison, we also present results for the associated homoscedastic case.
Table 5
Notation convention for the nonlinear heteroscedastic and homoscedastic models of (14) and (15), respectively.
Nonlinear model
Heteroscedastic
Eq.
Homoscedastic
Eq.
Regression equation
w = βθaαθ + aθϵϵ = N(0, a2θσ2)
(14), (E.1)
w = βoaαo + ϵϵ ~ N(0, σ2)
(15), (E.5)
Mean response function
Eθ(w | a) = βθaαθ
(B.45)
Eo(w | a) = βoaαo
(B.46)
Fitting results of the heteroscedastic and homoscedastic models appear in Appendix E. We can learn that estimates for the normalization constant and scaling exponent parameters are very similar. Certainly, corresponding 95% confidence intervals display some overlap. As a result, we can expect similar reproducibility features for both models. Table 6 presents comparison statistics.
Table 6
Assessment of models fitted in arithmetical space. This includes the nonlinear homoscedastic and heteroscedastic protocols.
Method
AIC
ρ
R2
SEE
TRE
ASE
MPE
MPSE
Heteroscedastic
-92761.13
0.972
0.9467
0.0041
0.0311
26.1073
0.6882
50.2781
Homoscedastic
-81386.51
0.972
0.9467
0.0041
0.2954
28.4527
0.6881
51.7400
We can be aware that model assessment backs the heteroscedastic model. But selection here is mainly on qualitative grounds. It actually concerns the ability of the heteroscedastic model to identify an expected dependence of variance in the covariate. Certainly, the reproducibility strengths of both paradigms are equivalent. Indeed, Figure 5 shows that mean response curves E(w∣a) and E(w∣a) differ just barely.
Figure 5
Spread about the mean response, curves E(w | a) and E(w | a) associated with the nonlinear heteroscedastic and homoscedastic regression models of (14) and (15) one to one. The mean response E(w | a) is shown in black lines and those corresponding to E(w | a) in red.
Results show that as it occurred for models fitted in geometrical space, data cleaning failed to correct a heavy tails problem for the nonlinear fits. This can be ascertained from the normal Q-Q plot of residuals. This strengthens our point on the consideration of a different error structure from the one assumed here. Exploring the effects of error structure in the fitting of models for curvature addressed here will be a matter of further research. Interestingly, both the homoscedastic and heteroscedastic models seem to induce the same reproducibility strengths.
3.5. Model Assessment in Arithmetical Space
The model selection assay in geometrical space summarized in Table 3 favored the biphasic protocol. Correspondingly, statistics in Table 6 support the nonlinear heteroscedastic model. Results of Table 4 endure δ(ϵ) as required for largest reproducibility of retransformation output. Table 7 allows assessment of these models. We can learn that half the number of comparison indices coincide (ρ, R2, SEE, and MPE). In addition, the biphasic model is favored by AIC, ASE, and MPSE. This sets criterion for selection of curvature in geometric space as a consistent paradigm for the present data. Accordingly, the biphasic model bears adequate.
Table 7
Assessment of models in arithmetical space. This includes the nonlinear heteroscedastic and biphasic protocols.
Method
AIC
ρ
R2
SEE
TRE
ASE
MPE
MPSE
Heteroscedastic
-92761.13
0.972
0.946
0.004
0.031
26.107
0.690
50.278
Biphasic
-96879.30
0.972
0.946
0.004
1.951
12.305
0.690
33.952
3.6. Implications for Allometric Proxies of Mean Leaf Biomass in Eelgrass Shoots
We in turn consider allometric proxies for average leaf biomass in eelgrass shoots. In getting these surrogates, we aggregate allometric projections of individual leaf biomass conforming a shoot. For comparison, we consider individual leaf biomass surrogates produced by the different projection methods. Table 8 compares resulting reproducibility strengths.
Table 8
Comparison of reproducibility strengths of proxies of average leaf biomass in shoots resulting from the biphasic polynomial or nonlinear heteroscedastic protocols.
Method
ρ
R2
SEE
TRE
ASE
MPE
MPSE
Biphasic
0.997
0.994
8.729e-04
2.408
2.229
3.486
5.080
Polynomial (m=6)
0.996
0.992
0.001
-2.456
-1.033
4.807
5.366
Heteroscedastic
0.997
0.994
7.722e-04
0.983
-0.726
3.084
5.645
Results in [11] stablished that proxies derived from the TAMA protocol are inconsistent with observed values. This endorsed nonlinear regression in the direct scale as a requirement for reliability of allometric projections of mean leaf biomass in eelgrass shoots. But Table 8 shows that a curvature model fitted in geometrical space can offer proxies entailing similar predictive power to a nonlinear regression protocol. Plots in Figure 6 allow getting a glimpse of this assertion.
Figure 6
Average leaf biomass in shoots calculated from observed-processed data compared with their allometric projections. Projection lines resulting from the biphasic and polynomial models relied on the δ(ϵ) form.
4. Discussion
The customary bivariate allometric model of (1) offers nondestructive surrogates for average leaf biomass in eelgrass shoots [11]. But there are methodological factors that could influence dependability. Views assert that parameter identification based on logarithmic transformations leads to biased projections [20-29]. But other practitioners clung to this approach as meaningful and necessary in allometric examination [30-39]. This going over suggests that surpassing this controversy amounts to considering curvature in geometrical space. For that aim, we proposed the generalized model of (3). Approaches such as direct nonlinear heteroscedastic regression, as well as biphasic and polynomial protocols in geometrical space [18], became logical resultants from this construct. For present data model selection validated maintenance of the analysis in geometrical space. Nevertheless, at an empirical level, addressed protocols produced allometric projections of individual leaf biomass of correspondent precision. This was also verified for concomitant projections of average leaf biomass in shoots. But, from a qualitative standpoint, the nonlinear regression protocol mainly contributed by identifying expected dependence of the variance on covariate. Moreover. Figure 7(a) depicts manifest differences in mean response trends between the polynomial fit and the nonlinear heteroscedastic model. Nonetheless, those in Figure 7(b) corresponding to this and the biphasic models differ but only barely. Then, a nonlinear regression scheme at best shaped a reasonable approximation of the mean response function resultant from curvature methods.
Figure 7
Trends of mean response functions calculated from the retransformed outputs of polynomial (a) and biphasic models (b) involving the δ(ϵ) form of δ(ϵ). We can observe a differentiation in trends relative to that of a power function fitted in directly by means of the nonlinear heteroscedastic protocol of (14). Although relative deviations manifest for the biphasic model, differences in mean response patterns are more clearly depicted for the polynomial choice.
Differences in patterns of the biphasic and polynomial mean response functions relative to the nonlinear protocol exhibit that clinging to this last paradigm could impair detection of the true allometric relationship. Moreover, relying on direct nonlinear regression impairs identification of heterogeneity in the log transformed response as covariate changes. This further stresses on limitations of this device as a tool for allometric examination ([18, 30]). Oppositely, output of the selected biphasic model shown in Figure 2 suggests differentiation of growth patterns among smaller and larger leaves. Besides, the polynomial mean response in Figure 3 suggests a gradual transition between different growth phases. Thus, as opposed to direct nonlinear regression, a consideration of curvature in geometrical space could elucidate an inherent leaf growth pattern. This strengthens a judgement that the log transformation step, essential to traditional allometric examination, cannot be thrown away without losing relevant information ([33, 37]).Mascaro et al. [18] conceived curvature in geometrical space as related to methodological factors of data gathering. But present examination corroborated consistency of curvature models for processed data. This suggests that manifestation of curvature is rather explained by intrinsic factors in leaf growth. Additionally, data processing failed to amend the heavy tails problem detected on Q-Q plots. This indicates departure of residuals from an assumed error structure. As a result, numerical values of the addressed correction factor forms turned out to be different, thus conveying ambiguity in selection of suitable mean response of models fitted in geometrical space. This could entail the only advantage of nonlinear regression method over log-transformation-curvature paradigms. But, also for this analysis method, an inadequate postulation of inherent error structure can lead to significant bias [18]. It seems then reasonable considering that a suitable characterization of error structure could lead to robustness of built allometric proxies, even when they derive from crude data. Steering to an error structure different from what is assumed here is a worthwhile subject of further research.When dealing with similar data we suggest taking into account recommendations that come up from this examination. First, it is highly advisable to perform a preliminary examination of the spread around the straight line in geometrical space resulting from the model of (1). If further statistical exploration confirms that linearity and assumed error structure are consistent with data, Huxley's bivariate allometric model could suit. Otherwise, the arrangement of curvature, error structure, and correction factor form such as proposed here could be called into account for the analysis. The use of data cleaning procedures in order to achieve a better fit is controversial [11]. Instead of performing data processing a posteriori, it is highly advisable to rely on standardized data gathering procedures. This will prevent proliferation of inconsistent replicates that could exacerbate a heavy tails problem on Q-Q plots.
5. Conclusion
Failure to perform both a preliminary exploration of spread of log transformed allometric data and a sound evaluation of model adequacy could impair detecting a possible manifestation of curvature. As a consequence, the output of a traditional analysis method could set biased predictions of observed values. This circumstance could result in dismissal of a log transformation step in the analysis, giving way to contemplation of direct nonlinear regression as the only protocol to acquire reliable parameter estimates [11]. Results of this examination suggest that consideration of curvature in geometrical space as set by the model of (3) could offer dependable allometric proxies of average leaf biomass in eelgrass shoots.From a general perspective, complexity as encompassed by the model of (3) can stand for curvature as conceived in allometric examination. Particularly, biphasic or polynomial protocols in geometrical space, as well as a direct nonlinear heteroscedastic regression model, derive as particular characterizations of this paradigm. Moreover all statistical models for accurate estimates of relative growth contemplated by Bervian et, al., [68] can be also accomodated by the present generalization of the model of simple allometry of Huxley. But empirical convenience on its own does not validate adoption of this paradigm as a general tool. Certainly, the Weierstrass approximation theorem [69] backs a polynomial regression model as a reasonable identification device for the generalized allometric model expressed in geometrical space. But suitability of retransformation results will sensibly depend on correction factor form. And a mean function resulting from a polynomial fitted in geometrical space will not enable characterization of functions α(a) and β(a) one to one. Furthermore, complexity of (3) could pose significant difficulties while attempting its identification through direct nonlinear regression methods. Needless to say, biological interpretation of the scaling functions α(a) and β(a) is also pending. A quest for efficient tools of nondestructive assessment of plant biomass units justifies addressing these examinations in a further research.
Table 9
Values of minimum, maximum, sample mean, and quantiles for measurements of eelgrass dry weight (w[gr]) and area (a[mm2]) (and their logarithms) of a sample of 10412 leaves before applying a data-cleaning procedure aimed to eliminate outliers.
Min
0.10
0.25
0.50
Mean
0.75
0.90
Max
w[gr]
0.00001
0.00042
0.00154
0.00564
0.01293
0.01477
0.035443
0.38058
a[mm2]
2.00
32.50
127.5
355.2
690.5
836.0
1859.00
7868.0
lnw
-11.513
-7.7753
-6.4760
-5.1779
-5.4001
-4.2152
-3.33983
-0.9661
lna
0.6931
3.48124
4.8481
5.8728
5.6729
6.7286
7.527794
8.9706
Table 10
Values of minimum, maximum, sample mean, and quantiles for measurements of eelgrass dry weight (w[gr]) and area (a[mm2]) (and their logarithms) of a sample of 10023 leaves remaining after applying a data-cleaning procedure aimed to eliminate outliers.
Min
0.10
0.25
0.50
Mean
0.75
0.90
Max
w[gr]
0.00001
0.00041
0.00144
0.00529
0.01211
0.01373
0.03381
0.15096
a[mm2]
2.00
30.00
124.00
352.50
672.83
828.00
1835.80
6240.00
lnw
-11.5129
-7.7994
-6.5431
-5.2419
-5.4593
-4.2878
-3.3868
-1.8907
lna
.6931
3.4012
4.8203
5.8651
5.6518
6.7190
7.5152
8.7387
Table 11
Parameter estimates of straight lines fitted on raw (10412) and processed data (10023) and shown in the Q-Q plots of Figure 8.
Estimate
Standard Error
Raw data
Intercept
141.22
3.88
slope
42493.46
163.29
Processed data
Intercept
111.20
2.50
Slope
46360.00
114.90
Table 12
Fitting results of regression model of the TAMA protocol (cf. (D.1) through (D.3)). CI stands for confidence interval, RSE stands for standard error of residuals, RMS means multiple R-squared, and ARS is adjusted R-squared.
Statistics of residuals
Minimum
1Q
Median
3Q
Maximum
-4.7088
-0.2344
0.0281
0.2244
4.7169
Parameters
Estimate
Std. Error
t value
Pr(>|t|)
C.I. (95%)
α
1.028222
0.003335
308.27
<2e-16
(1.021684, 1.03476)
lnβ
-11.270569
0.019534
-576.94
<2e-16
(-11.308861, -11.23228)
RSE σ
0.5131 on 10021 df
MRS
0.9046
ARS
0.9046
F-statistic
9.504e+04 on 1 and 10021 df
p-value
< 2.2e-16
Table 13
Fitting results of the biphasic regression model of (D.4) through (D.7) to processed data. CI stands for confidence interval and RSE for standard error of residuals.
Residual statistics
Minimum
1Q
Median
3Q
Maximum
-4.4406
-0.1948
0.0240
0.2072
4.4626
Parameters
Estimate
Std. Error
t value
Pr(>|t|)
CI (95%)
lnβ1
-9.190321
0.052812
-174.0
<2e-16
(-9.2938, -9.0868)
lnβ2
-11.9381
0.093895
-127.1
<2e-16
(-12.1221, -11.7541)
ub
3.550709
0.032744
108.4
<2e-16
(3.4865, 3.6149)
α1
0.336339
0.020042
16.8
<2e-16
(0.2971, 0.3756)
α2
1.164558
0.004248
274.1
<2e-16
(1.1562, 1.1729)
RSE σ
0.452947 on 10018 df
Table 14
Fitting results of the polynomial regression model. CI stands for confidence interval, RSE stands for standard error of residuals, RMS means multiple R-squared, and ARS is adjusted R-squared.
Residual statistics
Minimum
1Q
Median
3Q
Maximum
-4.4207
-0.1947
0.0248
0.2072
4.5703
Parameters
Estimate
Std. Error
t value
Pr(>|t|)
CI (95%)
p0
-11.080
0.4136
-26.783
<2e-16
(-11.889e+1, -10.267e+1)
p1
4.602
0.7411
6.209
5.54e-10
(3.1489, 6.0543)
p2
-3.2120
0.5027
-6.390
1.74e-10
(-4.1976, -2.2267)
p3
1.0620
0.1671
6.355
2.17e-10
(0.7344, 1.3894)
p4
-0.1678
0.02911
-5.765
8.43e-09
(-0.22483, -0.11072)
p5
0.01287
0.002549
5.047
4.57e-07
(0.0078689, 0.017863)
p6
- 3.854e-04
8.854e-05
-4.353
1.36e-05
(-0.00055897, -0.00021186)
RSE σ
0.4538 on 10016 df
MRS
0.9254
ARS
0.9254
F-statistic
2.071e+4 on 6 and 10016 df
p-value
< 2.2e-16
Table 15
Maximum likelihood estimates of parameters for the heteroscedastic and homoscedastic regression models ((E.1) and (E.5) one to one). CI stands for confidence interval.
Authors: J Chave; C Andalo; S Brown; M A Cairns; J Q Chambers; D Eamus; H Fölster; F Fromard; N Higuchi; T Kira; J-P Lescure; B W Nelson; H Ogawa; H Puig; B Riéra; T Yamakura Journal: Oecologia Date: 2005-06-22 Impact factor: 3.225