History index model

In statistical analysis, the standard framework of varying coefficient models (also known as concurrent regression models), where the current value of a response process is modeled in dependence on the current value of a predictor process,  is disadvantageous when it is assumed that past and present values of the predictor process influence current response. In contrast to these approaches, the history index model includes the effect of recent past values of the predictor through the history index function. Specifically, the influence of past predictor values is modeled by a smooth history index functions, while the effects on the response are described by smooth varying coefficient functions.

Definition
In Functional data analysis, functional data are considered as realizations of a Stochastic process $$X(t), t\in \mathcal{I}$$ that is an $$L^{2}$$ process on a bounded and closed interval $$\mathcal{I}$$.

Let the current functional response process $$Y(t)$$ at time $$t$$ depends on the recent history of the predictor process $$X$$ in a sliding window of length $$\Delta$$.

Then the history index model is defined as

$$ \mathrm{E}\{Y(t)|X(t)\}=\beta_{0}+\beta_{1}(t)\int_{0}^{\Delta}\gamma(u)X(t-u)du, $$             (1)

for $$t \in [\Delta,T]$$ with a suitable $$T>0$$. Then, a  history index function  is $$\gamma(\cdot)$$ defining the history index factor at $$\beta_{1}(\cdot)$$ by quantifying the influence of the recent history of the predictor values on the response. In most cases, $$\gamma(\cdot)$$ is assumed to be smooth. For identifiability, $$\gamma(\cdot)$$ is normalized by requiring that $$\int_{0}^{\Delta} \gamma^{2}(u) du = 1$$ and that $$\gamma(0)>0$$, which is no real restriction as $$\{-\beta_{1}(t)\}\{-\gamma(u)\}=\beta_{1}(t)\gamma(u)$$.

Estimation of the history index function
At each fixed time point $$ t $$, the model in (1) reduces to a functional linear model between the scalar response $$Y(t)$$ and the functional predictor $$X(t), t-\Delta \leq s \leq t.$$ Also, $$X^{C}(s)=X(s)-\mathrm{E}\{X(s)\}$$ is a centered functional covariate and $$Y^{C}(s)=Y(s)-\mathrm{E}\{Y(s)\}$$ is a centered response process. Writing the model as

$$ \mathrm{E}\{Y^{C}(t)|X^{C}(t)\}=\beta_{1}(t)\int_{0}^{\Delta}\gamma(s)X^{C}(t-s)ds=\int_{0}^{\Delta}\alpha_{t}(s)X^{C}(t-s)ds, $$             (2)

with regression parameter functions $$ \alpha_{t}(s)=\beta_{1}(t)\gamma(s), $$ the functions $$ \alpha_{t}(s) $$ contain the factor $$ \gamma(s) $$ for each $$ t $$. To satisfy the constraint $$ \int_{0}^{\Delta}\gamma^{2}(u)du=1 $$ and stabilize resulting estimators, over an equidistant grid of time points $$ (t_{1},\ldots,t_{R}) $$ in $$ [\Delta,T], $$ we can define

$$ \gamma(s)=\frac{\Sigma_{r=1}^{R}\alpha_{t_{r}}(s)}{[\int_{0}^{\Delta}\{\Sigma_{r=1}^{R}\alpha_{t_{r}}(s)\}^{2}ds]^{1/2}} $$.             (3)

When the history index function is recovered, model (1) reduces to a varying coefficient model.

Estimation of the varying coefficient function
Once the estimate of $$ \gamma(s) $$ has been obtained, the remaining unknown component in model (2) is the varying coefficient function $$ \beta_{1} $$. Define $$ \tilde{X}(t)=\int_{0}^{\Delta}\gamma(s)X^{C}(t-s)ds. $$ From (2),

$$ \mathrm{cov}\{X(t),Y(t)\}=\mathrm{cov}[\mathrm{E}\{X^{C}(t)|X\},\mathrm{E}\{Y^{C}(t)|X\}]+\mathrm{E}[\mathrm{cov}(X^{C}(t),Y^{C}(t)|X)]=\beta_{1}(t)\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds $$,

$$ \mathrm{cov}\{X(t),\tilde{X}(t)\}=\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds, $$

and therefore $$\beta_{1}(t)=\mathrm{cov}\{X(t),Y(t)\}/\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds.$$

Application of the history index model
The applications of the varying coefficient model, which considers both the past and present information at the same time, have received an increasing attention in recent years. For example, Sentürk et al. proposes a time varying lagged regression model to assess the association between predictors, such as cognitive and functional impairment scores, with the frequency of clinic visits of older adults. Also, Zemplenyi et al. suggests a function-on-function regression model that leverages data from nearby DNA methylation probes to identify epigenetic regions that exhibit windows of susceptibility to ambient particulate matter less 2.5 microns (PM2.5). In this trend, the history index model have also been used in various situations.

Delay differential equation
The modeling of time dynamical systems is of interest in multiple scientific fields. A delay differential equation (DDE) is a natural extension of a variety of differential equations, such as ordinary differential equation, random differential equation and stochastic differential equation, when observed processes have an aftereffect.

For dynamic learning of random differential equations with a delay (RDED), Dubey et al. utilize functional linear regression with history index to learn the distributed delay, where the regression parameter function then corresponds to a history index function for the process of interest.

Let $$ (X(\cdot),\mathbf {U}(\cdot)) $$ denote multivariate stochastic process where $$ X(\cdot) $$ is a continuously differentiable process of interest. $$ \mathbf {U}(\cdot)=(U_{1}(\cdot),\ldots,U_{J}(\cdot))^{T} $$ is a vector function of additional covariates and $$ [t_{0},T] $$ is a time window of interest. The model is defined as

$$ \frac{dX(t)}{dt}=\alpha(t)+\int_{0}^{\tau_{0}}\gamma(s,t)X(t-s)ds + \int_{0}^{\tau_{1}} \gamma_{1}(s,t)U(t-s)ds+Z(t), t\in [t_{0},T], $$

$$ X(t)=g(t), t\in[t_{0}-\tau_{0},t_{0}],$$

where $$g$$ is an initial condition process, $$\tau_{0}$$, $$\tau_{1}$$ are delays, $$\alpha(t)$$ is a smooth function, $$\gamma(s,t),\gamma_{1}(s,t)$$ are history index functions, and $$Z(\cdot)$$ is a random drift process that is independent of $$ (X(\cdot),\mathbf {U}(\cdot)) $$. For the purpose of illustration and technical derivations, we assume that $$ U(\cdot) $$ is a univariate process: the corresponding multivariate generalization is straightforward. By using the RDED described above, it is utilized to predict the growth rate of COVID-19 cases in the United States.