Hypertabastic survival models

Hypertabastic survival models were introduced in 2007 by Mohammad Tabatabai, Zoran Bursac, David Williams, and Karan Singh. This distribution can be used to analyze time-to-event data in biomedical and public health areas and normally called survival analysis. In engineering, the time-to-event analysis is referred to as reliability theory and in business and economics it is called duration analysis. Other fields may use different names for the same analysis. These survival models are applicable in many fields such as biomedical, behavioral science, social science, statistics, medicine, bioinformatics, medical informatics, data science especially in machine learning, computational biology, business economics, engineering, and commercial entities. They not only look at the time to event, but whether or not the event occurred. These time-to-event models can be applied in a variety of applications for instance, time after diagnosis of cancer until death, comparison of individualized treatment with standard care in cancer research, time until an individual defaults on loans, relapsed time for drug and smoking cessation, time until property sold after being put on the market, time until an individual upgrades to a new phone, time until job relocation, time until bones receive microscopic fractures when undergoing different stress levels, time from marriage until divorce, time until infection due to catheter, and time from bridge completion until first repair.

Hypertabastic cumulative distribution function
The Hypertabastic cumulative distribution function or simply the hypertabastic distribution function $$F(t)$$ is defined as the probability that random variable $$T$$ will take a value less than or equal to $$t$$. The hypertabastic distribution function is defined as


 * $$F(t) =

\begin{cases} 1 - \operatorname{sech}(\frac{\alpha(1-t^{\beta} \operatorname{coth}(t^{\beta}))}{\beta}) & t > 0 \\ 0 & t\leq 0 \end{cases} $$, where $$\operatorname{sech}$$ represents the hyperbolic secant function and $$\alpha$$, $$\beta$$ are parameters. The parameters $$\alpha$$ and $$\beta$$ are both positive with $$\operatorname{sech}$$ and $$\operatorname{coth}$$ as hyperbolic secant and hyperbolic cotangent respectively. The Hypertabastic probability density function is


 * $$f(t) =

\begin{cases} \operatorname{sech}(W(t))(\alpha t^{2 \beta - 1}\operatorname{csch}^{2}(t^{\beta})-\alpha t^{\beta-1}\operatorname{coth}(t^{\beta}))\operatorname{tanh}(W(t))& t > 0 \\ 0 & t < 0 \end{cases} $$,

where $$\operatorname{csch}$$ and $$\operatorname{tanh}$$ are hyperbolic cosecant and hyperbolic tangent respectively and


 * $$W(t)=\frac{\alpha(1-t^{\beta} \operatorname{coth}(t^{\beta}))}{\beta} $$

Hypertabastic survival function
The Hypertabastic survival function is defined as


 * $$S(t)=\operatorname{sech}[\frac{\alpha(1-t^\beta \operatorname{coth}(t^\beta))}{\beta}]$$,

where $$S(t)$$ is the probability that waiting time exceeds $$t$$.

For $$t>0$$, the Restricted Expected (mean) Survival Time of the random variable $$T$$ is denoted by $$REST(t)$$, and is defined as


 * $$REST(t)=\int_{0}^{t}{S(u)} du$$.

Hypertabastic hazard function


For the continuous random variable $$T$$ representing time to event, the Hypertabastic hazard function $$h(t)$$, which represents the instantaneous failure rate at time $$t$$ given survival up to time $$t$$, is defined as


 * $$h(t) = \lim_{\Delta(t) \to 0^{+}} \frac{P(t \leq T < t + \Delta(t) | T \geq t)}{\Delta(t)}= \alpha (t^{2 \beta-1} \operatorname{csch}^{2} (t^{\beta}) -t^{\beta-1}\operatorname{coth} (t^{\beta})) \operatorname{tanh} (W(t))$$.

The Hypertabastic hazard function has the flexibility to model varieties of hazard shapes. These different hazard shapes could apply to different mechanisms for which the hazard functions may not agree with conventional models. The following is a list of possible shapes for the Hypertabastic hazard function: For $$0 < \beta \leq 0.25$$, the Hypertabastic hazard function is monotonically decreasing indicating higher likelihood of failure at early times. For $$0.25 < \beta < 1$$, the Hypertabastic hazard curve first increases with time until it reaches its maximum failure rate and thereafter the failure decreases with time (unimodal). For $$\beta = 1$$, the Hypertabastic hazard function initially increases with time, then it reaches its horizontal asymptote $$\alpha$$. For $$1 < \beta < 2$$, the Hypertabastic hazard function first increases with time with an upward concavity until it reaches its inflection point and subsequently continues to increase with a downward concavity. For $$\beta = 2$$, the Hypertabastic hazard function initially increases with an upward concavity until it reaches its point of inflection, thereafter becoming a linear asymptote with slope $$\alpha$$. For $$\beta > 2$$, the Hypertabastic hazard function increases with an upward concavity. The Hypertabastic cumulative hazard function is


 * $$H(t)= \int_{0}^{t}h(v)dv = -ln(S(t))$$

Hypertabastic proportional hazards model
The hazard function $$h(t|\mathbf{x},\mathbf{\theta})$$ of the Hypertabastic proportional hazards model has the form


 * $$h(t|\mathbf{x},\mathbf{\theta})=h(t)g(\mathbf{\theta}|\mathbf{x})$$,

where $$\mathbf{x}$$ is a p-dimensional vector of explanatory variables and $$\theta$$ is a vector of unknown parameters. The combined effect of explanatory variables $$g(\mathbf{\theta}|\mathbf{x})= e^{-\theta_0 - \sum_{k=1}^{p}{\theta_kx_k}}$$ is a non-negative function of $$x$$ with $$g(\mathbf{\theta}|\mathbf{0})=e^{-\theta_0}$$. The Hypertabastic survival function $$S(t|\mathbf{x},\mathbf{\theta})$$ for the proportional hazards model is defined as:


 * $$S(t|\mathbf{x},\mathbf{\theta})=[S(t)]^{g(\mathbf{\theta}|\mathbf{x})}$$

and the Hypertabastic probability density function for the proportional hazard model is given by


 * $$f(t|\mathbf{x},\mathbf{\theta})=f(t)[S(t)]^{g(\mathbf{\theta}|\mathbf{x})-1}g(\mathbf{\theta}|\mathbf{x})$$.

Depending on the type of censoring, the maximum likelihood function technique along with an appropriate log-likelihood function may be used to estimate the model parameters. If the sample consists of right censored data and the model to use is Hypertabastic proportional hazards model, then, the proportional hazards log-likelihood function is


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}{(ln[{\operatorname{sech}(W(t_i))}]g(\mathbf{\theta}|\mathbf{x}_i)+\delta_i ln{[(\alpha {t_i}^{-1+2 \beta}\operatorname{csch}^{2}{({t_i}^{\beta})}-\alpha {t_i}^{-1+\beta}\operatorname{coth}({t_i}^{\beta}))\operatorname{tanh}(W(t_i))g(\mathbf{\theta}|\mathbf{x}_i)]})}$$.

Hypertabastic accelerated failure time model
When the covariates act multiplicatively on the time-scale, the model is called accelerated failure time model. The Hypertabastic survival function for the accelerated failure time model is given by


 * $$S(t|\mathbf{x},\mathbf{\theta})=S(tg(\mathbf{\theta}|\mathbf{x}))$$.

The Hypertabastic accelerated failure time model has a hazard function $$h(t|\mathbf{x},\mathbf{\theta})$$ of the form


 * $$h(t|\mathbf{x},\mathbf{\theta})=h(tg(\mathbf{\theta}|\mathbf{x}))g(\mathbf{\theta}|\mathbf{x})$$.

The Hypertabastic probability density function for the accelerated failure time model is


 * $$f(t|\mathbf{x},\mathbf{\theta})= f(tg(\mathbf{\theta}|\mathbf{x}))g(\mathbf{\theta}|\mathbf{x})$$.

For the right censored data, the log-likelihood function for the Hypertabastic accelerated failure time model is given by


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}{(ln{[\operatorname{sech}(\frac{\alpha{(Z(t_i))}^{\beta}\operatorname{coth}({Z(t_i)}^{\beta})}{\beta})]}+\delta_i ln{[(\alpha {(Z(t_i))}^{-1+2 \beta}\operatorname{csch}^{2}{[{Z(t_i)}^{\beta}]}-\alpha {Z(t_i)}^{\beta}\operatorname{tanh}(\frac{\alpha (1-{(Z(t_i))}^{\beta}\operatorname{coth}{(Z(t_i))}^\beta)}{\beta}))]}g(\mathbf{\theta}|\mathbf{x}_i))} $$,

where $$Z(t_i) = t_i g(\mathbf{\theta}|\mathbf{x}_i)$$.

A modified chi-squared type test, known as Nikulin-Rao-Robson statistic is used to test the goodness-of-fit for Hypertabastic accelerated failure time models and its comparison with unimodal hazard rate functions. Simulation studies have shown that the Hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution because of its flexible shape of hazard functions. The Hypertabastic distribution is a competitor for statistical modeling when compared with Birnbaum-Saunders and inverse Gaussian distributions

Likelihood functions for survival analysis
Consider a sample of survival times of n individuals $$t_1,t_2,\ldots,t_n$$ with associated p-dimensional covariate vectors $$\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n$$ and an unknown parameter vector $$\mathbf{\theta}=(\theta_0,\theta_1,\ldots,\theta_p)$$. Let $$f(t_i|\mathbf{x}_i,\mathbf{\theta}), F(t_i|\mathbf{x}_i,\theta), S(t_i|\mathbf{x}_i,)$$ and $$h(t_i|\mathbf{x}_i,\mathbf{\theta})$$ stand for the corresponding probability density function, cumulative distribution function, survival function and hazard function respectively. In the absence of censoring (censoring normally occurs when the failure time of some individuals cannot be observed), the likelihood function is


 * $$L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}f(t_i|\mathbf{x}_i,\mathbf{\theta})$$

and the log-likelihood $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})$$ is


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}ln{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]}$$

For the right censored data, the likelihood function is


 * $$L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]^{\delta_i}[S(t_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\delta_i}}$$

or equivalently,


 * $$L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}{[h(t_i|\mathbf{x}_i,\mathbf{\theta})]^{\delta_i}S(t_i|\mathbf{x}_i,\mathbf{\theta})}$$,

and the log-likelihood function is


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}(\delta_i ln{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]}+(1-\delta_i) ln{[S(t_i|\mathbf{x}_i,\mathbf{\theta})]})$$

or equivalently,


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}(\delta_i ln{[h(t_i|\mathbf{x}_i,\mathbf{\theta}]}+ln{[S(t_i|\mathbf{x}_i,\mathbf{\theta})]})$$

where


 * $$\delta_i =

\begin{cases} 0 & t_i \text{is a right censored observation} \\ 1 & otherwise \end{cases} $$,

In the presence of left censored data, the likelihood function is


 * $$L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}[f(t_i|\mathbf{x}_i,\mathbf{\theta})]^{\gamma_i}[F(t_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\gamma_i}$$

and the corresponding log-likelihood function is


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}(\gamma_i ln{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]}+(1-\gamma_i)ln{[F(t_i|\mathbf{x}_i,\mathbf{\theta})]})$$

where


 * $$\gamma_i =

\begin{cases} 0 & t_i \text{is a left censored observation} \\ 1 & otherwise \end{cases} $$,

In the presence of interval censored data, the likelihood function is


 * $$L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}([f(t_i|\mathbf{x}_i,\mathbf{\theta})]^{\xi_i}[F(v_i|\mathbf{x}_i,\mathbf{\theta})-F(u_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\xi_i})$$

and the log-likelihood function is


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}(\xi_i ln{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]}+(1-\xi_i) ln{[F(v_i|\mathbf{x}_i,\mathbf{\theta})-F(u_i|\mathbf{x}_i,\mathbf{\theta})]})$$

where $$u_i \le t_i \le v_i$$ for all interval censored observations and


 * $$\xi_i =

\begin{cases} 0 & t_i \text{is an interval censored observation} \\ 1 & otherwise \end{cases} $$,

If the intended sample consists of all types of censored data (right censored, left censored and interval censored), then its likelihood function takes the following form


 * $$ L(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\prod_{i=1}^{n}([S(t_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\delta_i}[F(t_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\gamma_i}[F(v_i|\mathbf{x}_i,\mathbf{\theta})-F(u_i|\mathbf{x}_i,\mathbf{\theta})]^{1-\xi_i}[f(t_i|\mathbf{x}_i,\mathbf{\theta})]^{\delta_i+\gamma_i+\xi_i-2})$$

and its corresponding log-likelihood function is given by


 * $$LL(\mathbf{\theta},\alpha,\beta:{\mathbf{x}})=\sum_{i=1}^{n}{(1-\delta_i) ln{[S(t_i|\mathbf{x}_i,\mathbf{\theta})]}+(1-\gamma_i)ln{[F(t_i|\mathbf{x}_i,\mathbf{\theta})]}(1-\xi_i)ln{[F(v_i|\mathbf{x}_i,\mathbf{\theta})-F(u_i|\mathbf{x}_i,\mathbf{\theta})]}+ (\delta_i+\gamma_i+\xi_i-2)ln{[f(t_i|\mathbf{x}_i,\mathbf{\theta})]}}$$

Cutaneous or mucosal melanoma
The Hypertabastic Accelerated Failure Time model was used to analyze a total of 27,532 patients regarding the impact of histology on the survival of patients with cutaneous or mucosal melanoma. Understanding patients’ histological subtypes and their failure rate assessment would enable clinicians and healthcare providers to perform individualized treatment, resulting in a lower risk of complication and higher survivability of patients.

Oil field quantities
The quantities of 49 locations of the same area of an oil field was examined to identify its underlying distribution. Using generalized chi-squared, the distribution of oil field quantities was represented by the Hyperbolastic distribution and compared with the lognormal (LN), log-logistic (LL), Birnbaum-Saunders (BS) and inverse Gaussian (IG) distributions.

Remission duration for acute leukemia
The times of remission from clinical trial for acute leukemia of children study were used to analyze the remission duration of acute leukemia data for two groups of patients controlling for log of white blood cell counts. The Hypertabastic accelerated failure time model was used to analyze the remission duration of acute leukemia patient.

Brain tumor study of malignant glioma patients
A randomized clinical trial comparing two chemotherapy regimens for 447 individuals with malignant glioma. A total of 293 patients died within a five-year time period and the median survival time was about 11 months. The overall model fit, in comparison with other parametric distributions, was performed using the generalized chi-square test statistics and proportional hazards model.

Analysis of breast cancer patients
The Hypertabastic proportional hazard model was used to analyze numerous breast cancer data including the survival of breast cancer patients by exploring the role of a metastasis variable in combination with clinical and gene expression variables.

Analysis of hypertensive patients
One hundred five Nigerian patients who were diagnosed with hypertension from January 2013 to July 2018 were included in this study, where death was the event of interest. Six parametric models such as; exponential, Weibull, lognormal, log-logistic, Gompertz and Hypertabastic distributions were fitted to the data using goodness of fit tests such as S.E., AIC, and BIC to determine the best fit model. The parametric models were considered because they are all lifetime distributions. S.E., AIC, and BIC measures were used to compare these parametric models.

Analysis of cortical bone fracture
Stress fractures in older individuals are very important due to the growing number of elderly. Fatigue tests on 23 female bone samples from three individuals were analyzed. Hypertabastic survival and hazard functions of the normalized stress level and age were developed using previously published bone fatigue stress data. The event of interest was the number of cycles until the bone gets microscopic fracture. Furthermore, Hypertabastic proportional hazard models were used to investigate tensile fatigue and cycle-to-fatigue for cortical bone data.

Analysis of unemployment
Hypertabastic survival models have been used in the analysis of unemployment data and its comparison with the cox regression model.

Analysis of kidney carcinoma patients
Using National Cancer Institute data from 1975 to 2016, the impact of histological subtypes on the survival probability of 134,150 kidney carcinoma patients were examined. The study variables were a race/ethnicity, age, sex, tumor grade, type of surgery, geographical location of patient and stage of disease. The Hypertabastic proportional hazards model was used to analyze the survival time of patients diagnosed with kidney carcinoma to explore the effect of histological subtypes on their survival probability and assess the relationship between the histological subtypes, tumor stage, tumor grade, and type of surgery.

Kidney carcinoma SAS example code
Sample code in SAS:

Applications of hypertabastic survival models in bridge engineering
Although survival analysis tools and techniques have been widely used in medical and biomedical applications over the last few decades, their applications to engineering problems have been more sporadic and limited. The probabilistic assessment of service life of a wide variety of engineering systems, from small mechanical components to large bridge structures, can substantially benefit from the well-established survival analysis techniques. Modeling of time-to-event phenomena in engineering applications can be performed under the influence of numerical and categorical covariates using observational or test data. The "survival" of an engineering component or system is synonymous with the more commonly used term "reliability". The term "hazard rate" or "conditional failure rate" (defined as probability of survival per unit time assuming survival up to that time) is an important measure of the change in the rate of failure over time. In this context, failure is defined as reaching the target event in the time-to-event process. This could be defined as reaching a particular serviceability condition state, localized/partial structural failure, or global/catastrophic failure applied the Hypertabastic parametric accelerated failure time survival model to develop probabilistic models of bridge deck service life for Wisconsin. Bridge decks are typically concrete slabs on which traffic rides as seen in the Marquette Interchange bridge. The authors used the National Bridge Inventory (NBI) dataset to obtain the needed data for their study. NBI records include discrete numerical ratings for bridge decks (and other bridge components) as well as other basic information such as Average Daily Traffic (ADT) and deck surface area (obtained by multiplying the provided bridge length with bridge deck width). The numerical ratings range from 0 to 9 with 9 corresponding to brand new condition and 0 being complete failure. A deck condition rating of 5 was selected as the effective end of service life of bridge deck. The numerical covariates used were the ADT and deck surface area, while the categorical covariate was the superstructure material (structural steel or concrete).

The hypertabastic Proportional Hazards and Accelerated Failure Time models are useful techniques in analyzing bridge-related structures due to its flexibility of hazard curves, which can be monotonically increasing or decreasing with upward or downward concavity. It can also take the shape of a single mound curve. This flexibility in modeling various hazard shapes makes the model suitable for a wide variety of engineering problems.

Tabatabai et al. extended the Hypertabastic bridge deck models developed for Wisconsin bridges to bridges in six northern US states and then to all 50 US states. The study of bridge decks in all 50 states indicated important differences in reliability of bridge decks in different states and regions. Stevens et al. discuss the importance of survival analyses in identifying key bridge performance indicators and discuss the use of Hypertabastic survival models for bridges. and Nabizadeh et al. further extended the use of Hypertabastic survival models to bridge superstructures. The covariates used were ADT, maximum bridge span length and superstructure type. The survival function can be used to determine the expected life using the following equation (area under the entire survival curve)


 * $${EL}_0=\int_{0}^{\infty}S(t)dt$$

It is important to note that both the survival function and the expected life would change as the time passes by. The conditional survival function $$C_S$$ is a function of time $$t$$ and survival time $$t_s$$ and is defined as
 * $$CS(t,t_s) =

\begin{cases} 1 & 0 \le t \le t_s \\ \frac{S(t)}{S(t_s)} & t > t_s \end{cases} $$,

Nabizadeh et al. used the Hypertabastic survival functions developed for Wisconsin to analyze conditional survival functions and conditional expected service lives $$(EL_c(t_s))$$


 * $${EL}_c(t_s)=\int_{0}^{\infty}CS(t)dt=t_s+\int_{t_s}^{\infty}{CS(t)dt=t_s+\int_{t_s}^{\infty}\frac{S(t)}{S(t_s)}}dt$$

The conditional expected life would continue to increase as the survival time $$t_s$$ increases. Nabizadeh et al. term this additional expected life as "survival dividend.”

An important mode of failure in bridge engineering is metal fatigue, which can result from repetitive applications of stress cycles to various details and connections in the structure. As the number of cycles $$(n_c)$$ increase, the probability of fatigue failure increases. An important factor in fatigue life $$(N_c)$$ is the stress range (Sr)(maximum minus minimum stress in a cycle). The probabilistic engineering fatigue problem can be treated as a "time"-to-event survival analysis problem if the number of cycles $$(n_c)$$ is treated as a fictitious time variable $$(t)$$

This would facilitate the application of well-established survival analysis techniques to engineering fatigue problems and Tabatabai et al. The survival function $$S(n_c)$$, probability density function $$f(n_c)$$, hazard rate $$h(n_c)$$, and cumulative probability of failure $$F(n_c)$$ can then be defined as


 * $$S(n_c)=P(N_c>n_c)=1-F(n_c)$$
 * $$f(n_c)=\lim_{\delta n_c \to 0} \frac{P(n_cn_c)}{\delta n_c}$$

The hypertabastic accelerated failure time model was used to analyze probabilistic fatigue life for various detailed categories in steel bridges.