User:Neg99/sandbox

Singular spectrum analysis (SSA) combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. The origin of SSA lies in Principal component analysis, the classical Karhunen–Loeve theorem, which provides spectral decomposition of time series and random fields, and in the embedding theorem.

The areas where SSA can be applied are very broad: climatology, marine science, geophysics, engineering, image processing, medicine, econometrics among them. Hence different modifications of SSA have been proposed and different methodologies of SSA are used in practical applications. The two main directions are SSA for general purpose (Golyandina et all, 2011) such as trend extraction, periodicity detection, seasonal adjustment, smoothing, noise reduction and SSA for spectral analysis of stationary time series (Vautard and Ghil, 1989).

SSA as a model-free tool (Basic SSA)
SSA can be used as a model-free technique so that it can be applied to arbitrary time series including non-stationary time series. The basic aim of SSA is to decompose the time series into the sum of interpretable components such as trend, periodic components and noise with no a-priori assumptions about the parametric form of these components.

Consider a real-valued time series $$\mathbb{X}=(x_1,\ldots,x_{N})$$ of length $$N$$. Let $$L$$ $$\ (1<L < N)$$ be some integer called the window length and $$K=N-L+1$$.

Main algorithm of SSA
1st step: Embedding.

Form the trajectory matrix of the series $$\mathbb{X}$$, which is the $$L\!\times\! K$$ matrix



\mathbf{X}=[X_1:\ldots:X_K]=(x_{ij})_{i,j=1}^{L,K}= \begin{bmatrix} x_1&x_2&x_3&\ldots&x_{K}\\ x_2&x_3&x_4&\ldots&x_{K+1}\\ x_3&x_4&x_5&\ldots&x_{K+2}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ x_{L}&x_{L+1}&x_{L+2}&\ldots&x_{N}\\ \end{bmatrix} $$

where $$ X_i=(x_{i},\ldots,x_{i+L-1})^\mathrm{T} \; \quad (1\leq i\leq K) $$ are lagged vectors of size $$L$$. The matrix $$\mathbf{X}$$ is Hankel which means that $$\mathbf{X}$$ has equal elements $$x_{ij}$$ on the anti-diagonals $$i+j =\,{\rm const}$$.

2nd step: Singular value decomposition (SVD).

Perform the singular value decomposition (SVD) of the trajectory matrix $$\mathbf{X}$$. Set $$\mathbf{S}=\mathbf{X} \mathbf{X}^\mathrm{T}$$ and denote by $$\lambda_1, \ldots,\lambda_L$$ the eigenvalues of $$\mathbf{S}$$ taken in the decreasing order of magnitude ($$\lambda_1\geq \ldots \geq \lambda_L\geq 0$$) and by $$U_1,\ldots,U_L$$ the orthonormal system of the eigenvectors of the matrix $$\mathbf{S}$$ corresponding to these eigenvalues.

Set $$d= \mathop{\mathrm{rank}} \mathbf{X} =\max\{i, \mbox{such that} \lambda_i >0\}$$ (note that $$d=L$$ for a typical real-life series) and $$V_i=\mathbf{X}^\mathrm{T} U_i/\sqrt{\lambda_i}$$ $$(i=1,\ldots,d)$$. In this notation, the SVD of the trajectory matrix $$\mathbf{X}$$ can be written as



\mathbf{X} = \mathbf{X}_1 + \ldots + \mathbf{X}_d, $$

where the matrices $$\mathbf{X}_i=\sqrt{\lambda_i}U_i V_i^\mathrm{T}$$ have rank 1 and are called elementary matrices. The collection $$(\sqrt{\lambda_i},U_i,V_i)$$ will be called $$i$$th eigentriple (abbreviated as ET) of the SVD. Vectors $$U_i$$ are the left singular vectors of the matrix $$\mathbf{X}$$, numbers $$\sqrt{\lambda_i}$$ are called the singular values and provide the singular spectrum of $$\mathbf{X}$$; this gives the name to SSA. Vectors $$\sqrt{\lambda_i}V_i=\mathbf{X}^\mathrm{T} U_i$$ are called vectors of principal components (PCs).

3rd step: Eigentriple grouping.

Partition the set of indices $$\{1,\ldots,d\}$$ into $$m$$ disjoint subsets $$I_1,\ldots,I_m$$.

Let $$I=\{i_1,\ldots,i_p\}$$. Then the resultant matrix $$\mathbf{X}_I$$ corresponding to the group $$I$$ is defined as $$\mathbf{X}_I=\mathbf{X}_{i_1}+\ldots+\mathbf{X}_{i_p}$$. The resultant matrices are computed for the groups $$I=I_1, \ldots, I_m$$ and the grouped SVD expansion of $$\mathbf{X}$$ can now be written as



\mathbf{X}=\mathbf{X}_{I_1}+\ldots+\mathbf{X}_{I_m}. $$

4th step: Diagonal averaging.

Each matrix $$\mathbf{X}_{I_j}$$ of the grouped decomposition is hankelized and then the obtained Hankel matrix is transformed into a new series of length $$N$$ using the one-to-one correspondence between Hankel matrices and time series. Diagonal averaging applied to a resultant matrix $$\mathbf{X}_{I_k}$$ produces a reconstructed series $$\widetilde{\mathbb{X}}^{(k)}=(\widetilde{x}^{(k)}_1,\ldots,\widetilde{x}^{(k)}_N)$$. In this way, the initial series $$x_1,\ldots,x_N$$ is decomposed into a sum of $$m$$ reconstructed subseries:



x_n = \sum\limits_{k=1}^m \widetilde{x}^{(k)}_n \ \ (n=1,2, \ldots, N). $$

This decomposition is the main result of the SSA algorithm. The decomposition is meaningful if each reconstructed subseries could be classified as a part of either trend or some periodic component or noise.

Theory of SSA
The two main questions which the theory of SSA attempts to answer are: (a) what time series components can be separated by SSA, and (b) how to choose the window length $$L$$ and make proper grouping for extraction of a desirable component. Many theoretical results can be found in Golyandina et al (2001, Ch. 1 and 6).

Trend (which is defined as a slowly varying component of the time series), periodic components and noise are asymptotically separable as $$N\rightarrow \infty$$. In practice $$N$$ is fixed and one is interested in approximate separability between time series components. A number of indicators of approximate separability can be used, see Golyandina et al (2001, Ch. 1). The window length $$L$$ determines the resolution of the method: larger values of $$L$$ provide more refined decomposition into elementary components and therefore better separability. The window length $$L$$ determines the longest periodicity captured by SSA. Trends can be extracted by grouping of eigentriples with slowly varying eigenvectors. A sinusoid with frequency smaller than 0.5 produces two approximately equal eigenvalues and two sine-wave eigenvectors with the same frequencies and $$\pi/2$$-shifted phases.

Separation of two time series components can be considered as extraction of one component in the presence of perturbation by the other component. SSA perturbation theory is developed in Nekrutkin (2010).

SSA for stationary time series
For stationary time series, SSA can be considered as a nonparametric spectral estimation method. It has some specifics including different terminology and methodology of application. In particular, centering is conventionally used as preprocessing before application of SSA to stationary time series.

The main difference in the algorithm is related to the use of the lag-covariance matrix $$\mathbf{C}$$ of $$\mathbb{X}$$ instead of $$\mathbf{S}$$ to obtain spectral information on the time series, assumed to be stationary in the weak sense. The matrix $${\mathbf{C}}$$ can be estimated directly from the data as a Toeplitz matrix with constant diagonals (Vautard and Ghil, 1989):



c_{ij} = \frac{1}{N-|i-j|} \sum_{t=1}^{N-|i-j|} x_t x_{t+|i-j|}. $$

The $$L$$ eigenvectors $$U_k$$ of the lag-covariance matrix $$\mathbf{C}$$ are called temporal empirical orthogonal functions (EOFs).

The use of the matrix $$\mathbf{S}$$ in Basic SSA was introduced in Broomhead and King (1986a,1986b) and therefore Basic SSA is sometimes called `BK-SSA’, The use of matrix $$\mathbf{C}$$ was introduced in Vautard and Ghil (1987) what gives the name `VG-SSA’ (another name for this version of SSA is ‘Toeplitz SSA’ (Golyandina et al, 2001, Sect. 1.7)).

Signal-to-noise separation can be obtained by merely inspecting the slope break in a &quot;scree diagram&quot; of eigenvalues $$\lambda_k$$ or singular values $$\lambda^{1/2}_k$$ vs. $$k$$. The point $$k^*$$ at which this break occurs should not be confused with a ``dimension&quot; of the underlying deterministic dynamics (Vautard and Ghil, 1989).

A Monte-Carlo test (Allen and Robertson, 1996; Allen and Smith, 1996) can be applied to ascertain the statistical significance of the oscillatory pairs detected by SSA (usually `VG’ version of SSA), in presence of red noise.

Forecasting by SSA
If for some series $$\mathbb{X}$$ the SVD step in Basic SSA gives $$d<L$$, then this series is called time series of rank $$d$$ (Golyandina et al, 2001, Ch.5). The subspace spanned by the $$d$$ leading eigenvectors is called signal subspace. This subspace is used for estimating the signal parameters in signal processing, e.g. ESPRIT for high-resolution frequency estimation. Also, this subspace determines the linear homogeneous recurrence relation (LRR) governing the series, which can be used for forecasting. Continuation of the series by the LRR is similar to forward linear prediction in signal processing.

Let the series be governed by the minimal LRR $$x_{n}=\sum_{k=1}^d b_k x_{n-k}$$. Let us choose $$L>d$$, $$U_1,\ldots,U_d$$ be the eigenvectors (left singular vectors of the $$L$$-trajectory matrix), which are provided by the SVD step of SSA. Then this series is governed by an LRR $$x_{n}=\sum_{k=1}^{L-1} a_k x_{n-k}$$, where $$(a_{L-1},\ldots,a_1)^\mathrm{T}$$ are expressed through $$U_1,\ldots,U_d$$ (Golyandina et al, 2001, Ch.5), and can be continued by the same LRR.

This provides the basis for SSA recurrent and vector forecasting algorithms (Golyandina et al, 2001, Ch.2). In practice, the signal is corrupted by a perturbation, e.g., by noise, and its subspace is estimated by SSA approximately. Thus, SSA forecasting can be applied for forecasting of a time series component that is approximately governed by an LRR and is approximately separated from the residual.

Multivariate extension
Multi-channel SSA (or M-SSA) is a natural extension of SSA to for analyzing multivariate time series, where the size of different univariate series does not have to be the same. The trajectory matrix of multi-channel time series consists of stacked trajectory matrices of separate times series. The rest of the algorithm is the same as in the univariate case. System of series can be forecasted analogously to SSA recurrent and vector algorithms (Golyandina and Stepanov, 2005).

In the meteorological literature, extended EOF (EEOF) analysis is often assumed to be synonymous with M-SSA. The two methods are both extensions of classical principal component analysis (PCA) but they differ in emphasis: EEOF analysis typically utilizes a number $$S$$ of spatial channels much greater than the number $$L$$ of temporal lags, thus limiting the temporal and spectral information. In M-SSA, on the other hand, one usually chooses $$L \geq S$$. Often M-SSA is applied to a few leading PCs of the spatial data, with $$L$$ chosen large enough to extract detailed temporal and spectral information from the multivariate time series (Ghil et al., 2002).

Other multivariate extension is 2D-SSA that can be applied to two-dimensional data like digital images (Golyandina and Usevich, 2010). The analogue of trajectory matrix is constructed by moving 2D windows of size $$L_x \times L_y$$.

Spatio-temporal gap filling
The gap-filling versions of SSA can be used to analyze data sets that are unevenly sampled or contain missing data (Schoellhamer, 2001; Kondrashov and Ghil, 2006; Golyandina and Osipov, 2007).

Schoellhamer (2001) shows that the straightforward idea to formally calculate approximate inner products omitting unknown terms is workable for long stationary time series.

In the algorithm described in Kondrashov and Ghil (2006) the estimates of missing data points are produced iteratively, and are then used to compute a self-consistent lag-covariance matrix $${\textbf C}_X$$ and its EOFs $${\textbf U}_k$$; moreover, cross-validation is used to optimize the window length $$L$$ and the number of leading SSA modes to fill the gaps with the iteratively estimated &quot;signal&quot;, while the noise is discarded. For a stationary univariate time series, the SSA gap filling procedure utilizes temporal correlations to fill in the missing points. For a stationary multivariate data set, gap filling by M-SSA takes advantage of both spatial and temporal correlations.

Golyandina and Osipov (2007) uses the idea of filling in missing entries in vectors taken from the given subspace. The recurrent and vector SSA forecasting can be considered as particular cases of filling in algorithms described in the paper.

Detection of structural changes
SSA can be effectively used as a non-parametric method of time series monitoring and change detection. To do that, SSA performs the subspace tracking in the following way. SSA is applied sequentially to the initial parts of the series, constructs the corresponding signal subspaces and checks the distances between these subspaces and the lagged vectors formed from the few most recent observations. If these distances become too large, a structural change is suspected to have occurred in the series (Golyandina et al, 2001, Ch.3; Moskvina and Zhigljavsky, 2003).

In this way, SSA could be used for change detection not only in trends but also in the variability of the series, in the mechanism that determines dependence between different series and even in the noise structure. The method have proved to be useful in different engineering problems (e.g. Mohammad and Nishida (2011) in robotics).

Relation between SSA and other methods
SSA and Autoregression. Typical model for SSA is $$x_n=s_n+e_n$$, where $$s_n=\sum_{k=1}^r a_k s_{n-k}$$ (signal satisfying an LRR) and $$e_n$$ is noise. The model of AR is $$x_n=\sum_{k=1}^r a_k x_{n-k}+ e_n$$. Despite these two models look similar they are very different. SSA considers AR as a noise component only. AR(1), which is red noise, is typical model of noise for Monte-Carlo SSA (Allen and Smith,1996 ).

SSA and spectral Fourier analysis. In contrast with Fourier analysis with fixed basis of sine and cosine functions, SSA use an adaptive basis generated by the time series itself. As a result, the underlying model in SSA is more general and SSA can extract amplitude-modulated sine wave components with frequencies different from $$k/N$$. SSA-related methods like ESPRIT can estimate frequencies with higher resolution than spectral Fourier analysis.

SSA and linear recurrence relations. Let the signal be modeled by a series, which satisfies a linear recurrence relation $$s_{n}=\sum_{k=1}^r a_k s_{n-k}$$; that is, a series that can be represented as sums of products of exponential, polynomial and sine wave functions. This includes the sum of dumped sinusoids model whose complex-valued form is $$s_n=\sum_k C_k \rho_k^n e^{i 2\pi \omega_k n}$$. SSA-related methods allow estimation of frequencies $$\omega_k$$ and exponential factors $$\rho_k$$ (Golyandina and Zhigljavsky, 2013, Sect 2.8). Coefficients $$C_k$$ can be estimated by the least squares method. Extension of the model, where $$C_k$$ are replaced by polynomials of $$n$$, can be also considered within the SSA-related methods (Badeau et al, 2008).

SSA and signal subspace methods. SSA can be considered as a subspace-based method, since it allows estimation of the signal subspace of dimension $$r$$ by $$\mathop{\mathrm{span}}(U_1,\ldots,U_r)$$.

SSA and State space models. The main model behind SSA is $$x_n=s_n+e_n$$, where $$s_n=\sum_{k=1}^r a_k s_{n-k}$$ and $$e_n$$ is noise. Formally, this model belongs to the general class of state space models. The specifics of SSA is in the facts that parameter estimation is a problem of secondary importance in SSA and the data analysis procedures in SSA are nonlinear as they are based on the SVD of either trajectory or lag-covariance matrix.

SSA and Independent Component Analysis (ICA). SSA is used in blind source separation by ICA as a preprocessing step (Pietilä et al, 2006). On the other hand, ICA can be used as a replacement of the SVD step in the SSA algorithm for achieving better separability (Golyandina and Zhigljavsky, 2013, Sect. 1.5.4).

SSA and regression. SSA is able to extract polynomial and exponential trends. However, unlike regression, SSA does not assume any parametric model which may give significant advantage when an exploratory data analysis is performed with no obvious model in hand (Golyandina et al, 2001, Ch.1).

SSA and linear filters. The reconstruction of the series by SSA can be considered as adaptive linear filtration. If the window length $$L$$ is small, then each eigenvector $$U_i=(u_1, \ldots, u_L)^\mathrm{T}$$ generates a linear filter of width $$2L-1$$ for reconstruction of the middle of the series $$\widetilde{x}_s$$, $$L\le s\le K$$. The filtration is non-causal. However, the so-called Last-point SSA can be used as a causal filter (Golyandina and Zhigljavsky 2013, Sect. 2.9).

SSA and density estimation. Since SSA can be used as a method of data smoothing it can be used as a method of non-parametric density estimation (Golyandina et al, 2012).

Brief history
The first publication, which can be considered as one of the origins of SSA (and more generally of the subspace-based methods of signal processing), can be traced back to the eighteenth century (Prony's method).

Broomhead and King (1986a, b) and Fraedrich (1986) proposed to use SSA and M-SSA in the context of nonlinear dynamics for the purpose of reconstructing the attractor of a system from measured time series. These authors provided an extension and a more robust application of the idea of reconstructing dynamics from a single time series based on the embedding theorem.

Ghil, Vautard and their colleagues (Vautard and Ghil, 1989; Ghil and Vautard, 1991; Vautard et al., 1992) noticed the analogy between the trajectory matrix of Broomhead and King, on the one hand, and the Karhunen–Loeve decomposition (Principal component analysis in the time domain), on the other. Thus, SSA can be used as a time-and-frequency domain method for time series analysis — independently from attractor reconstruction and including cases in which the latter may fail.

Note also the so-called ‘Caterpillar’ methodology, see Danilov and Zhigljavsky (Eds) (1997) and Golyandina et al (2001). This methodology is a version of SSA that was developed in the former Soviet Union independently of the main-stream SSA. The main difference between the main-stream SSA and the ‘Caterpillar-SSA’ is in the emphasis of the study of SSA properties. The main concept behind theoretical studies and methodological principles of ‘Caterpillar-SSA’ is the concept of separability. Careful consideration of separability requirements the ‘Caterpillar-SSA’ leads, for example, to specific recommendations concerning the choice of SSA parameters.

At present, there are several dozen papers dealing with methodological aspects of SSA and many more with applications of SSA. A very basic introduction to SSA can be found in Elsner and Tsonis (1996). More advanced texts are the monograph Golyandina et al. (2001), the survey Ghil et al. (2002), a recent special issue of ‘Statistics and Its Interface’ (Zhigljavsky, 2010, Ed.) and the recent book Golyandina and Zhigljavsky (2013).