User:AnonymousFish

Introduction to multifractals
Multifractal theory generalises and extends the fractal sets and processes which were introduced in the previous section. While fractals are characterized primarily by a single number, the fractal dimension, multifractals are characterized primarily by a function.

Multifractals have proven useful in numerous applications across many fields of study, including seismology, network traffic modelling, astronomy, geology, fluid dynamics and meteorology. They are typically used to represent the distribution of physical quantities, such as the distribution of minerals in the earth's crust or the distribution of energy in turbulent flow. More recently multifractal processes have been applied to the field of finance.

A complete understanding of multifractal theory requires the reader to be familiar with several advanced mathematical concepts; in particular measure theory. In this discussion we will keep the measure theoretic concepts to a minimum and focus more on providing a general intuition about what multifractal processes are, and why they are relevant to finance.

It is convenient to group multifractals into three categories: We will start by briefly focussing on the theory behind deterministic and random multifractals, before moving the discussion onto multifractal processes.
 * deterministic multifractal measures
 * random multifractal measures
 * multifractal processes

Multifractal measures
Fractals are defined as sets for which the Hausdorff (or fractal) dimension is greater than the standard Euclidean dimension. The key property of a set is that elements either belong to the set, or don't belong to the set.

Multifractals are conceptually quite different, in that they are not sets. Instead imagine that we have a set, and for each element of the set we assign a number or weight. If we assign weight or mass to the set in certain ways the distribution of the weight is said to be a multifractal measure. The set that we assign weight to does not have to be fractal set; more often then not we will be assigning weight to a smooth set, such as the real number line between 0 and 1 [0,1]. It is the way in which the weight is assigned that produces the multifractal characteristics.

Constructing a simple multifractal measure
A simple multifractal can be constructed using a recusive procedure, similiar to the way we constructed simple fractals.

We will start with a very simple set, the real number line between 0 and 1, and will apply a weight to each element of the set. Initially the weight is assigned evenly across the set. The first step involves partitioning the set into two equal halves and then modifying the weight attached to each half in such a way that the overall average weight across [0,1] remains the same. In this example we will modify the weight associated with sub-region [0,0.5] by multiplying it by 0.7, and will multiply the subregion [0.5,1] by 1.3 (this ensures that the average weight is still equal to one). This is illustrated in the first plot in figure 1.

For the second iteration we split each subsection into two equal halves again, and multiply the left sub-subsection by 0.7, and the right side by 1.3. This can be seen in the second plot in figure 1. This simple process of splitting the x-axis into smaller and smaller pieces, and multiplying the weight according to this simple rule, is repeated again and again. The 3rd, 4th and 10th interation are shown in the remaining panels in figure 1. If this process is repeated infinitely, the distribution of the weight becomes a multifractal measure. (Note that this particular method is known as the binomial multifractal measure).

Fractals vs multifractal measures
Consider a very simple fractal, the Cantor set, which is illustrated in figure 2. The Cantor set is constructed by taking a line (say [0,1] on the real number line), splitting it into three equal sections and then removing the middle section. This process is then repeated for each subsection, until the limit is reached. In Euclidean terms, Cantor dust has a dimension of 0. However the Hausdorf (fractal) dimension is $$log(2)/log(3)$$. If we were to alter the procedure which is used to construct the Cantor set, splitting it into unequal parts for example, we would end up with a fractal set that had a different dimension.

If we now look at the binomial measure after 10 iterations, it can be seen that the measure takes only a number of discrete values. Figure 3 illustrates this by drawing lines across the graph which coincide with some of the values that the measure $$\mu(x)$$ takes. For each red horizontal line, the points at which the measure coincide with the line constitute a set, which is quite similar in appearance to the Cantor set. If we were to calculate the fractal dimension of each of these sets, we would find that it was different for each line. The higher and lower lines (higher and lower values of $$\mu(x)$$) have lower fractal dimensions, and the lines in the middle tend to have a higher fractal dimension.

As the number of iterations of the binomial measure increase, $$\mu(x)$$ takes on more values and therefore produces more sets of values that have differing fractal dimension. As we approach the limit number of iterations, we end up with a full spectrum of sets with differing fractal dimension, hence the term multifractal.

Random multifractal measures
We have seen how to construct a deterministic multifractal measure using a binomial iteration. In order to randomize the construction, we use the exact same procedure except now we flip a coin and if we get a head we apply the weight 0.7 to the left-hand subset, and if we get a tail we multiply the right-hand subset by 0.7 instead. Figure (4) shows several iterations in the construction of a randomized binomial measure.

The random multifractal measures share the same properties as the deterministic one, but offer a much more realistic description of many real world phenomena, such as geographic distribution of minerals in the earth crust.

Properties of multifractal measures
The most important property of the multifractal measure is that the moments of measure $$\mu(x)$$ scale in the following way


 * $$ E(\mu[x,x+\Delta x]^q) \ \sim \ c(q)(\Delta x)^{\tau(q)+1} \ as \ \Delta x \to 0,$$

A multifractal measure is also singular everywhere, but not discontinuous.

Multifractal processes
Introduced by Calvet, Fisher and Mandelbrot (1997), multifractal processes combine the moment scaling properties of multifractal measures with the continuous time diffusion properties of stochastic processes.

Multifractal processes are constructed by sub-ordintaing a Brownian motion with a multifractal measure. More simply, a multifractal process can be thought of as a stochastic volatility model, where the volatility process is described by a multifractal measure.

Properties of multifractal processes
Multifractal processes possess several important features which differentiate them from other classes of stochastic process.
 * Flexible moment scaling behaviour
 * Finite variance method
 * Semi-martingale
 * Strong long range variance autocorrelation
 * High skewness and kurtosis

Mathematical definition
A multifractal process is defined as a process which has stationary increments and satisfies the moment scaling rule:


 * $$E(|X(t+\Delta t)-X(t)|^q )\ \sim c_X(q)(\Delta t)^{\tau_X (q)+1}$$

as $$\Delta t$$ converges to zero, where $$q$$ represents the moment of interest. The function $$\tau_X(q)$$ is called the scaling function, and is weakly concave.

Multifractal models
Multifractal diffusions were introduced to the literature in a series of papers by Calvet, Fisher and Mandelbrot (1997). In these papers, a general method for constructing multifractal processes was described, and a special case of the method, the Multifractal Model of Asset Returns (MMAR), was studied. The MMAR confirmed the.

A host of new multifractal models have subsequently been developed, which draw on the strengths of the MMAR, and overcome some of the short comings. Two notable models include the Multifractal Random Walk (MRW) and the Markov Switching Multifractal. Both of these models will be explored in more detail in future articles.

Forecasting Volatility with the MSM model
The Markov switching multifractal (MSM) is a model of asset returns which is able to capture many of the important stylized features of the data, including long-memory in volatility, volatility clustering, and return outliers. The model delivers strong performance both in- and out-of-sample, and is relatively simple to implement. As such, the MSM model can be applied to a wide range of financial time-series applications as an alternative to GARCH-type models. In this tutorial we will look at how to use the simplest Binomial-MSM model to make forecasts about asset return volatility. The tutorial makes use of the maximum likelihood estimation method to calibrate the model. Details about how to calculate the likelihood function are given in Calvet and Fisher (2004, 2008), and a MATLAB implementation, the ‘MSM MLE toolkit’, can be downloaded from multifractal-finance.com under an academic license.

Model overview
The MSM model is built around the concept that asset return volatility can be modelled using a set of kbar components which are multiplied together to give the total volatility level. Each component is a stochastic process that switches randomly between two values and they differ only in the frequency at which they switch. In continuous time, the switching frequency of component $$k$$ is given by the formula $$\gamma_k = \gamma_1 b^{k-1}$$, where $$\gamma_1$$ is the frequency of the lowest frequency component, and $$b$$ is a parameter which controls the spacing between each frequency. A more detailed explanation of the MSM model can be found here.

Understanding the parameters
We are free to choose the number of frequency components we want to use in the MSM model. In practise, the optimal number depends upon the length and type of data we are modelling, and this is discussed in more detail in following section. Irrespective of the number of components, the model is specified by 4 parameters.

State-space
In the MSM model, the volatility level at any given moment is calculated by multiplying together all of the frequency components, each of which take either a high or low value. Since each component can take one of two values, this means there are $$2^\bar{k}$$ different possible combinations of multiplier values. We call each combination a state. Many of the different states will produce the same volatility levels, due to the multiplication, but each state will have a unique probability of switching to a different state, based on the probability of each individual component switching. This is illustrated in table 1. Note that there is no correct order for listing the states, but it is convenient to order the states in the manner shown, as this has a natural binary interpretation.

Table 1. All of the different possible volatility states in the MSM(3) model.

Maximum likelihood estimation
If we assume that asset return observations have been generated by an MSM process, we can use these observations to estimate the model parameters. The likelihood function for the MSM model is available in closed-form, and means that we can use maximum likelihood estimation to obtain estimates of the parameters. The maximum likelihood estimator, if available, is generally the preferred method for estimating a parametric statistical model. This is mainly due to the MLEs attractive asymptotic properties: it is a consistent estimator, asymptotically normal, and efficient. Only when the likelihood function is unavailable, or is computationally inefficient to calculate would we turn to other methods to estimate the model. The method for calculating the likelihood function is detailed in Calvet & Fisher (2004, 2008). The likelihood function must then be optimized numerically in order to find the parameter values that maximise the function.

Testing
The best way to test whether a likelihood function implementation is working correctly is to generate a set of simulated data using the MSM model with a particular set of parameter values. We can then perform MLE on the simulated data, and check that the estimated parameters are close to the true parameter values. (They will not be exact since we are using a finite set of data).

MLE techniques with the MSM model
Although the likelihood function for the MSM model is available in closed-form, when we have a large number of frequency components the analytical calculations can become slow. It is therefore often more efficient to start estimating parameter values on the data using a low number of frequency components. Due to the speed of the algorithm, we can perform a rigorous global optimization over the whole parameter space, checking that different starting points converge to the same optimum. When an estimate of the parameter values has been obtained, we can increase the number of components, and use the previously estimated parameter values as the starting values for the new maximisation. These starting values are likely to be quite close to the global optimum, and therefore this reduces the time spent optimizing. For example, imagine that we have a set of equity return data. We start by choosing a set of starting parameter values and then estimate the MSM model with 1 frequency component (denoted as MSM(1)) to obtain a set of estimated parameter values $$\bar{\theta}_1$$. We repeat this exercise using several different starting values and check to make sure that we obtain the same estimated values. (If we don’t, we should use the set of values that produce the highest likelihood value.) We then use the estimated values from the MSM(1), as starting values for MSM(2). The estimation for MSM(2) should now be quicker, as the starting parameters are likely to be closer to the global optimum. Table 2. Maximum likelihood estimates on S&P 500 daily returns from 03/01/1985 to 31/12/2004.

Local minima
When performing any sort of numerical optimization, there is always a risk that the any maxima will be a local maxima, and not the global maxima. The problem of finding local maxima depends on the nature of the objective function (how non-linear the function is), and on the optimization algorithm used. Gradient based optimizers tend to be more prone to finding local maxima than global optimizers, but they are normally much quicker as well. For many applications, gradient based optimizers appear to work well with the MSM model. One way of checking for local maxima is to try several different sets of starting value, and check that they all converge to the same answer. Simulation and Forecasting The MSM model is a martingale model of asset returns: it assumes that the market is efficient, and that today’s price is the expected value of tomorrow’s price. This means that the model cannot be used to predict future prices or returns. However, the model can be used to predict/forecast the volatility of future returns. It is quite simple to extend the likelihood function algorithm of the MSM model to generate volatility forecasts across any horizon. The likelihood function implementation recursively calculates a filtered state vector at each time step. Each element $$i$$ of the filtered state vector is defined as $$\mathbb{P}(M_t=m^i|\phi_t)$$ where $$\phi_t = r_1,...,r_t$$, that is the probability of being in state $$i$$ conditional on all the available information about what state the model is in. This filtered state vector is very similar in form to the forecasted vector that we need. Each element $$i$$ of the forecast vector is defined as $$\mathbb{P}(M_{t+h}=m^i|\phi_t)$$, where $$h$$ is the number of days ahead to forecast. Using the Markov property of the latent states, we can use the transition matrix along with the filtered state vector to generate the forecast vector.
 * $$\mathbf{A}^h*\Pi_t,$$

where A is the one day ahead transition matrix. Now we have the forecasted probability of being in each state conditional upon information up to time t. We can convert this probability distribution into a forecasted volatility by multiplying each forecast element by the state volatility value, and then summing, i.e. taking the expected value of $$V_{t+h|t}$$.

Example
We start with our set of 5000 returns (so T=5000), and perform MLE to generate parameter estimates for the data. We then use these parameter estimates and run the likelihood function again, this time to generate the filtered state vector for each day of the data. We then take the filtered state vector for date T, and use equation 1 to generate our forecasted state vector at date T+h. Finally we convert this vector in to a volatility forecast by taking the dot product of the forecast vector, with the vector of volatility values.