Kalman Filtering for the Heston model with Matlab code, Part 2

By on March 23rd, 2015

In this post I will go into more detail on the application of the Kalman filter to the Heston model specifically. If you missed the first part of this post series and you are not familiar with the Kalman filter, you might benefit from reading this blog post.

Now, in a stochastic volatility setting the Kalman filter is used to estimate the parameters of the model in question. The idea of course is that the volatility of the process is latent and only observable through the stock price process, which contains extra noise.

Consider the Heston model:


Explicitly this means we are trying to find and which are unobservable (generally you should be able to estimate separately, if not then only have to include that as well in the optimization routine as well).

First let us define

and using the Cholesky decomposition

where such that:

With some abuse of notation (think of the changes as small, discrete incremental time-steps, not as the short-hand notation for the stochastic integral):

which is inserted into the variance-equation:

And that’s really it. These functions can now be used directly in the algorithm developed in the first part of this blog series.

Below I provide example code written in Matlab to perform exactly this algorithm (press the + to see the code):

%% Unscented Kalman Filter for the Heston Model
% Takes one vector of inputs:
% parameters = [kappa, theta, eta, rho];
% and the stock drift
% plus a vector of log-stock prices, logS

%mu = stock drift
%kappa = reversion speed
%theta = long run variance
%eta = volatility of sqrt(var) in varianace equation
%rho = correlation

kappa = parameters(1);
theta = parameters(2);
eta = parameters(3);
rho = parameters(4);

n = length(logS);   %Length of stock-price vector

L = 3;              %Dimension

%Default values
alpha = 0.001;      %Default
k = 3-L;            %Deafult
beta = 2;           %Optimal for Gaussian dist
eps = 0.00001;
lambda = alpha^2*(L+k)-L;

X = ones(1,7)*9999;
Xa = ones(3,7)*9999;
Y = ones(1,7)*9999;
estimates = ones(1,n)*9999999;
u(1) = 0;
v(1) = 1;
vol = 0;

estimates(1) = logS(1) + eps;
estimates(2) = logS(1) + eps;

%%Initial parameter guesses
x = 0.1;
xa = [x, zeros(1,2)];
Pa = diag(ones(1,3));
Pa(1,1) = 0.1;

Wm = [lambda/(L+lambda), 0.5/(L+lambda)+zeros(1,2*L)];      %Mean weights

Wc(1)= Wc(1) + (1-alpha^2+beta);                            %Covariance weights

for t = 2:n-1 

    vol(t-1) = sqrt(x); 

    Xa(:,1) = xa';

    for i = 1:L
        for j = 1:L
            if i == j
                if Pa(i,j) < 0.000001
                    Pa(i,j) = 0.000001;
                if Pa(i,j) < 0.000001
                    Pa(i,j) = 0;

    [~,eigPa] = chol(Pa);
    while eigPa ~= 0    %Checking for positive definiteness
        Pa = Pa + 0.001*diag(ones(1,length(Pa)));
        [~,eigPa] = chol(Pa);

    Pa_chol = chol(Pa);


    for l = 2:L+1
        for i = 1:L
            Xa(i,l) = xa(i) + (sqrt(L+lambda)*Pa_chol(i,l-1));

    for l = 2+L:2*L+1
        for i = 1:L
            Xa(i,l) = xa(i) - (sqrt(L+lambda)*Pa_chol(i,l-L-1));

    % Cholesky
    for l = 1:2*L+1
        if Xa(1,l) < 0
            Xa(1,l) = 0.0001; %Ensuring positivity

        X(l) = Xa(1,l) + (kappa*theta - (mu*rho*eta) - (kappa - 0.5*rho*eta)*Xa(1,l))*dt...
            + rho*eta*(logS(t) - logS(t-1))...
            + eta*sqrt((1-(rho^2))*dt*Xa(1,l))*Xa(2,l);

    x1 = 0;
    for l = 1:2*L+1
        x1 = x1 + Wm(l)*X(l);

    P1 = 0;
    for l = 1:2*L+1
        P1 = P1 + Wc(l)*((X(l)-x1)^2);

    yhat = 0;
    for l = 1:2*L+1
        if X(l) < 0
            X(l) = 0.00001;
        Y(l) = logS(t) + (mu - 0.5*X(l))*dt + sqrt(X(l)*dt)*Xa(3,l);   

        yhat = yhat + Wm(l)*Y(l);


    %Transform cross-covariance
    Pyy = 0;
    for l = 1:2*L+1
        Pyy = Pyy + Wc(l)*((Y(l)-yhat)^2);

    %Transform covariance
    Pxy = 0;
    for l = 1:2*L+1
        Pxy = Pxy + Wc(l)*(X(l)-x1)*(Y(l)-yhat);

    K = Pxy/Pyy;                %Kalman Gain

    u(t) = logS(t+1) - yhat;    %u(count)   %Mean observation errors
    v(t) = Pyy;                 %v(count)   %Variance of observation errors
    estimates(t+1) = yhat;      %estimates(count)

    x = x1 + K*(logS(t+1)-yhat);    %State update
    P = P1 - ((K^2)*Pyy);           %Covariance update

    xa(1) = x;
    Pa(1,1) = P;

    if x < 0

    Pa(2,1) = 0;
    Pa(1,2) = 0;
    Pa(2,3) = 0;
    Pa(3,2) = 0;
    Pa(1,3) = 0;
    Pa(3,1) = 0;


plot((1:1:length(vol)),vol); drawnow;
legend('Filtered vol')

Note that this is a Matlab adaptation of the code found in Javaheri (2006).

The output from this code is the following figure, plotting the estimate of the volatility over time.


This is the strategy suggested by Javaheri, Lautier & Galli (2003) and it generally works ok. However, many practitioners argue that it is better to take this even a step further and instead of using the Unscented Kalman Filter one should use the so-called Particle Filter. I have not had any extensive experience with this, but instead I would suggest a simple way of increasing the accuracy of the filter as it stands.

The technique above uses a direct Euler-discretisation but a better approach is to use the Milstein scheme, which I personally have had much greater success with.

Note that for the log-stock price the Milstein scheme does not improve on the Euler scheme and the functions are identical for both techniques. Lastly, we can insert for the error term as done above, and we get a code that is a little more stable than the outline given earlier.

With respect to the parameter estimation we can now minimize the log-likelihood of our estimates, after a burn-in period (see for example Javaheri (2006) for more info on this). The burn-in period is simply a number of estimates that are discarded to ensure that our initial guess of the state is not contaminating the filtering and equally the log-likelihood estimation. This can be problematic in finance, since the model-parameters may be time-dependent and we therefore need a quite short time-series in order to ensure the data is representative of the current price process. We might instead require data sampled at higher frequencies, but this carries with it a range of other problems that I will discuss more in a future post.

What if the data does not fit well? Generally I would say that in the case your data does not fit well with the model outlined above, you have two choices:

  1. Increase the number of data points and the burn-in of the filter.
  2. Consider another model, usually including jumps.

There are a whole range of other discretisation schemes that might be used, but I have not really found any of them performing noticeably better (at least for my applications). Therefore, usually you will either have too short a time-series to produce a reliable parameter estimation (of course assuming your optimizer is working properly) or the stock price follows another process than the Heston model. The most likely situation is that the stock price frequently makes discrete jumps, in which case other filtering techniques will have to be implemented to estimate the jump intensity.


Relevant sources:

Haykin, S. S. (2001). Kalman filtering and neural networks. Wiley.

Javaheri, A. (2006). Inside Volatility Arbitrage: The Secrets of Skewness. Wiley.

Javaheri, A., Lautier, D., & Galli, A. (2003). Filtering in Finance. Wilmott Magazine, 2003(3), 67-83.

Petris, G., Petrone, S., & Campagnoli, P. (2009). Dynamic linear models with R. Springer.

Leave a Reply

Your email address will not be published. Required fields are marked *