A Guide to Trading Systems

Lesson -> Pair Stats - PTM1, C1

4.1 -Correlation and its different types

This is something I must mention at this point. Mark Whistler wrote a book called "Trading Pairs", which discusses the pair trading strategy we are currently discussing. This book is a favorite of mine because it introduced me to pair trading. As my interest grew, I began to explore other strategies beyond Mark Whistler's. These techniques will be discussed later in the module, it is obvious. My intention is to show you the exact path that I took when learning pair trading.

We introduced the concept and methods of analyzing correlation values towards the end of the last chapter. Now we will continue the discussion and learn how to calculate correlation between two stocks using excel. You may already know that the key to pair trading is the calculation of correlation between two stocks.

Axis Bank (and ICICI Bank) were both considered for this example. They are both Private sector banks with similar business backgrounds. Therefore, intuition dictates that they should be closely related.

Here is the final price for Axis Bank & ICICI Bank, as of December 2015 (roughly 2 years) and December 2017 (about 4 thDec 2017). This is approximately 496 data points.

Before we proceed, a quick note on data -

  1. You should ensure that you have the same number data points. If you have 400 data points for Stock A, you must ensure that you have the same number for Stock B.
  2. Make sure that the data is clean for corporate actions like bonus/splits, etc. (IMAGE 1)
  3. You can see in the above image that I have downloaded data for BPCL and HPCL as well as HDFC Bank. This data can be used to test and build other correlations.

    However, we only have the date and closing price for the stock at this point. Let's now calculate the daily returns. You are probably familiar with the daily returns calculation. We have already discussed it in previous modules.

    You can calculate the daily return as

    = [today's close price / previous days closing price] - 1.

    This is how I calculated ICICI and Axis Bank -
    (IMAGE 2

    Two parameters are required to calculate correlation.

    1. The daily closing price
    2. The daily return series

    Calculating the daily closing price correlation requires that you use the closing prices for two stocks to determine the correlation. Although I don't like the idea of calculating correlation based on closing prices, let's do it for now.

    You can do this in Excel by simply using the '=Correl ()', function to get the daily closing prices. This calculation is being done on a new sheet. It's called 'Pair Data'.

    Here's a snapshot
    (IMAGE 3).

    Correlation between closing prices for ICICI Bank (and Axis Bank) is 0.51. This is not a very strong correlation, but it's something we can accept for now. Remember, we felt that the two banks might be closely related because they had similar business backgrounds.Whereas the numbers depicts different picture.

    Now we will run the correlation of the daily % returns series for both stocks. I have already calculated the % return. I will just need to run the correl function.
    (IMAGe 4).

    Although this is not a great number, it's still a good start.

    Some traders use the correlation method to calculate the absolute daily change. This can be calculated Throgh " Stock Price (Today)-  Stock Price(Yesterday)". This is something I don't like. Let me tell you, though.

    (IMAGE 5)

    All the calculations above have been run to determine the correlation between Axis Bank and ICICI Bank. The results will be the same if I had chosen to calculate the correlation between ICIC Bank and Axis. The correlation between A, B and C is generally the same as the Correlation between A and B.

    The correlation number is sacred in this trading method. The number should be at least 0.75. This is clearly not true for ICICI or Axis. However, as I said earlier, it's possible to live with it.

    4.2 -A Database Settup

    We discussed in the previous chapter three variables that are related to pairs: the spread, differential and ratios. Let's now calculate these variables for the stocks we are currently studying. This will be done on a separate sheet in the same workbook. We'll call it the "Data Sheet". This is the snapshot.
    (IMAGE 6).

    These variables are easy to calculate, as I have explained in the previous chapter.

    Different types of Pair Trading work at different levels of complexity. This section will cover basic statistics for pair trading. We will now discuss the 3 most frequently used statistics variables.

    4.3 -Core Stats

    At this point, I will discuss three basic statistical terms. These terms are fundamental and crucial for pair trading. These terms are essential in pair trading. I am fairly certain you would have learned them in high school math.

    This is a list of random runs that batsmen score in 10 cricket matches.

    Mean Also known as the arithmetic mean , this is the average of a number. The average can be calculated by multiplying the sum of all observations by the total number.

    If I wanted to find the average score in the example above, I would add all scores together and divide it by 10, which is the total number of observations.

    Average (Mean) = 626/10


    Excel allows you to simply use the "=Average" function to calculate the average value of any number.

    Median The median number is the middle number in a data series that has been arranged numerically. To calculate the mean, we must take the average of two numbers (which is what this example shows). If there is an odd number data points, we can simply use the median data point.

    Let me reorder the data points according to their numerical order.

    23, 34, 44. 51, 55. 65. 72. 82. 100. 100

    Because there are many observation numbers, I will take the middle numbers, 55 and 65. Their average is the median.

    Median = (55 +65)/2


    To calculate the median, use excel's function '=Median ()'.

    When viewed together, the median and mean give an indication of the trend. We'll get to this more later.

    Mode – The mode of a data sequence is the data point that occurs the most often in the series. It is obvious that 100 repeats twice with no other numbers appearing more than once. This makes it the mode for the data series.

    Excel's function to calculate Mode is "=Mode ()'."

    These functions will be used in Excel and explained in the next chapter.


    Keep watching.

     Key points 

    1. It is important to take care to ensure that data are accurate and adapted for corporate actions.
    2. Close correlation can be defined as the correlation calculated from the closing stock prices
    3. The correlation between the stock's daily returns and the % return is called the % correlation.
    4. The arithmetic mean of the data series is called the Mean
    5. The middle observation is the median in a data set.
    6. If there are even numbers of observations in a data set, the median is the average between the middle two observations
    7. If there are odd numbers of observations in a data series, the median is the middle observation
    8. Mode refers to the data series that repeats the most often.
    9. When viewed together, the median and mean provide great insight into data trends.