This chapter is going to be a bit more complicated. We'd be only scratching the surface of higher-order statistical theory. I will do my best to keep it simple and not get into too much detail. These things will be explained from a trading perspective. However, I have to admit that some theory is necessary.
With the long road ahead, I believe it is important to review our lessons so far and to put order to them. Let me summarize the journey thus far.
I trust you have understood and read everything we have discussed to this point. If you don't understand something, I suggest that you go back and read the chapters to get more clarity. Then, proceed.
Remember, the residuals were discussed in the previous chapter. I mentioned in fact that the main focus of this chapter will be on residuals. It's time to study residuals more closely and determine the behavior they exhibit. We will learn two new terms: Stationarity and Cointegration in our quest to understand them.
If two time series (stock X or stock Y) are "co integrated", it means that they move together. Any deviation from this movement can either be temporary or attributed to an unforeseen event. One can expect the time series to return to its normal orbit, i.e. convergence and then move together again. This is what we want when pair trading. This means that the pair we pair trade on should be cointegrated.
The question is: How do we determine if these stocks are cointegrated or not?
To check if two stock are cointegrated, first run a linear analysis on them, then add up the residuals from the linear regression algorithm and verify that the residual is stationary'.
If the residuals remain stationary, it means that the stocks are cointegrated. If the stocks move together, the pair is ready for trading.
This is how it can be viewed: one can take any series of two-times and apply regression. The regression algorithm will always produce an output. How can one tell if the outputs are reliable? Stationarity is crucial. Regression equations can only be valid if the residuals are stationary. Regression relation should not be used if residuals aren't stationary.
It is much more effective to speculate and set up trades using a co-integrated series of time frames than it is to do so in a market direction.
This boils down to the question of whether residuals are stationary.
This is where I can show you how to determine if residuals are stationary. There is an easy test called the "ADF test" to verify this. It is really all you need. But, it is worth taking a few minutes to learn what "Stationarity" really means without actually diving into the numbers.
If you're interested in learning more, then read the next section. Otherwise, go to the section that discusses ADF testing.
If a time series follows three simple statistical conditions, it is considered "Stationary". These conditions are partially satisfied by a time series, such as 2/3 or 1/3 respectively. The stationarity is then considered weak. If none of these conditions are met, the time series is considered 'non-stationary.
These are the three basic statistical conditions:
Pair trading is limited to pairs that exhibit complete stationarity. We will not accept non-stationary or weak stationary series.
It is best to use an example, such as a time series sample, to understand what these conditions mean.
This example uses two-time series data with 9000 data points each. These have been named Series A and Series C. Using this time series data, we will evaluate the stationarity conditions.
Condition 1: The mean or range of the series should be equal
This will allow me to evaluate the data. I will divide each time series into three parts and calculate the mean for each. The mean of all three parts should be approximately the same. If this is true, I can conclude that the average will remain the same regardless of new data being gathered in the future.
Let's get started. Let's start by splitting the Series A data in three parts and computing its respective means. It looks like-
- (IMAGe 1)
As I said, I have 9000 datapoints in Series A and B. As you can see, the beginning and end cells have been highlighted.
The means for all three parts is similar, which clearly fulfills the first condition.
Here's how Series B looks like.
As you can see, Series B's mean swings quite wildly. This makes it difficult to satisfy the first condition of stationarity.
Condition 2: The Standard deviation should not exceed a range.
This is the same approach I used in the previous example. I will calculate the standard deviation of each part for the series and then observe the results.
This is the Series A result.
The standard deviation oscillates from 14-19% which is very tight and qualifies for the 2 nd stationary condition.
This is how Series B's standard deviation looks.
What is the difference? The standard deviation range for Series B is very random. Series B is not a stationary series. At this point, however, Series A appears to be stationary. We still have to assess the last condition, i.e. the autocorrelation bit. Let's get on with it.
Condition 3: There should not be any autocorrelation in the series
Autocorrelation, in layman's terms, is a phenomenon in which any time series value is independent of any other value.
Take a look at this example:
The 9 ninth value is 29. If there is no autocorrelation, the value 29 does not depend on any previous values. i.e. the values from cells 2 to 8.
The question is, how can we do this?
There is a way.
Let's say there are 10 data points. I will take the data in Cell 1 through Cell 9, and call it series X. Now, I will take the data in Cell 2 to Cell 10, and call it Series Y. Calculate the correlation between Series X, Y. This is known as 1-lag correlation. This is called 1-lag correlation.
This works for 2 lag, i.e. between Cell 1 and Cell 8, then between Cell 3 and Cell 10. Again, the correlation should be near 0. If this is true then it is safe for us to assume that the series does not have an autocorrelated relationship and therefore the 3 rd condition of stationarity has been proven.
Here is the lag correlation calculation for Series A.
Remember that I am subdividing Series A into two pieces and creating two subseries, i.e. series X or series Y. These two subseries are used to calculate the correlation. The correlation is very close to zero, and we can conclude that Time Series A has a stationary state.
Let's also do it for Series B.
I have used a similar approach and the correlation is very close to 1.
As you can see, all conditions for stationary are met for Series A. This means that the series is stable. While Series B is not.
It's obvious that my approach to explaining co-integration and stationarity is unconventional. These formulae are essential to any statistical explanation. This is a deliberate approach, and I felt that this would be a great way to discuss the topic. Our goal is to pair trade efficiently, and not dive into statistics.
You might be wondering if you really need to do this to determine if the time series (residuals), are actually stationary. This is not necessary, as I have said.
To determine if the time series has a stationary pattern, we only need to examine the results of the "The ADF Test".
Perhaps the most effective technique to determine the stationarity in a time series is the augmented Dickey–Fuller test or the ADF test. In our example, the residuals series is the time series to be considered.
The ADF test basically does all that we have discussed, plus a multi-lag process to verify the autocorrelation within a series. The output of the ADF testing is not definitive. It does not say whether this series is stationary or not. The ADF test's output is more like a probability. It indicates the probability that the series is moving, but not stationary.
If the ADF output for a time series is 0.25 then it means that the series has a 25% chance to not be stationary. In other words, the series has a 75% chance to be stationary.This number can also be called as "The P value''.
The P value of a time series should not be lower than 0.05 (5%) in order to consider it stationary. This basically means that the probability that the time series will be stationary is 95% or higher.
Okay, now how do you run an ADF testing?
This is quite a complex process. Unfortunately, there is no free online source that will allow you to run an ADF test. Although I have an Excel sheet with a paid plugin to run an ADF Test, it is not available here. If I could, I would.
You could also try Python plugins to run ADF tests if you're a programmer.
If you're not a programmer like me, you'll be stuck at this point. Here's what I'll do: Once in 15 days, I'll upload a "Pair Data" sheet. It will contain the following information about the best combination of pairs.
This lookback period is 200 trading days. This is only for banking stocks. I hope to include other sectors in the future. This is the Snapshot of the most recent Pair Datasheet for Banking Stocks - to help you better understand the situation.
This first line indicates that Federal Bank as PNB as X and Federal Bank as Y are a viable pairing. This means that Federal Bank as Y, PNB and X were regressed and that Federal as PNB and Y had the lowest error ratio.
After the order is determined (as in which one of the two is Y or X), the intercept for the combination and Beta have also been calculated. The ADF was then run and the P value calculated. As you can see, the P value of Federal Bank as Y is 0.365 and PNB (as X) is 0.365.
This means that this combination is not one you should be considering as the chance of residuals being stationary at 63.5% is very unlikely.
You will see that only 2 pairs have the desired P-value, HDFC and PNB, respectively, in the above snapshot.
The p values rarely change overnight. Therefore, I make sure to check the p-value every 15 or 20 working days.
This chapter has taught us a lot. Many of the topics discussed in this chapter could be unfamiliar to most readers. This is why I'm going to summarize everything you need to know about Pair Trading at this point.
If you're not sure about any of the points, I suggest that you read Chapter 7 again.
We will be looking at a pair trade as an example and trying to understand its dynamics in the next chapter.