I hope you have gained a good understanding of linear regression. Also, how to conduct linear regression operations on two sets data using MS Excel. We are referring to two variables X and Y.
X is the variable (independent one), and Y the variable (dependent one). You'd know X and eventually Y if you spent some time considering this.
Let's just run a linear regression on 2 stocks - HDFC Bank or ICICI Bank - and see what we get.
I am setting ICICI Bank to be X and HDFC Bank to be Y.Before we proceed,let's take a quick note-
Here's how the data looks (IMAGE 1)
I will run linear regression on these stocks (I have explained how to do it in the previous chapter). Also, please note that I am only focusing on stock prices and not stock returns.
The linear regression results are as follows:
The equation is -ICICI is independent, HDFC is dependent.
Price of ICICI * 7.613 – 663.677
Assuming you know the equation, I assume that you do. If you are unfamiliar with the equation, I recommend that you read the preceding chapters. Here's a quick summary: The equation attempts to predict HDFC's price using ICICI's price.
We are trying to 'express the HDFC price in terms of ICICI.
Let's reverse this: I will make ICICI dependent and HDFC independent.
These are the results.
The equation is -
ICICI = HDFC + 0.09 + 142.4677
You can therefore regress in two different ways for the two stocks you have just mentioned by reordering which stock has the dependent variable and which is the independent variable.
The question is: How do you decide which should be considered dependent and which independent? Also, it is up to you to decide which order makes sense.
Three things are required to answer this question:
The linear equation above essentially expresses the price variation of ICICI in terms HDFC. Refer to the equation. The expression of price variation in one stock using the price of another stock as a reference cannot be 100%. It would be 100% if there was no play.
The equation must be strong enough to explain as much variation as possible in the price of the dependent variable, while keeping the independent variable in context. This equation is stronger than the one before it.
We now have to ask the obvious question: How do we determine how strong the linear regression equation? Here is the ratio.
Standard Error of Incept / Standard Error comes into play. Before we can talk about the actual ratio, it is important to first understand the numerator as well as the denominator.
Below is the linear regression equation for ICICI (as independent) and HDFC (as dependent).
Price of ICICI * 7.613 – 663.677
This basically means that if I know the price for ICICI, I should also be able predict the price for HDFC. In reality, however, the actual HDFC price is different from what was predicted. This is known as the "Residuals".
following a snapshot is given,for the residual explaining the price of HDFC keeping ICICI as a variable which is independent.
I often get asked a common question when I discuss the regression equation and residuals. Is it possible to use regression if there are residuals every time? Also, we cannot rely on an equation that fails to accurately predict anything, even once.
It is a valid question. It is a fair question.
This was not about predicting the stock's price based on the independent stock. It was always about residuals!
Let me tell you, the residuals exhibit a certain behavior. Once we understand the pattern and can identify it, we can then work backwards to create a trade. This trade is a pair trade because it involves simultaneously buying and selling both stocks.
We will delve into this more in the next chapters. For now, however, let's focus on the 'Standard Error,' the denominator of the Standard Error of Incept / Standard Error equation
When you perform a linear regression operation, the standard error is one variable that gets reported. The snapshot below shows the same.
The standard error is the standard deviation of residuals. The residuals are a time series array. If you calculate the standard deviation from the residuals, you will get the standard error.
Let me actually manually calculate the standard error for the residuals. I'm doing this with X = ICICI, and y = HFC
Excel tells me that the standard deviation of 152.665. The 152.819 standard error is reported in the summary output. It is possible to ignore the minor differences.
It is not easy to calculate the 'Standard error of the Intercept'. It is reported in the regression report. Here is the standard error for the intercept, with x = ICICI y = HDFC
Remember, the regression equation is -
M = Slope
C = Intercept
You will see that both M and C are estimates. How are they calculated? These are based on historical data that has been provided to the regression algorithm. There may be noise components in the data, but few outliers. This means that estimates can be wrong.
The Standard Error of the Intercept measures the variance of the estimated intercept. This helps to understand how much the intercept can vary. This can be taken as likely to the 'Standard error' itself. To summarize -
Let's now bring back the "Error Ratio" after we have defined these variables. The term "Error Ratio" is not a standard term. I created it to make it easier to understand.
However, we do know that the error ratio is -
Error Ratio = The Standard Error Of Intercept / The Standard Error
I calculated the same as -
The error ratio is a key factor in deciding whether to assign X or Y to stock. The lower the error ratio, the better. We will assign HDFC (X) to ICICI (Y), as ICICI (Y).
Although I would love to give the reasons behind why we use the error ratio to designate X and Y I won't. This will be a topic I return to when I am able to take up pair trading.
Now, calculate the error ratio to determine which stock is dependent and which independent.