A Guide to Trading Systems

Lesson -> The Error Ratio - PTM2, C3

10.1 – Finding Who is X & who is Y?

I hope you have gained a good understanding of linear regression. Also, how to conduct linear regression operations on two sets data using MS Excel. We are referring to two variables X and Y.

X is the variable (independent one), and Y the  variable (dependent one). You'd know X and eventually Y if you spent some time considering this.

Let's just run a linear regression on 2 stocks - HDFC Bank or ICICI Bank - and see what we get.

I am setting ICICI Bank to be X and HDFC Bank to be Y.Before we proceed,let's take a quick note-

  1. You must ensure that your data is accurate - adjust for bonuses, splits, and other corporate actions
  2. Check that the dates are correct. For example, I have data for both stocks from the 4th of December 2015 to the 4th of December 2017.

Here's how the data looks (IMAGE 1)
I will run linear regression on these stocks (I have explained how to do it in the previous chapter). Also, please note that I am only focusing on stock prices and not stock returns.
(IMage 2
The linear regression results are as follows:
(Image 3).

The equation is -ICICI is independent, HDFC is dependent.

Price of ICICI * 7.613 – 663.677

Assuming you know the equation, I assume that you do. If you are unfamiliar with the equation, I recommend that you read the preceding chapters. Here's a quick summary: The equation attempts to predict HDFC's price using ICICI's price.

We are trying to 'express the HDFC price in terms of ICICI.

Let's reverse this: I will make ICICI dependent and HDFC independent.

These are the results.
(IMAGE 4).

The equation is -

ICICI = HDFC + 0.09 + 142.4677

You can therefore regress in two different ways for the two stocks you have just mentioned by reordering which stock has the dependent variable and which is the independent variable.

The question is: How do you decide which should be considered dependent and which independent? Also, it is up to you to decide which order makes sense.

Three things are required to answer this question:

    1. Standard Error
    2. Standard Error in intercept
    3. The sum of the two variables above.

The linear equation above essentially expresses the price variation of ICICI in terms HDFC. Refer to the equation. The expression of price variation in one stock using the price of another stock as a reference cannot be 100%. It would be 100% if there was no play.

The equation must be strong enough to explain as much variation as possible in the price of the dependent variable, while keeping the independent variable in context. This equation is stronger than the one before it.

We now have to ask the obvious question: How do we determine how strong the linear regression equation? Here is the ratio.

Standard Error of Incept / Standard Error comes into play. Before we can talk about the actual ratio, it is important to first understand the numerator as well as the denominator.

10.2- Return to residuals

Below is the linear regression equation for ICICI (as independent) and HDFC (as dependent).

Price of ICICI * 7.613 – 663.677

This basically means that if I know the price for ICICI, I should also be able predict the price for HDFC. In reality, however, the actual HDFC price is different from what was predicted. This is known as the "Residuals".

following a snapshot is given,for the residual explaining the price of HDFC keeping ICICI as a variable which is independent.

I often get asked a common question when I discuss the regression equation and residuals. Is it possible to use regression if there are residuals every time? Also, we cannot rely on an equation that fails to accurately predict anything, even once.

It is a valid question. It is a fair question.

This was not about predicting the stock's price based on the independent stock. It was always about residuals!

Let me tell you, the residuals exhibit a certain behavior. Once we understand the pattern and can identify it, we can then work backwards to create a trade. This trade is a pair trade because it involves simultaneously buying and selling both stocks.

We will delve into this more in the next chapters. For now, however, let's focus on the 'Standard Error,' the denominator of the Standard Error of Incept / Standard Error equation

When you perform a linear regression operation, the standard error is one variable that gets reported. The snapshot below shows the same.
(Image 6).

The standard error is the standard deviation of residuals. The residuals are a time series array. If you calculate the standard deviation from the residuals, you will get the standard error.

Let me actually manually calculate the standard error for the residuals. I'm doing this with X = ICICI, and y = HFC
(IMAGE 7).

Excel tells me that the standard deviation of 152.665. The 152.819 standard error is reported in the summary output. It is possible to ignore the minor differences.

It is not easy to calculate the 'Standard error of the Intercept'. It is reported in the regression report. Here is the standard error for the intercept, with x = ICICI y = HDFC
(IMAGe 8).

Remember, the regression equation is -



M = Slope

C = Intercept

You will see that both M and C are estimates. How are they calculated? These are based on historical data that has been provided to the regression algorithm. There may be noise components in the data, but few outliers. This means that estimates can be wrong.

The Standard Error of the Intercept measures the variance of the estimated intercept. This helps to understand how much the intercept can vary. This can be taken as likely to the 'Standard error' itself. To summarize -

  • Standard Error of Incept - The variance from the intercept
  • Standard Error is the variance of residuals.

Let's now bring back the "Error Ratio" after we have defined these variables. The term "Error Ratio" is not a standard term. I created it to make it easier to understand.

However, we do know that the error ratio is -

Error Ratio = The Standard Error Of Intercept / The Standard Error

I calculated the same as -

  1. ICICI as X, HDFC as y = 0.401
  2. HDFC as X, ICICI at y = 0.227

The error ratio is a key factor in deciding whether to assign X or Y to stock. The lower the error ratio, the better. We will assign HDFC (X) to ICICI (Y), as ICICI (Y).

Although I would love to give the reasons behind why we use the error ratio to designate X and Y I won't. This will be a topic I return to when I am able to take up pair trading.

Now, calculate the error ratio to determine which stock is dependent and which independent.



  1. X is the independent stock, Y the dependent stock
  2.    To figure out between the stock that has to be X and Y depends upon the 'Error Ratio'  .
  3. Estimates of both the slope and intercept are made from the linear regression equation
  4. Error Ratio = Standard error of the Intercept/Standard Error
  5. The standard error is the standard deviation from the residuals
  6. The standard error of intercept provides an indication of the variance of intercept
  7. Regress Stock 1 and Stock 2, or Stock 2 and Stock 1, depending on which has the lowest error ratio, to determine which stock is dependent and which is independent
  8. Certain properties of residuals can be used to identify pairs trading patterns.