The previous chapter provided a basic understanding about a straight-line equation. We used a simple example to show how variables can be related. The examples were chosen in such a way that even casual eyes could see the relationship. We posted a table with two numbers in it towards the end of the chapter. The task was to determine if there was any relationship between them and if so, how can one express that relationship as a straight-line equation. What was the intercept and constant, more precisely?
In this chapter, we will learn how to create a relationship and move closer to the relative value trading method. Let me once more post the table with both number arrays -
|X Variable||Y variable|
Casual eyeballing doesn't reveal anything about the relationship between these two sets of numbers. It might work if you're a mutant, but it doesn't for me.
In such situations, the Linear Regression technique is used. Linear regression can be described as a statistical operation in which the input is an array with two sets of numbers, and the output is made up of many parameters including the constant and intercept that are necessary to construct the straight-line equation.
Excel is the best tool to perform linear regression operations. This is the complete guide for performing a simple linear regression using two numbers. You will see many screenshots and instructions.
Step 1 Installing the Plugin
Input the values of X & Y into an Excel sheet.I did the same as picturised below-
|X variable||Y variable|
This is our data collection. Remember that Y is the dependent variable, whose value is dependent on X. For the linear regression operation, both X and Y are the input variables.
Click on the Data ribbon, highlighted in red.
Now, the data ribbon will display the "Data Analysis" option. This option is highlighted in blue. Some people may not be able to see this option. If so, don't panic. I'll tell you what you need to do.
Click on "File" -
You can open another new window. On your left-hand side panel you will see the option to choose 'option'.
You will find a variety of options available by clicking on the Options. Select 'Add-Ins" from the left-hand panel. Click on it, then click on the "Analysis Tool Pack". Next, click on "Go" and then on "Ok". This will add the "Data Analysis" option to your data ribbon.
Close the excel sheet, restart your system, and you're good to go.
Step 2: Enter the values
We will assume that you have the data analysis package in your excel sheet. Next, you will need to invoke the linear regression function from the data analysis package. Click on the Data ribbon and choose the Data Analysis option. A pop-up will appear with a list of statistical operations that you can do on data sets. Choose the one that says "Regression".
There are many fields in this section, as you can see. The input section is the most important section. You will find two fields in this section: 'Input Y Range’ and 'Input X Range’. As you might have guessed, Y stands for dependent variable and X for dependent variable.
Here is where you feed the X series and Y series data. Click on the input channel to select Y or X range.
Please also notice that I have checked the label box. This implies that the first cell value i.e. A2 and B2 contain series label i.e X & Y, respectively.
For now, I suggest that you disregard the input values.
You should ensure that you have clicked the following items on the output side.
By selecting 'New sheet', the output data will be printed on a different sheet. I also clicked on two additional variables, Residuals or Standardized Residuals. These two variables will be discussed later. You can just make sure they are chosen.
Now you can perform the linear regression operation. You can click on the "OK" button in the right-hand upper corner.
Excel will now accept these inputs and run the linear regression operation. The results will be posted to a new sheet in the same workbook.
Here is the output of linear regression. As expected, the summary is shown in a new sheet.
Yes, it is scary at first glance. In this output one can gather a lot of information. As we go, we will dissect this output in pieces.
Let's now focus on finding our slope or intercept. This is what I highlighted in the following snapshot.
The red data points indicate the coefficients that we are trying to find, i.e. the intercept (or constant), and the slope (denoted with x).
Some may be confused by the slope represented by x. I get it. However, M would have been better as it would match straight-line equation. But then, we'll have to accept x for slope.
This means that the straight-line equation to calculate the random set of data is -
y = 1.885*x + (7.859813)
y = 1.85xx - 7.859813
What does all this actually mean?
This equation, as you may recall from the previous chapter helps us to predict the value or dependent variable of a given x. I will repost the table for your convenience.
|X variable||Y variable|
I have added a new point for x, i.e 15. Now using the slope or intercept, we can determine the value of the y.
y = 1.885 * 15, - 7.859813
= 28.275 - 8.859813
If x is 15, the most likely value for y is 20.415.
You may be wondering how accurate this prediction is.
It's not exact. It's only an estimate. Consider, for example, that x has a value of 18 (refer the last data point). The straight line equation will then calculate y to be 18.
y = 1.885*18 + 7.859813
= 33.93 - 8.859813
The actual value of y, however, is 22.
We have two values for y.
The residuals are the differences between the two values for y. The residual for y is the difference between predicted and actual y when x = 18.
22.067019 - 26.07019
If you do linear regression, the summary output also includes residuals. I have highlighted this in the snapshot below.
I have also highlighted the residual when 18 is chosen, which is what we calculated previously.
Let me give you a heads-up: The residuals are the main focus of relative value trades. Keep watching!