Quick and Dirty
Regression TutorialThe statistical procedure known as linear regression basically involves drawing and analyzing trend-lines through data points. Economists use regression analysis to test hypotheses, derived from economic theory, against real-world data. In your first microeconomics class you saw theoretical demand schedules (Figure 1) showing that if price increases, the quantity demanded ought to decrease. But when we collect market data to actually test this theory, the data may exhibit a trend, but they are "noisy" (Figure 2).
To analyze the empirical relationship between price and
quantity, download and open the
Although it is standard practice to graph supply and demand
with Price on the Y-axis and Quantity on the X-axis,
economists more often consider demand Quantity to be
the "dependent" variable influenced by the "independent" variable Price.
To obtain a more conventional demand equation, invert your equation,
solving for intercept and slope coefficients
where Quantity = a + b*Price + ee is the residual "unexplained" variation in the Quantity
variable (the deviations of the actual Quantity data points from
the estimated regession line that you drew through them).
That's basically what linear regression is about: fitting trend lines through data to analyze relationships between variables. Since doing it by hand is imprecise and tedious, most economists and statisticians prefer to...
MS-Excel provides two methods for
fitting the best-fitting trend-line through data points, and calculating
that line's slope and intercept coefficients.
The standard criterion for "best fit" is the trend line that minimizes the
sum of the squared vertical deviations of the data points from
the fitted line.
This is called the The easiest way to plot a trend line and calculate a single-variable regression equation is to right-click on the data points in an Excel XY plot and select "Add Trendline." Under the "Options" tab check "Display equation on chart" and click "OK." How well do this trend line and calculated slope and intercept coefficients match the line you drew and the slope and intercept that you calculated?
Excel also includes a formal regression utility in its Analysis
ToolPak that provides statistics indicating goodness-of-fit and
confidence intervals for slope and intercept coefficients.
This utility lets you regress one dependent
"left-hand-side" (of the equal sign) variable against
one Excel's Regression procedure is one of the Data Analysis tools. If you don't see it, you need to activate the Analysis ToolPak. Click the Windows symbol or the File menu, choose Options--Add-Ins, select Analysis ToolPak (not Analysis ToolPak VBA) and click "Go..." Check the Analysis TookPak checkbox and "OK." You will find "Data Analysis" on the right end of the "Data" menu.
The only things you are required to specify are...
Here is output from Excel's regression utility
replicating the regression of Price (Y range) against Quantity (X range).
At the bottom of the output you can see the same Intercept and Quantity
slope coefficients that are shown for the trend line in the XY plot
above.
This empirical inverse demand model,
written out in equation form, is
Try specifing Quantity as the dependent variable and Price as
the independent variable, and estimating the conventional demand
regression model
Now try
regressing Quantity (Y range) against both Price and
Income (the X range is both the Price and Income columns).
This will yield coefficient estimates for the
multivariate demand model
Written out in equation form, this empirical demand model is Q = 49.18 - 3.118*P + 0.510*I + e.
Multivariate models such as this don't lend themselves to easy
graphing, but they are much more interesting.
In this example an increase in Income shifts the conventional
Q vs. P demand schedule to the right, while an increase in
Price shifts the Q vs. Income curve (aka Engel curve) to the
left.
When analyzing your regression output, first check the signs of the model coefficients: are they consistent with your hypotheses? Is the Price coefficient negative as theory predicts? Does the Income coefficient indicate this is a normal good, or an inferior good? Try calculating the price and income elasticities using these slope coefficients and the average values of Price and Quantity.
The next thing you should check is the statistical
significance of your model coefficients.
Because the data are noisy and the
regression line doesnt fit the data points exactly, each reported
coefficient is
really a point The standard error is the square root of the variance
of the regression coefficient. The
If that last paragraph is just statistical gibberish for you,
don't worry--most people just check the P-values.
These are the probabilities that the coefficients are The R-Square statistic near the top of the output represents the
percent of the total variation in the dependent variable that is explained
by the independent variables, i.e., the model's overall goodness of
fit."
But whether a model is really a "good" fit or not depends on context.
R-squares for cross-sectional models are typically much lower than
R-squares for time-series models.
You can always increase R-square by throwing another independent variable
( Those are all the diagnostics you really need to worry about.
The classical OLS model assumes that the residuals
Keep in mind that a regression actually analyzes the statistical
You should never force the regression line through the origin (the "Constant is zero" check-box in the Excel utility) without a clear theoretical justification for doing so. It makes your model diagnostics unreliable. |