1. UNIVARIATE REGRESSION
It's generally easier (and neater)
to graph data points and fit lines through them
using a package like MS-Excel. You have probably used Excel
to create graphs
before. This exercise also introduce Excel's regression
utility.
First, copy the following demand
data into two columns of an MS-Excel spreadsheet
(don't type in the $ signs for P; just format this column to currency
if you want).
Q(demand) P ($)
15 1.40
21 1.25
35 1.00
24 1.10
22 1.30
19 1.20
11 1.40
16 1.45
26 1.05
29 0.90
38 0.90
28 1.05
Highlight both data columns in
the spreadsheet and use the chart wizard to create
an XY (scatter) plot of these data. Q should be on the horizontal
axis; P on the
vertical axis.
Now use Excel's regression utility to calculate the slope and intercept
of the
equation that best fits these data. Most datapoints in a
scatterplot have some
vertical deviation from any trendline drawn through the scatter.
Excel will
estimate the slope and intercept of the line that minimizes the
sum of the squared
deviations (or "errors") of all the data points from that line.
Here you will be determining the
values of the coefficients a and b that minimize
the sum of the squared e's (errors):
Q = a + bP + e
You may need to load Excel's Analysis
ToolPak first: from the menu, use
Tools--Add-Ins and click on the Analysis TookPak check-box.
To do a regression, from the menu,
use Tools--Data Analysis and select
"Regression" from the list of tools.
In the Regression window, you need
to input the Y-Range (click on the red select
icon at the right of that field, highlight the column in the spreadsheet,
then
click on the red icon to okay the selection), then the X-range.
The Y-Range is
the "dependent variable" on the left-hand side of the equal sign;
the X-range
contains one or more "independent variables" on the right-hand
side of the equal
sign that are supposed to explain the variation in the dependent
variable. Note
that the input options allow you to force the intercept to zero,
but you don't
want to do that here!
Once you have input the X and Y
ranges, you need to specify where Excel should
direct your output: it's usually most convenient to specify the
upper left corner
of a blank output range within the same spreadsheet.
Here's a portion of the regression output you should get if you do this correctly:
-----------------------------------------------------
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.910124427
R Square
0.828326473
Adjusted R Square 0.811159121
Standard Error 3.497799308
Observations 12
ANOVA
df
SS MS
F
Regression 1 590.3206667 590.3206667 48.25009944
Residual 10 122.346 12.2346
Total 11 712.6666667
Coeffs StdError t-Stat
Intercept 67.58 6.402023899
10.55603682
X Variable1 -37.64 5.418767388 -6.94622915
------------------------------------------------------
It looks messy, but there are only
a few things here you need to worry about.
First, the Intercept and X-Variable1 coefficients are the slope
and intercept
parameters you're after:
Q = 67.58 - 37.64P
Simple, huh? This is the demand
equation that fits the market data best.
If you plotted the points by hand, and drew a trendline with a
straightedge
through the points, and solved for the slope and intercept of the
line,
you would get values very close to these.
The R-Square coefficient tells you
what percent of the total variation in Q
is explained by your demand model. It ranges from zero to
one. This model,
using just price, explains 82.8% of the variation in Q. We
could increase
the R-Square coefficient by including a second variable in the
X-range, such
as the price of a substitute good.
The estimated Intercept and X-Variable
coefficients are just the most likely
coefficient values. Since the data are noisy, there is a
range of possible
values for each model coefficient. The Standard Errors and
t-Statistics are
indicate how wide this range is. They can be used to determine
if a
coefficient is in fact significantly different from zero.
The t-Statistic is simply the coefficient
divided by its standard error.
I will give you a t-Distribution
table and explain how to use it in class.
In general, the critical t-value thresholds to look for are--
2.0 or greater means there is a
95% probability that the reported sign of
your coefficient is correct, i.e., the coefficient
is significantly
different from zero with 95% confidence.
2.7 or greater means there is a 99% probability that the reported
sign of
your coefficient is correct.
With small datasets (<30 datapoints) such as this one, the critical
t-values
are actually a bit higher.
(Ignore the sign on your t-Statistic; it just matches the sign
on the
coefficient.)
In a blank column adjacent to the
Q column, use the estimated coefficients to
calculate predicted values of Q. (Multiply each value of
P by the slope
coefficient and add the intercept.) Do the predicted Q values
match the
actual Q values fairly closely? Create an XY (scatter) plot
with Price on the
X-axis and both actual and predicted Q's in the Y-Range.
Format the actual
Q series as points only; format the predicted Q series as points
with a line
running through them. Make sure the predicted Q series trace
out a straight
line.
2. MULTIVARIATE REGRESSION
As hinted above, the Excel regression
utility can handle multiple right-hand
side variables in a regression model. Here's an exercise
to demonstrate this:
Suppose we want to relate consumer
demand for coffee Q(coffee) to four
variables:
the price of coffee P(coffee)
the price of a substitute drink, tea P(tea)
the price of a complement good, non-dairy creamer P(creamer)
consumer income I
In this exercise, we specify and test a linear demand model
Q(coffee) = A + B1*P(coffee) + B2*P(tea) + B3*P(creamer) + B4*I
We hypothesize that B1 will be negative
(meaning the demand for coffee is
negatively related to its own price).
What sign do you hypothesize for B2? Why?
What sign to you hypothesize for B3? Why?
What sign to you hypothesize for B4? Why?
Your research assistant interviewed
22 households in different regions of the
US and obtained the following data (price in dollars, income in
hundreds of
dollars). Copy it (ACCURATELY!) into an Excel spreadsheet.
---------------------------------------------------------
| Obs | Q(coffee) | P(coffee) | P(tea) | P(creamer) | I(ncome) |
| 1 | 3 | 1.04 | 1.77 | 0.29 | 113 |
| 2 | 6 | 0.95 | 2.15 | 0.26 | 105 |
| 3 | 5 | 0.88 | 1.66 | 0.23 | 147 |
| 4 | 4 | 0.95 | 1.58 | 0.22 | 266 |
| 5 | 6 | 0.90 | 2.07 | 0.24 | 132 |
| 6 | 4 | 1.16 | 1.99 | 0.28 | 807 |
| 7 | 4 | 0.95 | 1.49 | 0.20 | 456 |
| 8 | 5 | 1.04 | 2.35 | 0.28 | 281 |
| 9 | 8 | 0.87 | 2.17 | 0.28 | 186 |
| 10 | 4 | 0.97 | 1.63 | 0.26 | 444 |
| 11 | 10 | 1.08 | 1.80 | 0.27 | 771 |
| 12 | 6 | 0.93 | 1.96 | 0.28 | 508 |
| 13 | 2 | 1.15 | 1.88 | 0.28 | 134 |
| 14 | 0 | 1.11 | 1.48 | 0.25 | 128 |
| 15 | 3 | 1.08 | 1.99 | 0.20 | 329 |
| 16 | 7 | 1.00 | 2.20 | 0.26 | 325 |
| 17 | 3 | 1.11 | 1.52 | 0.23 | 606 |
| 18 | 3 | 1.15 | 1.74 | 0.20 | 139 |
| 19 | 2 | 1.11 | 1.68 | 0.25 | 475 |
| 20 | 8 | 0.96 | 1.88 | 0.26 | 541 |
| 21 | 10 | 0.91 | 2.16 | 0.26 | 610 |
| 22 | 2 | 1.16 | 1.64 | 0.26 | 249 |
--------------------------------------------------------.
Now regress Q(coffee) as the Y-Range against P(coffee), P(tea),
P(creamer)
and I all included in the X-Range. Direct the output to empty
cells to the
right of your data.
Is the sign of the own-price coefficient
B1 negative as hypothesized?
Does its t-statistic indicate it is statistically significant at
the 95%
confidence level? At the 99% confidence level?
Does the sign of the cross-price
coefficient B2 confirm that tea is a
substitute for coffee? Is it statistically significant?
At what confidence
level? Explain.
Does the sign of the cross-price
coefficient B3 confirm that creamer is a
complement to coffee? Is it statistically significant?
At what level?
Explain.
Does the sign of the income coefficient
B4 confirm that coffee is a normal
good? Is it statistically significant? At what level?
Explain.
In a blank column adjacent to your
Q(coffee) column, calculate predicted
values for Q(coffee) from the regression coefficients: write a
formula to
add A + B*P(coffee) + B2*P(tea) + B3*P(creamer) + B4*I).
Create an XY
plot of actual and predicted Q(coffee) on the Y-axis versus P(coffee)
on
the X-axis. In this case, the predicted Q(coffee) series
does not trace
out a straight line. Why not?
Using your estimated demand equation,
how much coffee would you expect a
household to consume if P(coffee)=$1.00, P(tea)=$1.75,
P(creamer)=$0.25
and I=350?
Calculate the average values of
Q(coffee), P(coffee), P(tea), P(creamer)
and I (use Excel's AVERAGE function). At these average values,
calculate--
elasticity of demand for coffee
with respect to P(coffee):
B1*P(coffee)/Q(coffee)
elasticity of demand for coffee with respect to P(tea):
B2*P(tea)/Q(coffee)
elasticity of demand for coffee with respect to P(creamer):
B3*P(creamer)/Q(coffee)
income elasticity of demand for coffee
B4*I/Q(coffee).