FREC 240 Homework #3                               Name_____________________________

1.  UNIVARIATE REGRESSION

It's generally easier (and neater) to graph data points and fit lines through them
using a package like MS-Excel.  You have probably used Excel to create graphs
before. This exercise also introduce Excel's regression utility.

First, copy the following demand data into two columns of an MS-Excel spreadsheet
(don't type in the $ signs for P; just format this column to currency if you want).

Q(demand)  P ($)
15         1.40
21         1.25
35         1.00
24         1.10
22         1.30
19         1.20
11         1.40
16         1.45
26         1.05
29         0.90
38         0.90
28         1.05

Highlight both data columns in the spreadsheet and use the chart wizard to create
an XY (scatter) plot of these data.  Q should be on the horizontal axis; P on the
vertical axis.

Now use Excel's regression utility to calculate the slope and intercept of the
equation that best fits these data.  Most datapoints in a scatterplot have some
vertical deviation from any trendline drawn through the scatter.  Excel will
estimate the slope and intercept of the line that minimizes the sum of the squared
deviations (or "errors") of all the data points from that line.

Here you will be determining the values of the coefficients a and b that minimize
the sum of the squared e's (errors):

   Q = a + bP + e

You may need to load Excel's Analysis ToolPak first: from the menu, use
Tools--Add-Ins and click on the Analysis TookPak check-box.

To do a regression, from the menu, use Tools--Data Analysis and select
"Regression" from the list of tools.

In the Regression window, you need to input the Y-Range (click on the red select
icon at the right of that field, highlight the column in the spreadsheet, then
click on the red icon to okay the selection), then the X-range.  The Y-Range is
the "dependent variable" on the left-hand side of the equal sign; the X-range
contains one or more "independent variables" on the right-hand side of the equal
sign that are supposed to explain the variation in the dependent variable.  Note
that the input options allow you to force the intercept to zero, but you don't
want to  do that here!

Once you have input the X and Y ranges, you need to specify where Excel should
direct your output: it's usually most convenient to specify the upper left corner
of a blank output range within the same spreadsheet.

Here's a portion of the regression output you should get if you do this correctly:

-----------------------------------------------------

SUMMARY OUTPUT

Regression Statistics
Multiple R        0.910124427
R Square          0.828326473
Adjusted R Square 0.811159121
Standard Error    3.497799308
Observations      12

ANOVA
           df SS          MS          F
Regression 1  590.3206667 590.3206667 48.25009944
Residual  10  122.346      12.2346
Total     11  712.6666667

             Coeffs StdError     t-Stat
Intercept    67.58  6.402023899  10.55603682
X Variable1 -37.64  5.418767388  -6.94622915

------------------------------------------------------

It looks messy, but there are only a few things here you need to worry about.
First, the Intercept and X-Variable1 coefficients are the slope and intercept
parameters you're after:

   Q = 67.58 - 37.64P

Simple, huh?  This is the demand equation that fits the market data best.
If you plotted the points by hand, and drew a trendline with a straightedge
through the points, and solved for the slope and intercept of the line,
you would get values very close to these.

The R-Square coefficient tells you what percent of the total variation in Q
is explained by your demand model.  It ranges from zero to one.  This model,
using just price, explains 82.8% of the variation in Q.  We could increase
the R-Square coefficient by including a second variable in the X-range, such
as the price of a substitute good.

The estimated Intercept and X-Variable coefficients are just the most likely
coefficient values.  Since the data are noisy, there is a range of possible
values for each model coefficient.  The Standard Errors and t-Statistics are
indicate how wide this range is.  They can be used to determine if a
coefficient is in fact significantly different from zero.

The t-Statistic is simply the coefficient divided by its standard error.
I will give you a t-Distribution table and explain how to use it in class.
In general, the critical t-value thresholds to look for are--

2.0 or greater means there is a 95% probability that the reported sign of
    your coefficient is correct, i.e., the coefficient is significantly
    different from zero with 95% confidence.
2.7 or greater means there is a 99% probability that the reported sign of
    your coefficient is correct.
With small datasets (<30 datapoints) such as this one, the critical t-values
are actually a bit higher.
(Ignore the sign on your t-Statistic; it just matches the sign on the
coefficient.)

In a blank column adjacent to the Q column, use the estimated coefficients to
calculate predicted values of Q.  (Multiply each value of P by the slope
coefficient and add the intercept.)  Do the predicted Q values match the
actual Q values fairly closely?  Create an XY (scatter) plot with Price on the
X-axis and both actual and predicted Q's in the Y-Range.  Format the actual
Q series as points only; format the predicted Q series as points with a line
running through them.  Make sure the predicted Q series trace out a straight
line.
 

2. MULTIVARIATE REGRESSION

As hinted above, the Excel regression utility can handle multiple right-hand
side variables in a regression model.  Here's an exercise to demonstrate this:

Suppose we want to relate consumer demand for coffee Q(coffee) to four
variables:
the price of coffee P(coffee)
the price of a substitute drink, tea P(tea)
the price of a complement good, non-dairy creamer P(creamer)
consumer income I

In this exercise, we specify and test a linear demand model

  Q(coffee) = A + B1*P(coffee) + B2*P(tea) + B3*P(creamer) + B4*I

We hypothesize that B1 will be negative (meaning the demand for coffee is
negatively related to its own price).
What sign do you hypothesize for B2?  Why?
What sign to you hypothesize for B3?  Why?
What sign to you hypothesize for B4?  Why?

Your research assistant interviewed 22 households in different regions of the
US and obtained the following data (price in dollars, income in hundreds of
dollars).  Copy it (ACCURATELY!) into an Excel spreadsheet.

---------------------------------------------------------
 
Obs Q(coffee) P(coffee) P(tea) P(creamer) I(ncome)
1 3 1.04 1.77 0.29 113
2 6 0.95 2.15 0.26 105
3 5 0.88 1.66 0.23 147
4 4 0.95 1.58 0.22 266
5 6 0.90 2.07 0.24 132
6 4 1.16 1.99 0.28 807
7 4 0.95 1.49 0.20 456
8 5 1.04 2.35 0.28 281
9 8 0.87 2.17 0.28 186
10 4 0.97 1.63 0.26 444
11 10 1.08 1.80 0.27 771
12 6 0.93 1.96 0.28 508
13 2 1.15 1.88 0.28 134
14 0 1.11 1.48 0.25 128
15 3 1.08 1.99 0.20 329
16 7 1.00 2.20 0.26 325
17 3 1.11 1.52 0.23 606
18 3 1.15 1.74 0.20 139
19 2 1.11 1.68 0.25 475
20 8 0.96 1.88 0.26 541
21 10 0.91 2.16 0.26 610
22 2 1.16 1.64 0.26 249

--------------------------------------------------------.

Now regress Q(coffee) as the Y-Range against P(coffee), P(tea), P(creamer)
and I all included in the X-Range.  Direct the output to empty cells to the
right of your data.

Is the sign of the own-price coefficient B1 negative as hypothesized?
Does its t-statistic indicate it is statistically significant at the 95%
confidence level?  At the 99% confidence level?

Does the sign of the cross-price coefficient B2 confirm that tea is a
substitute for coffee?  Is it statistically significant?  At what confidence
level?  Explain.

Does the sign of the cross-price coefficient B3 confirm that creamer is a
complement to coffee?  Is it statistically significant?  At what level?
Explain.

Does the sign of the income coefficient B4 confirm that coffee is a normal
good?  Is it statistically significant?  At what level?  Explain.

In a blank column adjacent to your Q(coffee) column, calculate predicted
values for Q(coffee) from the regression coefficients: write a formula to
add A + B*P(coffee) + B2*P(tea) + B3*P(creamer) + B4*I).  Create an XY
plot of actual and predicted Q(coffee) on the Y-axis versus P(coffee) on
the X-axis.  In this case, the predicted Q(coffee) series does not trace
out a straight line.  Why not?

Using your estimated demand equation, how much coffee would you expect a
household to consume if P(coffee)=$1.00, P(tea)=$1.75, P(creamer)=$0.25
and I=350?

Calculate the average values of Q(coffee), P(coffee), P(tea), P(creamer)
and I (use Excel's AVERAGE function).  At these average values, calculate--

elasticity of demand for coffee with respect to P(coffee):
   B1*P(coffee)/Q(coffee)
elasticity of demand for coffee with respect to P(tea):
   B2*P(tea)/Q(coffee)
elasticity of demand for coffee with respect to P(creamer):
   B3*P(creamer)/Q(coffee)
income elasticity of demand for coffee
   B4*I/Q(coffee).