Regression Exercises
  1. Every year the College Board publishes average SAT scores for college-bound seniors by state, and every year a chorus of morons point out that states which spend a lot per pupil on K-12 public education generally have lower average SAT scores that states that spend relatively little per pupil. For example, New York spends more than twice as much per pupil as Mississippi, and their kids average more than 200 points lower on the SAT (sum of reading, math and writing), which just goes to show you how our wasteful, bloated public education system has been ruined by all those incompetent overpaid bureaucrats, lazy teachers protected by corrupt unions, negligent parents, dope-addled sex-crazed kids and their rap music, somebody oughta do something, blah blah harrumph harrumph!

    Check out some recent data for yourself. (Note the College Board's cautionary note in the second line of the spreadsheet.) Use Excel's Regression utility to test the following model:

    SUM08 = B0 + B1EXPPUPIL06

    Are the critics onto something? Is the negative B1 coefficient statistically significant?
    How much of the total variation in SAT scores does this model explain?

    Use Excel's Correlation utility to evaluate the pairwise correlations between states' mean SAT scores, per-pupil spending and SAT test participation rates. Explain why test participation is positively correlated with per-pupil spending. Explain why a higher rate of test participation would reduce a state's average SAT score.

    This is a textbook example of an omitted variable problem. Maybe per-pupil spending has something to do with who takes the SAT and who doesn't. Estimate and interpret an alternative model that accounts for variable test participation rates:

    SUM08 = B0 + B1EXPPUPIL06 + B2PARTICIP08

    Explain the logic behind these model results.
    How much of the total variation in SAT scores does this model explain?


  2. Download crime_by_state.xls. The variable definitions for this spreadsheet are as follows:
    VIOCRIM06	Violent Crimes per 100K Persons, 2006
    MURDER06	Murders per 100K Persons, 2006
    POP1864_06	Adult Population (ages 18-64) in 2006
    COPS06		Public Law Enforcement Officers, 2006
    INPRISON06	Incarcerated Population, 2006
    DEATHPNLTY	=1 if State has Death Penalty; =0 Otherwise
    DEATHROW06	Prisoners on Death Row, 2006
    EXECUTED	Number of Executions between 1976 and 2006
    PCTINCARC06	Percent of Adult Population Incarcerated, 2006
    PCTHSGRAD	Percent of Adult Population with High School Diploma or equivalent
    UNEMPL06	Statewide Unemployment Rate, 2006
    MEDHHINC	Median Household Income, 2006
    <200%POV	Percent of Population Below 200% of Poverty Level, 2006
    <100%POV	Percent of Population Below 100% of Poverty Level, 2006
    <50%POV		Percent of Population Below 50% of Poverty Level, 2006
    GINI06		Gini Coefficient of Household Income Inequality, 2006
                    (higher means more unequal)
    
    Use these data to create any additional variables that you think may affect murder rates and overall violent crime rates. You are free to look for other variables on the web and add them into your analysis if you wish.

    Estimate a series of multivariate regression models to determine the set of explanatory variables that most significantly affect murder rates.

    Does jailing more criminals reduce violent crime rates?
    Is the death penalty an effective deterrent to murder?
    Does the percentage of prisoners on death row affect murder rates?
    Does the rate at which states execute their death row inmates affect murder rates?
    Will hiring additional police reduce violent crime rates or murder rates?
    What policies would you recommend to reduce murder and overall violent crime rates?