Demographics, Resources and Development

The CIA maintains a database at https://www.cia.gov/cia/publications/factbook/index.html called the World Factbook that contains up-to-date cross-sectional data for more than 220 nations (including some you've never heard of), including information on their geography, demographics, politics, economics, infrastructure and military.  The US Dept. of Energy's Energy Information Administration maintains similarly detailed information on countries' energy reserves, production and consumption. I imported data from these sites to an Excel spreadsheet, omitting a few small nations with missing data.  The worksheet tabs contain
  1. GDP and demographic data for 219 nations
  2. Gini coefficients representing degree of income inequality for 121 nations.
  3. Fossil fuel data for 163 nations, and
  4. Indices of political corruption for 155 nations.
Dowload the spreadsheet and use Excel's regression utility to perform the following analyses:

(For each of your regression analyses, you can copy and paste the appropriate data columns into a separate worksheet tab, then delete any rows with missing data.)

1. How does income (per-capita GDP) affect fertility rates?

A population's fertility rate is the average number of live births per female over her lifetime.  Create an XY-plot of Fertility rates on the Y-axis against Incomes (GDP/capita) on the X-axis.  Include a trendline with the regression equation and R-square in your plot. Then regress Fertility (Y range) against Income (X range) to analyze the statistical significance of the regression coefficients.  

Since the trend in the plotted datapoints is obviously curved rather than linear, you can obtain a better linear regression model by transforming the data. Calculate the natural logarithms of fertility and income, then create an XY-plot of ln(F) against ln(I), including the regression trendline, equation and R-square. Then regress ln(F) vs. ln(I). Compare this log-log model against the original model: which fits the data better?

The negative coefficient for Income implies that children are inferior goods: poor nations have significantly higher birthrates than rich nations. Explain why children are inferior goods.  

[It turns out that the log-log model implies constant elasticities, and the coefficient on the income variable represents the global income elasticity of demand for children. If:
          ln(F) = B0 + B1ln(I)
then exponentiating both sides yields
          F = eB0+B1ln(I) = eB0eB1ln(I) = eB0IB1
and if income elasticity E = (dF/dI)(I/F) then for this model
          E = (eB0B1IB1-1)(I/[eB0IB1) = B1           (all the other terms cancel out!)
So the income coefficient of the log-log model is the income elasticity of demand for children.]

At what level of per-capita GDP does the log-log model predict a zero-population-growth fertility rate of 2.10?

2. How does age structure affect per-capita GDP?

Per-capita GDP depends on the percentage of working-age people in the population. The graph on the right shows male and female percentages of total population by age: children (<15), working-age adults (15-64) and elderly (>65) for the US, Mali and Japan. Note the better survival rates of elderly females, and the relative dominance of youth in Mali's population and elderly in Japan's population.

The "youth effect" hypothsis states that countries with large proportions of children (14 and younger) are likely to have lower per-capita GDP.  Likewise, the "elderly effect" hypothesis states that countries with larger proportions of elderly (65 and older, presumably non-working) people may also have lower per-capita GDP.

Create an XY-plot of percent <15, percent 15-64 and percent 65+ (Y-axis) versus the natural log of per-capita GDP.
Regress the natural logarithm of per-capita GDP against both percent <15 and percent 65+ to test the youth and elderly effects.  (If you regress per-capita GDP against all three percent Age variables the regression will fail because percent 15-64 is exactly correlated with the sum of percent <15 and percent 65+.) 
Does your regression model support the youth effect hypothesis?
Does it support the elderly effect hypothesis?
Why might the elderly effect be insignificant in this model?

3. How do income inequality (Gini coefficient), literacy rate and fertility rate affect per-capita GDP?

The Gini coefficient is a measure of income inequality, calculated from the cumulative distribution of wealth by income percentile (Lorenz curve).  It is the ratio of the area between the Lorenz curve and the 45-degree line representing a perfectly equal income distribution, divided by the total area under the 45-degree line.   A nation with a low Gini coefficient (<0.3) will typically have a large middle class and relatively few very poor or very rich people.  A nation with a very high Gini coefficient (>0.6) will typically have extensive poverty, little or no middle class, and a small economic elite.   US income inequality has increased.  The Census Bureau has reported rising Gini coefficients:  0.394 in 1970, 0.403 in 1980, 0.428 in 1990, 0.462 in 2000 and 0.469 in 2005.

Use the data in the second worksheet tab (151 nations) to regress per-capita GDP against the Gini coefficient, overall literacy rate and fertility rate.
Explain the statistical significance of the regression model. 

Explain the economic development policy implications of this regression model.



Browsing through the other data in this spreadsheet may suggest other research questions that you could answer by testing statistical relationships between variables in this dataset.
What is the income elasticity of demand for fossil fuels?
Is higher per-capita GDP correlated with higher scores on Transparency International's Corruption Perception Index (higher scores imply less corruption)?
What is the relationship between income inequality and corruption?

These are the types of questions that undergraduates pursuing a Degree with Distinction have researched.