# A Few Datasets

## Preface

I have created this webpage to provide you with a few datasets that you might use for your course project and to explain what I hope you will learn by conducting an econometric analysis.

...

## Organizing Information, Identifying Trends

I want you to learn econometrics and the best way to learn econometrics is to do it. But more broadly, I hope that conducting an econometric analysis will teach you how to organize information and identify trends.

In the specific case of an econometric analysis, each column in your spreadsheet must represent a variable and each row must represent an observation. So your very first task in an econometric analysis is to properly align that information your spreadsheet.

If you carefully construct your spreadsheet from reliable sources of data (and if you choose a good set of variables to test your null hypothesis), then you should observe some clear trends in your data. Your next task then is to describe those trends, test a series of null hypotheses and report your findings.

Gretl, of course, will help you run regressions and calculate statistics for your analysis. But Gretl is a tool. It is not the tool that is important. It is the quality of your input that is important.

Your spreadsheet is what's important. How you organize information is what's important.

In a more general case, your information might not be the numeric data that we work with in econometrics. It might be names, addresses or whole documents and files. Your data might not even fit into a spreadsheet at all.

But some principles, like functions and variables, will remain the same. And, once again, what will be important is how you organize information. If your information is well-organized, you should observe some clear trends in your data.

To get you started, I have assembled the datasets below. Your task is to identify the most important trends in one of those datasets, test a series of null hypotheses and report your findings.

...

For the project proposal, please submit a written description of:

• the null hypotheses that you wish to test
• the dataset that you plan to test them with

For the final project, please submit a formal paper, in which you describe:

• the null hypotheses that you tested
• the dataset that you tested them with
• summary statistics
• how you manipulated the data
• the regressions that you ran
• your conclusions:   should we accept or reject the null hypothesis?
• if we accept it, why might the explanatory variable not have any effect on the dependent variable?
• if we reject it, how strong is the effect of the explanatory variable on the dependent variable?

...

## OECD Financial Data

variable description units
CCRETT01 relative consumer price indices
CCUS currency exchange rates monthly average
IR3TIB short-term interest rates percent per annum
IRLT long-term interest rates percent per annum
IRSTCI immediate interest rates, call money, interbank rate percent per annum
MANM narrow money (M1) index, seasonally adjusted
SP share prices index

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States

Non-OECD Economies

Argentina, Brazil, China (People's Republic of), Colombia, Costa Rica, India, Indonesia, Lithuania, Russia, Saudi Arabia, South Africa

Monthly Data

Jan. 1950 to Nov. 2017

### description

This dataset contains financial market data from stats.oecd.org covering 35 OECD countries and 11 non-OECD countries. You may use this data to explore the effect that interest rates, exchange rates, inflation rates and the money supply have on share prices.

Specifically, you may test the null hypotheses that:

• a change in interest rates is not associated with a change in share prices
• a change in exchange rates is not associated with a change in share prices
• a change in money supply is not associated with a change in share prices
• a change in inflation rates is not associated with a change in share prices

Then focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

• which variables have the strongest effect on share prices?
• how large is that effect?

For example, if we think that interest rates will rise one percentage point next month, then how much will share prices fall in response to that change?

As you conduct your analysis, you must remember that one of the Gauss-Markov assumptions is that your residuals ("error terms") must not be correlated with each other. Related to this assumption is the concept of "stationary residuals" -- the mean and variance of your residuals must be constant over time.

Taking the difference in value between one time period and the next will usually make a series stationary, so if you difference each variable in your regression model your residuals will usually be stationary. Differencing, therefore, usually ensures that your residuals are stationary.

The alternative is to find a co-integrating relationship among your variables that makes the residuals stationary. In practice however, it is difficult to find such co-integrating relationships, so I encourage you to work with the differenced variables.

But also -- from the perspective of an investor -- the share price itself is not important. What is important to the investor is the change in share price (i.e. the difference in share price).

So from an investment perspective, you want to develop a model that predicts changes in share price. What predicts those changes?

• change in interest rate?
• change in exchange rate?
• change in inflation rate?
• change in money supply?

And how large is the effect of those changes on the change in share price?

...

## OECD Labor Market Data

variable description units
gwagegap gender wage gap percentage
minwage minimum wage in 2014 constant prices 2014 USD PPPs
rgdpcap GDP per head (expenditure approach) at constant prices, constant PPPs, OECD base year = 2010 US dollar
dln_cpi CPI inflation rate (no food, no energy) percentage
uniondens union density -- percentage of wage and salary earners that are trade union members percentage
lrem25fe Employment rate, Aged 25-54, Females percentage
lrem25ma Employment rate, Aged 25-54, Males percentage
lrem25tt Employment rate, Aged 25-54, All Persons percentage
lrem64fe Employment rate, Aged 15-64, Females percentage
lrem64ma Employment rate, Aged 15-64, Males percentage
lrem64tt Employment rate, Aged 15-64, All Persons percentage
lfwa25fe Working age population, Aged 25-54, Females percentage
lfwa25ma Working age population, Aged 25-54, Males percentage
lfwa25tt Working age population, Aged 25-54, All Persons percentage
lfwa64fe Working age population, Aged 15-64, Females percentage
lfwa64ma Working age population, Aged 15-64, Males percentage
lfwa64tt Working age population, Aged 15-64, All Persons percentage
eprc_v1 employment protection -- individual and collective dismissals (regular contracts), version 1 0 to 6
eprc_v2 employment protection -- individual and collective dismissals (regular contracts), version 2 0 to 6
eprc_v3 employment protection -- individual and collective dismissals (regular contracts), version 3 0 to 6
epr_v1 employment protection -- individual dismissals (regular contracts), version 1 0 to 6
epr_v3 employment protection -- individual dismissals (regular contracts), version 3 0 to 6
epc employment protection -- collective dismissals (additional provisions) 0 to 6
ept_v1 employment protection -- temporary employment, version 1 0 to 6
ept_v3 employment protection -- temporary employment, version 3 0 to 6

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States

Annual Data

1985 to 2014

### description

This dataset contains labor market data from stats.oecd.org covering 34 OECD countries. It is the same dataset that I use in class with the addition of "gender wage gap" -- the percentage difference between male and female wages.

In class, we discuss the effect of labor market regulation on male and female employment rates. You might extend that analysis by exploring the effect of labor market regulation on the gender wage gap.

If you do, you would account for the simultaneity issues that arise in the supply and demand for male/female labor. Then you would find an instrumental variable to estimate the effect of labor market regulation on the male/female employment rates and on the gender wage gap.

...

## NYC Vision Zero

variable description units
NODEID intersection identifier
Casualties sum of "Fatalities" and "Injuries" during month count
Fatalities total number of fatalities during month count
PedFatalit number of pedestrian fatalities during month count
BikeFatali number of bicyclist fatalities during month count
MVOFatalit number of motorist fatalities during month count
Injuries total number of injuries during month count
PedInjurie number of pedestrian injuries during month count
BikeInjuri number of bicyclist injuries during month count
MVOInjurie number of motorist injuries during month count
CasualBefore total "Casualties" from 2009 to 2013 sum
CasualAfter total "Casualties" from 2014 to 2017 sum
InjurBefore total "Injuries" from 2009 to 2013 sum
InjurAfter total "Injuries" from 2014 to 2017 sum
FatalBefore total "Fatalities" from 2009 to 2013 sum
FatalAfter total "Fatalities" from 2014 to 2017 sum

Monthly Data

Jan. 2009 to Dec. 2017

nyc-dot_by-mon_with-zeroes.csv.zip (Zipped CSV file)

lib_crosstabs.r (R library)

nyc-dot_crosstabs_v3.r (R script)

### description

This NYC DOT dataset contains information on traffic fatalities and injuries at 30,754 New York City intersections over 9 years.

During the most recent 4 years of data (from Jan. 2014 to Dec. 2017), New York City set a goal of eliminating traffic fatalities and injuries in an initiative called "Vision Zero." Vision Zero reduced the speed limit throughout the city from 35 to 25 miles per hour and changed traffic rules at many intersections.

You may use this dataset to test the null hypothesis that Vision Zero did not reduce fatalities or injuries. And to conduct such a hypothesis test, you might use regression analysis. But because this dataset is so large (3,347,028 observations) we can also create cross-tabulations that directly examine the empirical distribution.

The one requirement is that you must use a high-memory computer for this analysis. Because this dataset is so large (3,347,028 observations), it took 5 minutes to run this code on my computer with 8 GB of RAM.