A Few Datasets

Preface

I have created this webpage to provide you with a few datasets that you might use for your course project and to explain what I hope you will learn by conducting an econometric analysis.

Organizing Information, Identifying Trends

I want you to learn econometrics and the best way to learn econometrics is to do it. But more broadly, I hope that conducting an econometric analysis will teach you how to organize information and identify trends.

In the specific case of an econometric analysis, each column in your spreadsheet must represent a variable and each row must represent an observation. So your very first task in an econometric analysis is to properly align that information your spreadsheet.

If you carefully construct your spreadsheet from reliable sources of data (and if you choose a good set of variables to test your null hypothesis), then you should observe some clear trends in your data. Your next task then is to describe those trends, test a series of null hypotheses and report your findings.

Gretl, of course, will help you run regressions and calculate statistics for your analysis. But Gretl is a tool. It is not the tool that is important. It is the quality of your input that is important.

Your spreadsheet is what's important. How you organize information is what's important.

In a more general case, your information might not be the numeric data that we work with in econometrics. It might be names, addresses or whole documents and files. Your data might not even fit into a spreadsheet at all.

But some principles, like functions and variables, will remain the same. And, once again, what will be important is how you organize information. If your information is well-organized, you should observe some clear trends in your data.

To get you started, I have assembled the datasets below. Your task is to identify the most important trends in one of those datasets, test a series of null hypotheses and report your findings.

(back to top)

Your Assignment

For the project proposal, please submit a written description of:

  • the null hypotheses that you wish to test
  • the dataset that you plan to test them with

For the final project, please submit a formal paper, in which you describe:

  • the null hypotheses that you tested
  • the dataset that you tested them with
  • summary statistics
  • how you manipulated the data
  • the regressions that you ran
  • your conclusions:   should we accept or reject the null hypothesis?
    • if we accept it, why might the explanatory variable not have any effect on the dependent variable?
    • if we reject it, how strong is the effect of the explanatory variable on the dependent variable?

(back to top)

OECD Financial Data

variable code description units
country country name
time time period
cons_prices CP consumer prices index
employment EMP employment persons
interest_interbank IRSTCI interbank interest rate percent per annum
interest_long IRLT long-term interest rate percent per annum
interest_short IR3TIB short-term interest rate percent per annum
money_broad MABM broad money (M3) index
money_narrow MANM narrow money (M1) index
nom_exch_rate CC nominal exchange rate national currency per US dollar
share_grow SHARE share prices growth rate
share_index SHARE share prices index
oecd OECD country dummy variable

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Colombia, Costa Rica, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkiye, United Kingdom, United States

Non-OECD Economies

Argentina, Brazil, China (People's Republic of), Croatia, India, Indonesia, Russia, Saudi Arabia, South Africa

Monthly Data

Jan. 1949 to Nov. 2024

notes

The dataset also contains dummy variables for each country. And for each variable, the Gretl data file also contains differences and log differences (as appropriate).

download

CSV file

Gretl data

description

This dataset contains financial market data from data-explorer.oecd.org covering 38 OECD countries and 9 non-OECD countries. You may use this data to explore the effect that interest rates, exchange rates, inflation rates and the money supply have on share prices.

Specifically, you may test the null hypotheses that:

  • a change in interest rates is not associated with a change in share prices
  • a change in exchange rates is not associated with a change in share prices
  • a change in money supply is not associated with a change in share prices
  • a change in inflation rates is not associated with a change in share prices

Then focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

  • which variables have the strongest effect on share prices?
  • how large is that effect?

For example, if we think that interest rates will rise one percentage point next month, then how much will share prices fall in response to that change?

stationarity

As you conduct your analysis, you must remember that one of the Gauss-Markov assumptions is that your residuals ("error terms") must not be correlated with each other. Related to this assumption is the concept of "stationary residuals" -- the mean and variance of your residuals must be constant over time.

Taking the difference in value between one time period and the next will usually make a series stationary, so if you difference each variable in your regression model your residuals will usually be stationary. Differencing, therefore, usually ensures that your residuals are stationary.

The alternative is to find a co-integrating relationship among your variables that makes the residuals stationary. In practice however, it is difficult to find such co-integrating relationships, so I encourage you to work with the differenced variables.

investor's perspective

From the perspective of an investor, the share price itself is not important. What's important is the change in share price (i.e. the difference in share price).

So from an investment perspective, you want to develop a model that predicts changes in share price. What predicts those changes?

  • change in interest rate?
  • change in exchange rate?
  • change in inflation rate?
  • change in money supply?

And how large is the effect of those changes on the change in share price?

forecasting

As another possible project, you could forecast changes in share prices and changes in interest rates. In this case, you would not use the full panel of countries. Instead, you would perform a time-series analysis for a single country. (Or, if you want to forecast financial variables for more than one country, you would perform a time-series analysis for each country).

To isolate a single country, go to Gretl's "Sample" menu, select "Restrict based on criterion ..." and then, in the pop-up box, enter the boolean condition:  united_states == 1   For your own convenience, you should make the restriction permanent and save the restricted data to a new file. Then you should go to Gretl's "Data" menu, select "Dataset structure ..." and then, in the pop-up box, select "Time series", select "Monthly" frequency and start on:  1949:01  (January 1949). Then, finally, save the dataset one more time.

When forecasting, you need to evaluate your model's predictions out-of-sample. In other words, because you're predicting future changes in financial variables, you need to simulate predictions of the future. Then you evaluate your model's forecasting performance on the simulated future.

For example, the simulated future might be the last two years of data. In that case, you would estimate the model's parameters on the data from January 1949 to December 2022. Then you evaluate its forecasts out-of-sample – from January 1949 to December 2024. To do so, go to Gretl's "Sample" menu, select "Set range ..." and set the sample range to end in 2022:12  (December 2022).

Then, after estimating your model, go to the "Analysis" menu (in the model's window), select "Forecasts" and select the variable that you wish to forecast out-of-sample. When you do, you'll have two options: "dynamic forecast" and "static forecast." Dynamic forecasts use the chained forecasts of lagged variables in the out-of-sample period. Static forecasts simply use the fitted values, even though they're computed out-of-sample.

Finally, to evaluate your model's forecasts, Gretl provides several metrics at the bottom of the forecasts window. Assuming that forecast error causes a securities trader to lose money, then you could evaluate your forecasts with the "Mean Absolute Error" to compute average total loss or you could evaluate your forecasts with the "Root Mean Squared Error," which gives more weight to large losses.

(back to top)

OECD Labor Market Data

variable code description units
country country name
year year of observation
gender_wage_gap_median GWP gender wage gap – diff. between median wages rel. to men's median wage percentage
emprate_fe EMP_WAP employment rate of females, age 25-54 percentage
emprate_ma EMP_WAP employment rate of males, age 25-54 percentage
wkapop_fe WAP working age females, age 25-54 persons in thousands
wkapop_ma WAP working age males, age 25-54 persons in thousands
eprc_v1 EPL_OV employment protection – dismissals from 0 to 6
ept_v1 EPL_T employment protection – temporary from 0 to 6
cpi_inflation CPI growth rate of consumer prices, all items non-food non-energy percent per annum
real_min_wage SM_WG statutory real minimum wages at constant prices US dollars, PPP converted
real_gdp_per_capita B1GQ_R_POP real gross domestic product per capita US dollars per person, PPP converted
union_density TUD trade union density percentage of employees

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Colombia, Costa Rica, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkiye, United Kingdom, United States

Annual Data

1990 to 2019

notes

The dataset also contains dummy variables for each country and year. And for the real minimum wage and real GDP per capita variables, the Gretl data file also contains the natural log of the variables.

download

CSV file

Gretl data

description

This dataset contains labor market data from data-explorer.oecd.org covering 38 OECD countries. It's similar to the dataset that we use in class to discuss employment protection. There are two differences. The first is that it covers a different set of years. The second is that this dataset also contains the "gender wage gap" – the percentage difference between median male and female wages relative to the men's median wage.

In class, we discuss the effect of labor market regulation on male and female employment rates. You might extend that analysis by exploring the effect of labor market regulation on the gender wage gap.

If you do, you would account for the simultaneity issues that arise in the supply and demand for male/female labor. Then you would find an instrumental variable to estimate the effect of labor market regulation on the gender wage gap.

In other words, you want to properly specify a model that you can use to test the null hypotheses that:

  • the minimum wage rate are not associated with the gender wage gap
  • employment protection is not associated with the gender wage gap
  • union density is not associated with the gender wage gap

Then you would focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

  • which variables have the strongest effect on the gender wage gap?
  • how large is the effect of those variables on the gender wage gap?

(back to top)

OECD Migration Data

variable code description units
country country name
year year of observation
edu_lesssecond below upper secondary education percentage of population
edu_secondary upper secondary or post-secondary non-tertiary education percentage of population
edu_tertiary tertiary education percentage of population
emprate EMP_WAP employment rate, age 25-54, both sexes percentage
gini INC_DISP_GINI Gini coefficient (based on disposable income) – measure of income inequality from 0 to 100
inflows_pct INMIG new residents in the region coming from another country percentage of population
inflows_total INMIG new residents in the region coming from another country total persons
netmigration_total NETMIG net international migration total inflows minus total outflows
outflows_pct OUTMIG persons who left the region to reside in another country percentage of population
outflows_total OUTMIG persons who left the region to reside in another country total persons
pop POP population total persons
popgrow POP population growth rate percentage change from previous year
povrate PR_INC_DISP poverty rate – less than 50% of national median disposable income percentage
rgdpcap B1GQ_R_POP real GDP per capita USD per person, PPP converted

OECD Economies

Austria, Belgium, Czechia, Denmark, Estonia, Finland, Germany, Italy, Japan, Latvia, Lithuania, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Switzerland, Turkiye

Annual Data

1990 to 2023

notes

The dataset also contains dummy variables for each country and year. The Gretl data file also contains net migration as a percentage of the population. And the Gretl data file also contains the natural log of the real GDP per capita variable.

download

CSV file

Gretl data

description

This dataset contains migration data from data-explorer.oecd.org covering 21 OECD countries. It also contains data on educational attainment, employment rates, income inequality (as measured by Gini coefficient), poverty rates, population growth rates and real GDP per capita.

If you use this dataset, you can perform an analysis of the economic factors that affect net migration. Specifically, you can test the null hypotheses that:

  • educational attainment is not associated with net migration
  • the degree of income inequality is not associated with net migration
  • the poverty rate is not associated with net migration
  • the population growth rate is not associated with net migration

Then you would focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

  • which variables have the strongest effect on net migration?
  • how large is the effect of those variables on net migration?

(back to top)

Copyright © 2002-2025 Eryk Wdowiak