A Few Datasets

Preface

I have created this webpage to provide you with a few datasets that you might use for your course project and to explain what I hope you will learn by conducting an econometric analysis.

Organizing Information, Identifying Trends

I want you to learn econometrics and the best way to learn econometrics is to do it. But more broadly, I hope that conducting an econometric analysis will teach you how to organize information and identify trends.

In the specific case of an econometric analysis, each column in your spreadsheet must represent a variable and each row must represent an observation. So your very first task in an econometric analysis is to properly align that information your spreadsheet.

If you carefully construct your spreadsheet from reliable sources of data (and if you choose a good set of variables to test your null hypothesis), then you should observe some clear trends in your data. Your next task then is to describe those trends, test a series of null hypotheses and report your findings.

Gretl, of course, will help you run regressions and calculate statistics for your analysis. But Gretl is a tool. It is not the tool that is important. It is the quality of your input that is important.

Your spreadsheet is what's important. How you organize information is what's important.

In a more general case, your information might not be the numeric data that we work with in econometrics. It might be names, addresses or whole documents and files. Your data might not even fit into a spreadsheet at all.

But some principles, like functions and variables, will remain the same. And, once again, what will be important is how you organize information. If your information is well-organized, you should observe some clear trends in your data.

To get you started, I have assembled the datasets below. Your task is to identify the most important trends in one of those datasets, test a series of null hypotheses and report your findings.

(back to top)

Your Assignment

For the project proposal, please submit a written description of:

the null hypotheses that you wish to test
the dataset that you plan to test them with

For the final project, please submit a formal paper, in which you describe:

the null hypotheses that you tested
the dataset that you tested them with
summary statistics
how you manipulated the data
the regressions that you ran
your conclusions: should we accept or reject the null hypothesis?

if we accept it, why might the explanatory variable not have any effect on the dependent variable?
if we reject it, how strong is the effect of the explanatory variable on the dependent variable?

(back to top)

OECD Financial Data

variable	code	description	units
country		country name
time		time period
cons_prices	CP	consumer prices	index
employment	EMP	employment	persons
interest_interbank	IRSTCI	interbank interest rate	percent per annum
interest_long	IRLT	long-term interest rate	percent per annum
interest_short	IR3TIB	short-term interest rate	percent per annum
money_broad	MABM	broad money (M3)	index
money_narrow	MANM	narrow money (M1)	index
nom_exch_rate	CC	nominal exchange rate	national currency per US dollar
share_grow	SHARE	share prices	growth rate
share_index	SHARE	share prices	index
oecd		OECD country	dummy variable

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Colombia, Costa Rica, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkiye, United Kingdom, United States

Non-OECD Economies

Argentina, Brazil, China (People's Republic of), Croatia, India, Indonesia, Russia, Saudi Arabia, South Africa

Monthly Data

Jan. 1949 to Nov. 2024

notes

The dataset also contains dummy variables for each country. And for each variable, the Gretl data file also contains differences and log differences (as appropriate).

download

CSV file

Gretl data

description

This dataset contains financial market data from data-explorer.oecd.org covering 38 OECD countries and 9 non-OECD countries. You may use this data to explore the effect that interest rates, exchange rates, inflation rates and the money supply have on share prices.

Specifically, you may test the null hypotheses that:

a change in interest rates is not associated with a change in share prices
a change in exchange rates is not associated with a change in share prices
a change in money supply is not associated with a change in share prices
a change in inflation rates is not associated with a change in share prices

Then focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

which variables have the strongest effect on share prices?
how large is that effect?

For example, if we think that interest rates will rise one percentage point next month, then how much will share prices fall in response to that change?

stationarity

As you conduct your analysis, you must remember that one of the Gauss-Markov assumptions is that your residuals ("error terms") must not be correlated with each other. Related to this assumption is the concept of "stationary residuals" -- the mean and variance of your residuals must be constant over time.

Taking the difference in value between one time period and the next will usually make a series stationary, so if you difference each variable in your regression model your residuals will usually be stationary. Differencing, therefore, usually ensures that your residuals are stationary.

The alternative is to find a co-integrating relationship among your variables that makes the residuals stationary. In practice however, it is difficult to find such co-integrating relationships, so I encourage you to work with the differenced variables.

investor's perspective

From the perspective of an investor, the share price itself is not important. What's important is the change in share price (i.e. the difference in share price).

So from an investment perspective, you want to develop a model that predicts changes in share price. What predicts those changes?

change in interest rate?
change in exchange rate?
change in inflation rate?
change in money supply?

And how large is the effect of those changes on the change in share price?

forecasting

As another possible project, you could forecast changes in share prices and changes in interest rates. In this case, you would not use the full panel of countries. Instead, you would perform a time-series analysis for a single country. (Or, if you want to forecast financial variables for more than one country, you would perform a time-series analysis for each country).

To isolate a single country, go to Gretl's "Sample" menu, select "Restrict based on criterion ..." and then, in the pop-up box, enter the boolean condition: united_states == 1 For your own convenience, you should make the restriction permanent and save the restricted data to a new file. Then you should go to Gretl's "Data" menu, select "Dataset structure ..." and then, in the pop-up box, select "Time series", select "Monthly" frequency and start on: 1949:01 (January 1949). Then, finally, save the dataset one more time.

When forecasting, you need to evaluate your model's predictions out-of-sample. In other words, because you're predicting future changes in financial variables, you need to simulate predictions of the future. Then you evaluate your model's forecasting performance on the simulated future.

For example, the simulated future might be the last two years of data. In that case, you would estimate the model's parameters on the data from January 1949 to December 2022. Then you evaluate its forecasts out-of-sample – from January 1949 to December 2024. To do so, go to Gretl's "Sample" menu, select "Set range ..." and set the sample range to end in 2022:12 (December 2022).

Then, after estimating your model, go to the "Analysis" menu (in the model's window), select "Forecasts" and select the variable that you wish to forecast out-of-sample. When you do, you'll have two options: "dynamic forecast" and "static forecast." Dynamic forecasts use the chained forecasts of lagged variables in the out-of-sample period. Static forecasts simply use the fitted values, even though they're computed out-of-sample.

Finally, to evaluate your model's forecasts, Gretl provides several metrics at the bottom of the forecasts window. Assuming that forecast error causes a securities trader to lose money, then you could evaluate your forecasts with the "Mean Absolute Error" to compute average total loss or you could evaluate your forecasts with the "Root Mean Squared Error," which gives more weight to large losses.

(back to top)

OECD Labor Market Data

variable	code	description	units
country		country name
year		year of observation
gender_wage_gap_median	GWP	gender wage gap – diff. between median wages rel. to men's median wage	percentage
emprate_fe	EMP_WAP	employment rate of females, age 25-54	percentage
emprate_ma	EMP_WAP	employment rate of males, age 25-54	percentage
wkapop_fe	WAP	working age females, age 25-54	persons in thousands
wkapop_ma	WAP	working age males, age 25-54	persons in thousands
eprc_v1	EPL_OV	employment protection – dismissals	from 0 to 6
ept_v1	EPL_T	employment protection – temporary	from 0 to 6
cpi_inflation	CPI	growth rate of consumer prices, all items non-food non-energy	percent per annum
real_min_wage	SM_WG	statutory real minimum wages at constant prices	US dollars, PPP converted
real_gdp_per_capita	B1GQ_R_POP	real gross domestic product per capita	US dollars per person, PPP converted
union_density	TUD	trade union density	percentage of employees

OECD Economies

Annual Data

1990 to 2019

notes

The dataset also contains dummy variables for each country and year. And for the real minimum wage and real GDP per capita variables, the Gretl data file also contains the natural log of the variables.

download

CSV file

Gretl data

description

This dataset contains labor market data from data-explorer.oecd.org covering 38 OECD countries. It's similar to the dataset that we use in class to discuss employment protection. There are two differences. The first is that it covers a different set of years. The second is that this dataset also contains the "gender wage gap" – the percentage difference between median male and female wages relative to the men's median wage.

In class, we discuss the effect of labor market regulation on male and female employment rates. You might extend that analysis by exploring the effect of labor market regulation on the gender wage gap.

If you do, you would account for the simultaneity issues that arise in the supply and demand for male/female labor. Then you would find an instrumental variable to estimate the effect of labor market regulation on the gender wage gap.

In other words, you want to properly specify a model that you can use to test the null hypotheses that:

the minimum wage rate are not associated with the gender wage gap
employment protection is not associated with the gender wage gap
union density is not associated with the gender wage gap

Then you would focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

which variables have the strongest effect on the gender wage gap?
how large is the effect of those variables on the gender wage gap?

(back to top)

OECD Migration Data

variable	code	description	units
country		country name
year		year of observation
edu_lesssecond		below upper secondary education	percentage of population
edu_secondary		upper secondary or post-secondary non-tertiary education	percentage of population
edu_tertiary		tertiary education	percentage of population
emprate	EMP_WAP	employment rate, age 25-54, both sexes	percentage
gini	INC_DISP_GINI	Gini coefficient (based on disposable income) – measure of income inequality	from 0 to 100
inflows_pct	INMIG	new residents in the region coming from another country	percentage of population
inflows_total	INMIG	new residents in the region coming from another country	total persons
netmigration_total	NETMIG	net international migration	total inflows minus total outflows
outflows_pct	OUTMIG	persons who left the region to reside in another country	percentage of population
outflows_total	OUTMIG	persons who left the region to reside in another country	total persons
pop	POP	population	total persons
popgrow	POP	population growth rate	percentage change from previous year
povrate	PR_INC_DISP	poverty rate – less than 50% of national median disposable income	percentage
rgdpcap	B1GQ_R_POP	real GDP per capita	USD per person, PPP converted

OECD Economies

Austria, Belgium, Czechia, Denmark, Estonia, Finland, Germany, Italy, Japan, Latvia, Lithuania, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Switzerland, Turkiye

Annual Data

1990 to 2023

notes

The dataset also contains dummy variables for each country and year. The Gretl data file also contains net migration as a percentage of the population. And the Gretl data file also contains the natural log of the real GDP per capita variable.

download

CSV file

Gretl data

description

This dataset contains migration data from data-explorer.oecd.org covering 21 OECD countries. It also contains data on educational attainment, employment rates, income inequality (as measured by Gini coefficient), poverty rates, population growth rates and real GDP per capita.

If you use this dataset, you can perform an analysis of the economic factors that affect net migration. Specifically, you can test the null hypotheses that:

educational attainment is not associated with net migration
the degree of income inequality is not associated with net migration
the poverty rate is not associated with net migration
the population growth rate is not associated with net migration

Then you would focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

which variables have the strongest effect on net migration?
how large is the effect of those variables on net migration?

(back to top)

<< back to the main page