Heteroscedasticity in the Logit Model

The purpose of these notes is to provide an example of heteroscedasticity and to explain how we can account for the unequal variances of our residuals in a weighted least squares model. The specific example of heteroscedasticity that we will use is an analysis of proportions data in a logit model.

Suppose that we observe our dependent variable as a proportion or percentage:

0 \leq p_{i} \leq 1

which reflects an underlying "true" probability:

0 \leq π_{i} \leq 1

If we used the raw percentage as our dependent variable, then our predictions of the dependent variable would not be bounded by zero and one. Instead, it is possible that some of our predictions of the dependent variable would be less than zero and some would be greater than one.

Converting the raw percentage to log odds gives us an unbounded dependent variable:

l n (\frac{p_{i}}{1 - p_{i}}) = α + β X_{i} + ϵ_{i}

but the residuals in such a logit model do not have constant variance:

v a r (ϵ_{i}) = \frac{1}{n_{i} π_{i} (1 - π_{i})}

Instead the residual variance will be larger when the true probability is closer to zero or one.

Assuming that our explanatory variable is correlated with the true probability, the residual variance will be larger at extreme values of our explanatory variable.

Because we know the nature of the heteroscedasticity, we can modify our regression model to account for it. All we need is an unbiased estimate of the true probability for each observation.

OLS will provide such an unbiased estimate if there is no correlation between the residual and the explanatory variable (i.e. if the explanatory variable only affects the residual variance, not the residual itself).

So our first step is to estimate the regression coefficients with OLS:

l n (\frac{p_{i}}{1 - p_{i}}) = α + β X_{i} + ϵ_{i}

and then use the estimated coefficients to predict the true log odds:

l n (\frac{\hat{π_{i}}}{1 - \hat{π_{i}}}) = \hat{α} + \hat{β} X_{i}

But what we really want is an unbiased prediction of the true probability:

\hat{π_{i}} = \frac{1}{1 + e x p (- α - \hat{β} X_{i})}

which we can use to weight each observation:

w_{i} = \sqrt{n_{i} \hat{π_{i}} (1 - \hat{π_{i}})}

And in the second step, we apply the weight to each variable and estimate the regression coefficients:

w_{i} l n (\frac{p_{i}}{1 - p_{i}}) = α w_{i} + β w_{i} X_{i} + w_{i} ϵ_{i}

Weighting each observation in such a manner accounts for the heteroscedasticity and the residuals in our regression model should now have constant variance.

Important note: To properly implement this technique in practice, you must read the software's documentation. For example, when using the weighted least squares feature of Gretl, we do not take the square root:

weight = num * p_hat * ( 1 - p_hat )

because Gretl weights the squared residuals.

<< back to the main page