Tuesday, May 1, 2012

The Variance Inflation Factor (VIF)

 By Kirk Harrington, special thanks to Benjamin Nutter (my biostats friends and partner in SAEG)

To check and correct for multi-collinearity in a linear-type regression, a great tool to use is the variance inflation factor.  Here's how to do it.

First, run each predictor against every other predictor in your model.  After you do this, you will get an R-Squared for each.  Then, calculation a VIF for each R-Squared (which corresponds to a given predictor).  The formula for the VIF is VIF=1/1-R-squared.

Then determine a suitable target VIF.  For example, a good target for you industry might be 5 or 3 (if you need more refinement).  A biostats friend of mine said you can even go up to 10--depending on the level of precision that is required in your field.  An engineer or pharmaceutical science may want 3, whereas someone in credit risk may want 5.  Someone in marketing may be willing to accept 10.

To correct for multicollinearity, start by taking out the variable with the highest VIF (that is more than your target VIF).  Then, rerun the VIF for each predictor again (you should see the VIFs declining).  Repeat this process until your variables are all within your target VIF.

------------------

An Effective Analyst Thought

A good tool that you can build for use in your modeling adventures is an inventory of the types of predictors that you ever found useful to you (or interesting to you) as you have built or validated models.  You can build a spreadsheet that says the name of the predictor, what type of model it was used in, any formulas associated with it, and what type of effect it was trying to explain in the model.  Once you build this inventory up enough, this is great to use to get ideas as you build other models, to provide ideas to improve models after validating them, to talk about during job interviews, or to share with other analysts and their modeling efforts.

No comments:

Post a Comment