Monday, April 7, 2014

Some Model Validation Thoughts

Some Model Validation Thoughts - Part 1, A Structured Approach idea

As I mentioned, I wanted to share some model validation tips I have picked up over the years having worked with financial service regulators, from reading regulatory guidelines, and developing out my approach.  In the below, I mainly focus on OCC Bulletin 2011-16 (which can be found at http://ithandbook.ffiec.gov/media/resources/3676/occ-bl2000-16_risk_model_validation.pdf).  I will likely share more on this subject, but I hope you find this a good start (and useful if you are in the position of validating a model). 

Kirk Harrington, SAEG Partner

If I had to outline my model validation approach, it would be as follows:

DATA PIECEàMETHODOLOGY PIECEàMODEL STRENGTH PIECEàCODING/INPUT CHECKSàISSUES RESOLVE

Why DATA piece:

In OCC 2000-16, “Validating the Model Inputs Component” is its own section.  It says “It is possible that data inputs contain major errors while the other components of the model are error free.  When this occurs, the model outputs become useless, but even an otherwise sound validation will not necessarily reveal the errors.  Hence, auditing of the data inputs is an indispensable and separate element of a sound model-validation process, and should be explicitly included in the bank’s policy.”

In my validation of the DATA, I focus on replicating/understanding:

·         Summary and key result tables.  What I mean by key result tables is that if a variable is being used in the model and has a specific population proportion stated in the documentation, I validate that proportion.

·         Figures with key rates, i.e. Percentages, averages, etc.  Mainly I try to focus on figures that if off, would affect the model’s results.

·         Exclusions—Are exclusions appropriate, do the numbers of exclusions match?  If too many exclusions happen, this could erode the model’s effectiveness.

In my mind, it is crucial to understand the population that is used prior to running the model.  If this inputs piece is inaccurate, anything run after it is highly questionable.

Why METHODOLOGY PIECE?

Models can be complex and what formulas are used and how they are translated and if a given approach is appropriate is highly critical.

The OCC 2000-16 says this:  “Implementing a computer model usually requires the modeler to resolve several questions in statistical and economic theory.  Generally, the answer to those theoretical questions is a matter of judgment, though the theoretical implementation is also prone to conceptual and logical error.”  Later, it continues…”Regardless of the qualifications of the model developers, an essential element of model validation is independent review of the theory that the bank uses.”  It also goes on to talk about how comparing to other models, either at the bank of publicly available, is also useful.

My approach to this piece typically involves:

·         Doing outside research to see how/where a given methodology is used.  For this, I may go through a digital database containing papers on financial models (for example) or talk to friends I have in the industry that may be using similar approaches (to get their feedback).  If I can’t find anything, I might also go through the exercise of breaking down the formula to understand its various components and if those components are reasonable.

·         Understanding if the variables used in prediction are reasonable (and if the coefficients and their signs make sense)

·         Understanding of the dependent variable, how it is calculated, and if it is appropriate for the type of model used

·         Understanding how the model is translated and if that translation is accurate given the type of model it is.

Why MODEL STRENGTH PIECE?

While not explicitly states in OCC 2000-16, this piece helps to understand key matters mentioned, like model results, code and mathematics.  There are various metrics and tests that can be created for a given model, and those metrics can speak to a model’s fit, specification (if it has enough variables to show a complete picture of prediction on the dependent variable), and strengths (or weaknesses).  For example, if a model has a low R-Square, that can be a sign of having weak predictive power.  However, if it is a pseudo R-squared, this is harder to interpret.  In this case, it is helpful to see models done similarly to judge the pseudo R-squared more accurately.  Here is an article that speaks to this matter:  http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

More important than just judging a model based on its R-square and fits statistics is to look to out-of-sample or backtests performed at the time the model was developed.  OCC 2000-16 also suggests that “model developers and validators should compare its results against those of comparable models, market prices, or other available benchmarks.”  I typically look to see if these kind of tests are in the model documentation and do analyze them.  I am especially wary of results that look to be ‘in-sample’ with exact fit.

Why CODING/INPUT Checks?

This is synonymous with checking the ‘Model Processing Component’.  For me, this entails line my line proofreading of code (see Code and Mathematics section of 2000-16) on the model and checking the correctness of mathematics and formulas used.  Something new I’ve added to my process (if feasible) is to construct an identical model to check coefficients and significance levels against those stated.  Because I’ve prepared the data to check against tables and key figures, this part is not very difficult to accomplish.  In OCC 2000-16 it does state that constructing an identical model is useful, especially if the model is simple (i.e. constructed from spreadsheets).  For more complex models, the alternative approaches it suggests are the line by line reading of the code and benchmarking against available models.

 ISSUES RESOLVE

If any issues are discovered in my process, I will bring these to the attention of the model owner first, then the model creator.  I will typically only go to the creator if I am unsatisfied with the answers I receive from the owner or if it is in regards to a matter that I know only the model creator can answer.  Here are some general rules I go by:

·         With data, if there is a X% discrepancy (agreed upon with the model owner), I will flag it as an issue

·         With model strength, I will raise it as an issue more so for informational purposes so that the model owner understands the weaknesses and strengths of the model

·         With coding and input, I will check if signs, formulas, and coefficients are correct.  I also like to focus on whether variables are being ‘prepped’ properly before going into the main model formula

·         I will typically put any issue I find (that is not resolved immediately between the model owner, creator, and myself) on an issues log.  I use this log to follow the resolve of issues throughout the validation process

No comments:

Post a Comment