As I mentioned, I wanted to share some model validation tips I have picked up over the years having worked with financial service regulators, from reading regulatory guidelines, and developing out my approach. In the below, I mainly focus on OCC Bulletin 2011-16 (which can be found at http://ithandbook.ffiec.gov/media/resources/3676/occ-bl2000-16_risk_model_validation.pdf). I will likely share more on this subject, but I hope you find this a good start (and useful if you are in the position of validating a model).
Kirk Harrington, SAEG Partner
If I had to outline my model validation approach, it would be as follows:
DATA PIECEàMETHODOLOGY
PIECEàMODEL
STRENGTH PIECEàCODING/INPUT
CHECKSàISSUES
RESOLVE
Why DATA piece:
In OCC 2000-16, “Validating the Model Inputs Component” is
its own section. It says “It is possible
that data inputs contain major errors while the other components of the model
are error free. When this occurs, the
model outputs become useless, but even an otherwise sound validation will not
necessarily reveal the errors. Hence,
auditing of the data inputs is an indispensable and separate element of a sound
model-validation process, and should be explicitly included in the bank’s
policy.”
In my validation of the DATA, I focus on replicating/understanding:
·
Summary and key result tables. What I mean by key result tables is that if a
variable is being used in the model and has a specific population proportion
stated in the documentation, I validate that proportion.
·
Figures with key rates, i.e. Percentages, averages, etc. Mainly I try to
focus on figures that if off, would affect the model’s results.
·
Exclusions—Are exclusions appropriate, do the
numbers of exclusions match? If too many
exclusions happen, this could erode the model’s effectiveness.
In my mind, it is crucial to understand the population that is used prior to running the model. If this inputs piece is inaccurate, anything
run after it is highly questionable.
Why METHODOLOGY
PIECE?
Models can be complex and what formulas are used and how
they are translated and if a given approach is appropriate is highly critical.
The OCC 2000-16 says this:
“Implementing a computer model usually requires the modeler to resolve
several questions in statistical and economic theory. Generally, the answer to those theoretical
questions is a matter of judgment, though the theoretical implementation is
also prone to conceptual and logical error.”
Later, it continues…”Regardless of the qualifications of the model
developers, an essential element of model validation is independent review of
the theory that the bank uses.” It also
goes on to talk about how comparing to other models, either at the bank of
publicly available, is also useful.
My approach to this piece typically involves:
·
Doing outside research to see how/where a given
methodology is used. For this, I may go
through a digital database containing papers on financial models (for example)
or talk to friends I have in the industry that may be using similar approaches
(to get their feedback). If I can’t find
anything, I might also go through the exercise of breaking down the formula to
understand its various components and if those components are reasonable.
·
Understanding if the variables used in
prediction are reasonable (and if the coefficients and their signs make sense)
·
Understanding of the dependent variable, how it
is calculated, and if it is appropriate for the type of model used
·
Understanding how the model is translated and if
that translation is accurate given the type of model it is.
Why MODEL STRENGTH
PIECE?
While not explicitly states in OCC 2000-16, this piece helps
to understand key matters mentioned, like model results, code and
mathematics. There are various metrics
and tests that can be created for a given model, and those metrics can speak to
a model’s fit, specification (if it has enough variables to show a complete
picture of prediction on the dependent variable), and strengths (or weaknesses). For example, if a model has a low R-Square,
that can be a sign of having weak predictive power. However, if it is a pseudo R-squared, this is
harder to interpret. In this case, it is
helpful to see models done similarly to judge the pseudo R-squared more
accurately. Here is an article that
speaks to this matter: http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
More important than just judging a model based on its
R-square and fits statistics is to look to out-of-sample or backtests performed
at the time the model was developed. OCC
2000-16 also suggests that “model developers and validators should compare its
results against those of comparable models, market prices, or other available
benchmarks.” I typically look to see if
these kind of tests are in the model documentation and do analyze them. I am especially wary of results that look to
be ‘in-sample’ with exact fit.
Why CODING/INPUT
Checks?
This is synonymous with checking the ‘Model Processing
Component’. For me, this entails line my
line proofreading of code (see Code and Mathematics section of
2000-16) on the model and checking the correctness of mathematics and formulas
used. Something new I’ve added to my
process (if feasible) is to construct an identical model to check coefficients
and significance levels against those stated.
Because I’ve prepared the data to check against tables and key figures,
this part is not very difficult to accomplish.
In OCC 2000-16 it does state that constructing an identical model is
useful, especially if the model is simple (i.e. constructed from
spreadsheets). For more complex models,
the alternative approaches it suggests are the line by line reading of the code
and benchmarking against available models.
If any issues are discovered in my process, I will bring
these to the attention of the model owner first, then the model creator. I will typically only go to the
creator if I am unsatisfied with the answers I receive from the owner or if it is in
regards to a matter that I know only the model creator can answer. Here are some general rules I go by:
·
With data, if there is a X% discrepancy (agreed upon with the model owner), I will
flag it as an issue
·
With model strength, I will raise it as an issue
more so for informational purposes so that the model owner understands the weaknesses and
strengths of the model
·
With coding and input, I will check if signs,
formulas, and coefficients are correct.
I also like to focus on whether variables are being ‘prepped’ properly
before going into the main model formula
·
I will typically put any issue I find (that is
not resolved immediately between the model owner, creator, and myself) on an issues
log. I use this log to follow the resolve of issues throughout the validation process
No comments:
Post a Comment