Friday, May 15, 2015

Know what you know

 by Kirk Harrington, for SAEG (The Statistical Analyst Effectiveness Group)

Something that has helped me as an effective analyst is to keep a record of trends, insights, and key analytic observations I have had over the time in my career.  Keeping a record of this can be helpful to 1)  remind yourself of things you learned in the past 2) provide talking points when you are interviewing for your next position, and/or 3) be helpful to other analysts you work with or that are in your same field.  As an exercise, I have done this based on my experience and share that with you here. My background as a modeler and analyst has been in Credit Risk, Marketing, and Asset/Liability Management.  Here are some examples of things I have learned over the years from working with these three areas:

Credit Risk

On scoring models
Scoring models should be regularly checked to ensure their output is reasonable.  I once discovered a model was creating appraisal values higher than expected.  This knowledge came from line by line proofreading of code and comparing output of the model to actual appraisal values.  In looking through the code, I found several errors that led to inaccurate predictions and brought those to the attention of the vendor to improve the model's ability to create more accurate predictions.

Multivariate modeling lends to more flexibility
Roll rate and forecasting default dollars based on money movements (alone) throughout the year makes the assumption that dollars move the same way as in the previous year.  Modeling these behaviors (say using logistic, linear, or decision tree methods), however, creates more flexible  forecasts which are based on the underlying populations and (if included in the model) could include the economic environment, credit quality, and pricing environments relevant to these populations.  Simple forecasts can be useful with populations whose dependent variable does not shift much over time and the populations are relatively stable (i.e. not diverse, for example..if most of the population consists of a conservative high FICO customer).  These forecasts will break down however when this is not the case.

 The balance between risk and profitability
Marketing and Credit Risk working together is essential to ensure offers strike a balance between risk and profitability.  Here is a diagram which illustrates this:


If the circle depicted represents the path of a consumer and how they affect an organization, this diagram illustrates that the more risk is taken (more weight of decisions on the risk side), the higher the potential profitability.  There can come a point though where greater risk leads to risk loss (associated with events like default, foreclosure, loss of credit quality).    The diagram also shows that the less risk is taken (movement of the fulcrum up on the left, down on the right), the more this leads to the zone of profitability loss and potential cash flows.  Its important to state though that this diagram inherently assumes that an organization engages in risk-based pricing, which prices customers higher the more risky they are found to be (a positive correlation between interest rate and their presumed credit risk).

 Zones leading to a loss event
Three factors which can be examined leading up to a loss event are credit quality, line utilization (in the case of open ended credit products like equity lines and credit cards), and payment behavior.  Models which predict a credit loss event can be gauged at different periods around the event.  The closer to the event, the more deterioration is evident in all three of these factors;  whereas the farther from the event, the less deterioration there is with these three factors.  Illustrating this on a time diagram is useful:










What this shows is that the closer to a loss event, the more obvious the deterioration would be in factors leading up to the loss event.  For the factors I mentioned, deterioration of credit quality, payment behavior, and line utilization are more evident.  The farther you go back in time the more deterioration is sporadic and less evident.  Models which are looking for 'clues' to a potential future loss event and that are trying to identify customers that could use help to avoid moving to the zone of obvious deterioration could be built around the zone of sporadic deterioration (i.e. Triggers models, models to place consumers in relief call groups).  Models which predict the inevitable loss event and make allowances for loss (i.e. ALLL models) would be built on the zone of obvious deterioration.  This zone is characterized by consumers that have reached a point of 'no return' in terms of their deterioration in loss factors...loss is inevitable and the chance of loss is highest.

Marketing

 On control groups
Control groups created during experimental design need only be created with the underlying populations in mind, not with treatments in mind.  To illustrate what I mean, consider the following diagram:











The diagram on the left assumes a control is needed for each treatment group.  This would only be needed if the population for each treatment is different (typically not the case).  The diagram on the right shows a more appropriate design, when the population for treatments A-C are from the same population.  While this sounds basic enough, I have seen designs done by treatment group.  This approach complicates the design and makes it harder to work with for tracking and measurement purposes on the back-end.

On finding consumers who would open a check card
The check card (aka debit card) is a popular way that consumers pay for transactions.  This card typically carries a Visa or Mastercard logo and is directly tied to a consumer's checking account.  To determine which type of customer is likely to open a check card (say in the case of trying to create a marketing campaign to encourage people to open check cards), I have found looking at check writing behavior and use of online banking useful.  In studies I have done, the more people write checks (and the higher number of checks they write) and they less they use online banking, the less likely they are to open a check card.  Further, the less checks they write and the more they use online banking, the more likely they are to open a check card.  This implies that the check card (in the views of consumers that don't have them and newly open them) is considered a useful technology to improve their transaction experience.

On striking a balance between response rate and approval rate
When creating marketing campaigns for new credit products, consideration should be given to striking a balance between response rate and approval rate.  Typically, consumers with lower credit scores, who are riskier as a whole, will seek more credit.  This 'seeking' will increase their chances of applying for a credit offer that is sent to them.  This will in turn increase the response rate.  However, while this is so, it does not mean (unless in the case of a pre-approved offer) that they will be approved for the offer and the credit product will make it to the financial institution's books.  This implies finding a group of consumer likely to apply AND likely to be approved once they apply.  If creating a model, the dependent variable could thus be defined as someone that not only responds, but is approved after they respond.

Find the 'warm' prospect
Prospect Marketing can be tough.  It is usually associated with lower response rate, higher cost per account, and more difficulties in finding the population highly likely to respond.  What makes it even more difficult is knowing less about a prospect before you mail or advertise to them, which makes modeling on the population a further challenge.  One population that I discovered in my prospect marketing work that was found to be highly likely to respond is what I like to call a 'warm' prospect.  This kind of prospect already knows something about (or has had some experience with) the product or service that is being offered.  One group of these populations can be found with prospects that live at the same address of current customers.  These prospects likely have heard of the product before through interactions with the customer at their residence (and hopefully interactions were positive).  To find these 'warm' prospects, I have created a match key containing the address and comparing the match key to addresses on record for current customers.  Care must be taken to ensure that the match keys are standardized similarly between the two sources so that the process of flagging these prospects is accurate.

On measuring 'new money effect' for a deposit campaign
Let me start by saying that this is by no means a perfect way of measuring new money effect, however it will get you close to what needs to be measured.  If the campaign is for current deposit customers (i.e. they already have a checking, savings and/or certificate of deposit account) and the purpose is to increase 'new money' to the bank, how do you measure it?  One way I found to do so is to first group products and customers by household first.  Household relationships (based on how I define them or have seen them defined) share the following characteristics:  1)  Customer is at the same address as someone in the same household (it is assumed that money is shared among that household, and 2)  Customer is on the same account as another customer (irregardless of address).

Once customers are placed into a given household, associated products and account balances should then be appended to these households.  For each household, total deposit balance should be calculated.  This is appended at two different points in time based on the marketing event that occurred.   To illustrate...


Once Total balance is appended to each time period, calculate change in total balance by taking Total Deposit Balance at T2 - (minus) Total Deposit Balance at T1.  Then, if the campaign was set up using experimental design (test and control group) you can perform a hypothesis test of the average change in total balance of the test group vs. the control group.  If the test group change is significantly different than the control group change, there has been some new money effect to the campaign.  Its important to recognize that this effect can be positive or negative (calculated as the different between the average changes).  To calculate the estimated total effect, take this difference multiplied by the number of incremental (or test less control) 'new' accounts times this average balance difference.

Other considerations:  As I mentioned, this method is not perfect in measuring new money effect, but it can be useful.  Any results presented should be stated as 'estimated', not exact.  Further, it is possible to have customers switch households from one period to another.  To adjust for this in measuring the effect, I use only households that have not changed household key from one period to the other.  Also, in measuring average differences, it is wise to check and adjust for outliers prior to performing any hypothesis testing.  This will provide a more refined result in the end.

Asset and Liability Management

Modeling on a down rate environment creates problematic forecasts in an up rate environment
I once validated a prepay model that was built during a down rate environment, specifically during the period leading up to the most recent mortgage crisis and during the crisis.  The reason more time periods could not be included was because of availability of data.  By the time the model was complete, the interest rates had already begun to rise for some time since.  When asked for predictions vs. actuals reports on a more current environment, it was noticed that there was an over-prediction of prepay effect on the more current population.  This was due to the fact that the model had not included a period of rising rates (to balance out the coefficient estimates).  The question is, how was this problem of over-prediction dealt with?  One could not simply create more data (since it wasn't available) and it was not in the company's best interest to wait several more years to rewrite the model to include an up rate environment.  The solution used (which I consulted on to create) was a prepay multiplier to deflate the over-stated prepay rate.  This multiplier was used against any model predictions so that they could be brought more in line with current rate environment prepay behavior.

~~~~~~~~ end of examples

In conclusion, these are just a few of the examples of things I've learned while working in my career.  As evidenced, these matters are a product of trial and experimentation and come through experience working with specific data for specific purpose.  Further, these findings will represent either innovations in your field or knowledge that is supported by general trends.  Irregardless, findings you write down should be important to you and important to company and industry you serve.