Monday, September 8, 2014

Weighted Average Node Group comparison statistic

As you do a CHAID decision tree, you probably have noticed how the tree splits at the top and there is a left and right side (where the probabilities at the end nodes are different from each other).  This implies a separation of the population into a generally low probability of the event vs. a high one. 

A useful measure to understand how each side of the tree compares to the other is derived by using weighted averages.  To do this, you need to determine first which end nodes represent the high vs. low end node probabilities.  Below is an illustration. 

As is illustrated, the Primary split can be useful in determining which end nodes to group based on high or low observed probabilities.  You should notice among the nodes that branch off from the primary split a range of probabilities that are similar if distributed out.  For example, lets say the end nodes of group 1 (without considering population at this point) above are 11.4, 0.8, 2.4, 1.2 and the end node probabilities of group 2 are 76.1 and 42.9.  Here, it is clear that there is a large separation in the observed probabilities for groups 1 and 2, therefore it makes sense to group them together.

Once you group the nodes together into a left and right side of the tree, if you want to see how well the end nodes separate each other in terms of average probability, a weighted average is appropriate to use.

First, add the sum of the ‘0 event’ and ‘1 event’ populations for each node group.  Then, divide each n population by the sum for each event type respectively.  Once you have these weighted for each node, then take the n population for each node and multiply it by the respective weights for each event type.  Once you have the weighted average populations for each node, they then can be put into a formula for analysis.  For example:

Sum(weighted average 1 events) / (Sum(weighted average 1 events + Sum(weighted average 0 events))

Here is an example with numbers…











 
 
 
 
 
 
 
 








 
Weighted average event 1% (nodes 7-10):  0.019394
Weighted average event 1% (nodes 5-6):  0.588547

Other notes:
·         These weighted average node percents can be useful as summary statistics when presenting your tree results
·         Comparing weighted average node percents of one time period vs. another can be useful to understand how the probability of a given event (assuming you have good separation between averages) can vary from one period to another

No comments:

Post a Comment