Friday, April 11, 2014

The Value of an Index


I would put this topic in the category of pretty basic, but useful to remember.  The simplicity of computing an index value, based on the calculation is evident…interpreting it, or understanding how to interpret it is where the value of an index gets more interesting.

The way to calculate an index is as follows:



 
The main thing you need to have before calculating an index is a data set.  Lets use the example of population by state as an example.  Let’s say I have a sample of 10 states and I want to prepare an index value based on the mean of the population (so if there is a new value, I want to know where it stands as compared to the other states in my sample).  Here is a sample of 10 states (all based on real data from the April 1, 2010 US Census Bureau stats (Annual Estimates of the Population for the United States, Regions, States, and Puerto Rico—NST-EST2013-01—link:  http://www.census.gov/popest/data/state/totals/2013/), with some basic descriptive statistics:

Alabama:  4,779,739
Florida:  18,801,310
Idaho:  1,567,582
Utah:  2,763,885
New York:  19,378,102
California:  37,253,956
Ohio:  11,536,504
Vermont:  625,741
West Virginia:  1,852,994
Pennsylvania:  12,702,379

 Mean:  11,216,219
Median:  8,158,120
Minimum:  625,741
Maximum:  37,253,956

 Now, if you want to create an index value based on this sample and say, its mean as the ‘base or comparison value’ (and see how new states you bring in compare to that index), take these steps.  Take each value in your sample / the mean * 100.  You will get the following results:
 

It is also helpful to do a distribution of your values in a graph:



 

 
 
 
 
 
 
 
 
 
As you can tell, you index ranges from 5.62 (Vermont) on the low end to 334.83 on the high end (California).  To interpret this, say for Ohio’s at 103.69…if you take this index value – 100, you can say that Ohio is 3.69% higher than the average (11,216,219).  Math-ing it out is…(103.69-100=3.69), then take 11,216,219 + (3.69*11,216,219) = est. 11,630,097.  Obviously, this is not exact because I only went out two decimal points.  The further you go out in decimal places, the closer you will get to the 11,536,504 actual (you get the picture).  If you state is below the mean, this also works.  Vermont, for example.  Take 5.62-100 = -94.38, meaning that its population is 94.38 percent LESS than the average.  Math-ing that out comes to… 11,536,504-(11,526,504*.9438)=11,536,504-10,888,152.4752=645,351.5248.  Again, this is not exactly the actual value of 625,741 because I went out only 2 decimal places (its close enough though for example purposes).

 Now lets say you want to bring in a new state, say Texas, population 25,145,561.  If you take its value divided by your average * 100, its index value is 224.19 (or 124.19% higher than the average).  That is somewhere between Florida and California in your sample.  Thus, any new state can be brought in and measured against your original sample average of 11,216,219.

Note that you can really use any ‘base’ value to compare against, depending on your purpose.  If you used minimum, it would be creating an index that compares against Vermont.  If you use the Median, it would be creating an index against a midpoint value for the states in the sample you have.  If you wanted to, you could take an average of all states and use that as the base value for comparing it against the population of a region within another country (for example).  Really, your choice of the base is dependent on your purpose and you will know what that is.

Now, lets say you want to simplify things and make index values that have less variation (especially since in our population, California is so high and Vermont is so low).  To accomplish the creation of a less varied index, you can transform the population first, then calculate your base value and create an adjusted index (understanding that if you want to come out with a reasonable value that can be interpreted, you have to un-transform afterwards).  For simplicity’s sake, I will only look at the square root and log transformations.  The results and adjusted indexes come out as follows:

 





 



 You can also tell in looking at graphs of the adjusted index, that there is less variation (notice also the means are closer to the medians with the transformations):
 

When interpreting the differences from your base, its different.  Lets say you use the square root transformation and your state is Ohio.  Its index value for the square root transformation is 116.90.  That means it is about 16.90% higher than the square root average of the population.  Math-ing that would be…2906+(2906*.169)=3397.114.  Notice how this matches the closely to the actual figure in the table above.  To convert this back to the original population figure, of course you square it = 11,540,354 (again, it’s the rounding that gives you the estimated number, not the exact one).

No comments:

Post a Comment