Saturday, December 14, 2013

Interview with Keith Woolner, Director of Baseball Analytics, Cleveland Indians

Welcome to our first official interview for the Statistical Analyst Effectiveness Group (SAEG).  You can find us on LinkedIn and here at this blog.  This interview was with Keith Woolner, Head Analyst at the Cleveland Indians (yes, THE Cleveland Indians professional baseball team!!!).  I met Keith at a R Users meeting here in town (I live in Cleveland) and met up with him again when I had lunch at Progressive field one afternoon.  He was more than happy to do this interview for us.  I have to admit that when I first met Keith, I was a little star struck.  Its not every day you get to meet someone who analyzes something as exciting as baseball (and for one of my favorite baseball teams)!  I hope you enjoy this interview and that it can be the first of many that SAEG will share with you.

Kirk Harrington, SAEG partner

Kirk, SAEG:  How long have you been an analyst with the Cleveland Indians?  I understand that you are the Lead analyst, correct?  How many analysts report to you?

KW: I've been with the Indians for a little over 6 years.  My title is Director of Baseball Analytics.  When I started, I was a one-person department, but I now have a couple of analysts working directly for me.

Kirk, SAEG:  How did you come to be an analyst for baseball?  What do you find most rewarding about the work you do?

KW: I started my professional career after college as a software developer in Silicon Valley.  I pursued a Master degree while working, and studied baseball statistics as a hobby.  In the late 1990's, I got involved with a group of people on the Internet who were publishing a baseball research book along the lines of the old Bill James Baseball Abstracts.  That group, Baseball Prospectus, grew to become a well-known presence in the baseball industry, and I ended up co-authoring about 10 books, developing statistical reports for the web site, and pursuing new baseball research.  That body of work eventually came to the attention of the Indians, and I was able to change my hobby into my career.

Kirk, SAEG:  What type of models do you/your group run regularly?  How often and what would trigger the creation of a new model?

KW: Linear and logistic regression are still the stalwarts of most of the analyses we do, but we've also tackled hierarchical models, local regression and other nonparametric statistics, as well as more machine learning approaches such as neural nets, SVM's, etc.  My group works exclusively for the baseball operations group, so all of our work is directed towards decisions our front office and coaching staff have to make about the team on the field.  A lot of effort is spent trying to improve our forecasts of player performance in future years, but we also research in-game strategies (e.g. the best time to try to steal a base, or bring in a reliever), evaluate college and high school players for the amateur draft, assessing the impact of changes in the Collective Bargaining Agreement, and so on.

Kirk, SAEG:  What and how are you collecting the information used for your models?
 
KW: We get data from a variety of sources.  From Major League Baseball and some other vendors, we receive a data feed containing detailed play by play information about every game played in the majors and minors every night.  That includes the names of the pitcher, batter, and every fielder, baserunner, and umpire on the field, the inning, number of outs, score, and the result of each play.  In many cases we also get information about the full pitch sequence leading up to the play, the exact location where a batted ball was hit to on the field, the speed of every fastball, the location of every pitch in the zone, and so on.  We also get information from MLB on the contract status, service time, and transaction history of every professional player.  Within our own organization, we have a database of scouting reports on players (major leaguers, minor leaguers, college and high school players) that goes back many years, plus medical information from our training staff, and reports from our coaches and instructors.
 
Kirk, SAEG:  This question comes from Greg, one of our SAEG members:  How much information are you using about player's attitude, lifestyle, and etc. for player evaluation?
 
KW: A player’s personality, habits, work ethic, attitudes, and competitiveness are all aspects of what commonly is called a player’s “makeup”.  Our scouts and player development staff spend a lot of time evaluating a player’s makeup – his ability to adapt to the schedules and rhythm of a baseball life, to receive instruction, to cope with failure, and to work hard at maximizing his skill set.  It’s very hard to quantify, but too important to ignore.  We consider a player’s makeup to be a significant part of who he is, and what he can become.
 
Kirk, SAEG:  Keith...I am amazed by all you're able to learn on baseball!  It's amazing how many data points that can be gathered from a single game.  I especially enjoyed knowing that you take into account a player's makeup.  It makes sense to me that this dimension of a player would affect their performance.  This combine with offensive and defensive performance would make for an interesting prediction modeling exercise for sure.
 
Exactly what decisions are being made using data analysis?  I know you talked about this a bit in your last set of questions....could you elaborate?  Perhaps it would help to categorize the answer into micro and macro level decisions...just a thought.
 
KW: Data analysis is just one of several inputs being used in decision making.  Chris Antonetti, our GM, takes multiple perspectives into account before making a decision, including input from our field manager Terry Francona and his coaches, our scouting department and player development staff.   I don’t think there’s any decision that is made solely on the basis of data analysis.  But the kinds of things we are asked to analyze would include questions like: How many runs would we save over the course of a season by playing a better defensive player at a certain position?    What kind of offensive production can we expect from player X five years from now?  Which prospects in team Y’s farm system might be worth targeting in a trade?
 
Kirk, SAEG:  Here are some additional questions from some of our SAEG members...

From Mike...
"I would like to know some examples of the kind of revenue generating or cost cutting analytics you do. Sports as a business is something I'm unfamiliar with."  
 
KW: I don’t have much direct involvement with our business analytics, but we do have people working on it.  A couple of articles related to a presentation one of my colleagues gave last spring might give you some insight:  http://www.baseballprospectus.com/article.php?articleid=19854 http://www.fangraphs.com/blogs/sabr-analytics-teams-going-deep-to-attract-new-fans/
 

From Sam...
"How do you get to be an analyst for a professional baseball team?"
 
KW: That’s probably the most common question I get asked.   The bottom line is that it’s very hard get paid to do baseball analysis, and there are a lot of people interested in doing it (cheaply or even for free).  There are only 30 teams, and not all of them hire even one analyst.
 
There wasn’t really a defined career path to get into baseball analysis, although as sports analysis becomes more mainstream, that’s changing somewhat.  A strong quantitative background, with coursework in probability, statistics, and computer science is certainly helpful.  Becoming proficient with at least one statistical software package such as R, SPSS, SAS, etc. is a must.  I also recommend that people become comfortable with databases and writing SQL queries so they can be self-sufficient with data extraction and preparation.

 
 "Analysis often involves observing the trends of groups like a team or a franchise. How does the baseball analyst make decisions based on individual players when most statistics involve averages for a team (or teams)?"
 

KW: Actually, we collect a great deal of information at the individual player level.  Although we win or lose as a team, baseball is perhaps more separable into individual efforts than most other teams sports.  When a batter is at the plate, the outcome is largely determined by his own ability and his opponents’, rather than that of his teammates.  Most fielding opportunities can only be handled by a single player, and so on.  We know which players were involved on every play that occurs during  a game.  We have data on how where and how hard each batter tends to hit balls. We know what pitches a pitcher throws.  We can count how often a runner takes an extra base on a hit, or attempts a steal.  We have an idea (through modeling) how these individual performances come together to produce team-level results, so we can estimate the effect of a single player on the overall team-level outcomes.
 
What do you think of the movie Moneyball? Has the concept actually changed baseball or was it just hype?
 
The Moneyball movie was enjoyable, albeit exaggerated in places for dramatic effect.  I think they captured the feel of the book pretty well, but wouldn't pretend it's a documentary on how front offices work (then or now).

I think Moneyball (the book) shed light on something that was already starting to happen in baseball.  It made it more prominent, and turned analytics into a catchphrase, but didn't create the change itself.  The industry is clearly very different than it was 15-20 years ago.  Few, if any, teams had full time analysts on staff.  There wouldn't have been an opportunity for someone like me.  Now the majority of teams have at least one, and some have several analysts on board.
 
Kirk, SAEG:  What has been your greatest success as a baseball analyst for the Indians?  What motivates you to do what you do?
 
KW: I think the greatest success has simply been analytics becoming an integrated part of the Indians’ decision making -- knowing that Chris Antonetti or Mark Shapiro relied in part on my work to make potential multimillion dollar baseball decisions.  The fact that they continue to ask for more information and invest in the infrastructure and analysts speaks to the value the organization places  in what we contribute.  And seeing the kind of work that we do spread from baseball operations to other parts of the company (marketing, ticketing, customer service) emphasizes that they buy into data-driven decision making in a lot of different ways.

 Kirk, SAEG:  Tell us of a model that you really enjoyed working on (that perhaps you found to be innovative for your field).  Think about what made it special for you.
 
KW: One of the most exciting developments in recent years has been the availability of PITCHf/x data.  There are multiple cameras installed at every major league park that track a pitched ball in flight between the pitcher’s mound and home plate.  We can measure how much each curveball breaks, how much a sinker sinks, how consistent a pitcher’s release point it, the speed of every pitch thrown, and the location of where every pitch crosses the plate.  From that data, we can create models that automatically classify the type of pitch thrown, track whether a pitcher starts to lose velocity the deeper he goes into games, even how well the catcher frames pitches to get a few more called strikes from the umpire.  Rather than just measuring outcomes of plays, we’re gathering data that relates to the actual physics of the game of baseball.

Kirk, SAEG:  Do you get the chance to meet and talk to any of the players as part of your analysis work?  If so, what kind of data gathering do you do that involved direct contact?
 
KW: For the most part, we interact with other members of the front office, or with Terry Francona and the coaching staff.  The information we create isn’t typically of a form that can be turned into operational knowledge by a player –p-values, inference, and cross-validation don’t help Carlos Santana recognize pitches, develop an approach at the plate, or call a game.  We need to communicate what our work implies for game strategy, roster management, and impact on the field, and then let the staff decide how best to convey what players need to know to try to make it happen.

Kirk, SAEG:  Are you disappointed that Choo left the team?  How do you handle the loss of major players in the lineup?
 
KW: Anyone who works in a front office is, first and foremost, a baseball fan.  And, as fans, we all have favorite players, and guys we like to root for.  But when it comes to making decisions, it’s the team, both present and future, that comes first.  Sometimes that means parting ways with a guy you really like to help in other areas of the team. Roster turnover is a fact of life in baseball, and often times for reasons out of your control.  You handle it by being adaptable, creative, and focused on the larger goal.
 

Kirk, SAEG:  Thank you Keith for your time.  I know this interview will be a homerun for our group (pun intended).

A picture I took at Progressive Field, Cleveland, Ohio at a game I attended, Summer 2011

Tuesday, December 3, 2013

My Management Style

I wrote this up recently.  I have managed people before and enjoy doing it.  I have found my style of managing to be quite effective.  I share this with you because some of you (if you don't already) will manage and oversee other analysts.  Perhaps my style will inspire you to manage better in your given role.

My Management Style, By Kirk Harrington

I like to lead by example
I don’t ask those I work with to do things that I would not be willing to do myself.

I have a collaborative style
I like having regular team meetings to understand where everyone is at.  I use this as a way to get people together, to put minds together, and to create an atmosphere of mutual benefit.

I enjoy having fun
I have been known to have special promotions, offering prizes to associates that want a challenge.  I enjoy taking my associates outside the banking office sometime…to lunch or an outing.  I enjoy rewarding those I work with as I can and as I see it to be appropriate and meaningful to them.

I like to encourage my associates to learn their jobs well
This may involve extra research as necessary to make sure they understand how to do their jobs well.  This activity I would call ‘sharpening the saw’.  This not only benefits the associate, but also the organization.  The better the quality of an associate, the better they can produce for the organization.

I enjoy challenging those I work with…so they can learn and grow in their positions.

If there is a dispute or misunderstanding, I tend to address those personally with whoever the problem is with.  
If there is a matter that can't be solved casually (with non-direct focus) I typically take it to an undisclosed conference room.  I think it’s important to have direct eye contact in those kind of situations…so that motives can be seen more clearly.  I tend to take the approach of listening and trying to understand the concern, and then try to resolve it. 

Sometimes I have noticed that friction can occur because someone feels they can do more but are not being given more opportunities to excel.  These opportunities may not even be offered traditionally by the department.  In these cases, I am creative enough to provide opportunities for the associate to take on more responsibilities.

If there are disputes between 2 (or more) individuals I like to talk to each of them separately…to listen to what each has against the other.  I will then bring them together and act as a neutral party to resolve their dispute.  For example, I might repeat back to each party what the other have said so they can talk it out between themselves.  I might present my own point of view in a matter, but I do not press it.  Typically I have found that taking a side is the wrong thing to do.  After I have taken my approach, people have tended to resolve their own problems and come away happier and more content for doing so.

I try to live by the motto of ‘let him who is greater among you be least’
In other words, I don’t like to think of myself in a management position as ‘better’ than someone that I work with.  I tend to take the approach of ‘serving alongside’ those I work with to help them succeed.  I consider everyone’s opinion just as important as mine, though I do take the liberty to make important business decisions as necessary.  Finally, I have no quams about taking the side of someone I manage in front of a business line, if I feel that their views are valid.   Not to say that I am combative with business lines my group serve…quite the opposite.  I hope that as my associates see me treat the business line, they will treat them the same (establishing with them trust and respect).

Tuesday, October 15, 2013

Topics from our Linkedin group discussions, 1st edition

Since we regularly have discussions of topics on our LinkedIn page, I thought it would be good to post some of those discussions here, to share with our blog readers.  Feel free to discuss below as I post these.  I figure we can have an edition every 3-6 months or so as more of these are released through the Group's LinkedIn group.

SAEG Thought: Its all about fit and reducing error. Its also about accepting imperfection.

There are many models available to try and explain a behavior or a result. The key is to find a method that 'fits' well to your given problem, and reduces your error in predictions. When it comes down to it, if you understand that any result or behavior can be explained mathematically, with a function or line (for example), you can use that knowledge to create the means to understand and explain it. You must also accept that no model you do will be perfect. Understanding the imperfection can help you improve it in the future...but even then it won't be perfect...error wil always exist. Minimizing that error is truly key.

PR lunch meetings are a great way to break the ice

Sometimes tensions can rise between analyst and line of business (or management). This may be more due to lack of communication and the natural separation that exists between the two parties (not necessarily due to behavioral problems). I recommend inviting your line of business manager to lunch as a way to 'break the ice' and to create a more 'accepting' bond between you. You could use lunches to talk about business or just to get to know each other better.

Always answer the 'so what'.

Something that has been ingrained in me as an analyst over the years is the importance of answering the 'so what'. 'What does this mean?', you might ask. You may present findings that you know are important, but they may fall flat if you don't tell your audience the importance of 'what' you are showing. In other words, answering the 'so what' lets your audience know why the 'what' of what you are showing is important for them to realize. Are you answering the 'so what?'

Animosity towards analysts from managers with Business Administration degrees. How do you cope?

One thing I have noticed as an analyst is the animosity that managers (typically with Business Administration degrees) have towards analysts. I think this has been seen throughout history too... people with power tend to think themselves better than someone smarter thats under them. Have you seen this in your workplace? How do you handle it? Hmmm...this sounds like a great article idea...be prepared for more writing on this.

This may sound trivial and something you already know, but this really saved me today when trying to disect and understand a complex formula...PEMDAS...want to guess what it stands for?

Please Excuse My Dear Aunt Sally. Its basically the order of operations when working with any mathematical formulas. P is for Parentheses, E is for exponents, M is for multiplication, D is for division, A is for addition, S is for subtraction. If you remember this, any complex formula you are confronted with will bow to your mastery!

SAEG Tip for the Day--Formatting, having someone validate your report before sending it to a decision maker

Its the small things that can separate an effective analyst from 'just an analyst'. Before you send an important report to a decision maker at your company, I recommend that you have it validated by someone else prior to final submission. If that is not feasible, validate it yourself against a past report. Further, make it easier for the person on the other end by formatting the report for easy printing. The extra effort will help you save face and assist in helping you come off as an analyst that cares about the end user.

SAEG Tip for the Day--Drawing out your process to avoid analysis paralysis

Sometimes you are analyzing something completely new and you might just rush into the data forest, without having an idea of where you're headed. Before doing this and risking the waste of time and analysis paralysis, try drawing out exactly what you want to solve first and organizing the thoughts and ideas you have or want to prove. It might also help to try and understand the limitations of your data before 'going in'. Once you have a map that you think you want to use, go into your data and use this as a guide to get you where you want to go more

SAEG Tip for the day--making data entry enjoyable

Sometimes data comes in a form that needs to be manually entered. You just can't avoid this sometimes. Besides using cutting and pasting to make date entry easier (if you're working with end of month data, don't forget your leap years!)...having music handy to listen to while you work can make the chore more enjoyable. Of course, your employer needs to allow it and you need to make sure you have good headphones that contain the sound (so you don't bother your co-workers). What is your favorite music to listen to?




Thursday, March 7, 2013

Questions From You, Episode 1

Welcome to something new at SAEG, a 'Questions From You' ongoing series.  Basically, when you ask us a question whether through LinkedIn or at our address (enduranalyst@yahoo.com), we will do all we can to answer your questions in a professional and timely way.  And the rule is...you can feel safe and assured to ask us any of your analysis questions, no matter the topic and no matter how silly you may think the question is.  So you know, one of the values we seek to uphold at SAEG is respect for you, respect for where you are at the learning curve in your career, and respect for you as a professional.  In no way should you feel demeaned by asking us questions and we hope that in no way we make you feel uncomfortable asking them.  Statistics can be a complicated science and I've seen some who are at very high levels treat those at lower levels in a disrespectful and sometimes dehumanizing way.  Rest assured, you will be safe at SAEG...it is our promise to you.

Also, the caveat that we have at SAEG is that if we don't readily know the answer to your question, we will research it for you (and point you in the direction as appropriate).   Further, let us know if you allow us to use your name in the blog or if you'd rather remain anonymous.  Thank you!

Without further ado, I am happy to start the 'Questions From You', with Episode 1

Questions From You, Episode 1:  On C-STAT and our thoughts on Multinomial Logistic Models

These questions came in from Jing Li, one of our followers:

1. C-STAT is a very commonly used criterion to evaluate how well a predictive model performs. But I sometimes had faced the situation that some variables improve the model fit statistics (for example, Hosmer-Lemeshow lack of fit statistic). But they don't improve the C-STAT or even worse, decreases a little bit. What would we say for these variables? Should they be included in the model or not.

2. Do you think it is a good idea or a bad idea to consider multinomial logistic regression as predictive models? I have come across some strong criticism against that. 


Our answer:
 
Kirk:  Hi Jing. As far as #1 goes, I have not used C-STAT, but Ben has (Ben Nutter is one of our Partners at SAEG). Here was his answer to you:

Ben:  The c-statistic is a great tool, but it isn’t one I consider when evaluating a model’s fit. The purpose of the c-statistic is to measure how well a model discriminates between an event and a non-event. Sometimes, you can improve the c-statistic by degrading the traditional fit of the model, but if your goal is prediction, that’s a sacrifice worth making. This isn’t to say that we should ignore model fit when attempting to predict outcomes—for instance, we definitely don’t want to overfit the model—but we might tolerate some multicollinearity for the sake of improving our prediction.

Ultimately, AIC, Hosmer-Lemeshow, and the number of degrees of freedom relative to the number of events are better methods for evaluating fit than the C-statistic itself.


Kirk:  As far as #2--Can I ask what context you are using the multinomial logistic model? Ben has not done them, however I have seen them used before for Asset/Liability models. In my readings on them, I understand that that they are very similar to logistic models, except that they predict multiple outcomes. Let me know what your context is and I can better answer your question.

Jing's reply...

Kirk,
Thanks for your reply.

The predicting modelling that I am doing is related to hospital admission. So the dependent variables would be a certain threshold of days. For example, whether the patient would get admitted within 30 days or not.

By multinomial, I originally was thinking to create the dependent variables as 0-30 days, 30-60 days, etc. But someone has criticized this approaching by pointing out the the nature of time is not suitable for this type of categorization. Please let me know what you think.

Thanks a lot for your help and feel free to post this question to the forum if you feel more appropriate.

Best regards,
Jing  


Our reply...

Jing--here's a start--am also checking with other people and will get back to you...

I was reading this article about Mutinomial logit models and is suggests that the dependent variable have 'no natural order'. In your case, I think they do, because someone can go from one time group to another and there is an order to the time.
http://kurt.schmidheiny.name/teaching/multinomialchoice2up.pdf

Further, it puts it into the terms of someone 'choosing' based on characeristics (or categories). Other things I've read talk about 'placement' to a membership. From what you are doing, it seems these are day groups that people 'fall into', vs. by choice or placement into.

Further, something I find useful whenever I'm trying to decide whether or not to use a given model or how to prepare its variables is to look at the assumptions. Are any of the assumptions violated if you pursue a certain path? If you look at the assumptions of the multinomial logistic regression for example (check out this article:
http://www.unt.edu/rss/class/Jon/Benchmarks/MLR_JDS_Aug2011.pdf ), one is the assumption of independence among the dependent variable choices. It says that "This assumption states that the choice of or membership in one category is not related to the choice or membership of another category (i.e., the dependent variable). The assumption of independence can be tested with the Hausman-McFadden test."

I found a nice article about this Hausman-McFadden test:
http://home.comcast.net/~alan.clayton-matthews/pp745/IIA_Hausman_Test.pdf

Basically its a test that "tests the null hypothesis that the inclusion of the unskilled occupation category does not change the odds ratio of the other pairs of choices". I would recommend you try this test. My guess is that since someone having been in 0-30 would affect someone going into 30-60, and would also affect someone going into 60-90, this would go against the assumption required for the dependent variable for the multinomial logit model.

You might consider using a different modeling technique that does not have this stringent assumption...i.e. I would suggest going the decision tree route (i.e. CHAID, CRT, etc) as you may have better luck.


Jing's reply...

Kirk,
Thank you very much for your comments. They are definitely helpful.
Regarding the multi-nomial regression, you are right that our categories regarding the membership of categories are NOT independent because they are following a time sequence. Survival analysis or something similar might be a more appropriate option. 


Aside--upon further research, I was able to find an academic article through a library database for Jing, that looked to be about the same exact problem she was trying to model.  Here is the reference for that article, for those interested...









This leads me to an Effective Analyst Thought...

If you're modeling something new or have questions about your approach, try delving into academic articles!  If you find a good source of vetted articles (i.e. I really like JSTOR), chances are that you can find someone that has done something similar to what you are doing and it can provide you enlightenment on your path forward.


 
 

Saturday, January 26, 2013

Using the AIC to Fit a Continuous Variable for a Logistic Model

One of the assumptions that must be met for continuous variables in a logistic model is linearity with the transformed dependent variable*.  A useful method that was shared with me by one of our SAEG partners (Benjamin Nutter) and tested by myself (and found to be very useful) is using the AIC to test various linear transformations against the dependent variable in order to find a best relationship fit.  Here is the method we consulted on together about...

First run a simple logistic regression of the continuous variable against the dependent 1,0 variable.

The, graph the predicted probabilities against the variable.  Here are some example of shapes (by drawing a line to connect the observed observations) my output came out with:




As you can see, the shape of the relationship between the variable and the predicted probability  (which you obtain from the simple logistic regression) can vary.  Further, what is not shown here is that the concentration of the observations along a given shape can differ.  So, for example, one of these shapes may look like this if looking at the individual observations along the shape (I've included the connecting line just as a visual):



Here, most of the observations form a fairly straight line up to a point where there is a deviation from the most observed pattern.  If you do a Q-Q plot and distribution of the variable, this deviation from a normal value can also be seen (which is evidence of an outlier).  While removing outliers to improve a variable's linearity is not discussed in this article, taking out the outliers prior to running the next steps I will discuss is a consideration that could be made.

The main thing to observe is that if the shape of the relationship between the predicted probability and the continuous variable is not a straight line (most I saw were bowed, even if slightly), a linear transformation can be used to improve the fit.

What I did after doing several diagnostics to understand the range of the given continuous variable, I would create a package of transformations that I could try.  For variables with all positive observations (even with some zeroes) I might try 'square root', 'log base 10', 'natural log', reciprocal', 'second order polynomial' and '3rd order polynomial'.  Just remember when doing these that if you have 0's with the logs and reciprocals you have to set the result to zero (your software will likely make the result missing).  For variables that had negative and positive values (with some 0's) I would choose a more limited package, say 'reciprocal', 'second order polynomial', and '3rd order polynomial'.  Similarly, I would need to make sure that for the reciprocal transformation if something was a zero and set to missing after the calculation run, I would set the value to 0.

Next, after creating the transformations, I would run each through separate simple logistic regressions (one for each transformation), this time keeping my residuals.  With the residuals, you can calculate the AIC using this formula**:

AIC = n * Ln(RSS/n) + 2k

Where...
n=number of total observations in your dataset
RSS is derived by just squaring each residual and coming up with its sum (the residual sum of squares)
k=the number of parameters in your model (in this testing case it is two, the independent transformation variable and the constant)

Once you have run an AIC for each transformation, the one with the best fit is where the AIC is minimized.  You will also want to run the AIC for the non-transformed variable as a way of comparison.  Here are some sample results and the transformation chosen based on the results:




 Transformation chosen:  2nd order polynomial***



Useful References
* Discovering Statistics Using SPSS 3rd edition by Andy Field, P273--assumptions of logistic regression-- 1) Linearity 2) Independence of errors 3) Multicollinearity or rather non multicollinearity of your data
**If you want a source for this formula, I found it in this presentation, page 9:
http://www4.ncsu.edu/~shu3/Presentation/AIC.pdf

*** A second order polynomial transformation is simply X + X^2 where X=the observed variable.  A third order polynomial is X+X^2+X^3.  Note that I do not know how to write the notation for squared and cubed in this blog, so the ^ implies 'raised to the power of'