[Q] Logistic Regression and Collinearity in League of Legends


I’ve been doing some research about using statistics for win/loss predictive analysis in League of Legends, mainly for an individual player which I collected quite a bit of data on. In this case, 22 variables worth of data . Since there’s such a large amount of information many of these variables have relatively minimal predictive power and some are collinear with each other.

When I first created logistic models for this purpose I applied my AIC and VIF to eliminate non significant and collinear variables. This left me with a 6 variable logistic model which was pretty good for my test data but left out some highly important variables. In particular two variables which have very strong predictive power but are collinear with each other . However even though they are collinear they both independently impact the game on their own since both provide more economy to the team that gets more of them.

However, when I include these variables in my model almost every other predictor becomes non significant in the p-values. This is a problem because all of the other variables do impact the game to some degree. They are things like number of deaths a player has, number of kills, number of assists, how many objectives did they take and all of these variables have decent predictive power when run in 1 variate logistic regression for win-loss prediction. None of them are individually collinear with either of the powerful collinear variables mentioned earlier either.

and linear models which showed a strong linear relationship so I don’t know why this isn’t showing up in the VIF.)

**Tl;dr** I am trying to create logistic models for win loss prediction but don’t know how to address issues of collinearity and non significance of previously significant predictors when the collinear variables are included