Every classification model example I've seen uses the binary target on the home team winning or losing. So that's how I've always built my classification models. Target is "home_win" or some variation of that.
I started thinking today about cutting my feature set in half and only using data related to the home team. Partly as an efficiency exercise. Reduce the number of variables in play, don't throw the kitchen sink at the model.
What prompted this was my collection of "differential" variables. For example, if I have "away average points" and "home average points," I'll turn that into "home average points differential" and use that engineered feature instead of the two individual features. But my differential features are always keyed to the home team, and those features are consistently among the best performers.
I'll backtest my theory of course, but I was curious how many of y'all use models that emphasize home team data vs a mix of both home and away.
In theory, that would be reflected in the different ranges of rolling average windows. So in your example, player gets 2-3 games against terrible teams -- that would show up in the "Last5" rolling average, but not as much in the "Last40" rolling average.
The same thing applies to rollover years. XYZ team now has new players & coaches, and they're 10 games into the new season. So for a little while your "Last40" rolling average will be influenced by last year's stats, but the "Last5" and "Last10" averages should balance that out.
I agree with you that using only small windows like Last5 and Last10 could really skew the results.
Stats Considering Number of Games
algobetting