blog.kanedo.net

Notiz vom 05.02.2025

Flavors of overfitting

However, I still think we can use the Soviet Tank Problem as a machine-learning Aesop fable. Is this a parable of overfitting? What word would we use here to describe that the machine learning algorithm is latching on to the wrong “concepts” in the train/test corpus? The issue here seems to be the data is not fully representative of how we will evaluate the algorithm in the field. […] The problem is you collected data that was insufficient to pin down the prediction problem for a machine learning system. Because pattern recognition is atheoretical, the only way we can articulate our evaluation expectations is to declare data is representative and sufficient for statistical pattern recognition. In other words, the Soviet Tank Problem is an evaluation problem.

This! This happens all the time.