I recently found out that overtraining is a thing that can happen.
(I know it's in the guide, but when you (I mean, me) read the guide for the first time, you filter out everything that doesn't answer the question "how do you make this thing work at all?")
I searched the forum and if I'm not using the search wrong, there's 10 mentions of this topic.
I figured that it could happen if you train the model for too long. When the model learned more or less everything it could and starts ignoring the results of loss-rate and gets worse? I have no idea how it works (or even if it works the way I described).
Also I figured there's no silver bullet to decide if you reached the point where overtraining starts. You need to watch the preview. I'm not good at that. The random changes between iterations are way bigger than incremental improvement over thousands of iterations. Sometimes the preview in the timeline looks worse than it looked 100k iterations before. And than it gets better again. At least in my eyes.
I also figured that to somewhat prevent overtraining you can add new data, so that that model has valid things to learn. And it's a separate question from whether you already have enough data for a decent model.
So I guess if you are worried about overtraining, you could keep some of your training data in a stash, start with less (but enough) and add batches of images to the model's training data gradually, like, a batch every several 100k iterations? It would make the process a bit less effective probably, but will protect against overtraining because there will +/- always be some new valid data for the model to learn. Something like that?
I guess my question(s) is:
- Did I get everything right?
- Are there any more pointers on when overtraining can happen? Like for example "You don't need to worry about overtraining until you hit way beyond 10m iterations".
- Are there good ways to prevent overtraining other than "don't train it after it's already good enough"? I mean, it looks like a good suggestion, but I have problems evaluating if the model is good enough already (as I previously described). )