Overtraining question

Fed · Post by **Fed** » Thu Apr 27, 2023 12:23 pm

I recently found out that overtraining is a thing that can happen.
(I know it's in the guide, but when you (I mean, me) read the guide for the first time, you filter out everything that doesn't answer the question "how do you make this thing work at all?")
I searched the forum and if I'm not using the search wrong, there's 10 mentions of this topic.

I figured that it could happen if you train the model for too long. When the model learned more or less everything it could and starts ignoring the results of loss-rate and gets worse? I have no idea how it works (or even if it works the way I described).

Also I figured there's no silver bullet to decide if you reached the point where overtraining starts. You need to watch the preview. I'm not good at that. The random changes between iterations are way bigger than incremental improvement over thousands of iterations. Sometimes the preview in the timeline looks worse than it looked 100k iterations before. And than it gets better again. At least in my eyes.

I also figured that to somewhat prevent overtraining you can add new data, so that that model has valid things to learn. And it's a separate question from whether you already have enough data for a decent model.
So I guess if you are worried about overtraining, you could keep some of your training data in a stash, start with less (but enough) and add batches of images to the model's training data gradually, like, a batch every several 100k iterations? It would make the process a bit less effective probably, but will protect against overtraining because there will +/- always be some new valid data for the model to learn. Something like that?

I guess my question(s) is:

Did I get everything right?
Are there any more pointers on when overtraining can happen? Like for example "You don't need to worry about overtraining until you hit way beyond 10m iterations".
Are there good ways to prevent overtraining other than "don't train it after it's already good enough"? I mean, it looks like a good suggestion, but I have problems evaluating if the model is good enough already (as I previously described). )

Post by **torzdf** » Thu Apr 27, 2023 12:39 pm

Thanks for this, it's a good post.

Overtraining is definitely a thing, however, I have never seen it happen in Faceswap (and I have trained models a VERY long way). That is not to say it isn't a thing, just that I've never hit it (I train with a LOT of data).

Overtraining is generally when the model performs well on data it has seen, but badly on new data. The steps you have shown will help mitigate this.

MaxHunter · Post by **MaxHunter** » Thu Apr 27, 2023 8:32 pm

Great post and questions.

I've been at this for about a year so I am far from an expert.

I've had one recent model turn out terrible, and the only thing I could attest it too is "overtraining." I use a specific B model alot trying to get it down to under .02 loss, and I think you're right, adding more pics/faces at increments might help.

I wish there was a way to implement an alarm noting that loss isn't dropping, or "your loss drop is slowing, consider adding more data to avoid overtraining," etc.

Post by **bryanlyon** » Thu Apr 27, 2023 8:36 pm

So I guess if you are worried about overtraining, you could keep some of your training data in a stash, start with less (but enough) and add batches of images to the model's training data gradually, like, a batch every several 100k iterations? It would make the process a bit less effective probably, but will protect against overtraining because there will +/- always be some new valid data for the model to learn. Something like that?

No, you shouldn't. The fix for overtraining is giving it new data, but the prevention is to give it that data all along. In other words, by keeping that data in you're keeping the model from overtraining from the start. You should not restrict data for overtraining reasons, just give it all to the model so it can do the best job it can getting you a quality deepfake.

FaceSwap has been thoroughly engineered to minimize the chance of overtraining. This is why things like augmentation, flip, and other options were created and added to FS.

Fed · Post by **Fed** » Fri Apr 28, 2023 6:46 am

torzdf wrote: ↑Thu Apr 27, 2023 12:39 pm
I have never seen it happen in Faceswap (and I have trained models a VERY long way). That is not to say it isn't a thing, just that I've never hit it (I train with a LOT of data).

bryanlyon wrote: ↑Thu Apr 27, 2023 8:36 pm
The fix for overtraining is giving it new data, but the prevention is to give it that data all along. In other words, by keeping that data in you're keeping the model from overtraining from the start. You should not restrict data for overtraining reasons, just give it all to the model so it can do the best job it can getting you a quality deepfake.

FaceSwap has been thoroughly engineered to minimize the chance of overtraining. This is why things like augmentation, flip, and other options were created and added to FS.

Ah. Good to know.

Faceswap Forum

Overtraining question

Overtraining question

Re: Overtraining question

Re: Overtraining question

Re: Overtraining question

Re: Overtraining question