Searching for Optimum Training Settings

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
tomward16
Posts: 9
Joined: Fri Jul 19, 2019 11:22 am

Searching for Optimum Training Settings

Post by tomward16 »

Hi All,

Firstly thank you for creating this easy-to-use application and GUI. Really grateful. I've been playing around using different trainers and mask options but still don't seem to be getting the same sort of results as shown by the creator of this. My input A is 720p HD filmed from an iPhone and my input B is 720p from youtube. I'm using approximately 2000 images per input. The training loss plateaus at about 0.03 and I get fairly decent results but you can still quite obviously see where it has been swapped as it is slightly blurrier and looks obvious. I'm using the extended mask option whilst training and have tried both the original and the IAE trainer with inputs of 512px which I set when doing the initial extract from the videos.

Is there anything I can do to increase the quality/lower the loss rate even more?

What combination of settings from the extract through to conversion have people found to give the best results?

Any suggestions would be most appreciated!

Thanks again!

Tom

[mod]Editted the topic to make more sense to viewers.[/mod]

by torzdf » Fri Jul 19, 2019 3:03 pm

Welcome to the Forum!

From what you have said, you are probably already aware, but the absolutely most important thing is quality of data.

Unfortunately YouTube isn't a great source of data as they perform quite aggressive compression on their video files. If you can get your target from other HD Sources, then you should, but I appreciate that isn't always possible.

That said, this:

with inputs of 512px which I set when doing the initial extract from the videos.

really isn't necessary and most likely will slow down training. For the models you are using, the input is downsized to fit into a 64px square, so the default extract size should be fine. The size parameter is really only there to support future development.

Original and IAE are 2 of the oldest models in the application. Depending on your GPU you are likely to get better results using a newer model. The Dfaker model is a decent model for balance between quality and GPU.

Close-ups will always be tricky, due to the aforementioned model input size. Dfaker (and others) output at 128px+ which helps mitigate this a bit, but until there is either consistently more VRAM available, or models become more efficient, to be able to fit more data into the same space, this is likely to always be an issue.

Another cause of blurry output (outside of close-ups and quality of data) is undertraining. Try not to focus on the loss values too much, and concentrate on the previews, and to a lesser extent when loss has "converged" (that is, it has stopped dropping). As a general rule, when you can see individual teeth, that tends to indicate it is getting towards being done... but there are no hard and fast rules, and it varies from model to model and from data to data.

Go to full post
User avatar
torzdf
Posts: 2672
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Optimum Training Settings

Post by torzdf »

Welcome to the Forum!

From what you have said, you are probably already aware, but the absolutely most important thing is quality of data.

Unfortunately YouTube isn't a great source of data as they perform quite aggressive compression on their video files. If you can get your target from other HD Sources, then you should, but I appreciate that isn't always possible.

That said, this:

with inputs of 512px which I set when doing the initial extract from the videos.

really isn't necessary and most likely will slow down training. For the models you are using, the input is downsized to fit into a 64px square, so the default extract size should be fine. The size parameter is really only there to support future development.

Original and IAE are 2 of the oldest models in the application. Depending on your GPU you are likely to get better results using a newer model. The Dfaker model is a decent model for balance between quality and GPU.

Close-ups will always be tricky, due to the aforementioned model input size. Dfaker (and others) output at 128px+ which helps mitigate this a bit, but until there is either consistently more VRAM available, or models become more efficient, to be able to fit more data into the same space, this is likely to always be an issue.

Another cause of blurry output (outside of close-ups and quality of data) is undertraining. Try not to focus on the loss values too much, and concentrate on the previews, and to a lesser extent when loss has "converged" (that is, it has stopped dropping). As a general rule, when you can see individual teeth, that tends to indicate it is getting towards being done... but there are no hard and fast rules, and it varies from model to model and from data to data.

My word is final

User avatar
tomward16
Posts: 9
Joined: Fri Jul 19, 2019 11:22 am

Re: Optimum Training Settings

Post by tomward16 »

Thank you so much for your reply. It really does help. If I can ask one more question?

I've been trying to ready up about the batch size. I'm currently using the default 64. According to what I've read, if I reduce the size I am likely to get better details. In your experience, would you agree?

Thanks Again,

Tom

User avatar
torzdf
Posts: 2672
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Optimum Training Settings

Post by torzdf »

I would.

Higher batch sizes are averaging over a larger dataset, so may normalize out little details.

There is still some debate over what is best, but generally, if I am training a model where I can have a high batch size, I will normally start on a high batch size, then reduce it once the face starts forming.

My word is final

User avatar
tomward16
Posts: 9
Joined: Fri Jul 19, 2019 11:22 am

Re: Optimum Training Settings

Post by tomward16 »

Thanks once again, greatly appreciated! Just wish it didn't take 12+ hours to find out if my settings are correct. There's just so many variables and with just a single GTX 1070 it's slow progress!

Any way of sharing the computation over a network?!

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Optimum Training Settings

Post by bryanlyon »

Latency really kills any gains sharing training would give. Synchronizing the two systems would be very slow and would effectively be slower than a single system in many cases.

User avatar
torzdf
Posts: 2672
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Optimum Training Settings

Post by torzdf »

Not easily, no, given the sheer amount of data that needs to be transferred.

The slow going is one of the more frustrating parts of it, and is what makes model development take so long. If you have to wait 12-48 hours to find out if the model you are developing has improved at all it somewhat burns out motivation.

It's one of those things you kinda learn over time on what works and what doesn't, but despite it's technical nature, it is something which seems to develop with "feel" rather than any hard and fast rules about what will make a good swap.

Hell, quite often I think I have everything perfect, and my final result is disappointing. Other times I will do some half-hearted effort and it will come out great.

My word is final

User avatar
tomward16
Posts: 9
Joined: Fri Jul 19, 2019 11:22 am

Re: Optimum Training Settings

Post by tomward16 »

Fair point. I just wondered if like network rendering you could portion out say Model A to one, and Model B to another. As you can tell I'm new to this but find it all fascinating so am still exploring options that have probably well and truly been tried, tested and quashed by experienced people like yourselves!

User avatar
tomward16
Posts: 9
Joined: Fri Jul 19, 2019 11:22 am

Re: Optimum Training Settings

Post by tomward16 »

torzdf wrote: Fri Jul 19, 2019 3:47 pm

Not easily, no, given the sheer amount of data that needs to be transferred.

The slow going is one of the more frustrating parts of it, and is what makes model development take so long. If you have to wait 12-48 hours to find out if the model you are developing has improved at all it somewhat burns out motivation.

It's one of those things you kinda learn over time on what works and what doesn't, but despite it's technical nature, it is something which seems to develop with "feel" rather than any hard and fast rules about what will make a good swap.

Hell, quite often I think I have everything perfect, and my final result is disappointing. Other times I will do some half-hearted effort and it will come out great.

Exactly that and because I'm finding that one of the greatest variables is the input data, your 'perfect recipe' doesn't really exist as you have to create a new recipe for each set of input data. Ah well...I'm still enjoying it so far!

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Optimum Training Settings

Post by bryanlyon »

tomward16 wrote: Fri Jul 19, 2019 3:48 pm

Fair point. I just wondered if like network rendering you could portion out say Model A to one, and Model B to another.

Unfortunately, in order to swap they require a shared encoder. If we could eliminate that then we could train on separate machines. Right now my best suggestion is to just put multiple cards in one system. There, latency is low so you can use them together. Though you still have limited for to diminishing returns.

Locked