Comparing model types with examples

Post by **abigflea** » Thu Jul 16, 2020 3:31 am

Since I started with Faceswap I've heard many questions over and over.
Which is the best model? How many iterations? Why does my swap look blurry?

The real answer is somewhere between having good data, lots of patience, some artistry and creativity.
I feel different models work best in different situations, and you will have to figure out what/how that is.

Postprocessing in Adobe After Effects could make them look professional, and way beyond my capacity, and not the point of these examples.
Thought I could help by showing y'all some models I made out of the exact same data.
Will update these post as I do more. Suggestions welcome.

Model A is a youtuber.
She had lots of data available. Closeups, different expressions, makeup, lighting, to make reasonably good training data.
YouTube is NOT the best place to find good data, but this person had much to choose from, with clear views of her face.

Model B is Karen Gillan.
Used a few YouTube videos, and several episodes of Dr Who, and high quality photos.

The video for converting was chosen to show different lighting, makeup and distance from the camera. So you can see the models react differently to different situations.

These models have not been trained to perfection, also called convergence. In fact, I didn't try very hard to make them perfect.
Really just wanted to show the raw output. Every one could be trained better, with better data, and made to look better by tweaking (I didn't do). I am by no means a professional at this, but I have time, I am curious and have 4 video cards.

All in all, I think the dataset is at least good/mediocre... not amazing. About ₄₅₀₀ pics for each.
I did manually adjust the alignments and mask for about 40% for each set to near perfect, very time consuming, likely unnecessary. "Close" would have been fine for training.

Most of these were trained on a Nvidia 1070. Smaller models were trained on 1060s.

What have I learned:
Close-ups are really hard. Something at the distance of the Jennifer Lawrence-Buscemi video are very possible to make flawless.
You can see this in the example video I've used.
Screen filling faces, cause troubles.

Training higher quality models really do add lots of time. About 160-172 pixels per side seem to be the sweetspot for my 8gb card, but takes ages.
Currently I'm trying to train the best possible at 192... a low batch of 6 ...and it is still getting better at 500K iterations ₁₈₀ hours. Plus side, I actually don't need a super computer to do what studios were doing in the early 2000's, just a single 4 year old video card and patience. What I also know, is 128-160 would be fine for some pretty convincing swaps.

Thanks to the mods and supporters continuing to help me with this.
There will be more examples, edits, and advice added as time goes on.

Some requested stats.
Original : 140 EGs/sec B:100
IAE: 115EG/s B:64
LightWeight: 50EGs/sec B:64
Dfaker: 47.5EGs/sec B:100
DFL-H128 : 29.5 EGs/sec B:80
DFL-SAE-DF @128 : 9.2 EGs/sec
DFL-SAE-Liae @128 : 8.8 EGs/sec B:16
DFL-SAE-Liae @192 : 3.9 EGs/sec B:6
Dlight: 15.7EGs/sec B:14
Realface: 11.2EGs/sec B:8
villain: 7.8EGs/sec B:10

Post by **abigflea** » Thu Jul 16, 2020 4:04 am

64pixel in/out models
These are Lightweight, Original, and IAE

Differences I noticed.
Lightweight: Eyes not exactly looking in the correct direction. Seems to have the double eyebrow thing going on more.
This is mainly because of coverage, but these were done exactly the same so there seems to be some degree of another factor.
IAE: Eyes jitter around a bit more. Teeth look slightly better. (may be solved by using spacial & temporal filtering found in alignments tab)
Original: Shape of the face and mouth seem slightly off, although MAYBE a bit sharper than the other two.

Close-ups : Learned closeups are hard to get clear, especially with just 64pix in/out. If the face is just in the background or otherwise not taking up the entire screen, the slight pixelation will be reduced greatly.
Double eyebrows : Can be fixed with more coverage and shouldn't be due to the models, although there is some variance from run to run on any model on how/what it learns. This likely explains the differences and would be different again if i ran them again.

Post by **abigflea** » Thu Jul 16, 2020 4:14 am

Dfaker From our forum post on training.

(64px input, 128px output) - This model leverages some different techniques from the original model, and also focuses on upscaling an input to a higher resolution output. Despite being around for a while, this model still achieves great results, whilst its lack of customization options makes it an easy 'fire and forget' model.

Dfaker: Double eyebrows again ( Can be fixed with more coverage viewtopic.php?f=6&t=146#config) Her left eye looks odd in the first section, but that seems to not be a issue in the rest.
Otherwise it looks a bit better!

Post by **abigflea** » Thu Jul 16, 2020 4:55 am

Now these are where things started getting good and interesting.
DFL-H128 , DFL-SAE (DF and Liae architecture), and Unbalanced set to 128/128

DFL-H128: Looks pretty good, Lets call this the base quality for 128 in/out
DFL-SAE DF architecture: Looks even better. Teeth and eyes look fairly realistic.
DFL-SAE Liae architecture : Looks basically the same quality as the DF architecture but the teeth look different. Eyes MAY be not as clear.
Unbalanced: Teeth not as defined, a bit of 'googly eyes' going on. Not looking the same direction. May be fixed with more training, but curious because it was trained basically the same number of Iterations and batch.

You may have noticed at this point a red smudge seems to appear by her eyebrows sometimes. Karen Gillan likes to have hair in her face covering one eye, suppose that's her thing. I did not mask every one out and now it appears in the trained model. Lesson for this is, have good data, mask and alignments.

For fun I ran this model again at 192in/128 out (smaller batch so needed more iterations) and got this.
Looks even better, even the closeups.

Post by **abigflea** » Thu Jul 16, 2020 5:15 am

Dlight viewtopic.php?f=6&t=146#choose

(128px input, 128-384px output) - A higher resolution model based on the dfaker variant, focusing on upscaling the faces, with custom upscalers. This is the newest model and is very easily configurable.

Dlight : Dlight seems to be popular with many people although this one was the only one that crashed, twice, and and to use a previous save. Edit: don't get me wrong, this is a good model, it was just finicky.
Teeth didn't look as defined. This may be a indication it needs more training, It is a 256 out model and may need 2x-4x more than the 128out from above. Same thoughts about the eyes.

Users have informed me that Dlight is great but can be a bit touchy. I saw spikes, but those are transient and not usually an issue.
If your model starts to look terrible, or starts crashing , try turning down the learning rate in Training options. Try reducing it to 4.5e-5 or as low as 4.0e-5.
If things are just fine, leave it at default 5.0e-5.
Any lower and it may not help or effectively stops learning ( ya dumbed it up too much?). Refer to the training documentation.

I only trained this one for 80 hours. The 2 DFL were at 120 hours trained too get to the same number of iterations.
Will revisit this and post.

On the previous video I used "warp to landmarks", so I ran the model again without warping and got this... unsure why the anomaly below her right eye appeared, and only in some scenes. Googly eyes all over. I am assuming this corruption happened early in the model and didn't disappear even with fit training. Suppose if I run it again, it would look different/ better.

Post by **abigflea** » Thu Jul 16, 2020 5:36 am

Villain (128px input, 128px output) - Villain is likely the most detailed model but very VRAM intensive and can give sub-par color matching when training on limited sources. Is the source of the viral Steve Buscemi/Jennifer Lawrence Deepfake. As this model does not have any customization options (beyond a low memory variant) it is a decent choice if you want a higher resolution model without having to adjust any settings.
Realface (64-128px input, 64-256px output) - The successor to the Unbalanced model. Takes learnings from that model and Dfaker, whilst looking to develop them further. This model is highly customizable, but it's best to tweak the options when you have some idea of what you are doing and what impact the settings will have.

I had a heck of a time with these 2 models. I KNOW Villian can look amazing, and suspect realface is as good, or nearly so.
Here I think I just ran out of patience and time. These models would likely run better on a Nvidia 2080 11gb or Tesla 16-32gb. Maybe you will have more patience or luck than I did.
To be clear these seem to be undertrained, even villian at 440k and 154 hours!
You can tell they look like they are heading in the right direction, and would likely look amazing if I could train them properly.
I may finish these some day

Post by **bryanlyon** » Thu Jul 16, 2020 4:09 pm

Great work, thanks for doing this.

calipheron · Post by **calipheron** » Fri Jul 24, 2020 9:19 pm

Fantastic work, it's nice to have a more detailed result to examine.

Personally, I have done a MILLION iterations with both DFL-SAE and Realface, and am unable to get results as detailed as my model B source files provide. (5400~ faces) I'm quite disheartened at this point. I'm certain that my source files are good enough, even after extraction the 256x256 faces have obvious detail in the eyes and teeth. I'm simply not getting that with my swaps, though.

Post by **abigflea** » Sat Jul 25, 2020 6:16 pm

I'm trying some ideas currently to see how to help exactly what your talking about.
After some thought and time I think the liae architecture may be a bit better.
Surely haven't run Realface enough to o have an opinion. Need more hardware!
What I have also found curious is how well some swaps will work and others won't with the same model, same 'actor' , and nearly identical scenes.
Suppose I'm trying to quantify some degree of artistry and creativity.... Not so easy to do

Post by **torzdf** » Sun Jul 26, 2020 9:09 am

abigflea wrote: ↑Sat Jul 25, 2020 6:16 pm
Suppose I'm trying to quantify some degree of artistry and creativity.... Not so easy to do

That's certainly true, however your attempts are admirable!

Post by **abigflea** » Mon Aug 03, 2020 9:38 am

DFL vs Dlight
Notice details. lipsync, eye blinks, expressions. See the differences around the 40 sec mark.

Edit I ran the Dlight model another 2X with a higher batch and the images improved, but not the conversion of expressions.

MIZOWIJO · Post by **MIZOWIJO** » Thu Aug 06, 2020 5:16 pm

On average, DFL-SAE seems to be able to achieve fast and reasonable results.

Post by **abigflea** » Thu Aug 06, 2020 6:13 pm

Actually split between dfl-sae and DLight.

cosmico · Post by **cosmico** » Wed Aug 19, 2020 5:41 pm

Reposting because I have a little more to say plus a request

--Could you post how many iterations you did for all of those examples? You mentioned the number in some but not all (the DFL, the 64 in and out, and the 64 in 128 out)

Very interesting post. just like you mentioned, Dlight has also been giving me lots of problems. I really liked your dfl-sae at 192. I think that will be my next project will be a DFL-sae at 208 pixels. If I had to just throw a guess out there, I think Dlights current popularity may be the result of the people wanting something as "fast" and "lightweight" as original and lightweight but they want higher resolution size than 64. They want a "original 2.0". And since Dlight is the newest, supports up to a massive 384, and has the word "light" in its name it may have seduced some people.

If you ever do an experiment like this again, I'd be curious to see what all the models look like at an exact same point -whether that point is 100 hours or 300k iterations as well as each taken to their max for max comparison.

Post by **abigflea** » Wed Aug 19, 2020 6:30 pm

Dfaker 221K Iter Batch 32

Most models I ran to around 180K - 220K , Batch size needs to also be factored in as well.
Can only say is I used 'max batch' size for each model.

I feel you are asking some of the questions I had, but realized some of it was a bit more complicated.
Iterations+batch and time is needed, but when does it become visually pleasing?
How does it change depending on the quality of your data?
When is it "finished" , when is it becoming "over trained"... Those answers become somewhat ambiguous.

So, I ran them all till they were good enough. With so many models I had to put a hard stop at some point.
I think all these took about 70 days to finish, including some false starts and a partition-geddon event.

Dlight can be a bit flaky but is is good.

The DFL-sae 192 version ran near 800K iterations and I don't think it was done. That is something to consider. An exponential increase in time needed with the increase of training resolution. There is a degree of up-scaling as well, so best to take a guess and see how they work out. A little post-processing the mask directly (beyond my creative capacity) can do wonders.

Unless you are dealing with faces that just about take up the entire screen at 2K - 4K I don't see it useful to go beyond 192.
I've done several at 128 in and out that just looked great. All depends on the scene.

Currently Im running a different set of models to show how the different loss functions affect an otherwise identical setup. They will take a while because I need to get them all closer to convergence. I expect the differences to be somewhat subtle , if noticeable at all.

Maybe I should do a short set to show how different in/out resolutions appear? I need a super computer. 4 gpu aren't enough to keep up with my own curiosity.

cosmico · Post by **cosmico** » Thu Aug 20, 2020 8:29 pm

So abigflea gave me permission to use his videos and try to make them play sidebyside or overlap for better comparison, but I quickly found out that my poor little computer can't handle running faceswap and video editing at the same time.
So I popped open photoshop and made a gif.
Gifs are low quality, and bad at fine details, so I did my best to compensate for this by doing things like stretching the images by an additional 60% and so on.
.
Also I wanted it to move fast so you can better compare them, but that might make it hard to determine which model is each face, So I added a big number next to the faces so you can easily just remember that.

Post by **abigflea** » Thu Aug 20, 2020 8:34 pm

Thanks!
Keeping in mind there is some randomness, you can surely see the differences in a different way.
FYI I kept the coverage and everything else as identical as possible.
Once I quit fooling around with these loss functions, Ill do another comparison like this for specifically trained resolutions .

cosmico · Post by **cosmico** » Thu Aug 20, 2020 9:27 pm

Yeah I noticed the randomness thats so weird!! I assume if they were all trained to perfection most of that randomness would go away. Also I updated the coverage %

I got a bunch more to post so be ready for the spam.
I stretched them again to compensate for being a gif, but I also went with much larger gifs
.
.
This one is a test of a much larger face, intended to show shortcomings of the 64/64's

.
This one was a face slightly in motion

.
This one was a test of teeth quality

.
Another big face test

.
And one more for good measure since I liked how it came out here for abigflea

You said dflsae@192 got a lot more training time I believe, so it makes sense why it's so good. It seems like your 2 dflsae@128 were the best. Df or Liae really just seems to be a preference for what look you are going for. Dfaker and Unbalanced did pretty good when it came to the eyes, but both struggled with the mouth and teeth. Dlight sucked in all of these, I'm guessing it needs a lot more training than the dfl's. The three 64 by 64's did exactly like you'd imagine them to. However IAE and Lightweight seemed to do a decent job every once and a while.

Post by **abigflea** » Fri Aug 21, 2020 1:03 am

That is very curious. You have shown me some differences I didn't notice
I did notice Dlights structure didn't seem to work as well, but frames themselves looked amazing.
Meaning the visual quality of Dlight looked great on any frame, although in a temporal context looks sub par in this example..

Dlight /DFL/ villain/ Dfaker all have pros and cons.. and maybe if I ran them all longer they may all even out as far as quality. May look better in a different swap. Was just showing trying to show the differences.
"Better" depends on what you're doing. I've done a few with low resolution and looked amazing.

Iae is interesting, but need to be just a touch higher res to be interesting compared to what's currently available.
I have some other ideas I need to run, suppose I will go buy two more 2070 and put them to work for a month or 3. Maybe a few $$K for cloud computing

cosmico · Post by **cosmico** » Sun Sep 13, 2020 4:55 pm

This post has seriously helped me alot. I've got a DFLH128 model I'm super excited about now thats 93% perfect with in 50 hours of training.

When starting new models/projects, now I refer to those gifs I made of your videos all the time.

Faceswap Forum

Comparing model types with examples

Comparing model types with examples

64pix in/out

64 in / 128 out.

128px in/out

dlight 128 in / 256 out

Villiain and Realface

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing models with examples

Re: Comparing model types with examples

Re: Comparing model types with examples

Re: Comparing model types with examples

Re: Comparing model types with examples

Re: Comparing model types with examples

Re: Comparing model types with examples

Re: Comparing model types with examples