Since I started with Faceswap I've heard many questions over and over.
Which is the best model? How many iterations? Why does my swap look blurry?
The real answer is somewhere between having good data, lots of patience, some artistry and creativity.
I feel different models work best in different situations, and you will have to figure out what/how that is.
Postprocessing in Adobe After Effects could make them look professional, and way beyond my capacity, and not the point of these examples.
Thought I could help by showing y'all some models I made out of the exact same data.
Will update these post as I do more. Suggestions welcome.
Model A is a youtuber.
She had lots of data available. Closeups, different expressions, makeup, lighting, to make reasonably good training data.
YouTube is NOT the best place to find good data, but this person had much to choose from, with clear views of her face.
Model B is Karen Gillan.
Used a few YouTube videos, and several episodes of Dr Who, and high quality photos.
The video for converting was chosen to show different lighting, makeup and distance from the camera. So you can see the models react differently to different situations.
These models have not been trained to perfection, also called convergence. In fact, I didn't try very hard to make them perfect.
Really just wanted to show the raw output. Every one could be trained better, with better data, and made to look better by tweaking (I didn't do). I am by no means a professional at this, but I have time, I am curious and have 4 video cards.
All in all, I think the dataset is at least good/mediocre... not amazing. About 4500 pics for each.
I did manually adjust the alignments and mask for about 40% for each set to near perfect, very time consuming, likely unnecessary. "Close" would have been fine for training.
Most of these were trained on a Nvidia 1070. Smaller models were trained on 1060s.
What have I learned:
Close-ups are really hard. Something at the distance of the Jennifer Lawrence-Buscemi video are very possible to make flawless.
You can see this in the example video I've used.
Screen filling faces, cause troubles.
Training higher quality models really do add lots of time. About 160-172 pixels per side seem to be the sweetspot for my 8gb card, but takes ages.
Currently I'm trying to train the best possible at 192... a low batch of 6 ...and it is still getting better at 500K iterations 180 hours. Plus side, I actually don't need a super computer to do what studios were doing in the early 2000's, just a single 4 year old video card and patience. What I also know, is 128-160 would be fine for some pretty convincing swaps.
Thanks to the mods and supporters continuing to help me with this.
There will be more examples, edits, and advice added as time goes on.
Some requested stats.
Original : 140 EGs/sec B:100
IAE: 115EG/s B:64
LightWeight: 50EGs/sec B:64
Dfaker: 47.5EGs/sec B:100
DFL-H128 : 29.5 EGs/sec B:80
DFL-SAE-DF @128 : 9.2 EGs/sec
DFL-SAE-Liae @128 : 8.8 EGs/sec B:16
DFL-SAE-Liae @192 : 3.9 EGs/sec B:6
Dlight: 15.7EGs/sec B:14
Realface: 11.2EGs/sec B:8
villain: 7.8EGs/sec B:10