Hello faceswap community,
I want to surprise a friend of mine who plays basketball by swapping Michael Jordan's face onto his head for his next game. I've never done deepfakes before, hence me posting here and showing you my progress. This is my first post, and I bet you'll see at least a few more detailing my progress because I want to do things right.
What was my goal for this first attempt :
The goal of this first deepfake was to understand the workflow, the concept, etc... But most importantly to test something very specific : whether I can get away with very little footage of MKBHD/my friend.
Why? Because if it works well enough, it would mean that I can just film my friend using a simple lighting setup, and get the "shoot" over with in less than half an hour - so as to not raise any suspicion on his part.
Here's what I did for my extract :
For the MKBHD footage, I used a simple video of just him talking about a product. Here is that video.
For the Michael Jordan footage, I used snippets of "The Last Dance" as well as footage from "Space Jam (1996)". Here is that video.
If my memory serves me well, I extracted every third frame out of these two .MP4s, which yielded 5353 frames of MKBHD and 5095 frames of Michael Jordan after cleaning up.
As per the guide says, I cleaned up my extraction :
For the Michael Jordan footage, I sorted and deleted unwanted faces, cleaned the alignements, then went into manual mode, cleaned some more by removing duplicate faces and other faces that were too far gone to be salvaged. I also tried to reposition the bounding-box whenever I felt it needed to be. I did not do much masking at all, in the case of the MJ I felt that it was mostly good enough, so I left some misaligned faces and obstructed faces in there. Here is an overview of what my alignments looked like, captured from the Manual tool (play it at 0.25x speed):
The MKBHD footage aligned properly from the get-go, I just had to remove some unwanted faces and cleaned the alignments file accordingly. I did not video-capture the alignments from the Manual tool for this reason.
Here's what I did for my training :
I'm running two RTX 2060S on my computer. From what I understand, this means I can use up to 8Gb of VRAM.
I tried running Villain with default settings, but Faceswap ran out of VRAM.
Then I (foolishly) tried running Phaze-A with default settings, but Faceswap ran out of VRAM too.
Then my finger slipped and I ran the Original model with default settings for 250k iterations. That seemed to hold up. It did crash after 12 hours of running but progress was saved so all was good.
Here's an overview of my graph when all was said and done, along with a timelapse of the training (nice feature) :
Here's what the end swap looks like :
"This is the best deepfake that's ever been created", said no one. But hey, I wasn't going to ace it anyway. This is a very iterative process we're talking about. Right?
Now onto my observations and conclusions, which I hope somebody can help me validate/invalidate :
Observations :
It just kinda looks like a blurry MKBHD. They don't bear a significant resemblance IRL, but in my deepfake, I feel like I'm looking at Marques' weird blurry cousin. I don't really "see" Michael Jordan that much. (I did train the right way around, trust me)
The image is very mushy, even from a far.
The hair is disctracting as hell. I think that if MKBHD didn't have that little beard and that hair, it would've been a little more convincing.
There is some flickering going on in the top corners of the deepfake.
Conclusions :
I should have used a bald guy for my second face. In some shots, if I cover up Marques' hair with my thumb on my screen, it starts to resemble MJ.
I should probably only select the sharpest, most crisp footage of MJ, even if it means I only have 2000~ images to feed Faceswap when it comes to training.
Checking the "No-warp" helped towards the end.
I should find a way to output something bigger than 64px. In that regard, will lowering the batch size of - for example - the Villain model, enable me to use it?
Having only a little bit of footage of MKBHD seemed to work well enough for what I wanted to do. EDIT: Nevermind, it seems that having only a little bit of footage of MKBHD might be one of the (if not THE) reason that my swap looks like crap
Here's what I'm going to do for my next test (open to suggestions):
Find a black basket ball player who happens to have no hair and no facial hair to source footage from.
Trim down my source footage of Michael Jordan, while trying to keep only the best and sharpest footage.
Better understand the Manual tool, to better cleanup my MJ data set.
Try another model (open to suggestions), with a lower batch size (if that's what's going to enable me to use another model)