Live Deep Fakes — you can now change your face to someone else’s in real time video applications.
TL;DR; What happens if Deep Fakes become real time? Let’s try it!
Edit: 9/4/2020 — Due to popular demand, I've now updated my code with a simpler version that only needs a single picture and doesn't require training. Check out the latest version here=> https://github.com/alew3/faceit_live3
Deep fakes is a technology that uses AI Deep Learning to swap a person's face onto someone else's. With this technique we can create a very realistic “fake” video or picture — hence the name.
This became a reality after researchers started using an auto-encoder neural architecture to accomplish this. The basic idea is pretty simple: for each face we train an encoder and corresponding decoder neural network. The trick happens when we encode the picture from the first person with their encoder, but decode it with the second persons decoder!
For the training part we need to gather a few hundred pictures (the more the better) from each person in different poses (very easy to do for celebs) in order to learn about them. We can even extract them from existing videos making our task very straightforward. After the neural network has trained and learnt the characteristics about each individual’s face it can than begin "dreaming" what the person would look like in poses it has never seen before.
Before this technology came along, face swap was done manually using Photoshop. This requires a skilled person and time, so it can’t be used for videos where it has to be done in every frame to be realistic.
But all of this changed at the beginning of the year a software called "deep fakes" was anonymously released that enabled anyone to try it out and it spread like wild fire. The first use was a controversial one: put celebrity faces onto porn videos! This non-consensual use of body parts made Reddit, which is normally a very open minded community decide it was too much and banned the /r/deepfakes subreddit.
The second big use case that came right after is a funnier one: place Nicolas Cage in hundreds of different movies he was never originally casted in.
The internet loves Nick Cage for some unknown reason and this reminded me of another funny spurious (aka fake) correlation with him.
The 3rd big use case: Live Deep Fakes
The first thing that came into my mind when I first heard about Deep Fakes, is what would happen if we could create DeepFakes in realtime and not just for existing videos or photos? Suppose we can go online with someone else’s face, would this be funny or would this push the ethical boundaries even further? I decided to see how much effort it would be to try it out.
I have a pretty fast computer (Quad-core i7 6700K with a Titan X (Pascal) GPU running Ubuntu) and reckoned that it might just work. I got my idea up and running pretty well in a single weekend! This was possible, as I started by forking the excellent work of https://github.com/deepfakes/faceswap which is based on deepfakes but not the original software and the code from Gaurav Oberoi https://github.com/goberoi/faceit which made it very easy to extract images from YouTube videos.
Results
After training the model for just over 48h we can get some really good results. For the real time render on the webcam it works at a very good frame rate and will look normal in a videoconference that already isn't butter smooth.
It was very amusing to get online as John Oliver and invite friends to see their reaction. I can imagine how this could get even funnier in sites like Chat Roulette (not sure if it still is a thing) or if someone tries to fake their identities using live video.
Even though it looks very realistic you still can spot that something is strange and the person won’t have the real person's voice .. so for now we are safe! But as the technologies get better and we can also copy someone’s voice .. things will begin to get scary!
How to try it out yourself
Just head to https://github.com/alew3/faceit_live and follow the instructions! I provided a Docker file to make things easier.
Edit: 9/4/2020 — Due to popular demand, I’ve now updated my code with a simpler version that only needs a single picture and doesn’t require training. Check out the latest version here=> https://github.com/alew3/faceit_live3