A Deep Dive Into Deepfakes—How They Are Created & How to Detect Them


Deepfakes are manipulated media that use machine learning to create realistic images or videos of people doing or saying things they never actually did or said. The term “deepfake” was coined in 2018, and since then, the use of deepfakes has exploded, with both positive and negative consequences.

The term “deepfake” is a combination of “deep learning” and “fake.” Deep learning is a subset of machine learning that uses neural networks to mimic the way the human brain works, enabling computers to learn from large amounts of data. Deepfake technology leverages this capability to manipulate media content convincingly.

One of the most common applications of deepfakes is in creating fake videos of celebrities, politicians, or other public figures. These videos can be used to spread misinformation, damage reputations, or incite discord. With the advancement of this technology, detecting deepfakes has become increasingly challenging.

In this article, we will explore the current state of deepfakes, how they are created, and how to detect them.

Creating Deepfakes

Fig 1: Tom Cruise DeepFake
Fig 1: Tom Cruise Deepfake

To create a basic deepfake, you’ll need the following:

Data Collection: Gather a large amount of video footage of the target person (the person whose face you want to replace) and the source person (the person whose face will be used to replace the target’s face). The more high-quality data you have, the better the deepfake will be.

Preprocessing: Use video editing software to trim the footage and extract the frames that will be used to train the deepfake model.

Training the Model: Use a deep learning framework like TensorFlow or PyTorch to train a generative adversarial network (GAN). This network consists of two parts: the generator, which creates the fake images, and the discriminator, which tries to distinguish between real and fake images. Through iterative training, the generator learns to create more realistic images.

Face Alignment and Swapping: Use facial recognition and alignment techniques to match the source person’s face to the target person’s face in each frame of the video. Then, replace the target person’s face with the source person’s face.

Post-processing: Use video editing software to refine the deepfake video, adjust the colors and lighting, and make any other necessary adjustments to improve the overall quality of the deepfake.

Ethical Considerations: Before creating a deepfake, consider the potential consequences of your actions. Deepfakes can be used to spread misinformation or create non-consensual explicit content, which can have serious implications for individuals and society as a whole.

Application of Deepfakes

Let’s look at both the positive and negative use cases of deepfakes.

Positive Uses of Deepfakes

Deepfakes have a range of potential positive uses, such as:

1. Entertainment and Media Production: Deepfakes can be used in the entertainment industry to create realistic special effects and improve the production quality of movies, TV shows, and video games. For example, deepfake technology could be used to seamlessly integrate actors into scenes that would otherwise be too dangerous or costly to film.

2. Education and Training: Deepfakes can be used in educational settings to create realistic simulations for training purposes. For example, deepfake technology could be used to create virtual patient scenarios for medical students to practice diagnosing and treating various conditions.

3. Historical and Cultural Preservation: Deepfakes can be used to recreate historical figures and events, preserving them for future generations. For example, deepfake technology could be used to bring historical speeches to life or recreate ancient civilizations in virtual reality.

4. Accessibility: Deepfakes can be used to improve accessibility for individuals with disabilities. For example, deepfake technology could be used to create sign language interpreters for online videos or to generate audio descriptions for visually impaired individuals.

5. Art and Creativity: Deepfakes can be used as a tool for artistic expression. Artists can use deepfake technology to create surreal and imaginative artworks that push the boundaries of traditional media.

Negative Uses of Deepfakes

All of that said, deepfakes also have the potential to be misused in a number of ways, such as:

1. Misinformation and Fake News: One of the most significant concerns is the use of deepfakes to create and spread misinformation. Deepfakes can be used to create realistic-looking videos of public figures saying or doing things they never actually did, leading to the spread of false information and the manipulation of public opinion.

2. Privacy Violations: Deepfakes can be used to create non-consensual explicit content by superimposing someone’s face onto explicit images or videos. This can have serious consequences for the individuals whose likeness is used, leading to harassment, blackmail, and damage to their reputation.

3. Political Manipulation: Deepfakes can be used to create videos of political figures saying or doing things to manipulate public opinion or undermine trust in institutions. This could have serious implications for the democratic process and political stability.

4. Impersonation and Fraud: Deepfakes can be used for impersonation and fraud, such as creating fake videos to impersonate someone in order to gain access to sensitive information or commit financial fraud.

Read more about executive impersonation

5. Erosion of Trust: The widespread use of deepfakes could lead to a general erosion of trust, as people may become increasingly skeptical of the authenticity of videos and images they see online.

6. Security Concerns: Deepfakes could be used for malicious purposes, such as creating realistic-looking videos to bypass security measures or to create convincing phishing attacks.

Detecting Deepfakes

Fig 2. Indicators of Deepfake
Fig 2. Indicators of Deepfake

Given the potential negative uses of deepfakes, it is important to be able to detect them. However, detecting deepfakes is a challenging task, as they are designed to be highly realistic.

Here are some techniques that can be used to detect deepfakes:

1. Lip-Syncing and Facial Movement Analysis: Deepfakes often struggle with accurately syncing the lip movements of the subject with the audio. Analyzing the consistency of lip movements with speech can help detect discrepancies that suggest a video may be a deepfake. Similarly, analyzing facial expressions and movements for unnatural or inconsistent behavior can also indicate a deepfake.

2. Inconsistencies in Blinking and Eye Movement: Deepfake videos may exhibit unnatural blinking patterns or lack of blinking altogether. Analyzing the frequency and timing of blinking, as well as the movement of the eyes, can help identify deepfakes.

3. Artifacts and Blur Analysis: Deepfake videos often contain artifacts or blur around the face, especially in areas where the face has been manipulated. Analyzing these artifacts using image processing techniques can help identify deepfakes.

4. Facial Geometry and Texture Analysis: Deepfake videos may exhibit inconsistencies in facial geometry and texture. Analyzing the geometry of facial features, such as the distance between the eyes or the shape of the nose, can help detect deepfakes. Similarly, analyzing the texture of the face for unnatural smoothness or blurriness can also indicate a deepfake.

5. Contextual Analysis: Deepfakes often lack contextual details that would be present in a real video. Analyzing the background, lighting, and other contextual cues can help determine if a video is a deepfake.

6. Consistency with Known Data: Comparing the deepfake video with known data, such as other videos of the same person, can help identify inconsistencies that suggest the video is a deepfake.

7. Forensic Analysis: Forensic analysis techniques, such as examining the metadata of the video file or analyzing the compression artifacts, can also help detect deepfakes.

8. Machine Learning and AI Algorithms: Advanced machine learning and AI algorithms can be trained to detect deepfakes by analyzing patterns and inconsistencies in the video data. These algorithms can be highly effective at detecting deepfakes, especially as deepfake technology evolves.

Research on Deepfake Detection

Fig 3. Deepfakes Detection Challenge (DFDC) dataset
Fig 3. Deepfakes Detection Challenge (DFDC) dataset

Over the years the deep learning community has tried to formulate policies for governance when it comes to deepfakes. The technology is powerful and needs to be utilized in a safe and responsible way to not harm the society.

The first step in preparing a strong regulatory structure for deepfake is to design machine learning systems capable of detecting deepfakes. This section goes over the state of the latest research when it comes to detecting deepfake.

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

One of the most intuitive solutions to detect deepfake was proposed by D. Coccomini et al. They proposed a simple idea of combining Convolutional Neural Networks (CNNs) with Vision Transformer (ViT).

They argue that CNNs detect features important to spatial locality which is critical in detecting image patch abnormalities.

The paper proposes two different architectures which combine EfficientNet and Vision Transformer. Both architectures specifically operate on face image patches extracted by a state of the art face detector, MTCNN.

The first architecture is Efficient ViT which combines EfficientNet and ViT sequentially, and where EfficientNet working as a feature extractor followed by ViT encoder.

EffiicentNet produces a visual feature map which can be split up into 7×7 chunks. The patches are projected linearly and further processed by ViT encoder. The CLS token is used to predict a binary classification score.

The next architecture proposed by the paper is Cross Convolution ViT. The motivation with this architecture is that limiting the patch size to only small patches may not be ideal as the deepfake introduces components locally and globally.

The Cross Convolution ViT is a multi- branch model, and uses a S branch for processing small 7×7 patches and a L branch that processes 64×64 patches. Both branches contain EfficientNet feature extractor followed by ViT encoder. The features from S branch and L branch are combined using cross attention mechanism.

Each branch produces a CLS token which are summed up to produce a final classification for the real or fake face.

FaceForensics++: Learning to Detect Manipulated Facial Images

An effective way to detect deepfakes has been to conduct deep forensics on facial images to detect inconsistencies. These anomalies can be detected by domain experts and serve as critical hand crafted features in training machine learning solutions to detect the spatial and temporal inconsistencies in video streams.

Authors in this paper propose using steganalysis data to train XceptionNet model. Transfer learning is used to train the network, and the final fully connected layer of the Xception Net model is removed and replaced by two outputs.

The layers are initialized b ImageNet weights, and all the weights except the final layer are frozen and then pre-train only the final layer for 3 epochs. The subsequentle model is further trained for 15 epochs.


As deepfake technology continues to evolve, detecting deepfakes will remain a critical challenge. By combining forensic analysis, biometric verification, machine learning algorithms, blockchain technology, and collaborative verification, researchers and developers can work towards more effective deepfake detection methods.

Understanding the complexities of deepfake creation and detection is essential in mitigating the potential risks posed by this technology and safeguarding the integrity of digital content.