Deepfake Detection

Deepfake detection made with the big data engine Apache Spark and the Neural Network library Keras.

PySpark is used to distribute the preprocessing of a labelled dataset containing >3000 videos, while Keras is used to build and train a Convolutional Neural Network to classify videos as real or fake.

The project is done as a part of the module "CS4225/CS5425 Big Data Systems for Data Science" at the National University of Singapore.

How To Use

The trained model can be used on a local video or a YouTube video with a command-line interface. This will output the classification of real or fake, and how confident the decision is.

YouTube url

python predict.py --url <youtube-video-url>

Local path

python predict.py --path <path-to-video>

Prerequisites

OpenCV: pip install opencv-python
CVLib: pip install cvlib
Pyspark: pip install pyspark (and some other configurations)
Numpy: pip install numpy
Keras: pip install keras
Tensorflow: pip install tensorflow
MatPlotLib: pip install matplotlib
Pydot: pip install pydot
Pafy: pip install pafy
YouTube-DL: pip install youtube_dl

Or simply:

pip install opencv-python cvlib pyspark numpy keras tensorflow matplotlib pytdot youtube_dl pafy

Dataset

The dataset should be stored in a folder named data, using the dataset gathered with these instructions: https://github.com/ondyari/FaceForensics/blob/master/dataset/README.md

The preprocessing is now done with the assumption that the following commands are run:

python download-FaceForensics.py path/to/project/data -d DeepFakeDetection -c c23 -t videos

and

python download-FaceForensics.py path/to/project/data -d DeepFakeDetection_original -c c23 -t videos

Preprocessing

The preprocessing is distributed with PySpark. It can be run locally, standalone or in a cluster.

The following command can be used when in standalone mode (after setting up master and workers):

spark-submit 
--conf spark.driver.memory=8g 
--conf spark.executor.memory=8g 
--conf spark.memory.offHeap.enabled=true 
--conf spark.memory.offHeap.size=8g 
--conf spark.driver.maxResultSize=8g
--master local[*] spark_preprocess.py

This runs the preprocessing with the default parameters, and stores the result in two numpy arrays; X and y. X contains samples faces and y contains their corresponding label of real or fake.

The real videos will have a higher amount of samples in order to balance the labels.

Convolutional Neural Network

The model is a CNN, and can be trained in CNN.py or the notebook (with plotting) train_model.ipynb

It has the following architecture:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deepfake Detection

How To Use

YouTube url

Local path

Prerequisites

Dataset

Preprocessing

Convolutional Neural Network

Results

Accuracy and Loss

Confusion matrix and ROC

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
figures		figures
model		model
.gitignore		.gitignore
CNN.py		CNN.py
README.md		README.md
interface.py		interface.py
predict.py		predict.py
preprocess.py		preprocess.py
spark_preprocess.py		spark_preprocess.py
testing.py		testing.py
train_model.ipynb		train_model.ipynb

trebua/Deepfake-Detection

Folders and files

Latest commit

History

Repository files navigation

Deepfake Detection

How To Use

YouTube url

Local path

Prerequisites

Dataset

Preprocessing

Convolutional Neural Network

Results

Accuracy and Loss

Confusion matrix and ROC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages