FLODA: FLorence-2 Optimized for Deepfake Assessment
Model Description
FLODA (FLorence-2 Optimized for Deepfake Assessment) is an advanced deepfake detection model that leverages the power of Vision-Language Models (VLMs). It's designed to surpass existing deepfake detection models by integrating image captioning and authenticity assessment into a single end-to-end architecture.
Key Features
- Utilizes Florence-2 as the base VLM for both caption generation and deepfake detection
- Reframes deepfake detection as a Visual Question Answering (VQA) task
- Incorporates image caption information for enhanced contextual understanding
- Employs rsLoRA (rank-stabilized Low-Rank Adaptation) for efficient fine-tuning
- Demonstrates strong generalization across diverse scenarios
- Shows robustness against adversarial attacks
Model Architecture
FLODA is based on the Florence-2 model and consists of two main components:
- Vision Encoder: Uses DaViT (Dual Attention Vision Transformer)
- Multi-modality Encoder-Decoder: Based on a standard transformer architecture
The model is fine-tuned using rsLoRA, with the following configuration:
- Rank (r): 8
- Alpha (α): 8
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, out_proj, lm_head
Performance
FLODA achieves state-of-the-art performance in deepfake detection:
- Average accuracy across all datasets: 97.14%
- Strong performance on both real and fake image datasets
- 100% accuracy on several fake datasets and all attacked datasets
Usage
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch
# Load the model and processor
model_path = "path/to/floda/model"
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("cuda").eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
def detect_deepfake(image_path):
image = Image.open(image_path).convert("RGB")
task_prompt = "<DEEPFAKE_DETECTION>"
text_input = "Is this photo real?"
inputs = processor(text=task_prompt + text_input, images=image, return_tensors="pt").to("cuda")
with torch.no_grad():
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
result = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))[task_prompt]
return "Real" if result.lower() == "yes" else "Fake"
# Example usage
result = detect_deepfake("path/to/image.jpg")
print(f"The image is: {result}")
Training Data
FLODA was trained on a dataset including:
- Real images: MS COCO
- Fake images: Generated by SD2 and LaMa
Evaluation Data
The model was evaluated on 16 datasets:
- 2 real image datasets: MS COCO, Flickr30k
- 14 fake image datasets generated by various models (e.g., SD2, SDXL, DeepFloyd IF, DALLE-2, SGXL)
- Includes datasets with stylized images, inpainting, resolution changes, and face-swapping
- Adversarial, backdoor, and data poisoning attack datasets
Limitations
- Performance on the ControlNet dataset (77.07% accuracy) is lower compared to some competing models
- The model's effectiveness on very recent or future AI-generated image techniques not included in the training or evaluation datasets is uncertain
Ethical Considerations
While FLODA shows promising results in deepfake detection, it's important to consider:
- The potential for false positives or negatives, which could have significant implications depending on the use case
- The need for continuous updating as new image generation techniques emerge
- Privacy considerations when processing user-submitted images
Model Card Authors [optional]
- Youngho Bae (Hanyang University)
- Gunhui Han (Yonsei University)
- Seunghyeon Park (Yonsei University)
Model Card Contact
For inquiries about this model card or the FLODA model, please contact:
Youngho Bae Email: byh711@gmail.com
Framework versions
- PEFT 0.12.0
- Downloads last month
- 469
Model tree for byh711/FLODA-deepfake
Base model
microsoft/Florence-2-base-ft