Pain Detection via Knowledge Distillation

Project information

Category: Research
Client: Brandenburg Technical University (Research Module in AI)
Project date: February 02, 2026
Project URL: https://github.com/HunainRaza/Pain-Detection-Knowledge-Distillation

Overview

Research project investigating how teacher network architecture affects knowledge distillation quality for automated pain detection from facial expressions. Implemented and extended the DeiT-PNP architecture from El Morabit & Rivenq (2022), adapting it to the SynPain synthetic dataset and introducing a comparative teacher study — ResNet-50 vs. Swin Transformer — not present in the original paper.

Research Question: How do different teacher network architectures (ResNet-50 and Swin Transformer) compare in terms of accuracy and knowledge transfer efficiency when distilling to DeiT for binary pain/no-pain classification from facial expressions?

Problem & Motivation

Automated pain detection from facial expressions has real clinical applications — assessing pain in patients who cannot self-report (post-surgery, neonates, or patients with cognitive impairments). Deep learning models trained for this task tend to be large and computationally expensive. Knowledge distillation addresses this by training a smaller, efficient student model to mimic a larger, more capable teacher enabling deployment in resource-constrained clinical settings.

The architectural choice of teacher model is an open research question. CNN-based teachers (like ResNet-50) encode spatial features hierarchically through convolutions. Transformer-based teachers (like Swin Transformer) encode global attention patterns. Whether this architectural difference affects what the student learns, and how well was the core question explored.

Technical Implementation

Dataset: SynPain

The SynPain dataset contains AI-generated synthetic facial images in a side-by-side format, each image shows two faces, with the labeling determined by the filename:

Filename contains "Pain" → left half = NoPain, right half = Pain
Filename contains "NoPain" → both halves = NoPain

A custom "prepare_dataset.py" script splits each image and applies this labeling rule, producing a structured "Pain/" and "NoPain/" directory with a manifest CSV. The synthetic dataset was generated using Ideogram and Runway, covering diverse demographics (age, gender, ethnicity).

Preprocessing: MTCNN Face Alignment

All faces were processed using MTCNN (Multi-Task Cascaded CNN) from facenet-pytorch:

Face detection with 5 facial landmark points
Alignment based on eye positions
Cropping with margin
Resize to 256×256
Fallback to center-crop if MTCNN fails

Training augmentations included random resized crops, horizontal flip, rotation (±10°), and color jitter, all normalized with ImageNet statistics.

Model Architecture: DeiT with Knowledge Distillation

Student - DeiT-Base

"deit_base_distilled_patch16_224" from "timm", adapted for 256×256 input
Binary classification head (Pain / NoPain)
Distillation token alongside the class token - the architectural feature that enables knowledge distillation natively

Teacher - Two Configurations Compared

ResNet-50: ImageNet-pretrained CNN, frozen during distillation. Provides hard labels (argmax) to supervise the student's distillation token
Swin Transformer: Transformer-based teacher, fine-tuned on SynPain before being used for distillation

Distillation Loss

Hard distillation combines two Binary Cross Entropy terms:

L_total = L_BCE(class_token, ground_truth) + L_BCE(distill_token, teacher_hard_label)

The class token is supervised by the ground truth label. The distillation token is supervised by the teacher's prediction (hard argmax), not a soft probability distribution.

Training Setup

| Parameter | Value |
|---|---|
| Optimizer | Adam |
| Learning rate | 1e-5 |
| LR schedule | StepLR (×0.5 every 10 epochs) |
| Epochs | 30 |
| Batch size | 64 |
| Split | 70 / 15 / 15 (train / val / test) |
| Mixed precision | Optional (AMP) |
| Reproducibility | Seeded + deterministic mode |

Key Features

Unified training entrypoint (train_multi_teacher.py) with "--teacher resnet50" or "--teacher swin" for side-by-side comparison under identical conditions
Comprehensive evaluation: accuracy, precision, recall, F1-score, confusion matrix, ROC/AUC
Reproducible pipeline: seeded operations, deterministic mode, stratified splits
Full inference support: single image (auto-detects side-by-side format) and batch inference with confidence scores
Research posters included in the repository (Posters/ directory)

Tech Stack

Deep Learning: PyTorch, timm (DeiT, Swin Transformer), torchvision (ResNet-50)
Face Detection: facenet-pytorch (MTCNN)
Data & Evaluation: Scikit-learn, Pandas, NumPy
Visualization: Matplotlib, Seaborn
Tooling: Python 3.8+, CUDA, AMP (mixed precision)

Paper Reference

El Morabit, S., & Rivenq, A. (2022). Pain Detection From Facial Expressions Based on Transformers and Distillation. 2022 11th International Symposium on Signal, Image, Video and Communications (ISIVC), IEEE.

Hunain Raza