Source stage
88.88%
DHVT clean CIFAR-10 accuracy, the strongest source-stage model in the study.
Computer Vision · April 2026
Explainable and budget-aware transfer-learning study comparing CNN, ViT, and DHVT across CIFAR-10, EuroSAT, and Brain Tumor MRI.
Unified training pipeline: one configurable framework for CNN, ViT, and DHVT across source and downstream datasets.
Frugal-learning protocol: CIFAR-10 data-efficiency runs plus downstream comparison between scratch, frozen-backbone linear probing, and full fine-tuning.
Checkpoint-first evaluation: saved model weights, per-run histories, plot regeneration, and canonical result export through a master results table.
Explainability workflow: Grad-CAM for CNN, attention rollout for ViT, head-token influence for DHVT, confusion matrices, class diagnostics, and misclassification interpretability.
Source stage
DHVT clean CIFAR-10 accuracy, the strongest source-stage model in the study.
EuroSAT
Best downstream EuroSAT result with DHVT trained from scratch.
Brain MRI
Best downstream Brain Tumor MRI result with pretrained DHVT and full fine-tuning.
Architecture comparison: DHVT was strongest on clean CIFAR-10, CNN remained competitive in the low-data regime, and vanilla ViT was the most robust to texture corruption.
Budget-aware transfer: linear probing reduced cost for transformer-style models, but the performance drop was too large to make it the preferred transfer strategy.
XAI insight: the explanation maps often showed that failures came from semantically plausible confusion rather than attention drifting to unrelated background.
Selected report-ready panels from the experiment repository.