Align and Distill: Unifying and Improving Domain Adaptive Object Detection

ALDI: A Unified Framework for DAOD

We propose a new framework, Align and Distill (ALDI), that unifies two common themes within domain adaptive object detection: feature alignment and self-distillation.

ALDI is architecture agnostic, supporting Faster R-CNN, DETR, and YOLO; and ResNet, ConvNeXt, and ViT out of the box.

The ALDI Framework
SADA
UMT
Adaptive Teacher
MIC
Probabilistic Teacher
ALDI++

Many existing methods can be viewed as special cases of ALDI (use tabs to view different methods).

ALDI++: A State-of-the-Art Method for DAOD

Within the ALDI design space, we propose a new method, ALDI++. We improve upon prior work via a robust burn-in procedure (left & center) and multi-task soft distillation (right).

Please see our paper for more implementation details.

Experimental Results

ALDI++ achieves state-of-the-art results on all benchmarks and architectures studied without additional hyperparameter tuning.

The CFC-DAOD Dataset

We introduce an extension to the Caltech Fish Counting Dataset—a domain generalization benchmark sourced from a real-world environmental monitoring application—with new data to enable DAOD. We call our new benchmark CFC-DAOD.

CFC-DAOD focuses on detecting fish (white bounding boxes) in sonar imagery under domain shift caused by environmental differences between the training location (Kenai) and testing location (Channel). With 168k bounding boxes in 29k frames from 150 videos, the dataset is substantially larger than existing DAOD benchmarks while targeting a real-world challenge.

A New Experimental Protocol for DAOD

We find that in prior work, source-only and oracle models are consistently constructed in a way that does not properly isolate domain-adaptation-specific components, leading to misattribution of performance improvements.

We show that including these components significantly improves both source-only and oracle model performance (+7.2 and +2.6 AP50 on Foggy Cityscapes, respectively). Practically, this means that source-only and oracle models are significantly stronger than previously thought, setting more challenging performance targets for algorithm developers.

Experimental Results

Our protocol offers researchers the opportunity to rigorously compare to strong baselines. Not all existing methods outperform a fair source-only baseline. For those that do, there is still room for improvement, especially with stronger backbones like ViT.

BibTeX

@article{
        kay2025align,
        title={Align and Distill: Unifying and Improving Domain Adaptive Object Detection},
        author={Justin Kay and Timm Haucke and Suzanne Stathatos and Siqi Deng and Erik Young and Pietro Perona and Sara Beery and Grant Van Horn},
        journal={Transactions on Machine Learning Research},
        issn={2835-8856},
        year={2025},
        url={https://openreview.net/forum?id=ssXSrZ94sR},
        note={Featured Certification}
}