TLDR: We introduce a new framework (ALDI), state-of-the-art method (ALDI++), dataset (CFC-DAOD), and benchmarking protocol for unsupervised domain adaptation of object detectors (DAOD).
Published in Transactions on Machine Learning Research 2025 with a Featured Certification (Spotlight).
We propose a new framework, Align and Distill (ALDI), that unifies two common themes within domain adaptive object detection: feature alignment and self-distillation.
ALDI is architecture agnostic, supporting Faster R-CNN, DETR, and YOLO; and ResNet, ConvNeXt, and ViT out of the box.
Many existing methods can be viewed as special cases of ALDI (use tabs to view different methods).
Within the ALDI design space, we propose a new method, ALDI++. We improve upon prior work via a robust burn-in procedure (left & center) and multi-task soft distillation (right).
Please see our paper for more implementation details.
ALDI++ achieves state-of-the-art results on all benchmarks and architectures studied without additional hyperparameter tuning.
We introduce an extension to the Caltech Fish Counting Dataset—a domain generalization benchmark sourced from a real-world environmental monitoring application—with new data to enable DAOD. We call our new benchmark CFC-DAOD.
CFC-DAOD focuses on detecting fish (white bounding boxes) in sonar imagery under domain shift caused by environmental differences between the training location (Kenai) and testing location (Channel). With 168k bounding boxes in 29k frames from 150 videos, the dataset is substantially larger than existing DAOD benchmarks while targeting a real-world challenge.
We find that in prior work, source-only and oracle models are consistently constructed in a way that does not properly isolate domain-adaptation-specific components, leading to misattribution of performance improvements.
We show that including these components significantly improves both source-only and oracle model performance (+7.2 and +2.6 AP50 on Foggy Cityscapes, respectively). Practically, this means that source-only and oracle models are significantly stronger than previously thought, setting more challenging performance targets for algorithm developers.
Our protocol offers researchers the opportunity to rigorously compare to strong baselines. Not all existing methods outperform a fair source-only baseline. For those that do, there is still room for improvement, especially with stronger backbones like ViT.
@article{
kay2025align,
title={Align and Distill: Unifying and Improving Domain Adaptive Object Detection},
author={Justin Kay and Timm Haucke and Suzanne Stathatos and Siqi Deng and Erik Young and Pietro Perona and Sara Beery and Grant Van Horn},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=ssXSrZ94sR},
note={Featured Certification}
}