[CVPR 2025] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
This is an official PyTorch implementation of "OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels".
Top-down attention plays a crucial role in the human vision system, wherein the brain initially obtains a rough overview of a scene to discover salient cues (i.e., overview first), followed by a more careful finer-grained examination (i.e., look closely next). However, modern ConvNets remain confined to a pyramid structure that successively downsamples the feature map for receptive field expansion, neglecting this crucial biomimetic principle. We present OverLoCK, the first pure ConvNet backbone architecture that explicitly incorporates a top-down attention mechanism. Unlike pyramid backbone networks, our design features a branched architecture with three synergistic sub-networks: 1) a Base-Net that encodes low/mid-level features; 2) a lightweight Overview-Net that generates dynamic top-down attention through coarse global context modeling (i.e., overview first); and 3) a robust Focus-Net that performs finer-grained perception guided by top-down attention (i.e., look closely next). To fully unleash the power of top-down attention, we further propose a novel context-mixing dynamic convolution (ContMix) that effectively models long-range dependencies while preserving inherent local inductive biases even when the input resolution increases, addressing critical limitations in existing convolutions. Our OverLoCK exhibits a notable performance improvement over existing methods.
We highly suggest using our provided dependencies to ensure reproducibility:
# Environments:
cuda==12.1
python==3.10
# Dependencies:
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu121
pip install natten==0.17.1+torch230cu121 -f https://shi-labs.com/natten/wheels/
pip install timm==0.6.12
pip install mmengine==0.2.0
π‘ To accelerate training and inference, we utilize the efficient large-kernel convolution proposed in RepLKNet. Please follow this guideline to install the
depthwise_conv2d_implicit_gemm
function.π‘ If you encounter network issues during the installation of
natten
, please download this package and install it locally.
Prepare ImageNet with the following folder structure, you can extract ImageNet by this script.
βimagenet/
βββtrain/
β βββ n01440764
β β βββ n01440764_10026.JPEG
β β βββ n01440764_10027.JPEG
β β βββ ......
β βββ ......
βββval/
β βββ n01440764
β β βββ ILSVRC2012_val_00000293.JPEG
β β βββ ILSVRC2012_val_00002138.JPEG
β β βββ ......
β βββ ......
Models | Input Size | FLOPs (G) | Params (M) | Top-1 (%) | Download |
---|---|---|---|---|---|
OverLoCK-XT | 224x224 | 2.6 | 16 | 82.7 | model |
OverLoCK-T | 224x224 | 5.5 | 33 | 84.2 | model |
OverLoCK-S | 224x224 | 9.7 | 56 | 84.8 | model |
OverLoCK-B | 224x224 | 16.7 | 95 | 85.1 | model |
To train OverLoCK
models on ImageNet-1K with 8 gpus (single node), run:
bash scripts/train_xt_model.sh # train OverLoCK-XT
bash scripts/train_t_model.sh # train OverLoCK-T
bash scripts/train_s_model.sh # train OverLoCK-S
bash scripts/train_b_model.sh # train OverLoCK-B
To evaluate OverLoCK
on ImageNet-1K, run:
MODEL=overlock_xt # overlock_{xt, t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint
If you find this project useful for your research, please consider citing:
@inproceedings{lou2025overlock,
title={OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels},
author={Meng Lou and Yizhou Yu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
If you have any questions, please feel free to create issuesβ or contact me π§.