PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization

Hasib Zunair¹

A. Ben Hamza¹

¹Concordia University, Montreal, QC, Canada.

British Machine Vision Conference BMVC 2024

[Demo]

[GitHub]

[Paper]

[Video]

[Poster]

Visual comparison of PEEKABOO and state-of-the-art FOUND on ECSSD, DUT-OMRON and DUTS-TE datasets. Across all datasets, PEEKABOO excels in localizing salient objects, particularly when they are small, reflective, or when background is poorly illuminated. PEEKABOO also avoids over-segmenting salient objects, segmenting non-salient regions, and producing noisy predictions. Zoom in to observe the results more closely.

TL;DR: A segmentation model with zero-shot generalization to unfamiliar images and objects that are small, reflective or under poor illumination without the need for additional training. Our approach aims to explicitly model contextual relationship among pixels in a self-supervised procedure through image masking for unsupervised object localization.

How to interpret the results
While unsupervised object localization algorithms work well on many cases, they often fail to segment salient objects that are small, reflective or when the background is poorly illuminated. These real-world characterstics result in methods to over-segment, segment non-salient regions and produce noisy segmentation maps. Even though this is not a solved problem, quite far from it, we believe our work is a significant step forward in solving unsupervised object localization in the aforementioned scenarios. There has been some concurrent work on this subject as well. Specifically, see FOUND.

Our approach aims to explicitly model contextual relationship among pixels through image masking for unsupervised object localization. In a self-supervised procedure (i.e. pretext task) without any additional training (i.e. downstream task), context-based representation learning is done at both the pixel-level by making predictions on masked images and at shape-level by matching the predictions of the masked input to the unmasked one.

We explored this technique in supervised settings in our previous works. Specifically, for recognition of small and occluded objects (WACV 2024) and for segmentation of ambiguous regions and is shape aware while being simple and compute efficient (BMVC 2022, Oral).

Do enjoy our results, and definitely try the model yourself!

Abstract

Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information such as the appearance, type and number of objects, as well as the lack of labeled object classes typically available in supervised settings. While recent approaches to unsupervised object localization have demonstrated significant progress by leveraging self-supervised visual representations, they often require computationally intensive training processes, resulting in high resource demands in terms of computation, learnable parameters, and data. They also lack explicit modeling of visual context, potentially limiting their accuracy in object localization. To tackle these challenges, we propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization by learning context-based representations at both the pixel- and shape-level of the localized objects through image masking. The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision. The experimental results, both quantitative and qualitative, across various benchmark datasets, demonstrate the simplicity, effectiveness and competitive performance of our approach compared to state-of-the-art methods in both single object discovery and unsupervised salient object detection tasks.

Results on unfamiliar images and objects

We show some results of Peekaboo on unfamiliar objects that are out-of-domain of ImageNet and DUT-TR, of different scales and shapes which are correctly localized. Specifically, octopus, dinosaurs and spaceships are not present in ImageNet/DUT-TR, Peekaboo can still detect them. These results show the ability of Peekaboo to discover multiple and diverse objects, basically which "are not background".

Videos

BMVC Talk 11/2024, also see [Slides].

Running Peekaboo on video feed from a webcam.

Try the Interactive Demo

[🤗 Spaces] Demo

Try Peekaboo!

[GitHub] Code

Paper and Supplementary Material

Hasib Zunair and A. Ben Hamza
PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization.
In BMVC, 2024.
(hosted on ArXiv)

Acknowledgements

This website template was originally made by Phillip Isola and Richard Zhang, code can be found here.