PEEKABOO2: Adapting Peekaboo with Segment Anything Model for Unsupervised Object Localization in Images and Videos

Hasib Zunair

This is a work in progress. 🛠️

[GitHub]

Clip from the movie 1917, directed by Sam Mendes.

Formula 1 race videos.

Gameplays of different games.

Choreographies of Loco by ITZY (2021) and Easy by Le Sserafim (2023).

Clips from Jurrasic World movie and underwater video by Simon Gingins.

Disclaimer: The videos included in this demonstration are the property of their respective owners. They are used only for academic and research purposes.

Abstract

Recent progress in promptable visual segmentation, like the Segment Anything Model 2 (SAM2), has shown strong ability to segment and track user-defined objects in images and videos. However, SAM2 cannot natively handle discovering salient objects, as it has no mechanism to detect which objects are salient. Moreover, it cannot segment and track salient objects automatically because it relies on human inputs. So how can we automatically segment and track salient objects without knowing anything about them? This work introduces PEEKABOO2, which is specifically designed for unsupervised salient object detection and tracking. PEEKABOO2 finds the salient object on the first frame, which acts as a prompt that is propagated to get spatio-temporal masks across the entire video. As a result, PEEKABOO2 is able to discover, consistently segment and track diverse salient objects in images and videos without the need for retraining, fine-tuning and human intervention.

Acknowledgements

This website template was originally made by Phillip Isola and Richard Zhang, code can be found here.