Recent progress in promptable visual segmentation, like the Segment Anything Model 2 (SAM2),
has shown strong ability to segment and track user-defined objects in images and videos.
However, SAM2 cannot natively handle discovering salient
objects, as it has no mechanism to detect which objects are salient. Moreover, it cannot
segment and track salient objects automatically because it relies on human inputs.
So how can we automatically segment and track salient objects without knowing anything about them? This work
introduces PEEKABOO2, which is specifically designed for unsupervised salient object detection and tracking.
PEEKABOO2 finds the salient object on the first frame, which acts as a prompt that is propagated
to get spatio-temporal masks across the entire video. As a result, PEEKABOO2 is able to
discover, consistently segment and track diverse salient objects in images and videos without the need for retraining,
fine-tuning and human intervention.
|