Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more
so when the objects are small. While promising, existing multi-label image recognition models do not
explicitly learn context-based representations, and hence
struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded
objects requires knowledge of partial input, and hence context. Motivated by this intuition, we
propose Masked Supervised Learning (MSL), a single-stage, model-agnostic
learning paradigm for multi-label image recognition. The key idea is to learn context-based representations
using a masked branch and to model label co-occurrence using label consistency.
Experimental results demonstrate the simplicity, applicability and
more importantly the competitive performance of MSL against previous state-of-the-art methods
on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random
masking and demonstrate its effectiveness in recognizing non-masked objects.
|