Learning to Recognize Occluded and Small Objects with Partial Inputs

Hasib Zunair¹

A. Ben Hamza¹

¹Concordia University, Montreal, QC, Canada.

IEEE/CVF Winter Conference on Applications of Computer Vision WACV 2024

[Demo]

[GitHub]

[Paper]

[Video]

[Poster]

Visual comparison of MSL (Ours) and CSRA on VOC2007 and MS-COCO datasets. First two rows show samples from VOC2007 where MSL can better recognize small objects and the second row shows cases of heavy occlusions. The last two rows shows samples from MS-COCO. For both datasets, MSL is effective at recognizing small and occluded objects compared to the baseline.

How to interpret the results
Aloha! While computer vision algorithms work well on some images, they often fail in others largely due to small size of objects and occlusions. Even though this is not a solved problem, quite far from it, we believe our work is a significant step forward in solving the recognition of small and occluded objects. There has been some concurrent work on this subject as well. Specifically, see MCAR.

Our appraoch aims to explicitly focus on context from neighbouring regions around objects. Further, this also enables to learn a distribution of association across classes. Ideally to handle situations in-the-wild where only part of some object class is visible, but where us humans might readily use context to infer the classes presence.

We also explored this technique in our previous work Masked Supervised Learning for Semantic Segmentation (BMVC 2022, Oral) where we find that MSL trained models are significantly compute efficient, better segments ambiguous and small regions, and is shape aware as it enables to quite well segment heavily masked regions.

Do enjoy our results, and definitely try the model yourself!

Abstract

Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While promising, existing multi-label image recognition models do not explicitly learn context-based representations, and hence struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and hence context. Motivated by this intuition, we propose Masked Supervised Learning (MSL), a single-stage, model-agnostic learning paradigm for multi-label image recognition. The key idea is to learn context-based representations using a masked branch and to model label co-occurrence using label consistency. Experimental results demonstrate the simplicity, applicability and more importantly the competitive performance of MSL against previous state-of-the-art methods on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects.

Our Method

Paper and Supplementary Material

Hasib Zunair and A. Ben Hamza
Learning to Recognize Occluded and Small Objects with Partial Inputs.
In WACV, 2024.
(hosted on ArXiv)

Acknowledgements

This website template was originally made by Phillip Isola and Richard Zhang, code can be found here.