Deng-Ping Fan; Jing Zhang; Gang Xu; Ming-Ming Cheng; Ling Shao
We identify a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. To this end, we propose a new dataset and update the previous saliency benchmark. Specifically, our dataset includes images with both salient and non-salient objects from several common object categories. Each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, existing saliency models are designed to achieve mapping from the training image set to the ground-truth set. We argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. Therefore, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Extensive results demonstrate the effectiveness of these tricks.