Abstract
Introduction Existing saliency detection algorithms in the literature have ignored the importance of time. They create a static saliency map for the whole recording time. However, bottom-up and top-down attention continuously compete and the salient regions change through time. In this paper, we propose an unsupervised algorithm to predict the dynamic evolution of bottom-up saliency in images.
Method We compute the variation of low-level features within non-overlapping patches of the input image. A patch with higher variation is considered more salient. We use a threshold to ignore less salient parts and create a map. A weighted sum of this map and its center of mass is calculated to provide the saliency map. The threshold and weights are set dynamically. We use the MIT1003 and DOVES datasets for evaluation and break the recording to multiple 100ms or 500ms-time intervals. A separate ground-truth is created for each interval. Then, the predicted dynamic saliency map is compared to the ground-truth using Normalized Scanpath Saliency, Kullback-Leibler divergence, Similarity, and Linear Correlation Coefficient metrics.
Results The proposed method outperformed the competitors on DOVES dataset. It also had an acceptable performance on MIT1003 especially within 0-400ms after stimulus onset.
Conclusion This dynamic algorithm can predict an image’s salient regions better than the static methods as saliency detection is inherently a dynamic process. This method is biologically-plausible and in-line with the recent findings of the creation of a bottom-up saliency map in the primary visual cortex or superior colliculus.
Competing Interest Statement
The authors have declared no competing interest.