Proceedings:
No. 1: Thirty-First AAAI Conference On Artificial Intelligence
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 31
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
In this paper, we propose a biologically-plausible model to explain the emergence of motion tracking behaviour in early development using unsupervised learning. The model's training is biased by a concept called retinal constancy, which measures how similar visual contents are between successive frames. This biasing is similar to a reward in reinforcement learning, but is less explicit, as it modulates the model's learning rate instead of being a learning signal itself. The model is a two-layer deep network. The first layer learns to encode visual motion, and the second layer learns to relate that motion to gaze movements, which it perceives and creates through bi-directional nodes. By randomly generating gaze movements to traverse the local visual space, desirable correlations are developed between visual motion and the appropriate gaze to nullify that motion such that maximal retinal constancy is achieved. Biologically, this is similar to using saccades to look around and learning from moments where a target and the saccade move together such that the image stays the same on the retina, and developing smooth pursuit behaviour to perform this action in the future. Restricted Boltzmann machines are used to implement this model because they can form a deep belief network, perform online learning, and act generatively. These properties all have biological equivalents and coincide with the biological plausibility of using saccades as leverage to learn smooth pursuit. This method is unique because it uses general machine learning algorithms, and their inherent generative properties, to learn from real-world data. It also implements a biological theory, uses motion instead of recognition via local searches, without temporal filtering, and learns in a fully unsupervised manner. Its tracking performance after being trained on real-world images with simulated motion is compared to its tracking performance after being trained on natural video. Results show that this model is able to successfully follow targets in natural video, despite partial occlusions, scale changes, and nonlinear motion.
DOI:
10.1609/aaai.v31i1.11235
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 31