Driven by deep neural networks (DNN), the recent development of computer vision makes vision sensors such as stereo cameras and Lidars ubiquitous in autonomous cars, robotics and traffic monitoring. However, a traditional DNN-based data fusion pipeline like object tracking has to hard-wire an engineered set of DNN models to a fixed processing logic, which makes it difficult to infuse new models to that pipeline. To overcome this, we propose a novel neural-symbolic stream reasoning approach realised by semantic stream reasoning programs which specify DNN-based data fusion pipelines via logic rules with learnable probabilistic degrees as weights. The reasoning task over this program is governed by a novel incremental reasoning algorithm, which lends itself also as a core building block for a scalable and parallel algorithm to learn the weights for such program. Extensive experiments with our first prototype on multi-object tracking benchmarks for autonomous driving and traffic monitoring show that our flexible approach can considerably improve both accuracy and processing throughput compared to the DNN-based counterparts.