Maritime surveillance is essential to avoid illegal activities and for environmental protection. However, the unlabeled, noisy, irregular time-series data and the large area to be covered make it challenging to detect illegal activities. Existing solutions focus only on trajectory reconstruction and probabilistic models that do ignore the context, such as the neighboring vessels. We propose a novel representation learning method that considers both temporal and spatial contexts learned in a self-supervised manner, using a selection of pretext tasks that do not require to be labeled manually. The underlying model encodes the representation of maritime vessel data compactly and effectively. This generic encoder can then be used as input for more complex tasks lacking labeled data.