Object recognition systems can be unreliable when run in isolation depending on only image based features, but their performance can be improved when taking scene context into account. In this paper, we present techniques to model and infer object labels in real scenes based on a variety of spatial relations — geometric features which capture how objects co-occur — and compare their efficacy in the context of augmenting perception based object classification in real-world table-top scenes. We utilise a long-term dataset of office table-tops for qualitatively comparing the performances of these techniques. On this dataset, we show that more intricate techniques, have a superior performance but do not generalise well on small training data. We also show that techniques using coarser information perform crudely but sufficiently well in standalone scenarios and generalise well on small training data. We conclude the paper, expanding on the insights we have gained through these comparisons and comment on a few fundamental topics with respect to long-term autonomous robots.