We consider the disentanglement of the representations of the relevant attributes of the data (content) from all other factors of variations (style) using Variational Autoencoders. Some recent works addressed this problem by utilizing grouped observations, where the content attributes are assumed to be common within each group, while there is no any supervised information on the style factors. In many cases, however, these methods fail to prevent the models from using the style variables to encode content related features as well. This work supplements these algorithms with a method that eliminates the content information in the style representations. For that purpose the training objective is augmented to minimize an appropriately defined mutual information term in an adversarial way. Experimental results and comparisons on image datasets show that the resulting method can efficiently separate the content and style related attributes and generalizes to unseen data.