Scene understanding addresses the issue of "what a scene contains?" Existing research on scene understanding is typically focused on classifying a scene into classes that are of the same category type. These approaches, although they solve some scene-understanding tasks successfully, in general fail to address the semantics in scene understanding. For example, how does an agent learn the concept label "red" and "ball" without being told that it is a color or a shape label in advance? To cope with this problem, we have proposed a novel research called semantic scene concept learning. Our proposed approach models the task of scene understanding as a "multilabeling" classification problem. Each scene instance perceived by the agent may receive multiple labels coming from different concept categories, where the goal of learning is to let the agent discover the semantic meanings, i.e., the set of relevant visual features, of the scene labels received. Our preliminary experiments have shown the effectiveness of our proposed approach in solving this special intra- and inter- category mixing learning task.