Unsupervised domain Adaptation (UDA) aims to learn and transfer generalized features from a labelled source domain to a target domain without any annotations. Existing methods only aligning high-level representation but without exploiting the complex multi-class structure and local spatial structure. This is problematic as 1) the model is prone to negative transfer when the features from different classes are misaligned; 2) missing the local spatial structure poses a major obstacle in performing the fine-grained feature alignment. In this paper, we integrate the valuable information conveyed in classifier prediction and local feature maps into global feature representation and then perform a single mini-max game to make it domain invariant. In this way, the domain-invariant feature not only describes the holistic representation of the original image but also preserves mode-structure and fine-grained spatial structural information. The feature integration is achieved by estimating and maximizing the mutual information (MI) among the global feature, local feature and classifier prediction simultaneously. As the MI is hard to measure directly in high-dimension spaces, we adopt a new objective function that implicitly maximizes the MI via an effective sampling strategy and a discriminator design. Our STructure-Aware Feature Fusion (STAFF) network achieves the state-of-the-art performances in various UDA datasets.