Multi-view unsupervised feature selection (MV-UFS) aims to select a feature subset from multi-view data without using the labels of samples. However, we observe that existing MV-UFS algorithms do not well consider the local structure of cross views and the diversity of different views, which could adversely affect the performance of subsequent learning tasks. In this paper, we propose a cross-view local structure preserved diversity and consensus semantic learning model for MV-UFS, termed CRV-DCL briefly, to address these issues. Specifically, we project each view of data into a common semantic label space which is composed of a consensus part and a diversity part, with the aim to capture both the common information and distinguishing knowledge across different views. Further, an inter-view similarity graph between each pairwise view and an intra-view similarity graph of each view are respectively constructed to preserve the local structure of data in different views and different samples in the same view. An l2,1-norm constraint is imposed on the feature projection matrix to select discriminative features. We carefully design an efficient algorithm with convergence guarantee to solve the resultant optimization problem. Extensive experimental study is conducted on six publicly real multi-view datasets and the experimental results well demonstrate the effectiveness of CRV-DCL.