Large-scale multi-label learning (LMLL) aims to annotate relevant labels from a large number of candidates for unseen data. Due to the high dimensionality in both feature and label spaces in LMLL, the storage overheads of LMLL models are often costly. This paper proposes a POP (joint label and feature Parameter OPtimization) method. It tries to filter out redundant model parameters to facilitate compact models. Our key insights are as follows. First, we investigate labels that have little impact on the commonly used LMLL performance metrics and only preserve a small number of dominant parameters for these labels. Second, for the remaining influential labels, we reduce spurious feature parameters that have little contribution to the generalization capability of models, and preserve parameters for only discriminative features. The overall problem is formulated as a constrained optimization problem pursuing minimal model size. In order to solve the resultant difficult optimization, we show that a relaxation of the optimization can be efficiently solved using binary search and greedy strategies. Experiments verify that the proposed method clearly reduces the model size compared to state-of-the-art LMLL approaches, in addition, achieves highly competitive performance.