Most existing facial landmark detection algorithms regard the manually annotated landmarks as precise hard labels, therefore, the accurate annotated landmarks are essential to the training of these algorithms. However, in many cases, there exist deviations in manual annotations, and the landmarks marked for facial parts with occlusion and large poses are not always accurate, which means that the “ground truth” landmarks are usually not annotated precisely. In such case, it is more reasonable to use soft labels rather than explicit hard labels. Therefore, this paper proposes to associate a bivariate label distribution (BLD) to each landmark of an image. A BLD covers the neighboring pixels around the original manually annotated point, alleviating the problem of inaccurate landmarks. After generating a BLD for each landmark, the proposed method firstly learns the mappings from an image patch to the BLD of each landmark, and then the predicted BLDs are used in a deformable model fitting process to obtain the final facial shape for the image. Experimental results show that the proposed method performs better than the compared state-of-the-art facial landmark detection algorithms. Furthermore, the proposed method appears to be much more robust against the landmark noise in the training set than other compared baselines.