Face Parsing assigns every pixel in a facial image with a semantic label, which could be applied in various applications including face recognition, facial beautification, affective computing and animation. While lots of progress have been made in this field, current state-of-the-art methods still fail to extract real effective feature and restore accurate score map, especially for those facial parts which have large variations of deformation and fairly similar appearance, e.g. mouth, eyes and thin eyebrows. In this paper, we propose a novel pixel-wise face parsing method called Residual Encoder Decoder Network (RED-Net), which combines a feature-rich encoder-decoder framework with adaptive prior mechanism. Our encoder-decoder framework extracts feature with ResNet and decodes the feature by elaborately fusing the residual architectures in to deconvolution. This framework learns more effective feature comparing to that learnt by decoding with interpolation or classic deconvolution operations. To overcome the appearance ambiguity between facial parts, an adaptive prior mechanism is proposed in term of the decoder prediction confidence, allowing refining the final result. The experimental results on two public datasets demonstrate that our method outperforms the state-of-the-arts significantly, achieving improvements of F-measure from 0.854 to 0.905 on Helen dataset, and pixel accuracy from 95.12% to 97.59% on the LFW dataset. In particular, convincing qualitative examples show that our method parses eye, eyebrow, and lip regins more accurately.
Published Date: 2018-02-08
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.