Most Named Entity Recognition (NER) systems use additional features like part-of-speech (POS) tags, shallow parsing, gazetteers, etc. Adding these external features to NER systems have been shown to have a positive impact. However, creating gazetteers or taggers can take a lot of time and may require extensive data cleaning. In this work instead of using these traditional features we use lexicographic features of Chinese characters. Chinese characters are composed of graphical components called radicals and these components often have some semantic indicators. We propose CNN based models that incorporate this semantic information and use them for NER. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. We present one of the first studies on Chinese OntoNotes v5.0 and show an improvement of + .64 F1 score over the baseline. We present a state-of-the-art (SOTA) F1 score of 71.81 on the Weibo dataset, show a competitive improvement of + 0.72 over baseline on the ResumeNER dataset, and a SOTA F1 score of 96.49 on the MSRA dataset.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved