AAAI Publications, The Twenty-Eighth International Flairs Conference

Font Size: 
Leveraging Paraphrase Labels to Extract Synonyms from Twitter
Maria Antoniak, Eric Bell, Fei Xia

Last modified: 2015-04-07

Abstract


We present an approach for automatically learning synonyms from a corpus of paraphrased tweets. The synonyms are learned by using shallow parse chunks to create candidate synonyms and their context windows, and the synonyms are substituted back into a paraphrase detection system that uses machine translation metrics as features for a classifier. We find a 2.29% improvement in F1 when we train and test on the paraphrase training set, demonstrating the importance of discovering high quality synonyms. We also find 9.8% better coverage of the paraphrase corpus using our synonyms rather than larger, existing synonym resources, demonstrating the power of extracting synonyms that are representative of the topics in the test set.

Keywords


Twitter, synonyms, paraphrase

Full Text: PDF