An Information-Based Approach to Punctuation

Bilge Say

Punctuation marks have a special importance in bringing out the meaning of a text. There has been recent computational work concentrating on punctuation marks in Natural Language Processing (NLP) mostly following Nunberg’s pioneering work (Nunberg 1990), in which he bridged the gap between descriptive linguistic treatments of actual usage of punctuation and prescriptive accounts, by putting down the features of a "text grammar" for the orthographic sentence. Several grammars for syntactic parsing incorporating punctuation were then shown by NLP researchers to reduce parse failures and ambiguities in parsing (Briscoe 1996). Nunberg’s approach to presenting punctuation (and other formatting devices) was partially incorporated into Natural Language Generation systems. However, little has been done on how punctuation marks bring semantic and discourse-based cues to the text and whether those cues can be exploited computationally. The aim of this thesis is to analyze, in an information-based framework, the semantic and discourse aspects of punctuation, drawing computational implications for NLP systems. This will not only enable NLP software writers to make use of the punctuation marks effectively but also may reveal interesting linguistic phenomena in conjunction with punctuation marks.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.