RSS or Rich Site Summary is becoming an invaluable format/tool for news feeds. More and more news publishing organizations are realizing its benefits. Content publishers are joining the already heavily crowded RSS club. In the era of information explosion and peer-to-peer sharing, RSS is a great format for doing content publishing, archiving, sharing and much more. However, it came late. We realize that this should have started at the same time Internet became popular and news organizations are making their on-line debut. During the last decade, an enormous amount of news articles had already been published, and (at the same time,) improperly archived due to the lack of a flexible and widely accepted format of archival. However, better late than never. As we now explore possibilities of RSS, this is the time to make the transition smooth for old unformatted news articles and make it uniform across all (new and old) news articles. To do that we realized that extracting metadata of old news articles is one of the ways to create their RSS versions. In this paper we talk about our progress in extracting news metadata with the use of support vector classifier and show that an ordering of applying the classifiers is more useful than applying them in random order. We also show preliminary results on applying TIMEX tags to extract news events, which can be very useful to go beyond RSS to create individual event lines instead of taking the whole story under a single timeline.
Subjects: 11. Knowledge Representation
Submitted: Feb 13, 2006