Abstract:
Due to the huge amount of data produced on large social media, capturing useful content usually implies to focus on subsets of data that fit with a pre-specified need. Considering the usual API restrictions of these media, we formulate this task of focused capture as a dynamic data sources selection problem. We then propose a machine learning methodology, named WhichStreams, which is based on an extension of a recently proposed combinatorial bandit algorithm. The evaluation of our approach on various Twitter datasets, with both offline and online settings, demonstrates the relevance of the proposal for leveraging the real-time data streaming APIs offered by most of the main social media.
DOI:
10.1609/icwsm.v9i1.14587