Riley Crane, Didier Sornette
The emergence of the internet as a vehicle for news, commerce, and social activity has created a wealth of information and content. While Google and others have successfully exploited the web's static structure to identify relevance, the proliferation of user generated content on sites like YouTube and Flickr has created a landscape in which quality is not easily identifiable. Here we show how to identify relevant content using information revealed by collective human behavior. We study the dynamics of the daily viewing activity for nearly 5 million videos on YouTube and find an ubiquitous power law relaxation governing the timing of views. Using simple filters, relaxation exponents cluster into three distinct classes, which correspond naturally to the labels of viral, quality, and junk. These results are consistent with an epidemic model on a social network containing two ingredients: A power law distribution of waiting times between cause and action and an epidemic cascade of actions becoming the causes of future actions.
Subjects: 15.7 Search; 12.2 Scientific Discovery
Submitted: Jan 24, 2008