The semantic interpretation of text remains a hard AI problem, and it becomes particularly relevant when large archives have to be built and searched. To aid in this process, advanced Web-based publication services (that is, document servers with surrounding infrastructure) adapt methods from traditional archiving, employing bibliographic metadata to improve data quality. However, most services cannot afford to pay for ex-post annotation, so the document authors themselves must provide metadata content and markup—additional work that may deter them. Services therefore have to acknowledge that while there may be many authors who are potential contributors, the steps to actually publishing with the service and to delivering high-quality metadata are voluntary; i.e. authors who decide to do so are volunteer contributors. In order to win contributors, a service must therefore understand its potential contributors’ concerns, and it must evaluate its capabilities of addressing them. We present a case study of a large university document and publication server. Surveys and Web usage mining identified which kinds of knowledge can or cannot easily be gathered from volunteer contributors. We also describe a tool that aims at improving the HCI incentives by employing text mining methods and presenting an easy-to-use interface to ensure correct markup. We expect that the recommendations and technology and interface concepts can be generalised to the needs of a range of other volunteer services with similar incentive structures.