An important problem for the Internet is how to provide a guaranteed quality of service to users, in contrast to the current "best-effort" service. A key aspect of this problem is how routers should share network capacity between different classes of traffic. This decision needs to be made for each incoming packet, and is known as the packet scheduling problem. A major challenge in packet scheduling is that the behaviour of each traffic class may not be known in advance, and can vary dynamically. In this paper, we describe how we have modelled the packet scheduling problem as an application for reinforcement learning (RL). We demonstrate how our RL approach can learn scheduling policies that satisfy the quality of service requirements of multiple traffic classes under a variety of conditions. We also present an insight into the effectiveness of two different RL algorithms in this context. A major benefit of this approach is that we can help network providers deliver a guaranteed quality of service to customers without manual fine-tuning of the network routers.