We consider dynamic route planning for a fleet of Autonomous Mobile Robots (AMRs) doing fetch and carry tasks on a shared factory floor. In this paper, we propose Stochastic Work Graphs (SWG) as a formalism for capturing the semantics of such distributed and uncertain planning problems. We encode SWGs in the form of a Euclidean Markov Decision Process (EMDP) in the tool Uppaal Stratego, which employs Q-Learning to synthesize near-optimal plans. Furthermore, we deploy the tool in an online and distributed fashion to facilitate scalable, rapid replanning. While executing their current plan, each AMR generates a new plan incorporating updated information about the other AMRs positions and plans. We propose a two-layer Model Predictive Controller-structure (waypoint and station planning), each individually solved by the Q-learning-based solver. We demonstrate our approach using ARGoS3 large-scale robot simulation, where we simulate the AMR movement and observe an up to 27.5% improvement in makespan over a greedy approach to planning. To do so, we have implemented the full software stack, translating observations into SWGs and solving those with our proposed method. In addition, we construct a benchmark platform for comparing planning techniques on a reasonably realistic physical simulation and provide this under the MIT open-source license.