The paper considers solving the multiple sequence alignment, a combinatorial challenge in computational biology, where several DNA RNA, or protein sequences are to be arranged for high similarity. The proposal applies randomized Monte-Carlo tree search with nested rollouts and is able to improve the solution quality over time. Instead of learning the position of the letters, the approach learns a policy for the position of the gaps. The Monte-Carlo beam search algorithm we have implemented has a low memory overhead and can be invoked with constructed or known initial solutions. Experiments in the BAliBASE benchmark show promising results in improving state-of-the-art alignments.