In this paper optimal state space planning is parallelized by exploiting the processing power of a graphics card. The two exploration steps, namely selecting the actions to be applied and generating the successors, are performed on a graphics processing unit. Duplicate detection, however, is delayed to be executed on the central processing unit. Multiple cores are employed to bypass main memory latency. To increase processing speed for exact duplicate detection, the hash tables are lock-free. Moreover, a bucket-based representation enhances the concurrent distribution of frontier states. The planner supports cost-first exploration and is able to deal with a considerable fraction of current PDDL, including numerical state variables, complex objective functions, and goal preferences. It can maximize the net-benefit. Experimental findings show visible performance gains especially for larger benchmark problems.