In recent work we showed that models constructed from planner performance data over a large suite of benchmark problems are surprisingly accurate; 91-99% accuracy for success and 3-496 seconds RMSE for runtime. In this paper, we examine the underlying causes of these accurate models. We deconstruct the learned models to assess how the features, the planners, the search space topology and the amount of training data facilitate predicting planner performance. We find that the models can be learned from relatively little training data (e.g., performance on 10% of the problems in some cases). Generally, having more features improves accuracy. However, the effect is often planner-dependent: in some cases, adding features degrades performance. We identify that the most prominent features in the models are domain features, though we find that the runtime models still have a need for better features. In the last part of the paper, we examine explanatory models to refine the planner dependencies and to identify linkages between problem structure and specific planners' performance.