This paper presents details on planning and designing human studies for Human-Robot Interaction. There is a discussion of the importance of using large sample sizes to better represent the populations being investigated in order to have a better chance of obtaining statistically significant results for small to medium effects. Coverage of the four primary methods of evaluation are presented: (1) self-assessments, (2) behavioral observations, (3) psychophysiological measures, and (4) task performance metrics. The paper discusses the importance of using multiple methods of evaluation in order to have reliable and accurate results and to obtain convergent validity. Recommendations for planning and designing a large-scale, complex human study are detailed as well as lessons learned from a recent study that was conducted using 128 participants, four methods of evaluation, and a high fidelity, simulated disaster site.