In recent years several tools based on statistical methods and machine learning have been incorporated in security related tasks involving classification, such as intrusion detection systems (IDSs), fraud detection, spam filters, biometrics and multimedia forensics. Measuring the security performance of these classifiers is an essential part for facilitating decision making, determining the viability of the product, or for comparing multiple classifiers. There are however relevant considerations for security related problems that are sometimes ignored by traditional evaluation schemes. In this paper we identify two pervasive problems in security-related applications. The first problem is the usually large class imbalance between normal events and attack events. This problem has been addressed by evaluating classifiers based on cost-sensitive metrics and with the introduction of Bayesian Receiver Operating Characteristic (B-ROC) curves. The second problem to consider is the fact that the classifier or learning rule will be deployed in an adversarial environment. This implies that good performance on average might not be a good performance measure, but rather we look for good performance under the worst type of adversarial attacks. In order to address this notion more precisely we provide a framework to model an adversary and define security notions based on evaluation metrics.