In many applications such as in financial and medical areas it is valuable to have probability estimates of binary outcomes. In this work we study the robustness of evaluation metrics for these estimates using strictly proper scoring rules. This is done by testing on both simulated data and real-world data using the following probabilistic binary classification methods: (robust) logistic regression (Cantoni and Ronchetti, 2001), support vector machine with Platt scaling and random forest with isotonic regression or Platt scaling. For the simulation the focus is on the effect of outliers and/or heavy-tailed error contamination on the (asymptotic) average and variance of the strictly proper scoring rules as well as the reliability curve of the associated model (where outlier contamination occurs in either the training sample or the training and test sample). From the simulation study we confirm that at the model the scores from classic and robust logistic regression are close. In the application we conclude that robust logistic regression is very reliable in many applications and gives competitive scores compared to machine learning methods, whilst enabling statistical inference. Keywords: Prediction, Binary Outcome, Strictly Proper Scoring Rules, Calibration, Robustness, Generalized Linear Models, Asymptotic Relative Efficiency, Sensitivity Analysis, von Mises expansion.

, , , , , , , ,
Zhelonkin, M.
hdl.handle.net/2105/50609
Econometrie
Erasmus School of Economics

Leeuw, E. de. (2019, December 3). Robustness of Evaluation Metrics for Predicting Probability Estimates of Binary Outcomes. Econometrie. Retrieved from http://hdl.handle.net/2105/50609