À propos de cet article

Citez

A standard method to evaluate new features and changes to e.g. Web sites is A/B testing. A common pitfall in performing A/B testing is the habit of looking at a test while it’s running, then stopping early. Due to the implicit multiple testing, the p-value is no longer trustworthy and usually too small. We investigate the claim that Bayesian methods, unlike frequentist tests, are immune to this “peeking” problem. We demonstrate that two regularly used measures, namely posterior probability and value remaining are severely affected by repeated testing. We further show a strong dependence on the prior probability of the parameters of interest.