Thursday Poster Symposium | 2022 North American School of Information Theory

On the Epistemic Limits of Personalized Prediction

Carol Long

Abstract:

Predictive models often include group attributes that encode personal characteristics like sex, blood type, or HIV status. Personalized models must ensure fair use — i.e., groups who provide personal data should expect to receive a tailored improvement in accuracy compared to a non-personalized model. In this paper, we derive conditions under which one can detect fair use violations in predictive models and characterize when estimating fair use is impossible. We propose a metric to evaluate the worst-case accuracy gain across groups called the benefit of personalization (BoP). We obtain bounds on the error probability for testing if the BoP is above a target threshold given finite samples. Remarkably, our bounds provide an information-theoretic limit on the number of group attributes that a model can use to allow for verification — beyond this limit, it is impossible to reliably detect if personalization harms or benefits all groups. We also derive statistical limits for the minimax mean-square error of estimating the BoP. Our results show that there is no way to reliably determine if a personalized model with k >= 19 attributes benefits every group that provides personal data, even when we are given a dataset with N = 8*10^9 samples (i.e., one observation for each person in the world).