Detecting Correlated Gaussian Databases
Zeynep Kahraman
Abstract:
This work considers the problem of detecting whether two databases, each consisting of $n$ users with $d$ Gaussian features, are correlated. Under the null hypothesis, the databases are independent. Under the alternate hypothesis, the features are correlated across databases, under an unknown row permutation. Upper and lower bounds are developed to show that for fixed $n$, a phase transition occurs at roughly $rho approx 1/d$ where $rho$ is the correlation coefficient (per feature). On the other hand, the bounds are not tight for $n$. Our results show a lower detection boundary for $rho$ compared to the corresponding recovery problem, where the goal is to decode the unknown permutation, and the required correlation coefficient is $rho approx 1 – n^{-4/d}$.