Thursday Poster Symposium

Seeded Database Matching Under Synchronization Errors

Serhat Bakirtas

Serhat Bakirtas

Abstract:

The re-identification or de-anonymization of users
from anonymized data through matching with publicly-available
correlated user data has raised privacy concerns, leading to the
complementary measure of obfuscation in addition to anonymization.
Recent research provides a fundamental understanding
of the conditions under which privacy attacks are successful,
either in the presence of obfuscation or synchronization errors
stemming from the sampling of time-indexed databases. This
paper presents a unified framework considering both obfuscation
and synchronization errors and investigates the matching of
databases under noisy column repetitions. By devising replica
detection and seeded deletion detection algorithms, and using
information-theoretic tools, sufficient conditions for successful
matching are derived. It is shown that a seed size logarithmic
in the row size is enough to guarantee the detection of all
deleted columns. It is also proved that this sufficient condition
is necessary, thus characterizing the database matching capacity
of database matching under noisy column repetitions.