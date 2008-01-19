[...] Differential privacy, first described in 2006, isn't a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will "leak" from a public data set to open the door to reconstruction.

"Any time you release a statistic, you're leaking something," explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. "The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary" on that risk.

A database can be considered differentially protected if the information it yields about someone doesn't depend on whether that person is part of the database. Differential privacy was originally designed to apply to situations in which outsiders make a series of queries to extract information from a database. In that scenario, each query consumes a little bit of what the experts call a "privacy budget." After that budget is exhausted, queries are halted in order to prevent database reconstruction.

In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction.