Every ten years the U.S. Census Bureau conducts a nationwide survey that sets the terms for the country’s democracy. The questionnaire yields rich data, including people’s names, street addresses, ages, races, ethnicities, and other details. People’s responses help determine dynamics of power, such as how seats are apportioned in the House of Representatives, where voting districts get divided, and which communities receive federal funds.
But the bureau, tasked with releasing summaries of the results while simultaneously protecting people’s privacy, faces a Catch-22. “Every time you publish a statistic you leak information about that confidential database,” as Simson Garfinkel, a computer scientist with the bureau, told a Census advisory committee in May.
If people believe their responses will not be kept private and secured, they may opt not to respond. And with the proposed addition of a sensitive question to the 2020 Census—asking whether a respondent is an American citizen—heeding the privacy mandate becomes paramount.
There’s a problem though: the usual methods for preserving people’s privacy no longer afford sufficient protection. In November 2016, a team of researchers successfully reconstructed an alarming portion of the most recent Census’s confidential database. Out of 308,745,538 respondents, records for 46% of the population were reassembled using public 2010 Census data and the team’s statistical tools; allowing for a year’s wiggle room on age, the proportion jumped to 71%. By combining the bureau’s published tables with other datasets, the researchers found they could re-identify 17% of the population.
John Abowd, chief scientist at the U.S. Census Bureau and leader of the 2016 study, says the old privacy safeguards are ineffective. Swapping respondent information between different geographic blocks, for instance, won’t cut it. “Turns out nobody is well enough buried in the haystack,” he says.