The last half of this year saw vigorous debates on the efficiency and reliability of de-identification techniques for large data sets. There remains disagreement about how effective and accessible different de-identification techniques are, how much risk is acceptable, and the feasibility of using auxiliary data sets to re-identify individuals. These questions are important and should discussed rigorously and at length by experts. But most people using data in the service of social good objectives are not experts, nor statisticians, computer scientists, or even very familiar with managing large data sets.
For organizations and advocacy initiatives leveraging data for transparency and accountability, the basic question remains: What can I do to make sure that the data I’m using can not be used to harm people or violate their privacy?
It’s a simple question without a very simple answer, but one which needs to be posed and discussed at the expertise level of project managers and campaigners, not only security gurus.
To help normalize some of these concerns and provide an introduction to basic de-identification concerns and strategies, the Responsible Data Forum is hosting a 90 minute online discussion in mid-January. A small group of experts will present basic de-identification strategies, such as perturbation, trimming and psuedonomynization, discussing their limitations, the contexts in which they’re most appropriate, and the expertise and tools required to use them.
We’re currently coordinating with potential experts, if you know someone who’d be a good fit, please pass this along or put us in touch.
Contact Kristin Antin, the engine room's Community Catalyst, if you're interested to join this event kristin [at] theengineroom [dot] org.
Tool: Appear.in or Jitsi meet + recordMyDesktop + YouTube + blog post (or a similar formula) Duration: One-hour Date/Time: TBD
Join discussion and planning on this and other events: http://lists.theengineroom.org/lists/info/responsible_data