From Responsible Data Wiki
Revision as of 20:16, 17 March 2016 by Willowbl00 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
For years, disaster and humanitarian response technologists, field agents, and policy makers have been pushing big data as a way to be self-reflexive and open data as a way to break down the silos rife throughout the sector. While folk tend to talk a lot about ethics and responsibility in response and data, we rarely talk about law (which is those things codified). Ebola response was one more time for organizations, funders, and those affected to wonder what role data would play in their response, and the repercussions of those choices after the response was declared complete. Sean McDonald posted his paper Ebola: A Big Data Disaster to the Responsible Data list, and it spurred a deep conversation about what data should be shared, how practioners who are also open data advocates make hard choices in pressing circumstances. We were all so excited that it made sense to have a real-time conversation about the topic.

Call Details

March 15th 8 a.m. PST, 11 a.m. EDT, 3 p.m. GMT, 5 p.m. CET, 6 p.m. EAT
Hangout on air broadcast link: https://plus.google.com/events/cekgergp643cet3u834qfcf3qu4
Youtube link (live and recorded access): http://youtu.be/xcczOMLQBJM


  • Danna, engine room
  • Jen, HHI, NetHope
  • John, HHI, Global Pulse
  • Sean, Frontline SMS
  • Tim, O'Reilly
  • Zara, engine room
  • Willow, Aspiration
  • Paul C

Suggested reading

Review of the paper, how it instigated a conversation

Ebola: A Big Data Disaster is a brief history of the role tech played in the ebola crisis. Institutional concern, particularly around how info systems became a central actor. A percieved lack of good information led some intelligent, well meaning people to release call-detail records. Some of our most sensitive data. Vary in accuracy depending on source and external circumstances.

An attempt to use data to scale contact tracing: valuable, but difficult to represent digitally. Danah Boyd asks "what is th edirection of this space?" Fighting the ecosystem prediction. Using how it's designed to harm, but these tools are designed to be helpful. Positive uses exist alongside terrifying uses.

danna boyd: "Many activists will be looking to fight the ecosystem of prediction — and to regulate when and where prediction can be used. This is all fine and well when we’re talking about how these technologies are designed to do harm. But more often than not, these tools will be designed to be helpful, to increase efficiency, to identify people who need help. Their positive uses will exist alongside uses that are terrifying. What do we do?"

"Technology is neither good nor bad... nor is it neutral. Mandated quarentine. That's scary! Practical and terrifying outcome is predictive quarantine.

We talk a lot about ethics and responsibility, but what we rarely talk about is law. We have ways to police and prevent. Those are being catagorically ignored. Innovation + disruption = HOORAY. But wait. Lots of experimentation, transparency, accountability that needs to go into this practice area.


  • Danna: Big gap: understanding legal framework, understanding how to act within the law as well as "ethically" adn "responsibly". Looking across legal jurisdictions becomes difficult.
  • Jen: Understanding and knowing the legal frameworks are super important. Staff face being pulled in so many directions! Legal frameworks are 45 pages long. How do I take something, hand it off to the team to bring it back, make it practical. Make it feel more feasible to integrate it. Open up the hood - we don't have a good playbook, but take the first steps to do so.
  • John: did work with World Bank on GeoNode. Including a legal person rarely happens. Learned that lesson the hard way. - leading research into UAVs, etc. Having lawyers into this function is something that enables the tech to find a lane that is possible to do, and legal. Other framing: lawyers need to know exact use case and how it will be applied in practice. Many technologies - eg. release of CDRs - don't get narrowed down enough. We need to be able to explain to a lawyer how exactly the data will contribute to a specific function (eg. contract tracing to send people a message on a phone?) - the modality matters.
  • Tim: Two things pushing the groups to ignore the law. "Disruption" and state of emergency. The culture of disruption. See also Can innovators build a future that is both disruptive and just?
idea that law is hard - doesn't mean that it's slow (or not useful!). Orgs competing for resources - looking to say "we're the go to org" for the next project, so it's imperative to come out well. Practicalities of laying out that we're getting resource X and it's going to lead to outcome Y - having a realistic notion that we're going to use things for this particular use. Purpose specificity isn't new but it's just as important.
  • Zara: The rush to do something, and the blind faith in "more data is better" in a situation where it's hard to argue against it (like Ebola). You're not going to make it worse, so... Implementing agencies versus just doing it with what you've got.
  • Willow: Lots of people who have been working on data sharing etc for YEARS. Worried that this will become a feather in the hat of people pushing against 'data sharing'

What did implementing in the Ebola context look like? Where and why were exceptions made?

If you look at individual medical records, quality metrics, it's decades of HIPPA policies and training. Patient-provider level. Policy, privacy, people. People are the central part. Can see why it's so complicated, we still make mistakes. Decades of training at the national level.

Existing open data efforts

Many people have been working on the open data and coordination issues of disaster response for awhile. Why is this important? Are those efforts not enough? Are they incorrect? What's happening here? The notion of call details in general is terrifying.

Worked with them extensively at Global Pulse - element of creating a space to analyse CDR is usually related to analysing them for a specific purpose. Usually adding some kind of noise in the data to prevent re-identification (or to make it a bit more difficult) - and usually, if it's going to be combined with other data sets, there's a review: how do we protect the people reflected in those datasets.

That's just for research. For epidemiology, we have to re-identify people for contact tracing. Makes the datasets radioactive from day one. Not something that will ever be shared. There are other mechanisms to protect the individual, the data, the country or operators with the data. How do we create the frameworks to create the specificly articulated tasks to notify folk they may be at risk. Can they chose, or is someone going to chose for them? Sharing CDRs with no anonymization is a non-starter in a

Important to note that anonymization is delicate protection - even in research as John notes. CDRs are some of the most reidentifiable data sets that exist, meaning that once they're released in the while it can be very difficult to limit their use, re-identification, and potential harm. From a legal perspective, there are very few protections for organizations or mobile network operators that share this data without user consent. In Liberia, even when mobile networks share data because they've been compelled by a government using emergency powers, they're not immune to prosecution.

Paul: two questions that come up for me are- what level the responsibility lies. Within NGO communities, very little legal expertise. especially in managing the data and what you do with it. Second: how to incorporate this into the work flow - policies and practices that we have are the result of years and years of training. Unlikely that we're going to get a system as robust as we like. So, if we assume that, how do we minimise what goes wrong - what does that look like from a legal/org perspective.

Legality of data sharing - not the ethics - often we're working in areas/regimes where government/'the powers that be' are looking for reasons to STOP the work that's going on. If UN/whoever, isn't abiding by those laws, it can give them fuel to restrict civil society actions. Not experimenting in a way that could, in the long run, clamp down.

Willow - I totally worry about this as well.

Also: some national laws are legitimately abrogated during emergencies - this could work in favour of / against responsible data management

Education focused on industry is focused on data siloes, combining your data to make more out of it. Then problems of anonymizing, what are the technical solutions? The idea of knowledge silos, have the data people not talking to the bio not talking to the law people.

Zara: People working in the field have low to zero awareness of these issues

John: Knowledge silos reflect organisational structures in field response

Signals flowing in bureaucracies is up and down channels - but most of what we need to do in these cases is more flexible, informal, ad hoc. Form base on personal relationships, and yet they're the main way that things happen. Right now we're working against hte grain of how orgs get stuff done.

How can we enable these informal channels? Have the data flow in legal mechanisms TO the problem solvers. Ideally in real time - not a small problem!

Knowledge silos

Data literacy? What does that mean for people in these contexts? (what do they think of as data/what not?) A certain data type might mean something to someone at the humanitarian level, something totally different to someone at the village level. (eg. GPS coordinates). In the will to data share, the process of data sharing and the people, can get lost. Understanding people, process + environment. Between agencies, local govt, national government, etc.

There are a bunch of things which have been brought up. Specifically to silos, the informal mechanisms John talked about were really pervasive in Ebola response in Liberia. Social groups or friendly orgs determining who was participating in sharing. Rarely did that data make it back to the ministry of health. No culture of reporting. Even when there was, the format, commonality of definitions, etc made it very difficult to decipher or to get value out of the data that made it back.

On the paper, two points to touch on

  1. There are a lot of data literate people in humanitarian orgs - even if people don't have the right terminology, they know that the things they are doing are somewhat concerning. To operationally manage this data, you'd have to fundamentally re-organise the way an org manages their data (??) - even moving from one territorial jurisdiction to another, is a violation of a two types of data law (? -- @Sean please check this!)
  2. Silos can be damaging when they're barriers to skills. But what data are you sharing, for what purpose? Who needs to see what? That kind of mapping very rarely gets done at the outset. Coordination has been an endemic problem of disaster response for a long time - data has potential to make that harder.

Paul: capacity of 'clusters' is completely variable & can't be predicted. Informality of coordination: useful coordination is built out of informal coordination, not the formal structures.

Informal / formal sharing standards

John: HXL: http://hxlstandard.org/ is an acknowledgement that there will be no formal data standard to describe how we're all supposed to work. Let's describe how we all do it now, and create a mechanism to help us aggregate it a bit. Simple idea - put another # on columns you want to share (#s allow you to specify what that data type is), and the data that doesn't have a # doesn't get shared.

Share, using the common tool of a spreadsheet - which humanitarians generally use, over email. HXL generally dealing with aggregate numbers, less ~individual/privacy

What next??

Sean: - a much more public, less technical, review of what are the component factors that make different algorithms/different data models, applicable to different things. eg. Flowminder, was also one of the most cited bodies of work for why we should be releasing CDRs, but all of the examples used were based on migration, where you can build probablistic models much easier than you can for eg. Ebola. There are commonalities between the factors of diseases that make a big difference. Quality control is a big, open, public interest area. Starting to see eg. Open AI (not the right answer for this community!) - take a step towards that. Having a public discussion about what makes a data model 'good', applicable - what are the bounds, what are the factors, being vocal about there being legal risks. Ethics are absolutely important, but if we're not talking about the legal risks, we're trading on the legal disempowerment of the people we are working to serve.

Harmonising data systems particularly in health systems, with humanitarian systems - investment is hard to find, but easy to do.

One fundamental tension is that WHO doesn't have emergency funds. When there is an emergency they have to respond to, they are dependent entirely on donations - so, they are subject to all of the fundraising & practical concerns around making sure the whole world know a disaster is happening. Understanding humanitarian response as a market, is another conversation we should be having more publicly.

Q and A

Q: Is there guidance on how to assess whether big data is really going to answer your question?

A: connecting between data people/biology people.. as per Sean's answer above.

Q. Is there a risk assessment procedure that guides orgs to ....

A: There is, but Tim feels being in the data world rather than the humanitarian world -- people could be more open about. You have to go through, be working on an international scale, be more public about how we look at projects before we decide to take them on.
A: Opt-in data donor (like an organ donor) when you sign up for a phone plan. Solves consent problem, but not the demand problem. There are low-hanging fruits in this space.

Q: Would having some ethical guidelines and standards for are there any existing guidelines for responsible data collection, storage, dissemination, and destruction could be helpful?

A: there are at least a dozen frameworks which say 'this is how we should be handling personal/public/ data. They have varying levels of "toothiness".
A: there are a good principles, and almost no really good practice -- at least when you get into operational contexts. We have a definitional set of problems around what kind of data does work, what are the cost trade offs, etc. We're in one of those situations where there are a big, big ethical questions (necessity, limitation in use) - but they are all very difficult to assess at present, even more difficult to enforce. Practical guidance would go a long way.

Danna: consent is a place we go to mitigate risk, but it's really complicated, used to coat over issues. Yourself or an organization you're working with, what could a consent policy look like? How would it effect the way consent is given, taken away, etc? In humanitarian setting; are you able to actually get consent? If you're not, then we probably shouldn't be talking about consent. It's about the ethical - legal. Consent isn't always the right framework. Consent without the ability to enforce limitations of use is absolutely whitewash. If you're using data from providers and ICTs, if they didn't have a choice in using it (and thus consenting) At present, outside of research settings, there really isn't any means of protection or limitation Violation of consent is a great way to punish provable abuse - which I think is a useful first step but there aren't technical solutions, we're a long way from organizational solutions, and unfortunately the market (economic incentives) pulls everyone in the exact opposite direction

Q: There's data.un.org - but is there already an umbrella org who has both political clout, and substantive knowledge to be able to say - let's all come to the table and come up with something to at least set some outer bounds and resolve some of the hardest issues?

A: UN is not a monolith - the agencies have very different attitudes towards data? eg. UNHCR, DESA, OCHA, Global Pulse - their approach to managing it is fundamentally different. As an 'institution' it's not set up to answer the question in that way. Can it set a standard, or function as a mediating org? (in many countries, the UN is not perceived as neutral.) Work can't be pushed off to one 'mediator' organisation, it's going to require all of us to do it - and it's going to be a long hard slog!! Mechanisms for convening the UN are powerful, but the mechanisms for implementing and putting into practice are a little bit more work.
A: in some contexts, INGOs, local NGOs, local gov't offices - that part of the data sharing can be fraught with more challenges.

Closing statements

  • Danna: looking at the instruction. How do we make a legal analysis of this work, functional. Looking at a framework for legal + ethical data (not necessarily based in a consent framework) - is something I'd love to keep working on. Places where we can experiment a bit - we need to figure out different ways of working.
  • Jen: practical transformations of legal frameworks would be GREAT. netHope could bring to our 45 memebers. Want to talk about the practical aspecgts of this. We'll be setting up these practcal systems (for better or worse) in June at Humanitarian Tech Conference. We'll implement anyway.
  • John: Lots of practical steps that need to get taken: if we're dealing with CDRs (or other high sensitive data) - we need to work out ways of dealing with this data in compartmentalised ways. Those black boxes can't fully be black, we need to know the data has been protected. We need to talk with people who the data describes, and understand what we need to do in different contexts, in different legal regimes, different cultural contexts. Practical > theoretical. (doesn't jive well with very academic frameworks of international aid) - we need to take a different approach, get a lot closer to the ground.
  • Paul: John talking about coming from the ground up, that's a big task but part of the solution. From the top-down, it's a relatively simple landscape we've got. Limited number of orgs doing all this, limited units, limited people in those units. We could build that list (departments & individuals).
  • Sean: A thought from the future, and one from the past. Spoiler, both from the past. South Korea, a MERS outbreak meant they seized the CDRs for the whole country. Then imposed quarantine on 17k people. That messed up tourism and subsequent income. requiring a $9B recovery package. These things are happening. They have a real tangible life impact. Important that these next steps need to keep pace with practice in the other direction. 100 pieces of US legislation successfully ?drafted/introduced re chemicals being used in terrible ways, it was difficult to keep things in check and hold people accountable. Strong advocacy! Seeing everyone step up is exciting.
  • Tim: One of my favorite sayings about data from Deidre McCloskey "one of the problems with the discourse around data (Latin for "things given") when we should be talking about capta (things seized)." (e.g. from http://ejpe.org/pdf/7-2-art-4.pdf#page=2) Means different things to different people. Want to step back, take the data, make slow use of it. But that's not right for this. We've developed models for protecting research subjects, which works when it's "everyone," but we don't. (I didn't transpose that right).
  • Zara: So many people interested in this from so many different angles, baffling about where we can take this. Let's do events, calls. We might break this out into various working groups.


Questions to Ask Frequently (QAFs) when working with Data and Marginalised Communities http://www.fabriders.net/qafs/

See: responsible data wiki for consent framework guidance. https://wiki.responsibledata.io/Main_Page

Geeks Without Bounds did a disaster data lifecycle doc (PDF) - http://gwob.org/


Humanitarian Tech in Boston in June http://www.humanitariantechnology.org/

HumTechFest (sibling event) also in June! humtechfest@aspirationtech.org June 4th & 5th MIT Cambridge, MA


RightsCon Silicon Valley 3/30-4/1/2016 https://www.rightscon.org/

Where to from here?

  1. The Ebola context specifically. It seems like there's a great blend of first-person expertise and research interest (Stefaan, Chris?, me). This might be a book club kind of chat about the paper where people talk about their experience/research and ways that it compares and contrasts? If we can, it might be great to get someone to write a blog post on the back of this toward a RDF community history of the crisis?
  2. Domain expertise. There's interest in exploring the intersection (and/or divergence) of data analysis and privacy - particularly in health/humanitarian contexts. It sounds like there are a lot of people who are better versed on the comparative methodologies around this subject than I am, so I'm happy to join, but would love it if someone else could frame and organize this one.
  3. Strategic advocacy. This would focus on building an effective advocacy movement to channel current momentum into safer practice. Again, a list full of experts, so my preference would be to start with framing the pressures, markets, and institutional actors involved, and then maybe explore points of leverage/influence, and see whether there's a critical mass of folks interested in taking next steps. I'd love to help lead on this one, through there are a lot of experts here, so I'm sure it'd be a group effort.
  4. Clear legal questions. Lawyers all over that would be likely willing to engage with us. Want a followup call around that. Danna is into it! Basic overview of legal frameworks for responders, as per JenC's request. Theories of litigation. We hear one hook, but we don't get to explore that hook so we get stuck. Each juristiction is going to be SO different.

Glossary of terms and acronyms

  • CDR : Call Detail Records are a blanket term that refers to the data that mobile network operators collect in the course of providing telecommunications services. Although practice varies between operators, in order to both provide services and present billing statements, network operators store a customer’s name, address, billing status, account identifiers, payment information, log of incoming and outgoing phone calls, phone numbers attached to each call, length of each call, number of text messages sent and received, log of data use (including web sites visited, information transferred, apps used), and, where applicable, mobile money transactions history. Many mobile network operators record and store more information, but the above is typical. One other piece of information that mobile network operators require is location data.
  • ETC : Ebola Treatment Center
  • PII : Personally Identifying Information
  • Predictive quarantine:
  • ICTs: Information and Communications Technology/ies, umbrella term for any communication device/application: radio, television, cell phones, computer/ network hardware/software, satellite systems