Data in the project lifecycle

From Responsible Data Wiki
Jump to: navigation, search

How are responsible data considerations distributed along different steps of the project lifecycle

Result of the Responsible Data Forum Resource Sprint, Oct 2014, Budapest


A lightweight tool for data project implementers to help plan their projects and check whether they are being responsible with their data at every step, both for planned and emerging risks.

The pipeline was designed for civic data projects with some social mission. We specifically excluded projects run by governments and commercial projects - though it would be interesting to also try it with those.

Journalism was an interesting source of debate - in the end we decided that there was two sorts of journalism:

1. Journalism by conscientious journalists - who want to change the world with their work. 2. Journalism designed to sell newspapers.

We figured that the second might be a bit of a lost cause in terms of uptake - so we bore in mind only projects which we thought fitted into the first category.

Connection to previous RDFs

RDF NYC Primer for Responsible Data for Development Practitioners:

Geeks without Bounds Whitepaper (RDF Oakland)


We separated the projects into three main phases, and multiple phases within those projects.


Getting to Go

  • Is there a theory of change?
    • Does that theory of change cover both utopian and distopian versions of reality?
  • Are there potential legal risks?
  • Have we done a stakeholder map and a power analysis map?
  • Are there clear user stories? (Note: believe that this is a different exercise to stakeholder mapping)
  • Have we specified how lessons learned will be shared?
  • Are we the best people to collect / manage / work with the data?
  • Who in the team will internalise data responsibility? Who will not?
    • Do we need to give anyone any training?
  • Have we specified a retirement plan? OR Is our plan to live forever realistic?
  • Are we targeting anyone specific with our project?
    • If 'yes' - what is the risk of collateral?


1. Define your question

  • Do I need to collect this data? (What is the minimum data I could collect?)
  • Who do I want to help / who could I harm?
  • Am I biased?
    • Would it be in my favour to presume a particular answer?
    • Does my question assume a particular answer?

2. (Option a) Collect / Create Data

  • Have I obtained consent? [Link to consent checklist]
  • Am I following the do no harm principle?
  • Am I using secure methods to collect & store data?
  • Is there anyone else who could use or benefit from the data?
  • In developing the data collection / acquisition methodology, have representativeness/ bias been taken into account?
  • Does collection strengthen digital literacy? How about digital autonomy?
  • Can a community benefit from participating in data collection?
  • What is my plan for disposing of the data?

2. (Option b) Find Data

  • Are my methods for finding data random or targeted?
  • Am I examining my methods for possible bias?

3. Store the data

  • Who ultimately owns the data?
  • Is my data secure in the storage?

4. Extract the data

  • Have I kept the original files?

5. Clean and transform the data

  • Am I sure that I understand the context and the structure of the data?
    • Will anything get lost or obscured?
  • Is the data to be shared with other parties? (Do I need to document it?)
  • Where will I store the data for further use? (See 3.)

6. Verify Data

  • Is there a way to cross check the data?
  • Do you trust / understand the methodology of the data collection? (particularly if you are working with data collected by others)
    • Do you know who collected the data?
    • Do you know how they collected it and what assumptions they were making?
  • Are you an expert in the subject matter or do you need help with verification?
    • Are the subjects of the data research inputting into verification?

7. Data Analysis

  • Are you a subject matter expert or data scientist?
  • Are you trying to proof your thesis or do you look at your data objectively?
  • Can you find a way to look at data from a completely new perspective?
  • What type of analysis are you engaging:
    • descriptive
    • exploratory
    • predictive
    • prescriptive
  • Has the community been consulted in your analysation?
  • Are results show correlations or cause effect?
    • How do you know that?

Data After Life

Meta Zone

DISCLAIMER: This pipeline may be reductive / more useful in some contexts than others. For cases where we are not yet sure whether the methodology will work, or for cases for which the methodology was not designed - we have created a parking lot below - so that we can work out later whether we need to tweak the methodology or not.

Free Parking

The following items were 'parking lotted' - as we have not yet tested whether they would be caught out by this pipeline. This section also includes issues with the pipeline which people flagged during the review stage, which we are working on reviewing - please add yours here!

Categories of projects not tested
  • 'Big Data Projects' (yes, we know Big Data is not a thing - but the question was raised)
  • [Add yours here]
Outstanding issues to address with the pipeline
  • Where are the loops in the timeline? Project not so linear...
  • Concerning store, extract, clean:
    • You could be storing data as you collect it - cleaning might come before extracting
    • Storing analogue & digital materials - how does that affect the timeline?
    • What about storing diff raw data and cleaned in different locations - such as preservation vs accessible (or data to be released)


This is for the implementers of a data project. (We considered breaking down specific roles and defining who should be asking which question when - but ran out of time).

Next steps

  • Define who should be asking each question
  • Define whether there are any checkpoints / gates for projects run by NGOs, volunteers or lone wolves. Projects with / without funders.


Food for thought

  • concepts, problems
  • questions to ask frequently
  • preventions: what do you actually do in concrete terms to prevent these things from happening
  • reactions: responsible responses for when things go wrong

Resources (we <3 links!)

Feel free to link any and all background material, additional info, useful resources, etc. The more the merrier!