People — not Data — drive Data for Good

Data Science for Social Good banner

Editor’s Note: This article originally appeared on the Cloudera Foundation websiteIn April 2021, the Cloudera Foundation merged with the Patrick J. McGovern Foundation.

by Claudia Juech
May 16, 2019

At last week’s Data Science for Social Good Workshop, the speakers shared some great examples of how expanded availability of data and improved analytical capabilities have helped make progress on issues ranging from addressing the spread of disease and earthquake recovery, to countering fake news. Leo Ferres suggested how shopping malls could be used to foster social inclusion based on an example from Santiago de Chile. There, the team had analysed close to 400,000 phone records of mall customers to understand whether malls with higher social mixing attract more people. Ting Wei-Lin spoke on how the maturity of sensor technology has opened up the opportunity for community based air quality monitoring. The AirBox project  has encouraged citizens to install more than 4,000 sensors to measure fine particulate matter, an air pollutant that at high levels can cause respiratory discomfort and diseases.

The value new data sources and data science approaches have to offer to the development sector is indisputable. Much is still needed to realize that value, such as the ability to enforce data ethics that protect the rights of vulnerable populations, and improved access to proprietary private sector data for nonprofits and development actors—a challenge that ‘data collaboratives’ seek to address.

However, what the day’s discussions brought home for me was that while good data science and access to diverse data sets are important, using data for impact requires more. It requires focusing on solving data questions that are core to the complex problems we want to tackle. Without clearly defining the problem upfront data science becomes a solution in search of a problem.

While there are many methodologically and technically exciting applications, we must always ask whether the use of data adds critical value. For example, does the use of Facebook photos to help identify snake species make a real difference in reducing the number of deaths from snakebites? To what extent does the survival of someone bitten in a rural village depend on knowing the type of snake instead of the distance to the nearest doctor or the availability of antivenom? These types of questions won’t always be easy to answer (and I’m curious to hear where you fall on the snakebite example).

It also requires the improved capacity of organizations, governments, and people to make decisions informed by data, (i.e. who are the users, the decision makers, the most vulnerable people, etc.). Building on the snakebite example from above, who benefits most from being able to access or add to that photo set? In rural areas, internet connections may be unstable or totally unavailable, so how do we ensure that doctors using this data are able to access it and translate it into action for their patient?

Finally, it requires the recognition that the barriers to a more advanced use of data (within ethical boundaries) are largely cultural. Nonprofit leaders need to think of the use of data for impact as a change management process that will fail without executive support, commitment from across the organization, and a deliberate end-to-end approach (from collection to decision and action).

Claudia Juech is the Vice President of the Data and Society program.