Empowering global insights with geospatial data
Editor’s Note: This article was originally published in AidData’s Blog: The First Tranche and the Cloudera Foundation website. In April 2021, the Cloudera Foundation merged with the Patrick J. McGovern Foundation and we are continuing to work with AidData.
by Soren Patterson, Parker Kim
Feb. 24, 2021
AidData’s new video series helps you get started with GeoQuery, our platform for creating customized geospatial datasets
Three years ago, an article ran in Forbes with an eye-catching headline: “Geospatial Analytics Will Eat the World, and You Won’t Even Know It.” The title was not hyperbole. Today, terabytes upon terabytes of data rich with location information are being unlocked from a dizzying array of sources, from satellites to surveys, and the value of that data—and the promise of the challenges it could solve—is only growing. In 2012, geospatial applications for agriculture and construction exceeded revenues of $1.4 trillion and cost savings of $1.6 trillion in the US alone. Since then, those figures have more than doubled.
Knowing who is doing what, where, and when is powerful, which is why an increasingly diverse range of groups, including nonprofits and grassroots organizations in the Global South, are looking to use geospatial data to produce meaningful insights that may help solve some of the world’s toughest development challenges. But organizations without significant computing power or data science expertise often don’t have the capacity to collect and process their own geospatial data, let alone find the right external data amidst a firehose of information.
Our new video series on GeoQuery, AidData’s flagship platform for creating customized geospatial datasets, aims to bridge this gap. GeoQuery is a free, ground-breaking tool that provides effortless access to open-source geospatial data in a simple format. Behind the scenes, it uses a High-Performance Computing Cluster to process terabytes of data, allowing anyone to find and aggregate dozens of high-quality datasets into a single spreadsheet. Users can select from over 70 satellite, economic, health, conflict, and other datasets that span decades and are available for over 195 countries and territories.
AidData researchers first created GeoQuery as an internal tool to help us wrangle massive amounts of geospatial data for our impact evaluations of development programs that spanned continents. Realizing what a powerful public good it could be, we made GeoQuery free and open source in perpetuity. And thanks to generous funding from our partner, the Cloudera Foundation, and foundational investments from William & Mary, USAID, and the William & Flora Hewlett Foundation, we’ve been able to improve GeoQuery’s infrastructure, enhance accessibility, and make more datasets available.
Since launching in 2017, GeoQuery has fulfilled more than 20,000 requests for customized datasets from users at over 1000 organizations. We’ve now produced a short video series that explores GeoQuery and its features for new users. We hope to reach a wider range of users, from large international organizations to small grassroots nonprofits, who are curious about the potential for geospatial data, but may not be familiar with how to use it or how it can help their work.
Why use geospatial data?
Geospatial data is everywhere; ubiquitous yet overlooked, it powers everything from the UberEats app you use to order pizza to the weather radar systems NASA uses to plan rocket launches. The “great enabler,” geospatial data—or simply data that has location information associated, like an address or coordinates—unlocks insights that would otherwise remain uncovered. Placing data on a map unveils important relationships between measured observations and geographic features such as cities, roads, and water. What’s more, statistical analyses of these spatial relationships can help find patterns in events that seem random to the human eye.
Many organizations do not currently use geospatial data but could greatly benefit from it, especially nonprofits that could use it to better understand who might benefit from their services or which services need to be adjusted. Geospatial data enables these groups to explore the relationships between their projects and impacts on the ground or identify precisely where help is most needed—ultimately making better, more evidence-based decisions.
In the video below, we explore some specific use cases of how international development organizations have derived insights from the kinds of granular geospatial data that users can get from GeoQuery. One example is how the Institute of Health Metrics and Evaluation (IHME) combined disparate data sources—including health surveys, agricultural and weather reports, satellite imagery of light pollution, and conflict reports—into a single measure to track the progress of African countries on the World Health Organization’s global nutritional targets for 2025. With this data, a country like Kenya, which had very uneven progress subnationally, was able to pinpoint exactly where child malnutrition was worst and adjust its focus to areas most in need.
How does GeoQuery help unlock the potential of geospatial data?
GeoQuery saves time and lowers the barrier to working with geospatial data for first-time users by handling the processing of large-scale geospatial datasets. GeoQuery allows users to choose fully customizable geographic boundaries (administrative zones, urban areas, etc.) and then uses super-computing capabilities to aggregate massive amounts of geospatial data to those geographic boundaries and get zonal statistics (measurements like the sum, average, min and max values)—a process that would otherwise require custom code or GIS software, as well as computing power that far exceeds that of the average laptop.
Downloading data from GeoQuery is simple, as shown in the step-by-step tutorial below. First, you select a country and then a geographic boundary type. GeoQuery contains state-level, district-level and municipality-level boundaries for almost every country in the world, as well as boundaries for 200 major cities. These are provided through geoBoundaries, which is the world’s largest open-source, free, and research-ready database of political administrative boundaries. A project of geoLab, the geoBoundaries dataset currently tracks 300,000 boundaries across over 195 countries and territories.
Once you choose a geographic boundary, you can then select from a list of available datasets. You can add multiple datasets and variables of interest, all in one request. The result is a simple spreadsheet, where each column is a variable (e.g., mean rainfall in 2010) and each row is a geographic area (e.g., districts in Kenya). With every request you make, you receive an email with an easy-to-use package containing your spreadsheet of data, documentation, and a permanent page you can use to access and share your data forever.
Going forward, we’ll continue to release content exploring how those new to geospatial data can incorporate GeoQuery into their workflows and use the data from GeoQuery to achieve their goals. We’ll also be hosting a workshop at NYC OpenDataWeek on March 9th at 5:30pm EST that will dive deeper into innovative sources and applications of open geospatial data at a global scale and how GeoQuery can be used to visualize data and produce meaningful insights. The event will be recorded and made available online, and you can register at: https://rebrand.ly/odw21_aiddata_event.
Soren Patterson is AidData’s Communications Associate, working to increase the lab’s reach and visibility by publicizing and promoting its unique activities. He writes for and edits AidData’s blog, The First Tranche, and manages its social media and web presence. He holds a BA in Government with a secondary in East Asian Studies from Harvard University.
Parker Kim is a Communications Associate at AidData. They design and create innovative graphic assets in support of the Policy Analysis Unit’s research, and provide design support for the unit’s publications. They also work to create cutting-edge design for AidData’s website.