Recent years have witnessed considerable enthusiasm over the opportunities offered by open data. Across sectors, it is widely believed today that we are entering a new era of information openness and transparency, and that this has the potential to spur economic innovation, social transformation, and fresh forms of political and government accountability. Focusing just on economic impacts, in 2013, for example, the consulting firm McKinsey estimated the possible global value of open data to be more than $3 trillion per year.1 A study commissioned by Omidyar Network has likewise calculated that open data could result in an extra $13 trillion over five years in the output of G20 nations.2
Yet despite the evident potential of open data, and despite the growing amounts of information being released by governments and corporations, little is actually known about its use and impact. What kind of social and economic transformations has open data brought about, and what transformations might it effect in the future? How—and under what circumstances—has it been most effective? How have open-data practitioners mitigated risks (e.g., to privacy) while maximizing social good?
As long as such questions remain unanswered, the field risks suffering from something of a mismatch between the supply (or availability) of data and its actual demand (and subsequent use). This mismatch limits the impact of open data and inhibits its ability to produce social, economic, political, cultural, and environmental change. This report begins from the premise that in order to fully grasp the opportunities offered by open data, a more full and nuanced understanding of its workings is necessary.
Our knowledge of how and when open data actually works in practice is lacking because there have been so few systematic studies of its actual effect and workings. The field is dominated by conjectural estimates of open data’s hypothetical influence; those attempts that have been made to study concrete, real-world examples are often anecdotal or suffer from a paucity of information. In this report, we seek to build a more systematic study of open data and its effect by rigorously examining 19 case studies from around the world. These case studies are chosen for their geographic and sectoral representativeness. They are built not simply from secondary sources (e.g., by rehashing news reports) but from extensive interviews with key actors and protagonists who possess valuable and thus far untapped on-the-ground knowledge. They go beyond the descriptive (what happened) to the explanatory (why it happened, and what is the wider relevance or impact).
To provide these explanations, we have assembled an analytical framework that applies across the 19 case studies and lets us present some more widely applicable principles for the use and impact of open data. Impact—a better understanding of how and when open data really works—is at the center of our research. Our framework seeks to establish a taxonomy of impact for open-data initiatives, outlining various dimensions (from improving government to creating economic opportunities) in which open data has been effective. In addition, the framework lays out some key conditions that enable impact, as well as some challenges faced by open-data projects.
It is useful to begin with an understanding of what we mean by open data. Like many technical terms, open data is a contested concept. There exists no single, universally accepted definition. The GovLab recently undertook an analysis of competing meanings, with a view to reaching a working definition. The Appendix contains nine widely used definitions and our matrix of analysis.
Based on this matrix, we reached the following working definition, which guides our research and discussion throughout this report:
Open data is publicly available data that can be universally and readily accessed, used, and redistributed free of charge. It is structured for usability and computability.
It is important to recognize that this is a somewhat idealized version of open data. In truth, few forms of data possess all the attributes included in this definition. The openness of data exists on a continuum, and although many forms of information we discuss here might not be strictly open in the sense just described, they can nonetheless be shareable, usable by third parties, and capable of effecting wide-scale transformation. The 19 case studies included here therefore include a variety of different kinds of data, each of which is open in a different way, and to a different degree. Here are some examples:
Brazil’s Open Budget Transparency Portal is an example of the most “traditional” type of open-data project: a downloadable set of open government data accessible to the public.
Mexico’s Mejora Tu Escuela is the result of a nongovernmental organization compiling and presenting data (including open government data) in easily digestible forms.
The Global Positioning System (GPS) is arguably not an “open data” system at all, but rather a means for providing access to a government-operated signal.
The United Kingdom Ordnance Survey offers a combination of free and paid spatial data, suggesting the possibilities (and limitations) of a mixed model of open and closed data.
In each of these cases, “open” has different meanings and connotations. Many—but not all—of the cases, however, demonstrate the importance of shared and disseminated information, and highlight open data’s potential to enhance the social, economic, cultural, and political dimensions of our lives.
To select our case studies, we undertook a multistep process that involved several variables and considerations. To begin with, we examined existing repositories of open-data cases and examples in order to develop an initial universe of known open-data projects (see http://odimpact.org/resources.html). This initial scan of existing examples allowed us to identify gaps in representation—those sectors or geographies that often remain underrepresented in existing descriptions of open data and its effect (or lack thereof). To fill in some of these gaps (and more generally widen our list of case study candidates), we also reached out to a number of experts in relevant subject areas; for example, open data, open governance, civic technology, and other related fields. We also attended and conducted outreach at a number of open-data-related events, notably the 2015 International Open Data Conference in Ottawa, Canada and ConDatos in Santiago, Chile.
Based on this process, we identified a long list of approximately 50 case studies from around the world. These included examples from the private sector, civil society, and government, and spanned the spectrum of openness just mentioned. The next step was to conduct a certain amount of preliminary research to arrive at our final list of 19 case studies. To do this, we took into account several factors: the availability and type of evidence in existence; the need for sectoral and geographic representativeness; and the type of impact demonstrated by the case study in question (if any). We also considered whether previous, detailed case studies existed; as much as possible, our goal was to develop case studies for previously unexplored and undocumented examples.
Having selected our 19 cases, we then began a process of more in-depth researching. This involved a combination of desk research (e.g., using existing media and other reports) and interviews (usually by telephone). For many of our examples, there existed very little existing research; the bulk—and certainly the most useful—of our evidence came from a series of in-depth interviews we conducted with key participants and observers who had been involved in our various cases.
Upon completing drafts of each case study, and in the spirit of openness that defines the field under examination, we open-sourced the peer review process for each case and this paper. Rather than sharing drafts only with a select group of experts, we made our report and each of the case studies openly accessible for review in the interest of gaining broad input on our findings and collaboratively producing a common resource on open data’s effects for the field. Through broad outreach at events like the 2015 Open Government Partnership Summit in Mexico City, Mexico, and through social media, more than 50 individuals from around the world signed up to peer review at least one piece.
During the month-long open-peer-review process, more than two dozen of those who signed up shared their input as Recognized Peer Reviewers through in-line comments and in-depth responses to the ideas and evidence presented in this report. Additionally, each element of the report was made openly accessible to the public, allowing anyone to share suggestions, clarifications, notes on potential inaccuracies and any other useful input prior to publishing. Much of this input was integrated into the final version of this report.
The standalone impact case studies (see Parts II through V) include detailed descriptions and analyses of the initiatives listed later in the report. In addition, the following table summarizes their main features and key findings. Here, we include a brief summary of each example:
Brazil: Open Budget Transparency Portal
Impact: Tackling corruption and transparency
Description: A tool that aims to increase fiscal transparency of the Brazilian Federal Government through open government budget data. As the quality and quantity of data on the portal have improved over the past decade, the Transparency Portal is now one of the country’s primary anti-corruption tools, registering an average of 900,000 unique visitors each month. Local governments throughout Brazil and three other Latin American countries have modeled similar financial transparency initiatives after Brazil’s Transparency Portal.
Sector: Philanthropy and aid
Impact: Tackling corruption and transparency
Description: A data hub created by the Swedish Ministry of Foreign Affairs and the Swedish International Development Cooperation Agency (Sida) built on open government data. The website visualizes when, to whom, and why aid funding was paid out and what the results were. The reforms are seen to be an important force for enhanced transparency and accountability in development cooperation at an international level and increased cooperation and involvement of more actors in Swedish development policy.
Slovakia: Open contracting projects
Sector: Public sector
Impact: Tackling corruption and transparency
Description: In January 2011, Slovakia introduced a regime of unprecedented openness, requiring that all documents related to public procurement (including receipts and contracts) be published online, and making the validity of public contracts contingent on their publication. More than two million contracts have now been posted online, and these reforms appear to have had a dramatic effect on both corruption and, equally important for the business climate, perceptions of corruption.
Indonesia: Kawal Pemilu
Sector: Politics and elections
Impact: Tackling corruption and transparency
Description: A platform launched in the immediate aftermath of the contentious 2014 Indonesian presidential elections. Kawal Pemilu’s organizers assembled a team of more than 700 volunteers to compare official vote tallies with the original tabulations from polling stations and to digitize the often handwritten forms, making the data more legible and accessible. Assembled in a mere two days, with a total budget of just $54, the platform enabled citizen participation in monitoring the election results, increased public trust in official tallies, and helped ease an important democratic transition.
Denmark: consolidation and sharing of address data
Sector: Geospatial services
Impact: Improving services
Description: In 2005, the Building and Dwelling Register of Denmark started to release its address data to the public free of charge. Prior to that date, each municipality charged a separate fee for access, rendering the data practically inaccessible. There were also significant discrepancies between the content held across different databases. A follow-up study commissioned by the Danish government estimated the direct financial benefits alone for the period 2005–2009 at €62 million, at a cost of only €2 million.
Canada: T3010 charity information return data
Sector: Philanthropy and aid
Impact: Improving services
Description: In 2013, the Charities Directorate of the Canada Revenue Agency (CRA) opened all T3010 Registered Charity Information Return data since 2000 via the government’s data portal under a commercial open-data license. The resulting data set has been used to explore the state of the nonprofit sector, improve advocacy by creating a common understanding between regulators and charities, and create intelligence products for donors, fundraisers and grant-makers.
Tanzania: Shule and Education Open Data Dashboard
Impact: Social mobilization
Description: Two recently established portals providing the public with more data on examination pass rates and other information related to school performance in Tanzania. Education Open Data Dashboard is a project established by the Tanzania Open Data Initiative; Shule was spearheaded by Arnold Minde, a programmer, entrepreneur, and open-data enthusiast. Despite the challenges posed by Tanzania’s low Internet penetration rates, these sites are slowly changing the way citizens access information and make decisions. They are encouraging citizens to demand greater accountability from their school system and public officials.
Kenya: Open Duka
Sector: Public sector
Impact: Informed decision-making
Description: A platform developed by the civil society organization, the Open Institute, that aims to address issues of opacity in governance in the private and public sectors, promoting corporate accountability and transparency by providing citizens, journalists, and civic activists with insight into the relationships, connections (and, to some extent, the dynamics) of those in and around the public arena. As a case study, it exemplifies the challenge for open-data initiatives to generate sufficient awareness and use necessary methods to achieve impact.
Mexico: Mejora Tu Escuela
Impact: Informed decision-making
Description: A platform created by the Mexican Institute for Competitiveness (IMCO) that provides citizens with information about school performance. It helps parents choose the best option for their children, empowers them to demand higher-quality education, and gives them tools to get involved in their children’s schooling. It also provides school administrators, policymakers, and NGOs with data to identify hotbeds of corruption and areas requiring improvement. Data available on the site was used in a report that uncovered widespread corruption in the Mexican education system and stirred national outrage.
Uruguay: A Tu Servicio
Impact: Informed decision-making
Description: A platform that lets users select their location and then compare local health care providers based on a wide range of parameters and indicators, such as facility type, medical specialty, care goals, wait times and patient rights. A Tu Servicio has introduced a new paradigm of patient choice into Uruguay’s health care sector, enabling citizens not only to navigate through a range of options but also generating a healthy and informed debate on how more generally to improve the country’s health care sector.
Great Britain’s Ordnance Survey
Sector: Geospatial services
Impact: Economic growth
Description: Data from Ordnance Survey (OS), Britain’s mapping agency, supports essentially any UK industry or activity that uses a map: urban planning, real estate development, environmental science, utilities, retail, and much more. OS is required to be self-financing and, despite the launch of its OS OpenData platform in 2010, uses a mixed-cost model, with some data open and some data paid. OS OpenData products are estimated to deliver between a net £13 million to £28.5 million increase in GDP over its first 5 years.
United States: New York City Business Atlas
Impact: Economic growth
Description: Developed by the Mayor’s Office of Data Analytics (MODA), the Business Atlas is a platform designed to alleviate the market research information gap between small and large businesses in New York City. The tool provides small businesses with access to high-quality data on the economic conditions in a given neighborhood to help them decide where to establish a new business or expand an existing one.
US: NOAA: Opening up global weather data in collaboration with businesses
Impact: Economic growth
Description: Opening up weather data through NOAA has significantly lowered the economic and human costs of weather-related damage through forecasts; enabled the development of a multibillion dollar weather derivative financial industry dependent on seasonal data records; and catalyzed a growing million-dollar industry of tools and applications derived from NOAA’s real-time data.
US: Opening GPS data for civilian use
Sector: Geospatial services
Impact: Economic growth
Description: Over the past 20 years, Global Positioning System (GPS) technology has led to a proliferation of commercial applications across industries and sectors, including agriculture, construction, transportation, aerospace and—especially with the proliferation of portable devices—everyday life. Were the system to be somehow discontinued, losses are estimated to be $96 billion. In addition to creating new efficiencies and reducing operating costs, the adoption of GPS technology has improved safety, emergency response times and environmental quality, and has delivered many other less-readily quantifiable benefits.
Sierra Leone: Battling Ebola
Impact: Data-driven engagement
Description: In 2014, the largest Ebola outbreak in history occurred in West Africa. At the beginning, information on Ebola cases and response efforts was dispersed across a diversity of data collectors, and there was little ability to get relevant data into the hands of those who could make use of it. Three projects—Sierra Leone’s National Ebola Response Centre (NERC), the United Nations’ Humanitarian Data Exchange (HDX), and the Ebola GeoNode—significantly improved the quality and accessibility of information used by humanitarians and policymakers working to address the crisis.
New Zealand: Christchurch earthquake GIS clusters
Sector: Emergency services
Impact: Data-driven engagement
Description: In February 2011, Christchurch was struck by a severe earthquake that killed 185 people and caused significant disruption and damage to large portions of a city already weakened by an earlier earthquake. In the response to the quake, volunteers and officials at the recovery agencies used open data, open source tools, trusted data sharing, and crowdsourcing to develop a range of products and services required to respond successfully to emerging conditions, including a crowdsourced emergency information web app that generated 70,000 visits within the first 48 hours after the earthquake, among others.
Singapore: Dengue cluster map
Impact: Data-driven engagement
Description: In 2005, the Singapore National Environment Agency (NEA) began sharing information on the location of dengue clusters as well as disease information and preventive measures online through a website now commonly known as the “Dengue Website.” Since then, the NEA’s data-driven cluster map has evolved, and it became an integral part of the campaign against a dengue epidemic in 2013.
Sector: Politics and elections
Impact: Data-driven engagement
Description: A tool, launched anonymously in 2009, that provided detailed information on supporters of California’s Proposition 8, which sought to bar same-sex couples from marrying. The site collected information made public through state campaign finance disclosure laws and overlaid that information onto a Google map of the state. Users could find the names, approximate locations, amount donated, and, where available, employers of individuals who donated money to support Prop 8. Eightmaps demonstrates how the increased computability and reusability of open data could be acted upon in unexpected ways that not only create major privacy concerns for citizens, but could also lead to harassment and threats based on political disagreements.
US: Kennedy v the City of Zanesville
Impact: Data-driven assessment
Description: For more than 50 years, even though access to clean water from the City of Zanesville, Ohio, was available throughout the rest of Muskingum County, residents of a predominantly African American area of Zanesville were only able to use contaminated rainwater or drive to the nearest water tower. One of the key pieces of evidence used during the court case was a map derived from open data that showed significant correlation between the houses occupied by the white residents of Zanesville and the houses hooked up to the city water line. The case went in favor of the African-American plaintiffs, awarding them a $10.9 million settlement.
What lessons can we learn from these examples of open-data applications, platforms, and websites? In this and the following sections, we outline some overarching insights derived from our 19 case studies. First, we focus on impact. What is the effect of open data on people’s lives? What are the real, measurable, and tangible results of our case studies? And, just as important, who (which individuals, institutions, demographic groups) are most affected?
Determining impact requires taking certain nuances into account. In many cases, open-data projects show results in more than one dimension of impact. In addition, the effect of our case studies on people’s lives is often indirect (and thus somewhat more subtle), mediated by changes in the way decisions are made or other broad social, political, and economic factors. Nonetheless, despite these nuances, our analysis suggests that there exist four main ways in which open data is having an influence on people’s lives (Figure 1-1):
First, open data is improving government around the world. It is doing so in various ways, but in particular by, a) making governments more accountable, especially by helping tackle corruption and adding transparency to a host of government responsibilities and functions (notably budgeting), and, b) making government more efficient, especially by enhancing public services and resource allocation.
Improvements in governance are evident in six of our 19 case studies. Notable examples include the Brazil Open Budget Transparency Portal, which brings accountability and citizen oversight to the country’s budget processes; Slovakia’s Central Registry, which is a global model for the open-contracting movement; and Canada’s opening of tax return data submitted by charities, the first move in a broader global effort to increase the transparency and accountability of philanthropies.
Open data is empowering citizens to take control of their lives and demand change by enabling more informed decision-making and new forms of social mobilization, both in turn facilitated by new ways of communicating and accessing information.
This dimension of impact plays a role in four case studies. Some notable examples in this category include Uruguay’s A Tu Servicio, which empowers citizens to make more informed decisions about health care, and education dashboards in Mexico (Mejora Tu Escuela) and Tanzania (Shule and Education Open Data Dashboard), each of which enables parents to make more evidence-based decisions about their children’s schools.
Open data is creating new economic opportunities for citizens and organizations. Around the world, in cities and countries, greater transparency and more information are stimulating economic growth, opening up new sectors, and fostering innovation. In the process, open data is creating new jobs and new ways for citizens to prosper in the world.
This category of impact often follows from applications and platforms built using government data. It is evident in four of our case studies, each of which relies for its underlying data on information released by governments. Two notable examples include New York’s Business Atlas, which lets small businesses use data to identify the best neighborhoods in which to open or grow their companies; and the various platforms and companies built around data released by the National Oceanic and Atmospheric Administration (NOAA) in the US.
Finally, open data’s effect is evident in the way it is helping solve several big public problems, many of which have until recently seemed intractable. Although most of these problems have not been entirely solved or eliminated, we are finally seeing pathways to improvements. Through open data, citizens and policymakers can analyze societal problems in new ways and engage in new forms of data-driven assessment and engagement.
Open data has created notable impacts during public-health crises and other emergencies. In Sierra Leone, open data helped to inform the actions of people working on the ground to fight Ebola. The government and citizens of Singapore are using a Dengue Fever Cluster Map to try to limit the spread of dengue fever during outbreaks like that experienced in 2013. The efforts to rebuild following devastating earthquakes in Christchurch, New Zealand were also aided by open data. It is important to recognize, however, that attempts to solve problems can also have unintended consequences. We see this, for example, in the case of Eightmaps, where efforts to address discrimination and other issues unintentionally created new privacy (and even personal security) problems.
Although our initial analysis told us what types of change open data was creating, a further round of analysis was required to understand how change comes about. In examining open data projects around the world, we are struck by the wide variability in outcomes. Some work better than others, and some simply fail. Eightmaps is an example of how open data can lead to unintended consequences, but there are many, many more examples that the GovLab did not select for this group of case studies due to the lack of meaningful, measurable effect to date. Some projects do well in a particular dimension of success while failing in others. If we are to achieve the believed potential of open data and scale the impact of the individual case studies included here, we need a better, more granular understanding of the enabling conditions that lead to success.
Based on our research, we identified four key enabling conditions, each of which allows us to articulate a specific “premise” for success:
The power of collaboration was evident in many of the most successful open-data projects we studied. Effective projects were built not from the efforts of a single organization or government agency, but rather from partnerships across sectors and sometimes borders. Two forms of collaboration were particularly important: partnerships with civil society groups, which often played an important role in mobilizing and educating citizens; and partnerships with the media, which informed citizens and also played an invaluable role in analyzing and finding meaning in raw open data. In addition, we saw an important role played by so-called “data collaboratives,” which pooled data from different organizations and sectors.
Virtually all the case studies we examined were the products of some form of partnership. Uruguay’s A Tu Servicio was an important example of how civil society can work with government to craft more effective open-data initiatives. NOAA’s many offshoots and data initiatives are an equally important example of collaboration between the private and public sectors. New York City’s Business Atlas was similarly an illustration of a public-private partnership; its data set, built both from government and private-sector information (supplied by the company Placemeter), is an example of an effective data collaborative.
Premise 1: Intermediaries and data collaboratives allow for enhanced matching of supply and demand of data.
Several of the most effective projects we studied emerged on the back of what we might think of as an open data public infrastructure; that is, the technical backend and organizational processes necessary to enable the regular release of potentially impactful data to the public. In some cases, this infrastructure takes the form of an “open by default” system of government data generation and release. The team behind Kenya’s Open Duka, for example, is responding to its lack of impact to date by attempting to build such an infrastructure with county-level governments to improve the counties’ internal data capacity, improving the data available on Open Duka as a result.
An open-data public infrastructure does not, however, only involve technical competencies. As part of the push around Brazil’s Open Budget Transparency Portal, for example, organizers not only developed an interoperable infrastructure for publishing a wide variety of data formats, but also launched a culture-building campaign complete with workshops seeking to train public officials, citizens and reporters to create value from the open data.
Premise 2: Developing open data as a public infrastructure enables a broader impact across issues and sectors.
Another key determinant in the success of open-data projects is the existence of clear open data policies, including well-defined performance metrics. The need for clear policies (and more generally an enabling regulatory framework) is a reminder that technology does not exist in a vacuum. Policymakers and political leaders have an essential role to play in creating a flexible, forward-looking legal environment that, among other things, encourages the release of open data and technical innovation; and that spurs the creation of fora and mechanisms for project assessment and accountability.
In addition, high-level political buy-in is also critical. It is not sufficient simply to pass enabling laws that look good on paper. Policymakers and politicians must also ensure that the letter of the law is followed, that vested interests are adequately combated, and that there are consequences for working against openness and transparency.
Among the many case studies that benefited from a conducive policy environment, a few stand out. In Mexico, we can see how an open-data initiative (in this case, the Mejora Tu Escuela project) can benefit from high-level government commitments to opening data that trickles down to—and empowers—local and regional governments. Slovakia’s Central Registry is another good example; it shows how laws can be redesigned, in this case to encourage transparency by default in contracting, and in the process greatly increase openness. The openness of GPS, though ingrained in daily life for many, was the subject of questions following the terrorist attacks of September 11, 2001; those questions were put to rest with the enactment of a new policy commitment in 2004 to maintain unfettered global access to the geospatial system.
Premise 3: Clear policies regarding open data, including those promoting regular assessments of open-data projects, provide the necessary conditions for success.
We have repeatedly seen how the most successful open-data projects are those that address a well-defined problem or issue. It is very challenging for open-data projects to try to change user behavior or convince citizens of a previously unfelt need. Effective projects identify an existing—ideally widely recognized—need, and provide new solutions or efficiencies to address that need.
Singapore’s Dengue Fever Cluster Map is a good example in this regard. Its core area of activity (public health) has clear, tangible benefits; it seeks to limit the spread of an illness that policymakers widely recognize as a problem, and that citizens dread. Uruguay’s A Tu Servicio is another good example: it provides clear, tangible benefits to citizens, giving them the means to take action that improves their health care. It is perhaps no coincidence that both of these examples are in the health sector: The most successful projects often touch on the most basic human needs (health, pocketbook needs, etc.). In a case involving one of the most essential human needs, the use of open data in Kennedy versus the City of Zanesville accomplished its singular goal: demonstrating beyond a reasonable doubt that water access decisions were being made on the basis of citizens’ race.
Premise 4: Open data initiatives that have a clear target or problem definition have more effect.
The success of a project is also determined by the obstacles and challenges it confronts. The challenges are themselves the function of numerous social, economic, and political variables. In addition, some regions might face more obstacles than others.
As with the enabling conditions, we found widespread geographic and sectoral variability in our analysis of challenges. Broadly, we identified four challenges that recurred the most frequently across our 19 case studies:
Perhaps unsurprisingly, countries or regions with overall low technical and human capacity or readiness often posed inhospitable environments for open-data projects. The lack of technical capacity could be indicated by several variables, including low Internet penetration rates, a wide digital divide, or overall poor technical literacy. In addition, technical readiness can also be indicated by the existence of a group of individuals or entities that are technically sophisticated and that believe in the transformative potential of technology, particularly of open data. Repeatedly, we have seen that such “data champions” or “technological evangelists” play a critical role in ensuring the success of projects.
Low technical capacity did not necessarily result in outright project “failures.” Rather, it often stunted the potential of projects, making them less impactful and successful than they could otherwise have been. In Tanzania, for instance, the Shule and Education Open Data Dashboard portals were limited by low Internet penetration rates and by a general low awareness about open data. Slovakia’s Central Registry was in many ways very successful; yet it, too, was restricted by a lack of technical capacity among government officials and others (particularly at the lower level). In these projects and others, we see that success is relative, and that even the most successful projects could be enhanced by greater attention to the overall technical environment or ecosystem.
Premise 5: The lack of readiness or capacity at both the supply and demand side of open data hampers its impact.
Success is also limited when projects are unresponsive to feedback and user needs. As we saw in the previous section, the most successful projects address a clear and well-defined need. A corollary to this finding is that project sponsors and administrators need to be attuned to user needs; they need to be flexible enough to recognize and adapt to what their users want.
For Sweden’s OpenAid project, for example, user experience was not a core priority at launch, and much of the information found on the site is too complex for most citizens to digest. Despite this high barrier to entry, the site only offers limited engagement opportunities—namely, a button for reporting bugs on the site. Moreover, project titles found on the site often contain cryptic terms interpretable only to those with close familiarity with the project at hand.
NOAA, on the other hand, has some of the most mature and wide-reaching open-data efforts in any of the cases studied here. But given that breadth, for the agency’s essential information to remain useful to the evolving needs of its users, an increased focus needs to be placed on customer analytics and user behaviors. The UK’s Ordnance Survey has very sophisticated user analytics and prioritizes customer satisfaction; however, the separation of OS OpenData from its other data sets and products is potentially limiting.
Premise 6: Open data could be significantly more impactful if its release would be complemented with a responsiveness to act upon insights generated.
A major challenge arises from the trade-offs between the potential of open data and the risks posed by privacy and security violations. These risks are inherent to any open-data project—by its very nature, greater transparency exists in tension with privacy and security. When an initiative fails to take steps to mitigate this tension, it risks not only harming its own prospects, but more broadly the reputation of open data in general.
Concerns about privacy and security dogged many of the projects we studied. In Brazil, more than 100 legal actions were brought against the Open Budget Transparency Portal when it inadvertently published the salaries of public servants. In New York, despite steps being taken to mitigate such harms, there has been concern that citizen privacy might be violated as cameras collect data for the project in public spaces.
Without question, the clearest example of open data leading to privacy concerns (and even outright violations) can be found in the Eightmaps case study, which used public campaign finance disclosure laws to publish various identifying information about and home addresses for donors to California’s Proposition 8, leading to instances of intimidation and harassment.
For all the very real—and legitimate—concerns, our case studies also show that the scope for privacy and security abuses can be mitigated. For example, NOAA stood out for its creation of a dedicated Cyber Security Division to address data security challenges when collecting and releasing data (the sole instance of such a dedicated division in our 19 case studies). Singapore, too, took proactive steps to anonymize data to protect the privacy of citizens. Addressing risks to privacy and security, though important, can also work against the goals of openness and transparency. For example, in the city of Zanesville, Ohio, security concerns have been raised (controversially) to restrict access to data that has proven essential in addressing decades-old civil rights violations. Such examples are an important reminder of the tensions that exist between openness and security/privacy, and of the need for careful, judicious policymaking to achieve a balance.
Premise 7: Open data does pose a certain set of risks, notably to privacy and security; a greater, more nuanced understanding of these risks will be necessary to address and mitigate them.
Finally, we found that inadequate resource allocation was one of the most common reasons for limited success or outright failure. Many of the projects we studied were “hackable”—that is, easily put together on a very limited budget, often created by idealistic volunteers. Indonesia’s Kawal Pemilu, for example, was assembled with a mere $54. Over time, though, projects require resources to succeed; although they might emerge on the backs of committed (and cheap) idealists, they are fleshed out and developed with real financial backing.
The limited success of Kenya’s Open Duka is a good example. Although the project was well conceived and based on a sound premise, it has been limited by the unanticipated effort involved in data collection. More resources would almost certainly have helped address this challenge. In addition, Mexico’s Mejora Tu Escuela is just one project that relies on foundation funding to operate, leading to some level of uncertainty about the long-term sustainability of such projects should any of those funding streams be discontinued. The UK’s Ordnance Survey, meanwhile, is required to be self-financing, forcing the agency to rely heavily on private sector customers paying to access the more sophisticated data products not included in OS OpenData.
Even an initiative as central and widely used as GPS experiences funding challenges. In a government climate focused on budget cuts at every corner, new features and capabilities, even for a “global public utility,” can be difficult to finance through public money.
Premise 8: Even though open-data projects can often be launched cheaply, those projects that receive generous, sustained, and committed funding have a better chance of success over the medium and long term.
The following is a compilation of our eight premises:
Intermediaries and data collaboratives allow for enhanced matching of supply and demand of data.
Developing open data as a public infrastructure enables a broader impact across issues and sectors.
Clear policies regarding open data, including those promoting regular assessments of open-data projects, provide the necessary conditions for success.
Open data initiatives that have a clear target or problem definition have more effect.
The lack of readiness or capacity at both the supply and demand side of open data hampers its impact.
Open data could be significantly more impactful if its release would be complemented with a responsiveness to act upon insights generated.
Open data does pose a certain set of risks, notably to privacy and security; a greater, more nuanced understanding of these risks will be necessary to address and mitigate them.
Even though open-data projects can often be launched cheaply, those projects that receive generous, sustained and committed funding have a better chance of success over the medium and long term.
Our case studies clearly indicate the tremendous potential and possibilities offered by open data. Around the world, open data has improved governments, empowered citizens, contributed solutions to complex public problems, and created new economic opportunities for companies, individuals, and nations.
But despite this clear potential, the hurdles are also apparent. We outlined several of the particular issues faced by open-data projects in the preceding sections. In addition to these specific challenges, there is the more general problem of scaling: How do we move beyond a “points of light” narrative that celebrates individual case studies to a broader narrative about the social, economic, and political transformation that could result from a far broader deployment of open data? In this section, we outline 10 steps or recommendations for policymakers, advocates, users, funders, and other stakeholders in the open-data community that we believe could usher in such wholesale transformation (Figure 1-2). For each step, we describe a few concrete methods of implementation—ways to translate the broader recommendation into meaningful impact.
Together, these 10 recommendations and their means of implementation amount to a Next Generation Open Data Roadmap. They let us better understand how the potential of open data can be fulfilled, across geographies, sectors, and demographics.
A core premise offered by our case studies is that the impact of open data is often dependent on how well the problem it seeks to address is defined and understood. It is therefore essential for open-data advocates and practitioners to clearly define their goals, the problem they are seeking to address, and the steps they plan to take. Here are some possibilities for how this focus can be achieved:
Set up a crowdsourced “Problem Inventory” to which users can contribute specific questions and answers, both of which can help define open-data projects. The UK Ordnance Survey’s GeoVation Hub is an interesting model focusing on the latter. It poses very specific questions (e.g., “How can we improve transport?” and “How can we feed Britain?”) for users to answer using OS OpenData.
Facilitate user-led design exercises to help define important public and social problems and how open data can help solve them.
To guide such exercises, it can be useful to establish “Problem and Data Definition toolkits”—potentially modeled on and informed by Freedom of Information requests—that help formulate clearly defined public issues and connect them with potentially useful open-data streams.
Large public problems are by definition cross-sectoral and interdisciplinary. They define boundaries and require a variety of expertise, knowledge, and data to be successfully addressed. It therefore stands to reason that the most successful open-data projects will similarly be collaborative and work across sectors and disciplines. Working in a collaborative manner can help draw on a diverse pool of talent and can also lead to innovative, out-of-the-box solutions. Perhaps most important, by allowing data users and data suppliers to work together and interact, collaborative approaches can improve the match between data demand and supply, thus enhancing the overall efficiency of the demand-use-impact value chain for open data.
Here are some pathways to achieving the required collaborative and cross-sectoral approaches:
Create data collaboratives to improve the efficiency and effectiveness of the demand-use-impact cycle. The value of data collaboratives is clearly illustrated by New Zealand’s Canterbury Earthquake Recovery Authority’s data sharing with construction companies, which is projected to deliver NZ$40 million in savings. In addition, NOAA’s Big Data Partnership, which formalized a sector partnership with five leading private-sector data and cloud technology companies, is also a good example.
Engage and nurture data intermediaries, especially from civil society, to help spread awareness and disseminate data (and their findings) more widely. Data intermediaries play a particularly important role in countries with low technical capacity (e.g., as is evident in our Tanzanian case study); they offer a vital link between technology and society, helping citizens maximize and make real, effective use of data in their everyday lives.
Too often, policymakers and decision-makers focus solely on opening up data, as if open data on its own provides a silver bullet for a society’s problems. In fact, as repeatedly evidenced in our case studies, data—in its raw form—needs to be supplemented by a host of other commitments: sustained and sustainable funding, skills training among those charged with data collection and use, and effective governance structures for every step of the data collection and use cycle. Approaching data in this broader, more holistic way means treating it as a vital form of public infrastructure. And this infrastructure is one that is at the heart of a society or nation, essential for its success, and embedded within wider social, economic, and political structures.
There are several steps policymakers can take to advance a “data-as-infrastructure” approach, including the following:
Developing a systems design and mapping methodology. Mapping the public and private sector data infrastructure as well as local, national, and global data infrastructures that can affect the value creation of open data is a first and necessary step to approach data as infrastructure. A systems map could enable the more targeted, coordinated, and collaborative development of open-data technical standards and best practices across sectors.
Embracing and implementing the Open Data Charter,3 which seeks to “foster greater coherence and collaboration” around open-data standards, practices, and, in particular, the following principles:
Open by default
Timely and comprehensive
Accessible and usable
Comparable and interoperable
Developed for improved governance and citizen engagement
Designed for inclusive development and innovation
Taking advantage of existing public infrastructure, such as libraries, schools, and other cultural and education institutions, so that data is more firmly embedded into other forms of public investment and public life. Open Referral, for example, is creating a data backend for the social safety net, allowing pilot partners, including libraries, to tap into a wide, interconnected range of potentially impactful data on civic and social services.
Developing skills and capacity around data collection, cleaning, and standardization to ensure better quality data is being released. This is especially important within agencies and organizations releasing data (to ensure its quality), but also, to the extent possible, within the community of users.
Viewing and treating open data as a public good, something to which citizens and taxpayers are entitled. Moving toward a view of open data as a public good requires as much of a cultural change as a policy change. As our case studies have repeatedly shown, the success of open data initiatives depends crucially on government stakeholders accepting that citizens, whether they be researchers, journalists, or just average individuals, have a right to demand access to government data.
Our research illustrates the vital enabling role played by a national legal and regulatory framework that supports open data. Well-articulated internal rules and priorities are equally important when the releasing entity is a company or other organization. In both cases, clarity is essential: open data thrives when there is an unambiguous commitment to its cause. Importantly, open-data policies should include provisions to measure the success (or otherwise) of an initiative; systems for measurement and assessment are vital to ensuring accountability.
There are several steps policymakers can take to ensure the necessary clarity of open data policies. These include the following:
Cocreating open-data policies with citizen and other groups, which can be an important way not only of drafting inclusive (and thus more legitimate) policies, but also of ensuring that policies are responsive to actual conditions and needs. Our research repeatedly shows that policies drafted without adequate public input and participation are less effective than those that draw on a wider range of experiences and expertise. Of course, attention must be paid to knowledge and power asymmetries involved in such cocreation processes.
Engaging the public in defining and monitoring metrics of success: citizen participation in measuring the results of open-data initiatives is as important as in drafting policies, and for the same reasons. It is a vital part of ensuring accountability and in enhancing the legitimacy and effectiveness of open-data projects.
Creating a “Metrics Bank” of important indicators, with input from stakeholders, researchers, and experts in the field. Such a Metrics Bank could be built around the variety of categories of open data’s effects, such as economic concerns (like return on investment or private sector economic revenues generated), public problem solutions (lives saved, increases in the efficiency of service delivery), and others. In line with the previous suggestion, the Metrics Bank should be reviewed on a regular basis by a citizens’ group or panel created specifically for that purpose.
Repeatedly, we have seen how open data initiatives are limited by a lack of capacity and preparedness among those who could potentially benefit most. Often, this manifests quite simply as a lack of awareness: those who do not know about the potential of open data are likely to use and benefit less from it. It is important to recognize that low capacity is a problem both on the demand side and supply side of the open-data value chain—policymakers and those tasked with releasing data are often as unprepared as intended beneficiaries.
Several steps can be taken to increase capacity and preparedness:
Set up coaching and training centers to teach policymakers and key stakeholders among citizens about the potential benefits and applications of open data. Brazil’s Open Budget Transparency Portal, for instance, benefited tremendously from TV campaigns and regular workshops designed to train citizens, reporters, and public officials on how to use the Open Budget Transparency Portal. In addition, a combined overview or searchable directory of coaching opportunities already in place and provided by, for instance, the GovLab Academy and the Open Data Institute, could enable easier navigation and matching of interests and needs worldwide.
Establish mentor and expert networks for those seeking to use open data. Such networks can serve as valuable resources, providing guidance on the optimal uses of open data and helping citizens and policymakers overcome hurdles or navigate obstacles.
Invest in and promote user-friendly data tools such as data visualizations and other analytic tools. Raw data can often be overwhelming for novice users; platforms and apps that include analytics and visualizations are often far more accessible. Notable examples from our case studies include the UK Ordnance Survey’s OS OpenMap, NYC’s Business Atlas, and Mexico’s Mejora Tu Escuela.
Use online and offline meet-ups and similar tools to create a culture that encourages knowledge sharing and collaboration. Many off-the-shelf tools already exist. if they are integrated within open-data initiatives or data labs—like the Justice Data Lab in the United Kingdom—they can provide a helpful online supplement to the types of training efforts and expert-mentor networks mentioned above.
As our case studies have shown, open data can be a force for good, but it is not without risks. Two of the most important risks involve potential violations of privacy and security that can result from widespread releases of data. Such risks were apparent in a number of our case studies, notably Eightmaps, Brazil’s Open Budget Transparency Portal, and New York’s Business Atlas. Mitigating such risks is essential not only for its inherent value, but also because privacy and security violations undermine trust in open data and, over the long run, limit its potential.
Several steps can be taken to mitigate risks:
Develop data governance “decision trees” to help decision-makers track the potential risks and opportunities around certain types of data releases. These decision trees can also help weigh the pros and cons and relative risks of data releases.
Create innovative, collaborative open-data risk-management frameworks so that governments and other institutions releasing data can draw on a clear, structured, step-by-step process to strategically respond to breaches of privacy, security or other risks. NOAA, for example, is working with outside experts to crowdsource new frameworks for data management.
Involve all stakeholders (including citizen groups) in developing data quality and risk standards. A participatory, collaborative approach to mitigating risks can build trust and help achieve the right balance between social goods like innovation, on the one hand, and risks like privacy and security, on the other hand. Crowdsourcing can be a valuable tool here, giving policymakers a way to solicit a wide range of responses from diverse stakeholder groups.
We have seen that public participation is essential in the drafting of open-data policies and in decisions about what data to release. It is equally important in understanding the impact of open data and in taking advantage of the opportunities it offers. For example, open data can generate insights that require government action; open data can likewise reveal inefficiencies that need concrete steps in order to be addressed. And as we have seen in the Brazilian case study on preventing government corruption, meaningful responsiveness requires the ability to take such steps and actions; what’s required are communities focused on problem solving, not simply on releasing data.
Meaningful responsiveness can be achieved through the following methods:
Develop open and online feedback mechanisms, including Q&As, ratings and feedback tools to gauge public opinion and solicit insights from citizens. For example, Denmark’s Open Address Initiative has a single portal for users to correct data errors across all agencies. Simplified mechanisms such as this help establish a virtuous open-data cycle, allowing open data to generate insights and ensuring meaningful action on those insights.
Designate an open-data ombudsman function to consistently track the usefulness of open data and whether necessary follow-up actions are being taken. This ombudsman should itself be open and transparent, and ideally include a wide range of stakeholder inputs.
As noted, open-data initiatives are often cheap to get off the ground, but require resources and investment over time. Goals such as increased participation and transparency are laudable, but without resource commitments, they might remain unachievable. Kenya’s Open Duka project is a good example of a laudable open-data initiative that has been limited by a lack of resources. Similarly, as of late 2015, Canada’s Open Charity Initiative T3010 has not been updated since its original 2013 release, in part due to a lack of funding. This means that anyone seeking recent data on Canadian charities must now scrape information independently.
Adequate resource allocations can be achieved by doing the following:
Implementing participatory budgeting initiatives, which let citizens choose their priorities and how public funds are allocated. Such initiatives can ensure that the most useful open-data initiatives receive the most funding.
Undertaking more rigorous cost/benefit analyses of open-data initiatives, which would give policymakers and other stakeholders the means to assess the relative opportunities offered by projects against their costs and possible risks. Among our case studies, NOAA and the UK Ordnance Survey both commissioned cost/benefit studies before launching their projects. This played a vital role in bolstering support and long-term commitments from policymakers and government stakeholders.
Exploring innovative avenues for funding, especially crowdsourcing, which can offer the public (and other interested parties) an avenue not only for funding initiatives, but also for establishing and ensuring the sustainability of their priorities.
The most effective avenue to understanding how open data works and how to achieve maximum positive effect, is through collaboration. Our knowledge of open data today is in many ways fragmentary, spread across organizations and individuals who are themselves scattered across the globe. There is a need for more communication and pooling of analysis (and resources). To achieve the potential of open data, we need a common research agenda, based on a wider evidential foundation. Importantly, this research framework should integrate a better understanding of impact into its core agenda. Too often, open data research focuses simply on the best ways of releasing data, with its effect—positive or negative—being simply an afterthought.
To achieve this common research agenda, we should do the following:
Set up mechanisms for communication and interaction among various stakeholders (individuals and organizations) currently working in the field of open data. Such mechanisms could include annual meetings or conferences, listservs, monthly hangouts, and other offline and online tools. The goal of these interactions would be to trade insights and ideas, to share evidence, and to collaboratively develop best practices. Events like the Open Data Research Summit within the context of the International Open Data Conference can provide, for instance, the impetus toward improved exchange and collaboration among researchers in this field.
Build on the taxonomy of impact developed through these 19 case studies and have other researchers test the premises we identified earlier. In addition, the open-data research community could consider further fine-tuning of the open-data common assessment framework4 GovLab developed together with Web Foundation and others in order to create a standardized tool for evaluating every stage of the open-data value chain.
Create a directory (perhaps in wiki format) of various assessment frameworks (in addition to our own), spread across countries and sectors. Such a directory would also include a list of key contacts and organizations, and would help facilitate discussion by establishing a baseline of sorts toward achieving a common research agenda.
Open data fuels innovation, but how can we innovate open data? We need to recognize different forms and models of open data, including big and small data and text-based data, and encourage stakeholders to think broadly about what data is and what open really means. Even while we work to better understand open data and its effect (for example, through exercises such as this one), we should foster a culture of proactive experimentation and innovation.
There are many ways to foster such a culture. Here are a few:
Institutionally, we can look at creating new entities or intermediaries, for example a global open-data innovation lab whose explicit purpose would be to think outside the box and research new models of open data that can be tested across sectors, regions and use cases.
The need for collaborative research mentioned earlier can also be institutionally developed into a cross-border and interdisciplinary open-data innovation network. Such a network would draw on global expertise and ideas.
Perhaps most important, we need to be open to new ideas and insights, and always remain in question mode. This report has outlined several recommendations and suggestions for how to maximize the value of open data. But we recognize that this is just a beginning. Our research has raised as many questions as it has suggested answers.
We end, therefore, with what we believe to be some of the most important questions we should be asking ourselves about open data: questions that can help direct future research, but, perhaps most important, fuel a culture of innovation and flexibility when it comes to how we think about open data.
The preceding findings and recommendations for policymakers and stakeholders in the open-data community are based on the examination of 19 case studies of open-data initiatives from around the world. Though this effort enabled a major step forward in our understanding of open data and its real and potential impacts, key questions remain, including the following:
What are the optimal value propositions (e.g., fighting corruption, spurring economic activity, citizens’ right to government information) to highlight in order to spur open-data activity in different contexts based on local priorities and needs?
What are the conditions to scale the effect of open data?
How can open data initiatives be made sustainable?
What comparative insights are transferable in a universal manner?
What is the optimal internal data infrastructure for enabling impactful open-data initiatives?
1 Manyika, James, Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi. “Open Data: Unlocking Innovation and Performance with Liquid Innovation.” McKinsey Global Institute. November 12, 2013.
2 Gruen, Nicholas, John Houghton, and Richard Tooth. “Open for Business: How Open Data Can Help Achieve the G20 Growth Target.” Omidyar Network. June 2014.