Data collaboratives

Data collaboratives (sometimes called “corporate data philanthropy”) are a form of collaboration in which participants from different sectors—including private companies, research institutions, and government agencies—can exchange data and data expertise to help solve public problems.

Types
Data collaboratives can take many forms. They can be organized as:
 * Public Interfaces: Private firms publish select data assets to be public for use by external parties. Firms typically present this information as Application Programming Interfaces (APIs) or data platforms.
 * Trusted Intermediary: Private sector firms share data with partners from public, civil society actors, and academia. Data can be brokered by third parties, who provide valuable data under fixed terms and time limits to non-private organizations. It can also be run through third-party analytics, which shares data with data stewards to run analysis and share those findings with external actors, providing the outcomes of the data without exposing the sensitive information.
 * Data Pooling: Multi-sectoral stakeholders join “data pools” to share data resources. Public data pools allow partners to openly access and independently use the data, while private data pools limit access and contribution to the information.
 * Research and Analysis Partnership: Organizations share data and “proprietary data assets'' with public and academic institutions to analyze and advance a public objective. Through these data transfers and data fellowships, access to and terms for use of data are highly controlled.
 * Prizes and Challenges: Organizations make data available to qualified applicants through competition for innovative use or platform design to add value to the firm. Open innovation competitions, like LinkedIn’s Economic Graph Challenge, allow for open and broader use of data by many independent users, while selective innovation challenges give limited data access, narrowing the scope of its application to a specific situation. Oftentimes, competition members are bound to data responsibility guidelines.
 * Intelligence Generation: Companies use data to build shareable tools and release them for public use. Although no formal, direct cross-sector sharing occurs, it lays the foundation for knowledge transfer and a culture of open, data-driven analysis.

Reasons for data collaboratives
The big data boom has demonstrated the power of data to inform and design public projects in an accountable and iterative manner. However, unequal access to certain data across sectors limits the ability of groups to find, access, or be made aware of valuable information, hindering social innovation. Data collaboratives create networks that bridge access and knowledge gaps by bringing different sectors together to share data to address social challenges.

The GovLab argues data collaboratives wherein a private sector data holder shares data with other groups tend to be motivated by a desire for:
 * Reciprocity: Sharing data with others can guide mutually beneficial business decisions.
 * Research and Insights: Sharing data can spark new and innovative approaches to issues.
 * Reputation and Public Relations: Sharing data, especially to advance public issues, can bolster the image and reputability of a firm, attracting new socially-conscious clients, talent, and followers.
 * Revenue Generation: Corporate data can be sold to data collaboratives, generating novel revenue streams.
 * Regulatory Compliance: Data collaboratives can help corporations advance transparency and trust by establishing and following data sharing protocols.
 * Responsibility and Corporate Philanthropy: Data collaboratives allow businesses to drive meaningful corporate social responsibility programs.

Data collaboratives can help respond to service delivery and emergency preparedness and disaster response problems. Robert Kirkpatrick, Director of UN Global Pulse noted that “the lack of innovation [in these sectors have] resulted in a failure to protect the public from what turns out to be preventable harms.”

Incentives for private sector participation
According to The GovLab, data collaboratives can provide five main benefits for public problems:
 * Situational awareness and response: recent, robust, and quality data from private or public sectors can help governments and civil society better mobilize in crisis and emergency situations. For instance, the Mobile Data, Environmental Extremes, and Population Project (MDEEP) is a collaboration between international organizations and telecommunications companies in Bangladesh to build “large-scale population displacement models to understand population movement related to natural disasters.”
 * Public service design and delivery: Access to previously inaccessible datasets can enable more accurate modelling of public service design and guide service delivery in a targeted, evidence-based manner. For example, collaborative use of datasets by governments, international organizations, aid groups, and private telecommunications carriers during the 2014 Ebola outbreak helped track and trace the virus.
 * Knowledge creation and transfer: Utilizing a larger number of and more diverse datasets can fill knowledge gaps to better respond to the problem at hand. The All of Us Research Program, created by the Obama administration in 2015, allows participants to share their health data to a secure system, which is then aggregated and anonymized for researchers to study and advance medical science.
 * Prediction and forecasting: Data from the past allows for informed prediction in the future, allowing groups to identify problems and respond more quickly. Leveraging search engine query data, researchers identified search terms, times, demographics that correlated with suicidal ideation across Indian youth.
 * Impact assessment and evaluation: Access to additional datasets can help organizations monitor and evaluate the effectiveness of policies and iteratively adapt programs for better service delivery. For example, the US Food and Drug Administration’s Sentinel Initiative used anonymized patient information sourced through the TriNetX Live USA Network to assess how many adults hospitalized for COVID-19 experienced or succumbed to thrombosis-related complications.

Examples
From 2017 to 2019, the percentage of companies entering data-related partnerships rose from 21% to 40%. A growing share of business competitors are also deciding to connect their data—jumping from 7% to 17%. In a 2019 report, the World Economic Forum and McKinsey estimated that connecting data across institutional and geographic boundaries could create roughly $3 trillion annually in economic value by 2020.

The following is an illustrative (but not exhaustive) list of some data collaboratives:
 * AI4BetterHearts: A global data cooperative established by the Novartis Foundation and Microsoft to improve cardiovascular health with the aim of using AI and data analytics to tackle heart disease.
 * The Chicago Data Collaborative: An effort by newsrooms, academics, and non-profit organizations to source data from public agencies, organize and document the data, and link it for a better and comprehensive understanding of the criminal justice system.
 * The Counter Trafficking Data Collaborative: A data collaborative working to curb human trafficking through data contributed by various countries and is maintained by the International Organization for Migration (IOM) and Polaris.
 * CubeIQ: An offline intelligence and measurement company helping marketers understand the true impact of their cross-channel advertising in the offline world. Their “Data For Good” program provides access to anonymous, privacy-compliant location data for academic research and humanitarian initiatives related to human mobility.
 * Data Collaborative for Justice: A project at the John Jay College that leverages community data to research the operations of the criminal justice system and create informed and transparent frameworks for criminal justice reform.
 * The Health Data Collaborative: A multi-agency, multilateral effort active in five African countries that provides a collaborative platform to leverage technical and financial resources at all levels alongside country-owned strategies and plans for collecting, storing, analyzing, and using data to improve health outcomes, with specific focus on UN SDG targets and communities that are left behind.
 * International Network for Data on Impact and Government Outcomes (INDIGO): An initiative of the Government Outcomes Lab (GO Lab) at the Blavatnik School of Government at the University of Oxford that builds an interdisciplinary network of data stewards to address social problems collaboratively.
 * InfoSum: A UK based company that enables a decentralized and trusted data ecosystem to enable companies to do more with customer data without actually sharing the data.
 * The Mobility Data Collaborative: A partnership among mobility operators, data aggregators, public agencies, academia and others to provide solutions and common framework to ensure safe, equitable and livable streets for all.
 * Water Data Collaborative: Works towards their mission to grow and maintain an inclusive community of water scientist data generators to provide data that enable the protection and restoration of our nation’s waterways.

Risks, challenges, and ethical considerations
Data collaboratives have significant challenges related to data security, data privacy, commercial risk, reputational concerns and regulatory uncertainty. In addition, there exist concerns about the lack of trust among individuals, institutions and governments.

Risks

 * Commercial Risks: “Corporations are concerned about brand reputation, data rights and the disclosure of proprietary or commercially sensitive information.”
 * Security Risks: Vulnerable data structures, lacking security expertise and processes can put all members of a data collaborative at risk.
 * Regulatory Risks: Fragmented legal and regulatory frameworks hinder data sharing across sectors and sovereign borders. Varying definitions of privacy and data holder rights exposes data holders to significant compliance risks and liabilities.
 * Privacy and Ethical Risks: Collaborative data use can expose individual identities, infringing on privacy and security. Additionally, protecting vulnerable populations from discrimination and human rights violations through the sharing of non-personal but demographically identifiable data is often a major issue.

Mitigating privacy protection issues
Privacy preserving computation (PPC) presents data in forms that can be shared, analyzed, and operated on by multiple stakeholders without the raw information. To do so, PPC seeks to control the environment within which the data is operated on (Trusted Execution Environment) and strips the data of identifying traits (Differential Privacy). Protecting the data via Homomorphic Encryption techniques, PPC allows users to execute operations and see their outcomes without exposing the source data. Through secure Multi-Party Computation, different groups can combine data to work in a decentralized and collaborative manner.

PPC techniques are already being leveraged by governments and large corporations. In 2015, the Estonian government worked with the private firm, Sharemind, to analyze tax and education records through Multi-Party Computation for the Private Statistics Project. An external audit by the European Commission PRACTICE project found that the Private Statistics Project did not expose any personal data.

In 2019, Google released its Private Join and Compute protocol to open-source, allowing users to use Homomorphic Encryption and Multi-Party Computation. In the same year, ten pharmaceutical companies formed the Melloddy consortium to use blockchain technology to train a drug discovery algorithm via shared data.

Mitigating power asymmetries
Power imbalances can occur when stronger parties manipulate, exclude, or pressure weaker members of the data collaborative. From a classical viewpoint, power refers to the influence a person or group has over another. Examining collaborative governance, Dave Egan, Evan E. Hjerpe, and Jesse Abrams suggest a three-phased approach to power: power over refers to the ability to control the behavior of others, power for looks at the ability to authorize the participation of stakeholders, and power to considers the ability to measure another entity’s ability to realize its goals.

Power imbalances can arise from disparities in authority, resources, legitimacy or trust between parties. The more actors in the data collaborative or more incentives of data use, the increased likelihood for conflicting interests. Oftentimes, data is viewed as an organizational asset, and opening it up to new uses by others means relinquishing control over the data and ceding this autonomy to the collaborative, resulting in the “control and generativity challenge.” Data stewards can help reduce the power imbalances by reducing bias influences, follow operating procedures, and provide issue resolution and remediation.