Blog

Potential unreached: challenges in accessing data for socially beneficial research

An opportunity to enable access to platform data with the Data Use and Access Bill

24 April 2025

Reading time: 11 minutes

Project: Private-sector data for public good: modelling data access mandates

Keywords: Research findings

Data governance

Digital platforms have improved access to services and simplified life for many people, by offering food delivery options, tools to apply for jobs or to book a doctor’s appointment. Their material power and significant reach into people’s lives, however, can have detrimental effects on both individuals and society.

Social media’s power to amplify extremist messages can translate into its alleged complicity in real world violence In 2022, Amnesty International reported that Meta’s engagement-maximising algorithms promoted extremist content in Myanmar, fuelling ethnic violence against the Rohingya population. UN human rights investigators suggested that, in this case, Facebook played a role in the violence due to the extent of its social reach.

The alleged complicity in real harm can also be found in more commonplace cases, such as food delivery applications, which grew in popularity during the COVID-19 pandemic and have raised concerns about public health and the environment. A 2021 report by the World Health Organization (WHO) highlighted issues with the nutritional value of delivery foods, which tend to contain more calories, sugar and salt, as well as with additional plastic packaging, and large portions contributing to waste. More recently, the Behavioural Insights Team at Cancer Research UK has shown how marketing strategies, such as offering special deals or enabling users to order from different businesses within a short amount of time, can encourage overspending and the consumption of low quality food.

These issues have very real and harmful consequences, which policymakers must prevent with timely and adequate interventions. To design the right policies, however, regulators need to access relevant platform data, which is often not publicly available. For example, access to data about social media misinformation and disinformation would enable researchers to understand the volume and spread of false information – where and how it is amplified – and test technical interventions on platform design and new policies to prevent the phenomena. However, platform companies systematically reject data access applications, and misreport or provide incomplete or inaccurate datasets. And policy interventions are likely to fall short, if they are based only on a limited understanding of how platforms work.

This blog is the second and last instalment of our series surveying examples of data access practices in different societal domains and sharing emerging findings from our project Private-sector data for public good: modelling data access mandates.

In this post, we draw on two case studies of access to digital services’ data: Meta’s industry-academia data sharing partnerships and instances of (limited) access to data from food delivery applications.

Both examples support two insights. First, enabling effective access to platform data could be instrumental to supporting research that is beneficial to society, developing adequate policy and preventing harms. Second, addressing insufficient or incomplete access to data held by private companies requires putting in place an institutional and procedural layer of governance. This could take many forms – for example, establishing an independent body to mediate between policy researchers and platform companies and develop rules and procedures for timely, safe and rights-preserving access to relevant data.

The UK Data (Use and Access) Bill (DUA), which is entering the last phases of parliamentary discussion, offers an important opportunity to put researcher access to data on a statutory footing. However, it will need to be complemented by secondary legislation to address some of the obstacles to effective data access. With online safety regulation reported to be included in international negotiations, it remains to be seen if and how policymakers will balance commitments to protecting people online while navigating the ‘new era of global trade’.

Meta’s industry-academia partnerships

Independent researchers and academics have repeatedly called for access to data held by private social media platforms to understand how they work and what effect they have on people. For example, researchers have asked to access platform data to understand how social media affects children and young people’s mental health – a phenomenon at the forefront of public debate and discussed in policymaking.

However, companies have historically limited such access, referencing privacy concerns and the risk of revealing trade secrets and proprietary content. In the aftermath of the 2018 Cambridge Analytica scandal – when the company harvested Facebook users’ data to build voter profiles, which were then sold to political party campaigns – Facebook established the Social Science One (SSO) initiative to test models for research collaborations between industry and academia.

SSO, incubated at Harvard’s Institute for Quantitative Social Science, was meant to enable researchers with access to data to study the relationship between social media, elections and democracy, while protecting companies’ proprietary content. However, the initiative has so far faced various challenges.

An SSO blog post from December 2019 stated that Facebook had not delivered the data it initially promised, causing delays. This resulted in some philanthropic funders withdrawing from the project. In September 2021, SSO-associated researchers revealed that the dataset initially provided by Facebook did not include roughly half of its US users, undermining ongoing research.

These issues cemented views that, while social media data is more relevant than ever to address questions with regards to the current course of political discourse and democratic engagement, insufficient access is hampering research efforts. As stated in a 2021 Brookings Institution’s article: ‘there may be more politically relevant data than ever before, but a smaller share of it is now accessible to outside researchers’.

Academics working with the SSO initiative argued that only internal, non-public platform data – which companies like Meta are not inclined to share – can support policymakers to address pressing online harms such as misinformation, disinformation, manipulation and election interference.

The interviews with experts we conducted during our project, as well as a 2022 workshop convened by the Center for Democracy and Technology (CDT), uncovered similar findings. Even establishing how to use data to study harms and develop policy becomes hard for researchers, if they do not know what types of and how much data is collected and stored by platform companies. In such circumstances, pinpointing what specific research questions to ask becomes difficult and, as a consequence, study programmes remain limited.

The same CDT workshop report mentions specific areas, such as advertising, recommendation algorithms and automated content moderation systems, about which data is insufficient or inaccessible.

When companies withhold important data or share it in unusably poor form, researchers with a good level of technical experience can still use custom browser extensions or web scraping scripts to collect datasets. But these methods come with inherent challenges: certain types of data may remain inaccessible, and platforms can shut down data collecting operations anytime.

In one case, again involving Facebook data, ProPublica developed a database showing how the social media platform was targeting users from specific backgrounds with political ads. ProPublica relied on individual users who installed a browser extension to collect the relevant information, but Facebook blocked the extension and claimed it was breaching its terms of service by collecting users’ data in unexpected ways. Facebook also argued that the extension was just replicating its in-house advertisement tracking tool, for which the data is publicly available. However, according to ProPublica, the independent extension recorded more instances of the distribution of adverts than Facebook’s own tool.

This is not an isolated instance – researchers at New York University had a similar experience, when their social media accounts were disabled while they were collecting data to research how Facebook can be used to spread misinformation.

Food delivery apps

Data availability and data quality have also been a major concern when examining the effects of online food delivery services, like Just Eat or Deliveroo, on people’s health and food consumption.

In this case, the only data that is publicly available is purely observational: information on the food and beverages sold through the app and the geographical coverage of meal deliveries. This does not offer real insights into how customers are affected by features of the recommendation systems, advertising or application design.

Food delivery platforms do collect large volumes of information about both customers (browsing history, orders, payment information, etc.) and restaurants (restaurant description and menu information) – but do not share it.

To work around the current lack of online information on the nutritional value of food sold through delivery applications, WHO and Kingston University researchers have developed a data dashboard to monitor the out of home food environment which includes external factors (such as availability, price, marketing, regulation) and internal factors (such as accessibility, affordability, convenience). However, to achieve a sufficient understanding of the digital food delivery sector, their study argues, delivery platform companies should be required by law to provide transparent nutritional information. This can provide the foundation for interventions like food labelling and price policies, and national dietary guidelines.

As a WHO report mentions, public APIs as well as mandated data-sharing for research could help collect data beyond nutritional values, including on portion sizes, allergens, food certifications (e.g. gluten free, halal, kosher), hygiene rating, etc., to study issues of public health and nutrition in comprehensive ways. This will offer important insights for policymakers on the impact of food delivery apps on eating habits and on the factors (economic, psychological, physical) leading to certain food choices. Besides supporting policies that promote healthy eating, better data access may help improve regulation in areas like labour for gig economy workers, road safety and the environment.

However, mandating the use of APIs to share data is not enough. The same WHO study identified additional challenges: the lack of data standardisation, the fact that data is recorded using different terms, and the inclusion of unverified information.

What now? Future regulation and emerging findings for an effective researcher access framework

In a welcome legislative development, the UK’s DUA Bill draft is introducing provisions for independent data access to study online safety matters and amend the Online Safety Act 2023.

The new legislation presents an opportunity to put researcher access to data on a statutory footing. However, it will not be enough on its own. Secondary legislation is needed to complement the DUA provisions and establish the details of its framework, including addressing the challenges around insufficient access to platform data and poor data quality. Without secondary legislation, the new provisions will be missing the operational rules for effectively enabling access in practice.

Existing research shows that platforms systematically provide incomplete or inaccurate data and raise significant barriers for researchers even when they are mandated by law to do so. In a recent example in Germany, a civil society organisation studying the influence of social media platforms on upcoming elections took the step of suing one platform. The company repeatedly refused to provide access to data (e.g. reach, number of likes, shares), which it was legally required to share via easy-to-use APIs. This is not what effective access looks like.

So how will policymakers enable effective access to data? How could they establish safeguards and processes to ensure data quality?

Emerging findings from our case studies and interviews with experts indicate that overcoming issues like insufficient access and poor data quality would be difficult without dedicated institutional support. An independent intermediary body that can serve as the central governance hub for researcher access to data may be necessary to facilitate effective access without burdening either researchers or companies. The independent body could ensure that the data shared is truthful, complete and formatted according to set quality standards through quality audits or assessments. It could analyse and compare the data collected and generated by platforms with already available data shared by companies via APIs or with the non-public datasets released to researchers. Complementing this measure, researchers could be given the right to contest if the data received does not meet quality standards or does not allow them to carry out their research as intended.

Experience from researchers working with the EU’s Digital Services Act point to other two core functions that an independent intermediary body could have. The first one is developing and overseeing a streamlined process for researchers to apply and obtain data access. This includes clear rules for the application process and review, and appeal mechanisms standardised across platforms – in other words a formalised process, and the opposite to what exists now. The second additional function is to provide guidance on privacy protection, which is necessary to access data in a rights-preserving way. Legislative proposals in other countries have opted for a tiered system of data access, with greater safeguards and restrictions depending on how sensitive or granular the shared datasets are.

Beyond setting access procedures, an intermediary body could also monitor and analyse the data access projects it supports: tracking which data policy research uses most, the main areas of study, the reasons for access denials, the safeguards and data protection measures that work, those that don’t, and those that need to change as platforms change.

Establishing an intermediary third-party organisation is one way to ensure that a researcher access framework remains effective, adaptable, and aligned with the evolving nature of online harms and platform technologies. Whichever mechanisms policymakers will choose, the DUA Bill provisions with the necessary secondary legislation offer an opportunity to address these challenges. Missing it would leave us with a limited understanding of and reduced capacity to address pressing societal phenomena. At the present junction and considering how far the legislation has come, it’s important that the UK does not trade away its ability to act on such a key area of digital policy.

Authors: Bartosz Maj

Valentina Pavel