Modelling access-to-data frameworks in the UK
What policymakers can learn from data sharing practice to enable research into online harms
17 January 2025
Reading time: 10 minutes
There are many benefits to facilitating public access to privately held data in areas such as transport, energy use and food consumption. Data sharing can contribute to developing public infrastructure, inform policy in different domains, advance research, and ensure that services, like those offered by online platforms, are safe and helpful for individual users.
At present, such data access is limited. Private companies tend to hold data in a proprietary fashion, impeding the ability to understand social impact and design appropriate interventions. Policymakers need to establish effective access-to-data frameworks that are geared towards projects that are socially beneficial.
But how can challenges to data access be overcome and how can we move from a system that favours the interests of private data holders to one that works in service of people and society?
In the UK, the Office of Communications (Ofcom) launched a public consultation on how and to what extent researchers access online platform data, for example to independently monitor services such as social media. The consultation will feed into a report the regulator must produce under the Online Safety Act 2023, and is part of an ecosystem of policy tools1 grappling with the constraints on and possibilities offered by data sharing.
The consultation could not be more timely. In the case of online safety, researcher access to data is not just desirable but necessary to enable public scrutiny. If researchers cannot analyse platform datasets, then independent auditing becomes impossible. And companies that are beginning to lower the standards of their own internal safeguards risk becoming less accountable for the harms they cause.
Both public and private organisations are currently testing different types of data access mechanisms, including technical environments, legal mandates and voluntary schemes.
To understand which ways of accessing private data are most effective and useful, and how legal mandates can help implement them, our project Private-sector data for public good: modelling data access mandates has been exploring:
- the main purposes of and incentives for access to private data
- the common technical and non-technical obstacles
- the friction points between different actors
- and the role of legal mandates – direct legal obligations that require data holders to share data – in rebalancing existing power dynamics.
Our project includes four case studies that explore friction points and power dynamics in the access-to-data landscape. The case studies will be published in full in a forthcoming report and the insights from them have served as a basis for our submission to the Ofcom consultation.
In this blog post, the first of a two-part series, we summarise the lessons from two of these case studies – transport and water pollution management – which highlight how access to private data could improve physical services.
While the case studies are contextual, they provide fundamental insights that are transferable to other domains. They draw the attention of policymakers towards the value of having organisations and actors with different competences accessing the same datasets; the need to balance the competing interests of these actors, which may range from local governments to academia and civil society advocacy groups; and the forms of friction (due, for instance, to incomplete or unrepresentative datasets) that emerge in the sharing process.
The Online Safety Act offers a first, high-stakes test for effective data access. Policymakers’ success in enabling independent research into online harms is also wagered on how they navigate specific data access issues. Academic and non-academic researchers working with civil society organisations and communities bring different skills to the process and can check for online safety according to the interests of underrepresented groups. How to ensure they can access the data they need? How can policymakers, independent researchers and private companies holding the data understand and address dataset limitations from the outset? How can policymakers move from mandating data access to requiring companies to enable the effective use of the shared datasets?
Two examples of data-sharing mechanisms
Strava Metro data for infrastructural development
The first case study concerns access and sharing frameworks for data held by Strava, a private company which programmes the popular fitness tracking application of the same name. Strava allows users to track and share information about their physical exercise routines, such as cycling or running.
Strava prepares an aggregated dataset via its Strava Metro initiative, and shares the data with organisations involved in supporting active travel infrastructure like pavements, crossroads, cycle paths, bike parking facilities, etc. The data has been repurposed by various public offices at local and national levels.
In the United States, local governments have used Strava data to plan infrastructure investments or design safety features on public trails. In the UK, the Office for National Statistics, together with the Department for Environment, Food and Rural Affairs (Defra), repurposed Strava data to measure public engagement with natural environments and evaluate the success of government policy on access to green spaces.
Strava data has been cited as a valuable resource by multiple Local Cycling and Walking Infrastructure Plans in the UK, such as this one by Milton Keynes Council. At a city level, Transport for London has been a consistent client of Strava Metro, using its data to assess the impact of bike safety lanes.
Academics have also used Strava data. For example, to study the use of cycling infrastructure during COVID-19-related lockdowns.
However, researchers using accessible Strava datasets in the context of these public interest projects have run into various issues, as reported in the interviews with experts we conducted for our project.
Some local authorities could not pay the subscription initially necessary to access the data, due to budget constraints. When the platform became free for urban planners and local administration, the reduced resources meant that Strava made its data available through a self-service interface rather than as a customisable dataset. In turn, this limited the degree to which the data could be tailored to the needs of specific partners.
Additional problems emerged as Strava users are not representative of the general population – with a higher proportion of young male individuals using the app. To overcome this issue, researchers and policymakers supplemented Strava data with other datasets.
Sewage spill data to monitor water pollution
The second case study examines the data sharing obligations of water and sewage companies in England and Wales. While water companies are private entities, they have a public function and are obliged to make the environmental information they hold publicly available.
Water and sewage companies are permitted to spill sewage into public waters during extreme weather conditions. However, when this happens, they must capture data on the length and frequency of the spills, and share it with the public upon request under the Environmental Information Regulations (2004).
In 2021, researchers at the environmental advocacy group Windrush Against Sewage Pollution (WASP) used the data provided to show how several sewage spills were unauthorised and in breach of water company permits, and therefore potentially caused environmental damage. In August 2024, this led the Water Service Regulation Authority (Ofwat) to fine three water companies for a total of £168 million.
In this case, accessing and using the data was not as straightforward a process as it should have been. Though sewage spill data is mandated, several factors interfered with its acquisition and use. meant the process lasted well beyond the standard 20 days response time. Research by WASP had to be paused when the Environment Agency and Ofwat put the water companies under investigation for the inadequate treatment of sewage and illegal sewage discharge. The water companies did not want to comply with data access requests arguing that this would prejudice the investigation. Only when the tribunal settled the matter was WASP able to access the data. However the delay resulted in research being paused during a period when public scrutiny was key.
The experts we interviewed reported that even when the data was provided, there were several limitations. The dataset had large gaps, was inconsistently formatted and not user-friendly, and included impossible values.
These two case studies make clear how private sector data being shared with public actors can be used to achieve a goal of public interest – improving active transport infrastructure and controlling water pollution, respectively – albeit according to very different mechanisms. Strava shares its data voluntarily and for purposes that do not conflict with its business model, while English and Welsh water companies are forced to do so for transparency and accountability purposes.
Insights for designing effective forms of private data access
The case of Strava Metro shows how making data available allows for it to be used by different actors with a variety of skills, perspectives and motivations. Local and national authorities, as well as research communities, are able to work with Strava data to make inferences and advance infrastructural projects that can benefit entire communities.
However, different actors may have different objectives and incentives. In the case of English and Welsh water companies, the interests of the data holder can be at odds with those of the communities they are supposed to serve. Water companies are disincentivised from using and sharing their data due to the legal, financial and reputational consequences of exposing potential permit breaches. External independent scrutiny by an advocacy group like WASP, which has no financial incentives and seeks to protect the environment, is key.
When the interests of parties involved in data sharing are not aligned, even though there is a data access framework in place, competing interests risk jeopardising scrutiny by obfuscating the data and blocking research.
A data sharing process is also likely to introduce frictions, either at the stage of data acquisition or of data use. Frictions are likely to dissuade or slow down research, ultimately delaying projects. To benefit people and communities, it is important that access frameworks are designed with likely frictions in mind. As seen in the example of sewage spill data, data request processes, the permitted length of response, and the appeal processes can negatively affect research.
In both the case of Strava Metro and the English and Welsh water companies, we also see the importance of data quality. When datasets are incomplete, formatted or made available in ways that they cannot be used or require the receiving organisation to invest substantial resources to use them, data sharing becomes a strenuous effort.
Additionally, for projects of public interest, such as health-related research, the improvement of public services, product or online service safety audits, etc., datasets will have to be representative of the general population. But private sector data is unlikely to be generalisable as it is typically collected for narrow business purposes, requiring that those receiving the data either correct or work around these limitations. Failing to do so may eschew research results and lead to services that are biased towards certain demographics or to overlooking harms against specific communities or groups.
Overall, policymakers need to consider ways to ensure that data is not only accessible through obligatory mandates, when independent intervention, such as safety scrutiny, is indispensable, but indeed usable for the many purposes it can contribute to.
The Ofcom consultation is a step in the right direction towards ensuring an access-to-data framework that prioritises independent scrutiny. Our case studies show how data sharing can improve independent monitoring and help to achieve a much-needed business-to-society feedback loop, one in which researchers raise transparency and accountability issues, regulators investigate these issues in a timely fashion, and platforms adequately act upon investigation results and implement redress.
- The recently launched AI Opportunities Action Plan also talks of the need to incentivise industry to curate and unlock private datasets.
Related content
Private-sector data for public good: modelling data access mandates
This project aims to model the legal backbone necessary for enabling access to data mandates in practice.
Rethinking data and rebalancing digital power
What is a more ambitious vision for data use and regulation that can deliver a positive shift in the digital ecosystem towards people and society?