Following government guidelines our space will reopen from July 6th - please read the ODI Leeds Safe Space guidelines.
ODI Leeds

Launching our open data health blog

"Open Data is data thats available to everyone to access, use and share", NHS Digital have published open data as part of 194 publications, 100 of these are still active and publish either monthly, quarterly or annually. This data which spans everything from dentistry, to hospital admissions, to substance abuse statistics has the capacity to drive benefits in three key areas:

  1. Improving health and social care, by providing policy makers with the information they need and academics with data to guide their research.
  2. Inform the public debate, by providing public facts and explanations that enable journalists to tell stories.
  3. Innovation and developing the UK economy, by supporting research and enabling services and apps to be built using freely available data.

As the data is freely accessible to anyone on our website, understanding the users and their needs is a challenge. Consequently, in late 2019 we carried out some user research. We know a lot of the data is used to support policy work but we identified a number of additional user personas. This research identified academics, the charity sector, and data journalists as key user groups. Following interviews with users from over 10 organisations, users consistently told us they found open data:

ThemeWhat users told us
Hard to find

"I find myself getting lost in the pages about data. It's not all in one place. I found some stuff and Karen found some other stuff in other another place."
"Where have you hidden the information?"

Hard to understand

"How are variables decided each year? In HES outpatient data in the last 5 years there are 5 different spellings of the word speciality between years."
"It was surprising how the format of these data sets varied over time. It's an absolute nightmare and it is written for people 5-10 years ago when you downloaded the thing, scroll down the spreadsheet, find the number that you want, and write it down."

Hard to manipulate

"It took 4 months to clean the data to use and analyse."
Analysts spend up to 80% of their time wrangling our data.

These findings tallied with our own internal audit of open data highlighting the patchy meta-data, which underpinned many of our issues (as the lack of meta-data hampered user's ability to find data via search engines and on our website, as well as making it challenging to interpret when the data was found!)

The user research also shed light on the complex eco-system that is open health data. We discovered organisations building services based on our data (these ranged from internal data warehouses to fully external services), and it was extensively re-used by many mainstream media outlets.

This prompted the question of how to best improve the situation. Given the complexity of the system, small scale experiments and collaboration were key. So in the latter half of 2019 we:

  • Published our data sets so that they are now available on Google Data Set Search
  • Invested time in capturing and standardising our open meta-data, this will then be loaded onto a meta-data catalogue (a lengthy but necessary bit of work!)
  • Created an early attempt to automate the creation of generation (thanks to an internal hackathon)
  • Carried out a PoC to push open meta-data into the new meta-data catalogue, which will enable to in future versions of the NHS Dictionary

As well as exploring the data, and improving things internally we have also focused on investing in collaboration:

  • Founding a cross organisation theme group focused on open data in health (let us know if you're interested!)
  • Collaborating with government bodies via the GSS COGs platform
  • Sponsoring Hackathons (one with BMJ and Nuffield, the other with Imperial College) that use open health data to raise awareness of what there (and get us closer to our users!)
  • Attending events, invaluable support at ODI Leeds
  • Joined the GSS open data group

This is just the start of our journey, and we are now focusing on 5 high level goals and 6 projects that aim to address the user challenges.

Understanding user challenges
Credit: Alistair Bullward, NHS Digital

The technology that underpins open data is key. Talk of RDF, Graph Databases and W3 standards can become necessarily complex. Keen readers may have noticed that I haven't mentioned technology until this point, although the technology is vital, it isn't what motivates us. The amount of data will only increase and high quality open data that can be easily found, interpreted and analysed will open up the NHSs challenges to even more enquiring minds, support informed public debate and also enable the UK to be at the forefront of developing innovative data solutions. NHS Digital is uniquely placed to build out the foundations of an exciting and innovative eco-system, no doubt there will be challenges ahead but the journey should be an exciting one!

Over the coming months we will be publishing updates on our work on open data, initially focusing on capturing requirements for an open data portal and improving our meta-data, if anything is of interest or you have any questions please get in touch!

Cheers,

Alistair