Geography - battle lines are drawn
On Thursday 4 June 2020, we hosted another excellent Open Data Saves Lives session, this time to discuss geography in the UK and how that relates to Covid-19 data, as well as the wider data community. Through experience of working with all sorts of data, we've learnt that UK geography is hard! When you scratch the surface, it's actually a very difficult topic, and is all too often a barrier to understanding datasets at a granular level. So we wanted to bring people together to discuss tips, share knowledge, and discuss pitfalls and flaws of the current system.
We had an excellent group of speakers, including Stuart from ODI Leeds, Dan Cookson, a GIS specialist, and Dr Robert Barr from the University of Liverpool. Stuart kicked off the session by giving us a brief overview of how UK administrative geographies work - all the way from Output Areas and LSOAs through to Local Authorities and Clinical Commissioning Groups (CCGs). The diagram below shows just how many different types of geography there are! He also shared the geographies page, put together by ODI Leeds with useful tips and resources we have collated.
Using maps to visualise data is an incredibly powerful tool, but can be difficult. At ODI Leeds, whenever we come across geographical data, we like to add it to our Data Mapper so that we can spot patterns in the data and share findings in a visually easy to understand way. The mapper allows the user to overlay layers, and we've added layers with boundaries for common types of geography such as wards and Local Authorities. It's a simple but powerful tool, and we've made it easy for others to visualise their data as well. Calderdale Council (one of our sponsors) regularly use the Data Mapper. They simply have to upload a dataset in GeoJSON format to Calderdale Dataworks and it will be added to the Data Mapper as a layer.
Our next speaker, Dan Cookson, is a long-time friend of ODI Leeds, and an expert in geospatial analysis and data visualization who is using his expertise to map COVID-19 data. To understand the spread of the virus, we first need to understand the geography so that we can make sense of the data. What's the easiest way to understand geographical data? Put it on a map! Dan has taken this one step further and developed a fully interactive online map of COVID-19 data at MSOA level, which can be overlaid with layers including population density, deprivation, household size, and care home locations. This is a really powerful tool as it allows us to quickly see relationships and distributions which would be difficult to spot in a set of spreadsheets, for example.
One of the hardest types of geography to deal with is postcodes - they may seem simple, but are often misleading. A postcode is not a point, nor an area: it is a set of points (ie, a set of addresses), and postcode area boundaries often cross Local Authority boundaries, which can be confusing. For example, 'LS' is the postcode area code for Leeds, so you might expect LS29 6AB to be in Leeds - it's actually in Bradford. Converting to other geographies is normally done by treating each postcode as a single point using the ONS Postcode Centroids table; however this is an assumption, and will always introduce some degree of inaccuracy.
In the UK, the database of addresses is called the Postcode Address File (PAF), but unfortunately this is not available as open data. The ODI worked on a project exploring the concept of 'open addresses' but encountered considerable challenges when it came to establishing who owned what and if you were allowed to use it.
This brings us onto the next talk, from Dr Robert Barr from the University of Liverpool, another GIS expert who has previously worked on projects for Ordnance Survey, Royal Mail, and the National Census Service. He had an excellent question - why is the obvious solution being ignored? The solution he was referring to is UPRNs (Unique Property Reference Numbers). As the name suggests, these are unique identifiers for every single property in the UK, which Ordnance Survey hold, along with the exact geographical location for each one. So why aren't we using those instead of postcodes and addresses? The answer is complex. UPRN's are going to be released as open data soon, but addresses (and the link between UPRN and addresses) will remain as closed data. It's the addresses that make UPRN's human-readable and for most situations, it's quite hard to use UPRN's to build human-facing services unless you buy AddressBase, which is managed by Ordnance Survey.
Hopefully things will change in the future, and Ordnance Survey (and other organisations like them) will become 'open by default', rather than the occasional release of limited excerpts of licensed data. At ODI Leeds, we've seen first-hand how releasing open data creates far more surplus benefits and values than trying to keep everything behind a paywall. These benefits are not as tangible as figures in a bank account but they are full of much more potential and can be far more practical. The act of letting more people see and use your data will make it better. They will find the errors, the ways to improve it, etc, and they will give you feedback. You save time and effort trying to do this internally, plus you can move forward with improvements that make your life better too. Why would people offer this help for free? Because they ultimately want to use the data as well. Everyone benefits.
Geography is a persistent challenge when working with data, and the stories from people who have joined us at #OpenDataSavesLives just emphasises the need for open data and collaborative approaches. We aim to connect people, share work, and get things done through these sessions. You can join us at any upcoming #OpenDataSavesLives session by visiting the mini-site or registering via the Eventbrite page.