Following recent government guidelines ODI Leeds is still open and can host business events for up to 30 guests.
Please read the Safe Space guidelines for more information.
ODI Leeds

How long is the coast? How many dimensions does it have?

Is the coast poor? Part 2

In the previous blog post of this project I explained how and why we’d spent two months creating maps like this of the island of Ireland.

Small areas of the whole island of Ireland with the income quintile (poorest fifth to richest fifth) within their countries. Colours are not strictly comparable between the two countries, but are reasonably close.

Since then we’ve been fixing the problems in our work, expanding it to more places, and expanding it to more datasets. That will be another blog post.

Here I want to explain what we learned from the first blog post and what we’re doing to fix things.

The most common reaction we had was “wow that’s cool, how can I use it?”. And our answer was “you can’t, the file is 500MB and we don’t know how to display that in a web browser”.

The second most common reaction we had was “wow that’s cool, but what is it showing me?”.

Both great questions, and in both cases the problem is the same. Not enough bandwidth.

Bandwidth

Those maps just have too much information on them. They are too complicated and too nuanced to extract safe insights from. They are also too complicated and too nuanced to display on the web easily.

So we need to simplify our maps. If you’ve ever used the fantastic mapshaper.org to simplify maps you’ll probably know about the Douglas-Puecker algorithm, which does exactly that. It’s built into almost every GIS library and this is what it does to Ireland when you ask it to reduce the number of lines defining the coast from about 25,000 to 1,000.

The coast of Ireland can be simplified from 25,000 lines (left) to 1,000 lines (right) without losing its recognisable shape and key features.

With a simplified coast we can start projecting our data onto it. Specifically, for each of the about 1,000 lines that define the simplified coast we can ask what the average income quintile is within 10km.

Projecting extremely detailed small area income data onto the simplified coast retains key information while shrinking file size by over 1000 times.

This process leaves us with coastal income maps that are under 100kb and can easily be displayed in a browser. And because everything is calculated from raw data we can change the datasets we’re looking at, and expand the analysis to more places.

But we haven’t done that yet.

What we did instead is consider how far we could go with this simplification.

What if we reduce Ireland’s coast to 200 lines? What if we reduce it to 12?

Even with just 12 lines the coast of Ireland with income data projected onto it tells a correct and useful story.

The result is quite beautiful, and probably quite powerful. The North-West coast is poor. The East coast around Dublin is richest. Belfast, Cork, and Limerick generate prosperity in the South-West. Even with such simplicity we can tell much of the story of Ireland’s coastal economy.

But can we go even further? Can we reduce the coast to just one line?

Let’s look only at Northern Ireland. Because when we’re developing new algorithms it is sensible to first work on the easiest case possible.

Of the four nations of the UK, Northern Ireland has the shortest coast. From Derry/Londonderry to Newry via Belfast is 27 lines at the level of simplification we used for the second map on this post. It’s a good level of simplification because it keeps the Strangford Lough.

27 lines describes Northern Ireland coast's key features and income characteristics well enough for many purposes.

When we draw these lines on a map we draw in two dimensions. This means that we have to draw our third dimension of data in another way.

Throughout this blog post and the last I’ve used colour to do that. It’s usually better than width of line, but it has many limitations. Humans can’t distinguish between colours well especially if the colours aren’t next to each other. So we are limited to a traffic light style representation of the data – red, orange, yellow, green, and maybe blue. Those of us with limited colour perception can extract even less information.

But we can stretch out the lines of the coast into a single straight line. From Derry/Londonderry to Newry via Belfast is a line.

So we can draw the line along the x-axis of a graph, and instead of using colours to show where is richer and where is poorer along the coast, we can use bars on a graph.

Once we consider the coast as a line we can stretch it out into a straight line and put it on the x-axis of a graph.

Simplifying the coast in this way makes lots of things easier.

Once we start working with graphs instead of maps a much bigger set of questions becomes answerable in a much easier way. We can use a much bigger and more familiar toolkit of statistical tests to answer questions that we’ve been receiving like,

  • Is coastal deprivation linked to access to jobs?
  • Is coastal prosperity linked to the weather?
  • Is coastal happiness related to age profile?

Most importantly we can share our data and our understanding more easily and generate more questions from more people.

We’ll probably need another month to tidy our work from this month up, share it, and expand it to more places and more datasets. We need to do a lot of thinking about how long the coast is. Once we've figured that out we'll share everything. The next blog post have all of this stuff ready for you to play with.