Project Cygnus - I EPC what you did there
Heating
After working our way through the most straightforward research areas, we decided to take a closer look at the documentation of heating systems. We decided on this focus, not only because it is getting colder by the day, but mostly because it is one of the 4 key aspects used to calculate EPC ratings. Because of this, it is one of the few mandatory fields to be filled in during an EPC inspection, making it relatively well documented.
We say relatively because the definition of heating systems and the resultant records vary from a focus
on overarching supply systems or fuels, to the actual fixtures like radiators installed in the
properties. For now we chose to direct our attention towards the former. We made this decision with
England's carbon-neutrality goals in mind, but also (selfishly) in the interest of exploring the
opportunities of the Leeds District Heating Scheme. But of
course that was easier said than done.
First, we revelled in the beauty of a mandatory field: No missing values! We did of course realise that
this was also caused by the extra fixture information. After dropping this, we found that the number of
unique values decreased significantly leaving us with 32. Some of these were simply different versions
of the same thing and could be grouped together. However, there were a few fundamental issues with the
data. The first was that some of the observations only included information on fixtures without giving
any information about fuel or overarching heating system. These were grouped as 'unknown.' Then cases
that provided fixture information with a hint towards overarching systems (like electric storage
heaters) complicated grouping. While these were aggregated by main fuel type groupings in the ONS
report on EPCs, we decided to keep them separate to be able to explore their individual impact
on energy efficiency in the future. While this decision left us with a somewhat messy visualisation,
this appropriately highlights the need for standardisation throughout EPC data
creation. The final 14 groupings of heat systems can be seen in the following graph and can
be found on our
data mapper.

(from October 2008, last updated 30 June 2020).
When reduced to the groupings chosen in the ONS report this graph gets a lot clearer, but there are also
some extra assumptions that need to be made. For example this is that all the cases that only include
electrical fixtures should be grouped in with electric heating.

(from October 2008, last updated 30 June 2020).
These two graphs show the trade-offs that need to be made when working with EPC data in its current form. Especially when it comes to the column associated with heating system descriptions, this can lead to uncertainties and potentially misleading generalisations. An option to avoid this may be to split the heating system and fixture descriptions into separate columns.
Given the necessary time, resources, and ultimately usable data, this could be connected with heating
cost to deliver some valuable insights into the cost of carbon neutrality. Especially in the context of
local geography there may be promise in exploring the potential of upgrading heating systems.
Green Deal
Given our interest in environmental efforts, we decided to take a closer look at the documentation of
government incentives in the EPC data, to paint a picture of how well they actually work. To do this, we
first cleaned and graphed the transaction types column of the EPC data set for Leeds. This is also
available on the data mapper.

(from October 2008, last updated 30 June 2020).
The visualisation above shows that the most prominent reason for EPC inspections in Leeds were property rentals and marketed sales. But this also includes a range of assessments for government incentives like ECO and FiT, which have been explored by IPPR and Ofgem before. For us, especially the Green Deal related data is interesting since it captures EPC assessments on the same dwellings before and after its implementation, in the timeframe of 2013 to 2015. To compare these, we filtered the data to retain only reasonable observations of duplicated building reference numbers that were collected right before and after the implementation of the Green Deal. This left us with 490 observations. While this is only a small subset of the openly available EPC data for Leeds, there are still some observations that can be made:

This first visualisation shows that the energy efficiency of building generally increases after the
implementation of the Green Deal. On average this is by about one band. However, it is interesting to
note that there are some cases where the energy efficiency has decreased following Green Deal measures.
These and the cases where energy efficiency has increased dramatically invite further exploration to
uncover if these are merely outliers or opportunities to inform future policy.


The next two graphs show the difference of original and achieved energy efficiency to the proposed potential prior to the implementation of Green Deal measures. They underline the positive effect of the Green Deal, since the distance to the past potential halved from roughly two bands to only one, and the minimum energy efficiency present has increased.
Here it would again be interesting to look at those cases that have surpassed the old potential to find out what exactly they did right. It would also be useful to determine whether the cases at the lower end of the spectrum have improved at all or if their improvement was insignificant.
These initial insights show that while the EPC data can be difficult to tame it holds potential for the exploration of the effectiveness of government incentives. This, however, could be much more detailed and useful given additional information.
For example, the final set of Leeds-based Green Deal related data used here is only a small subset of the already limited openly available EPC data. This means that there is little room to make representative assumptions about spatial and temporal factors affecting the discovered energy efficiency differences. If the entirety of the EPC data, not only for Leeds but all of England and Wales were to be used there might be potential to reliably pinpoint such trends in Green Deal efficiency. On top of this, this research would benefit from more detailed insight into the local allocation and height of the spend to dwellings that passed the Green Deal assessment. This might offer some explanation for the differences in energy efficiency improvements.
So what happens next?
Making a data-driven tool or visualisation is always a great output. It's shareable, it tells a story, it engages people. But sometimes it isn't always possible because the underlying data can't support that. As frustrating as that can be it is not entirely fruitless. Instead of highlighting an interesting discovery, we can highlight the problems that have been barriers to discovery and then work towards removing those barriers. It might require big, sweeping changes in methods or the introduction of proper standardisation but the benefits of better quality data are bigger and long-lasting. The dive in to EPC data is a great example of this - there are masses of potential and benefits to the data being better. As the Climate Crisis has brought Net Zero goals forward, now is the time to reevaluate EPC data in a greener and more open context, for the benefit of everyone.

