This is a very niche data gremlin but is something that turns up time and time again when I'm working with open datasets. It can seem unimportant but it has a surprising impact that I'll come to later. My complaint is about significant figures. No, not privacy and anonymisation issues. I'm not referring to people; this is about numbers. Too many numbers.
Open datasets - particularly geographic ones - often list latitudes and longitudes to an excessive number of decimal places. What do I mean by excessive? It is very common to see 15 decimal places given (e.g. 53.796782814910291, -1.533757130238925) but I've seen as many as 51 decimal places! None of this is because a human has decided to provide that many numbers or because the measurements are that accurate. It is always down to the default export options of the software generating the dataset. Most of those decimal places are due to errors in the way computers store numbers internally and aren't 'real'. How do I know that? Well, let's work it out.
...if a particle or dot about 0.005mm in size (which is the same size as a small grain of silt) were magnified in size to be as large as the observable universe, then inside that universe-sized "dot", the Planck length would be roughly the size of an actual 0.005mm dot.
This over zealous quoting of decimal places is amusing but it has a practical effect. Those unnecessary, insignificant, numbers increase the sizes of files. How much by depends on the amount of other data included but I often find that I can reduce the size by 20-40% just by limiting the coordinates to 5 or 6 decimal places. This ONS dataset of ward boundaries (accurate to 20 metres) could be reduced from 46.5 MB to 24.8 MB if they truncated their decimal places. As many of the things I do are web visualisations, it is good to save bandwidth for both me and the end user (especially if they are paying per MB on mobile devices).
In summary, dear data publishers, please check your export options and give a precision that matches the accuracy of your data. Also, please check out our Open Data Tips for more suggestions (and add to it).
ODI Leeds Data Projects