You are in: Rob's OSM Stuff » Post Hoc

Post Hoc — Analysing OSM Postbox and Post Office Data in the UK

OpenStreetMap records the location of postboxes and Post Offices using the tags amenity=post_box and amenity=post_office respectively. The Post Hoc tool provides statistics, reports and maps to compare the data in OpenStreetMap with Matthew Somerville's Dracos: Locating Postboxes webapp and a list of Post Office locations compiled by Edward Betts from Royal Mail's Post Office finder.

Postbox Summary Statistics

Source Postboxes Percent Import date
Royal Mail list (old) 115684   2011-08-11 00:00 UTC
Royal Mail list (new) 117538   2012-07-18 00:00 UTC
OSM 42794 37.0% 2013-05-18 00:00 UTC
OSM (with valid-looking ref) 26579 23.0% 2013-05-18 00:00 UTC
Dracos 56310 48.7% 2013-05-18 11:35 UTC

Post Office Summary Statistics

Source Post Offices Percent Import date
Royal Mail List 10989   1970-01-01 00:00 UTC
OpenStreetMap 7653 69.6% 2013-05-08 00:00 UTC
OpenStreetMap (matched) 0 0.0% 2013-05-08 00:00 UTC

General Postbox Reports

Too Far Apart
List of box references found in both OSM and Dracos data, with positions that differ by a significant amount.
OSM Reference Issues
Results of various checks on the values in 'ref' fields of post box nodes on OpenStreetMap.
OSM Collection Times Issues
Postboxes with possible errors in the collection_times key.
OSM Royal Cypher Issues
Postboxes with possible errors in the royal_cypher key.
OSM Box Type Report
Report on usage of tags to describe box types.
OSM Fixme Report
List of all post boxes with a fixme=* tag.
Dracos Duplicates
Pairs of postboxes located by Dracos from different postcode districts but with almost identical locations.

Local maps showing potential issues to survey

Generate a map showing possible issues with OSM postbox or post office data in a limited area. The area will be centred on the postcode entered, but may cover more or less than that postcode region. The data is fixed from the initial postcode, and so won't be reloaded as you move around the map, but if you click the 'permalink' the refreshed map will show data centred on the current location. Once on the map, you can also obtain a GPX file of the postbox issues for download, which may be useful to those wanting to survey the problems.

Enter first half of postcode:

Can we import Dracos' postbox data to OSM?

There are three issues to consider here: first the legal question of whether the data is sufficiently free to use, secondly the question of data quality, and thirdly the merging procedure.

On the legal side, Mathew Somerville says as far as he is concerned the data is public domain. However, the positions have been derived from the OSM map (which isn't a problem for us to re-import, but my affect other uses), and more importantly, other data such as the box number and collection times come from a list provided by Royal Mail. Even if the individual items are ineligible for copyright as they facts, there is still a Crown Database Right over the collection of data. My conclusion is that we would be free to import box locations to OSM, but not to systematically use any of the other data.

On the data quality side, these scripts have revealed quite a number of errors in the Dracos data, and also on the Royal Mail list itself, with missing boxes, duplicate boxes, and mis-located boxes. Many box locations are more than 50m from where OSM contributors have independently placed them. Since most of the independent Dracos data comes from people clicking on a map, we should probably expect greater accuracy from OSM, where many will use GPS waypoints.

Finally, if we were to attempt an import, we would need to consider how to merge the new data with existing data. Points where coordinates of a box exactly coincide are easy to discount, but what do we do with nearby locations that may or may not be different in reality? If we're not allowed to use the box numbers from the Royal Mail data, things will be very difficult indeed.

In conclusion, I believe it would be better to maintain the two datasets in parallel and use them from cross-checking, with analysis tools like the ones I've developed here.