Messy Data

We are planning on a hiking vacation to a new area so it is time to generate some topographic hiking maps. On one of the days we will be visiting a National Historic Site which has the ruins of a calvary fort from the late 1800s.

At first glance the area seems to be fairly well mapped. But when I rendered a hiking map for the area the ruins, which appear to be not much more than foundations in aerial imagery, were rendered as buildings. So it is time to look into the data.

Documented Tagging of Ruins

The OpenStreetMap Wiki has a page on ruins. It suggests two ways of tagging summarized as:

Tagging with ruins=yesAlternative tagging
historic=castle and ruins=yeshistoric=ruins and ruins=castle
building=yes and ruins=yesruins:building=yes

Of these choices, I prefer the ruins:building=* form as it means as single tag can be looked at when rendering the object. And it is consistent with the lifecycle prefix tagging form and is actually listed as one of the lifecycle prefixes used in stages of decay.

Conclusion: My canonical form will be ruins:building=*

Tagging in the Wild

But the buildings at that historic site were tagged as building=collapsed. For the purpose of rendering a hiking map I don’t care what type of building exists so I don’t look at the value only the existence of the building=* tag. No wonder those ruins were rendered as existing buildings.

Looking at the building=* values in TagInfo, I see some other values that probably imply a ruin rather than a building:

  • building=ruin
  • building=ruins
  • building=ruined
  • building=collapsed
  • building=partially_collapsed

And, of course, there are some variations on the prefix scheme. For example, I found a ruined:building=* tag in use.

Canonical Form

When dealing with variations in tagging there area three places it can be done:

  • In the rendering stylesheet (Mapnik XML)
  • When rendering the database contents by either fancy select statements or stored procedures
  • In the OpenStreetMap data before loading it into the database

In my early endeavors I simply put larger and larger “filters” on each rendering rule in the Mapnik XML. That often required some additional tweaking of the database select statements as well.

When I got to the point of rendering names it became infeasible to deal with the complexity in the Mapnik stylesheet so I started using stored procedures in the database.

And once I decided to add information from hiking routes into the trail names it was easier to pre-process the OpenStreetMap data before putting it into the database.

For now, I have decided to put the logic into my pre-processing. That makes the database select statements easy and means I don’t have to re-learn the database stored procedure dialect each time I want to make a change.