California Hiking Routes in OpenStreetMap

Introduction

A fairly recent changeset to OpenStreetMap let me know that the San Clemente Beach Trail is apparently part of the California Coastal Trail system. There were a number of things I did not agree with on the changeset:

  • Renaming a signed city trail to something that is not signed or marked on the ground.
  • Extending an official city trail along an “informal” trail.
  • Placing that informal extension in the middle of a steep hill side (basically a cliff).
  • Not using the existing California Coastal Trail hiking route relation to identify the ways as part of the long distance trail..

I cleaned that up and along the way also learned that apparently some of the same OSM ways are part of the California Missions Trail for which you can download KML files and seems to have an organization making a profit from it.

A bit later I decided it was time to run my scripts to generate a new set of PDF topographic hiking maps and realized that a couple of my maps cover sections of the Pacific Crest Trail but it was not obvious which trail in an area was the official route. It seemed a good idea to annotate the local trail name with something indicating it was part of a long distance hiking route. For my local beach walk I could label the trail with “San Clemente Beach Trail (California Coastal Trail)” or “San Clemente Beach Trail (CCT)”. For the part of the Pacific Crest Trail coming out of the San Jacinto Mountains, Falls Creek Road could be labeled as “Falls Creek Road (PCT)”.

Hiking Routes

My first guess was I could find the names of any long distance routes a trail was part of and append them to the name. Something like:

“Local Hill Trail (Scenic Trail, Other Scenic Trail)”

Looking at a number of California hiking routes I became very discouraged. For example, using this scheme there is an OSM hiking trail that would be labeled as:

“Seaview Trail (American Discovery Trail, ADT – California – T – Seg 8, Mokelumne Coast to Crest Trail, East Bay Skyline National Recreation Trail, Bay Area Ridge Trail – 02 – East Bay)”

Pretty ugly and probably not useful for the map user at all.

Digging deeper, some routes follow the wiki guidance and use route=hiking while others use route=foot.

There are inconsistencies in the use of route=hiking and route=foot. And inconsistencies in the network identifier for the nested route relation hierarchies. Is the Pacific Crest Trail (PCT):

There seems there is a tendency to use the name tag value as a description. The naming of the various Pacific Crest Trail relations show evidence of this. But there are much more obvious examples like the “California Mission Trail – day 05 Oceanside to San Clemente” route relation. (In this case I haven’t yet found evidence that the guide book apparently used to come up with this has licensing compatible with OpenStreetMap. I need to get back to the mapper who added it to clarify.)

And there seems to be a pattern of using the descriptive route name for the member way’s name value. For example there are many ways in the California PCT relation with name=“Pacific Crest National Scenic Trail #2000”. I have only hiked a very small fraction of the PCT system but every sign I have seen on the ground has either “Pacific Crest Trail” or “PCT” so it seems exceedingly unlikely to me that the “Pacific Crest National Scenic Trail #2000” names are correct. Perhaps they could or should have official_name=“Pacific Crest National Scenic Trail #2000” (though the “#2000” part seems wrong regardless. But I am very sure they shouldn’t have that as a name tag value.

Hiking Route Consumers

The next question that came to mind is how do other consumers of hiking routes deal with this.

If there are lots of consumers for a set of tags and/or a type of relation then there is a lot of feedback on how good the tagging/relation scheme is and how well mappers are following the scheme. And it is much more likely that editing tools will be developed to assist in assuring consistent mapping. For example, plug-ins or native editor support exist for navigation relations (turn restrictions, etc.) and driving instructions (turn lanes, etc.). Hiking route relations have not had this level of attention.

In the case of hiking routes, the only wide spread consumer/renderer I am aware of is Way Marked Trails (WMT) which does a great job of allowing you to explore hiking route relations including showing the tagging and giving links into OpenStreetMap for the relations, etc. It has given me a few ideas on rendering this very messy data.

WMT may actually be too good at what it does: For example, it hides mapping errors like using a descriptive sentence in the name field. A mapper who just looks at the overall map display in WMT will see their hiking or foot route with a short string that looks like a ref value regardless if the route has a network tag or if the network tag is more of a descriptive sentence. Or even if the there is no ref tag.

I imagine mappers looking at WMT to check their work and, at the top level, see their trail system with a nice three letter designation and not realize that they have bad things in there that don’t follow the hiking relation or even the name tagging conventions given in the OpenStreetMap wiki.

Setting a Goal

I think I should:

  • Display the ref=* tag value for the route rather than the name.
  • A lot of ref=* values seem to also be descriptive in form with the first portion being the real route reference. (e.g. “ADT – California – T – Seg 8” has a ref of “ADT T8” rather than just “ADT” for “American Discovery Trail”.) Some of this can be taken care of by keeping only the first token of the ref value.
  • If a way is a member of more than one version of the route (e.g. “American Discovery Trail” and “ADT – California – T – Seg 8”) they likely have the same ref value. So only use unique values.
  • Some local hiking “routes” seem to be simply a collection of local trails in the same park. And some of those have no ref=* and/or no network=* tags. So ignore things that are tagged as network=lcn or have no network tag and/or ignore routes that lack a ref tag.

With these tweaks the unwieldy “Seaview Trail (American Discovery Trail, ADT – California – T – Seg 8, Mokelumne Coast to Crest Trail, East Bay Skyline National Recreation Trail, Bay Area Ridge Trail – 02 – East Bay)” would become a much more displayable “Seaview Trail (ADT, BAR)” (Neither the “East Bay Skyline National Recreation Trail ” nor the “Mokelumne Coast to Crest Trail” have ref tags at this time.)

How to Implement?

Dealing With Bad Data

I have neither the desire nor the time to clean up the hiking route relations and their member ways in there entirety. I can and will do it for the small areas that I actually hike in. And, once the current US Forest Service closures due to extreme fire conditions are over, getting out to hike some of the sections near to me to verify names, etc. will be a welcome activity.

I feel an on-the-ground survey to verify names before they are changed is a good thing. But I am not in a position to hike the entirety of any one of the long distance routes that pass through my area, much less hike the entirety of all of the routes.

So my implementation will have to deal with bad data. My plan, at present, is to catch things on my printed/PDF maps that I don’t like then with string matching and hard coded logic in my mapping scripts clean up the OSM data when I load it into my rendering SQL database. I expect that the clean up part will be an ongoing effort.

Many to Many

Route relations are “many to many”. Each route relation contains multiple ways. And any given way may be a member of more than one route relation.

When rendering a trail name a list of hiking routes the way is part of is needed to get each route’s ref value. That is not efficient with the SQL table structure setup by osm2pgsql. In the past there was at least one experiment for rendering highway route shields and I considered using that method.

But I decided a pre-processing step using a scripting language I am more comfortable with would be easier. I had already scripted the update of my postgresql database with a pre-processing step to identify/differentiate walkways from trails so extending that seemed like the least work.

In the PBF extract files, relations are at the end of the extract and so are found only after all the ways have been processed. And the OSM PBF extract files for my area are way to big to be held in the RAM of my little rendering computer so two passes over the PBF file are needed.

I added a new script which reads .osm data from stdin and builds a table of ways with the route refs for each way. The route refs might be modified by custom string matching logic as desired. For example, the “California Mission Trail – day 05 Oceanside to San Clemente” has a ref=CMT05 which is cleaned up to “CMT”. The results of this scan are written to a temporary JSON file which is then read by a modified version of the script written to detect if a way is a walkway or trail. That older script now imports the JSON file then for each way, checks to see if the way is in any hiking or foot routes and if so adds a custom route_relations=* tag to the way.

Finally, the overly complex SQL script for building a display name string from all the various name related OSM tags has been modified to include the route_relations tag value into consideration.