Names are hard in OpenStreetMap

When rendering a map using data from the OpenStreetMap (OSM) project the cookbook implementations end up using Mapnik XML to specify the presentation (you may get to the Mapnik XML by passing CartoCSS through a preprocessor). Seems simple enough, for each feature you want to map your XML looks something like:

<Style name="natural_point_names">
    <Rule>
        <Filter>
            ([natural] = 'peak') and
            (not ([name] = null))
        </Filter>
        <TextSymbolizer
            fontset-name="sans-fonts"
            size="6 * 4.16666666667"
            fill="black"
            justify-alignment="center"
            dy="-5 * 4.16666666667"
            dx="0"
            placement-type="list"
        >
        [name]
            <Placement dy="-6 * 4.16666666667" dx="0" /> <!-- north -->
            <Placement dy="0" dx="7 * 4.16666666667" />  <!-- east -->
            <Placement dy="2 * 4.16666666667" dx="0" />  <!-- south -->
            <Placement dy="0" dx="-7 * 4.16666666667" /> <!-- west -->
        </TextSymbolizer>
    </Rule>
    .
    .
    .
</Style>

The “[name]” part says take the value of the name tag. The other stuff in the <TextSymbolizer> part specifies the styling. And all the stuff in <Filter> specifies what this whole <Rule> is applied too.

Again, in the cookbook case, OSM data is put into a Postgresql database for use. So to get the name out you need a select query. Something like:

SELECT way, "natural", "place", "tourism", "amenity",
       name
FROM planet_osm_point WHERE ("natural" is not null) or ("place" is not null)) as foo

Easy peasy.

You may want to have the elevation shown with the name and an easy way to do that is to make it part of the name that is passed to the styling information. At least I found it easier to append the elevation, if it exists, to the name in the SQL query than to get everything lined up and spaced properly in the XML style. But there are places where the primary name is not the only one that is used. So we should check if there is an alt_name value. In SQL with a little work we can return a synthetic name value of the form:

Name (alt_name)
 elevation

We can even convert the elevation in meters to feet. Looking pretty good:

But what is this “Los Angeles County (Los Angeles)” bit? That doesn’t look good. The second “Los Angeles” is from the alt_name. Maybe we shouldn’t display the alt_name if it is a leading substring of the name. This looks better:

We will be trekking in Nepal so I’d like to make a offline/paper map of the area around where we will be. But the primary names there may not be in English. So we need to extend our logic to cover the “name:en” and “alt_name:en” tags. But it turns out that in many cases in Nepal the “name” and “name:en” fields have the same value so we need to check for and deal with that. We might be adequately handling the names now:

(Note the peak icon is not at the high point of the contours. Maybe the SRTM data is off. On the other hand the most official data I can find for the location of Mount Everest has a slightly different location than is in the OSM database. Since I don’t know where the error lies I am not making a change. Just noting that there is an inconsistency.)

Oddly, at least to me, the mappers in Kathmandu have decided to put a “Romanized” version of their local names in the name field rather than in their local language. Looking at the trekking areas around Lukla and Namche Bazar, it seems this practice extends beyond the Kathmandu Valley. This makes creating a map with Nepalese and English names impossible without resorting to special casing the logic for Nepal. That is we can’t just look at the name and name:en tags, we need to figure out if the feature is in Nepal and if so look at the name:ne tag instead of or in addition to the name tag.

This does not scale well. Every country or region that uses Romanized (transliterated) names in the name tag will need to be special cased. OSM has no standardized way of specifying a default language code on a national or regional boundary so this will need to be in some sort of separate geographical database. Requiring a separate geographical database to render OSM, itself a geographical database, is not a good thing.

If you are defining mapping standards and practices for your country, please just follow the wiki for normal name tag practice and definitely don’t transliterate the name.

Unfortunately, combining the name, alt_name, name:en, alt_name:en, ref and ele fields into one synthetic “name” is a lot of logic to put into each and every SQL select statement. So put everything into a stored procedure:

-- Internal routine for consistent formatting of names of all types.

CREATE OR REPLACE FUNCTION display_name(
    pname       text DEFAULT NULL::text,        -- value of name field
    alt_name    text DEFAULT NULL::text,        -- value of alt_name field
    en_name     text DEFAULT NULL::text,        -- value of name:en field
    alt_en_name text DEFAULT NULL::text,        -- value of alt_name:en field
    elevation   text DEFAULT NULL::text,        -- value of ele field
    reference   text DEFAULT NULL::text)        -- value of ref field
RETURNS text AS $$
BEGIN
    -- Ignore alt_name in cases like the one where
    -- name='Riverside County' and alt_name='Riverside'
    -- i.e. where alt_name is a leading substring of name

    IF (LEFT(pname,LENGTH(alt_name)) = alt_name) THEN
        alt_name = null;
    END IF;

    -- If no alt_name then use ref as the alternative name
    IF (alt_name is null) THEN
        alt_name = reference;
    END IF;

    -- Odd case, but maybe we have an alt_name with no name
    -- If we have both, then alt_name in parens after name
    IF (pname is null) THEN
        pname = alt_name;
    ELSE
        IF (alt_name IS NOT null) THEN
            pname = pname || ' (' || alt_name || ')';
        END IF;
    END IF;

    -- If no name:en but there is a alt_name:en then set name:en to
    -- the alt_name:en value
    IF (en_name is null) THEN
        en_name = alt_en_name;
    ELSE
        IF (alt_en_name IS NOT null) THEN
            en_name = en_name || ' (' || alt_en_name || ')';
        END IF;
    END IF;

    -- Use elevation value for name or alt name as appropriate
    IF (elevation::text ~ '^[0123456789\.]+$') THEN
        IF (pname is null) THEN
            IF (en_name is null) THEN
                RETURN round(elevation::real * 3.28084)::text;
            ELSE
                RETURN en_name || chr(10) || round(elevation::real * 3.28084)::text;
            END IF;
        ELSE
            IF (en_name is null) THEN
                RETURN pname || chr(10) || round(elevation::real * 3.28084)::text;
            ELSE
                IF (pname <> en_name) THEN
                    RETURN pname || chr(10) || en_name || chr(10) || round(elevation::real * 3.28084)::text;
                ELSE
                RETURN pname || chr(10) || round(elevation::real * 3.28084)::text;
                END IF;
            END IF;
        END IF;
    ELSE
        IF (pname is null) THEN
            IF (en_name is null) THEN
                RETURN null;
            ELSE
                RETURN en_name;
            END IF;
        ELSE
            IF (en_name is null) THEN
                RETURN pname;
            ELSE
                IF (pname <> en_name) THEN
                    RETURN pname || chr(10) || en_name;
                ELSE
                    RETURN pname;
                END IF;
            END IF;
        END IF;
    END IF;

RETURN 'Huh?';
END; $$
LANGUAGE PLPGSQL;

And you change the SQL select to

SELECT way, "natural", "place", "tourism", "amenity",
       display_name(planet_osm_point.name,
                    planet_osm_point.alt_name,
                    planet_osm_point."name:en",
                    planet_osm_point."alt_name:en",
                    planet_osm_point.ele,
                    planet_osm_point.ref) as name
FROM planet_osm_point WHERE ("natural" is not null) or ("place" is not null)) as foo

We haven’t even considered the possibility of the other name related tags in OSM:

  • International name (int_name)
  • Local name (loc_name)
  • National name (nat_name)
  • Official name (official_name)
  • Old name (old_name)
  • Regional name (reg_name)
  • Short name (short_name)
  • And all their international language equivalents.

Ugh.

There is a set of Postgresql extensions to help transliterate names. And maybe it is the way to go. But its focus is on creating a German (or possibly other language) transliteration from languages that use non-Latin scripts. I don’t think it covers all the possibilities of alt_name, loc_name, old_name, using the ref if there is no name, etc.

Edit

It is probably better to gather the various name bits needed from the tag field which is a Postgresql “hstore” (key->value hash) type. This allows the stored function to have fewer parameters:

CREATE OR REPLACE FUNCTION display_name(
    pname       text DEFAULT NULL::text,        -- value of name field
    elevation   text DEFAULT NULL::text,        -- value of ele field
    reference   text DEFAULT NULL::text,        -- value of ref field
    tags        hstore DEFAULT NULL::hstore)    -- all other tags
RETURNS text

And makes the call in each SQL select statement easier:

display_name(name, ele, ref, tags) as name

An advantage of this, in addition to being shorter to code, is that the logic can be more easily revised to use more fields (loc_name, official_name, etc.) without affecting the call parameters. And if automatic transliteration is attempted using something like the German library mentioned above, the tags field needed for that is already being passed.