I’ve been using the database export CSV file from OpenCellID to generate the SQLite3 database for my on-phone network based location service. But in poking around to see if that is the best freely available database I’ve just found out that the Mozilla Location Services database uses the same CSV format file.
So it is not a question of which is better for me but how can I use both simultaneously. Playing around a bit, it looks like OpenCellId has more towers worldwide. But for any given area neither is distinctly better. OpenCellId has many towers Mozilla doesn’t and vice versa. And there are a large number of towers duplicated in both databases. In the case of duplicate towers, it is pretty evenly split between which one has more samples (and thus presumably a better estimate of tower location).
My SQL skills are basic and it is pretty easy to create a database with both sets of towers in it as long as you are willing to live with duplicates. Or if you are willing to always use one source over the other to resolve the duplicates.
But I want to actually merge the conflicts. That is I want create a new replacement record with a weighted average of the latitude and longitude based on the number of samples each source reports, etc. In some dialects of SQL there is a merge statement that looks like it might do the trick. But that does not exist in sqlite3. I found some cook book examples that showed how to do similar things but I could not get the merge to work as I wanted.
So an ugly hack: Use sqlite3 to merge the two CVS files keeping duplicate towers, remove columns and rows not needed for my area or for my network location plug in. Then output a new CSV file sorted by mcc, mnc, lac and cid.
With a sorted CSV file it is now pretty easy and fast to do a single scan over the file to look for duplicate records and recompute the latitude, longitude, accuracy (tower range) and number of samples for the towers that are duplicated. PHP is my language of choice for that as I know it and it handles strings and CSV files fairly well.
Then it is back into sqlite to generate the actual database needed on the phone.
All of the above machinations are, of course, wrapped up in a bash script that uses wget to pull the CSV files from Mozilla and OpenCellId.
Seems to work okay but it is an ugly hack and I am not sure I want to show the pieces here much less actually publish them to some place like github.