How do you define the hierarchy technically

Avatar
  • updated
  • Answered

Image 34

I also compute relations between admin boundaries of osm data and compare my results with yours. I realize that you do not apply raw containment relations to create a hierarchical order but also some quantitative reasoning in addition.

Example: In Austria -> Vienna -> Favoriten has the child "KG Oberlaa Land (osm relation5241912)"

But actually there is a tiny minor area (top left) where it reaches out of Favoriten, so it actually overlaps.

Your addition of quantitive refinement is very good, so areas that are 99.9% contained are considered contained.

I wonder if you can say something about how you compute containment = parent-child relations.

(Maybe you can relax the robot check in repatcha, it is tedious to ceck images over a minute)

Avatar
Magnus
  • Answered

You are correct that we are doing a "sloppy" comparison when building the hierarchy. We do an intersect between all polygons. Then we check that at least 98.5% of a polygon is inside another. If there are multiple matches the smallest polygon is chosen as a parent. So basically we require a 98.5% overlap.

Regarding captcha, this is the first time we hear about that issue. It's basically controlled by Cloudflare, though we can set it to different defensive levels. But if you are having issues with it there is something that Cloudflare doesn't like. Most likely your web browser or external IP. We never get a captcha ourselves. We are also using the default Security Level, which is medium. It's within the companies policy to not use a lower setting than that to protect our services. Especially for a site like OSM-Boundaries which has quite a few quite heavy/slow api calls.

Avatar
snupo

Thank you so much. Computing the intersection and then the area proportion is more work than computing plain containment. How long does it take to compute the world, and as I understand you do this with PostGis right?

Avatar
Magnus

A few hours since we upgraded to a new PostgreSQL/PostGIS (v16). And yes, we are using PostgreSQL/PostGIS.

A full planet.osm import used to take us ~3 weeks. After upgrading PostgreSQL this was lowered to ~24 hours. During our rewrite of the site we might have done some minor changes in some SQL statements as well, but that should be the small part of it.

But there was also quite a few new polygons added when we rewrote things (and upgraded related tools). We went from ~817,000 boundaries to ~866,000 boundaries between planet-20240401.osm and planet-20240506.osm (which was the first planet file imported with our new code base).

To create the hierarchy we run one SQL statements for every boundary (so 850-900k queries). No CTE expressions, nothing weird at all. We just join our boundaries table with itself, a few where statements, an ST_Intersects() and an ST_Area(), and then finally an ORDER BY.

Avatar
snupo

Very interesting, the speedup for the planet osm import, have not heard elsewhere about this improvement. Thank you so much for your detailed account.