Exclude large waterbodies

Avatar
  • updated
  • Completed

Hello Magnus,

Perhaps, you remember my e-mail about using google charts for maps (more than seven years so I don't blame you if you don't remember). Since then I've developed an alternative using mapbox and shapefiles. See http://mapboxutil.technetium.be. When searching for up to date shapefiles I found this site. I'd like to compliment for this great site.

So far the praise part, now the Idea.


You have already an option to exclude water parts. According to me only the sea parts gets excluded and not the large lakes. This gives an unfamiliar polygon for the dutch provinces bordering the IJselmeer. Also in Swedish maps Vänern and Vättern are often not included. See for example: https://cdn2.project-gc.com/dimages/ps_map.php?mapHash=9b416a72a820d1118671b2b090b70185


So my request is to have an option to also remove large water bodies from the border polygons. I understand this will lead to more complex polygons and more and complex calculations to generate them. If you need any help with this please ask.

Avatar
Magnus
  • Under review

Hi,

I must be honest, I sadly don't remember. I wish I did though. I checked out your site 2-3 weeks ago and you have done an excellent job. This site is a part of improving the maps of Project-GC's Profile stats. You can see some of the rendered results here. I am hoping for a release in about a month, but it's affecting a lot at Project-GC, so it has been a huge task.

You are correct that it's actually the sea that's excluded today. We use already existing OSM tools to do this, or actually, to produce the coastline to which the polygons are then cut.

We are far from experts with OSM, we have only learned from using it ourselves, and to be honest I don't really know what approach I would need to make to reduce lakes of certain sizes. I am also unsure if there is a good approach to determining the size of which lakes should be excluded. I have a feeling that number would be different for different users, and not the least, depending on what they would be used for, including zoom level.


I like the idea itself, but now sure how to implement it, and to be honest I have a feeling that it will require more hours than we can spend on such task.

Avatar
technetium

Hello Magnus,

Since I've found somebody that is willing to pay for my sofware enginering work, I don't have much time for side projects. So it took some time to make a little progress. You might already know what I've accompliced, but I still wanted to share it.

I've created a python script that is able to subtract polygons. I don't know if a python script is faster than postgis in subtracting polygons, I'm just more familiar with python tools than postgis.

I'm using shapefiles that are freely available on the Internet. You might be able to extract the polygons from the OSD data the same way you do with the borders. The tag you are looking for is: water=lake preferable you would only select lakes with at least a certain area.

So far I've discovered, the extended collection at https://hub.arcgis.com/datasets/0abb136c398942e080f736c8eb09f5c4 is too large for my computer to load. I had to resort to GADM data for the Netherlands, which can be handled without a problem, but does not line up with the OSM borders.

Another problem I found is that in OSM the border between Friesland and Groningen crosses itself. Causing the subtract method to throw an exception. So I had to exclude these provinces. The error is now corrected: https://www.openstreetmap.org/changeset/105163331#map=15/53.1068/6.2663 so I have to wait for you to process a new database.

For your information, my pyhton code:

import shapefile
import json
import geojson
from shapely.geometry import shape

sf = shapefile.Reader('data/gadm36_NLD_shp/gadm36_NLD_1')
with open('data/OSMB-NL-l-2-4.geojson') as json_file:
    data = geojson.load(json_file)

lake_shapes = []
lake_records = sf.records()
for i, lake_shape in enumerate(sf.shapes()):
    if 'Water body' == lake_records[i].ENGTYPE_1:
        lake_shapes.append(shape(lake_shape))

features = []
for i, feature in enumerate(data['features']):
    geometry = shape(feature['geometry'])
    if feature['properties']['name'] in ('Friesland', 'Groningen'):
        features.append(geojson.Feature(geometry=geometry, properties = feature['properties']))
        continue
    print(i, ')', feature['properties']['name'], feature['properties']['admin_level'])
    print(' Bounds:', geometry.bounds)
    for j, lake_shape in enumerate(lake_shapes):
        geometry = geometry.difference(lake_shape)
    features.append(geojson.Feature(geometry=geometry, properties = feature['properties']))

feature_collection = geojson.FeatureCollection(features)
with open('no-lakes.geojson', 'w') as f:
   geojson.dump(feature_collection, f)
Avatar
Magnus
  • Planned

Thank you a lot for your long answer, and the code (even though I actually won't use it).


I have been experimenting a bit and I have found ways to do it. I started by extracting all polygons in OSM where the tag water was set to lake or river, but I have decided to not use river, it didn't look too good.

Here is a first version from a region in Sweden, with and without lakes.

https://www.facebook.com/ProjectGC/posts/4212676525456571

Basically it's done like this:

1) Join together all lakes with less than ~10 meters between them.

2) Remove all (joined) lakes with area less than X.

This is all done in PostGIS. My first compilation of the whole world took 1000 minutes and 28 seconds. :)

I felt that it was too much water in some places so I doubled the size requirement. That, and the removal of rivers, reduced the processing time by 55% or so. But I wasn't satisfied with that either, felt like too little water. I am currently working on an extract with the first area requirement as well, but without the rivers. When that is in place I think I will try a more scientific approach to decide on three different detail levels (three different area requirements). Then I will decide upon the maps area which level of detail I will use. The more zoomed in, the more water. When viewing world map, less water. Trying to avoid to get too many very small (down to 1-10 pixels) lakes, those only makes it cluttered.

Regardless, it seems like I have a working solution in progress, but it might take a few days. I was hoping on not spending too much time, but the processing time is what it is, hard to compile the data faster than this.

The up-side of using OSM as a source for everything is that borders always match. I don't want to end up with a river (if I use those) that stops just before the ocean and leaves a short land strip in between.

// Magnus

Avatar
technetium
Quote from Magnus

Thank you a lot for your long answer, and the code (even though I actually won't use it).


I have been experimenting a bit and I have found ways to do it. I started by extracting all polygons in OSM where the tag water was set to lake or river, but I have decided to not use river, it didn't look too good.

Here is a first version from a region in Sweden, with and without lakes.

https://www.facebook.com/ProjectGC/posts/4212676525456571

Basically it's done like this:

1) Join together all lakes with less than ~10 meters between them.

2) Remove all (joined) lakes with area less than X.

This is all done in PostGIS. My first compilation of the whole world took 1000 minutes and 28 seconds. :)

I felt that it was too much water in some places so I doubled the size requirement. That, and the removal of rivers, reduced the processing time by 55% or so. But I wasn't satisfied with that either, felt like too little water. I am currently working on an extract with the first area requirement as well, but without the rivers. When that is in place I think I will try a more scientific approach to decide on three different detail levels (three different area requirements). Then I will decide upon the maps area which level of detail I will use. The more zoomed in, the more water. When viewing world map, less water. Trying to avoid to get too many very small (down to 1-10 pixels) lakes, those only makes it cluttered.

Regardless, it seems like I have a working solution in progress, but it might take a few days. I was hoping on not spending too much time, but the processing time is what it is, hard to compile the data faster than this.

The up-side of using OSM as a source for everything is that borders always match. I don't want to end up with a river (if I use those) that stops just before the ocean and leaves a short land strip in between.

// Magnus

Hello Magnus,

That's great news. I don't think that 1000 hours for all the lakes in the world, considering Chile takes 21days.

I've seen you have also added the new maps to project-gc. I really like the improved accuracy. Even if it has cost me some diamond country badges (but then again I've also gained a few because you reduced the number of regions for some countries)

I hope the lakes will also be removed from the maps at project-gc also, because the Dutch map now looks weird. I (and probably many Dutchies) don't recognize the municipalities around the IJsselmeer and on the isles of Zeeland.

Below is a map how "we" expect a map of The Netherlands loos like. Although we would not mind some extra lakes in Friesland, Vinkeveense plassen near Hilversum and some lakes in Holland. That's also the order in which we would expect them to appear on the map.

I've used the polygons from GDAM. Since they don't include the latest municipal reorganizations. More municipalities are left blank than in the maps you have generated for my cache founds. I've stolen your idea to leave areas with zero founds blank, for the rest I do prefer my own colour scheme, going from green to red and based on the maximum found, but I do understand your choice for the earth colours and the desire to not have to "lower" the colour of an area.

Greeting,

technetium

Avatar
Magnus

Project-GC can't use GADM since it's not for commercial use. Licenses has in general been an issue for us since some countries have been using local map sources before. From what I know they have been free to use, but it's not always very simple agreements to read through. It feels better to rely on fewer sources. But even before most of Europe was created using OSM. But it was old code, old tech and not the best implementation in every way. Rewriting it all has opened up a lot of opportunities.

I am getting ready to release the update of maps with Lakes and Rivers. I actually brought back rivers, but I have a much higher size requirements on those and don't include them at all for larger areas such as world and continents. In many cases not even when viewing the whole country. I feel that they at many times can help the user understand and see what the map is a map of. But when you see a map of Europe you really don't need that, then it's actually causing more issues than it's helping.

The end result is not very far from what you sent me yesterday, besides the colors then. Talking about colors, it could very well be that we will allow other algorithms to determine colors in the future. I have two other variants in mind, where one of the variants would be either white for zero or one of the brown colors for at least 1 find. And then another variant of "gradient coloring". I don't think we will provide two color themes though, either we will keep these colors which I personally actually think looks good even though many hates them, or we will switch entirely. The reason is that it requires a LOT of resources to compile these, which forces us to cache them. A new color theme would double the cache storage, and I am already expecting a terabyte of cache storage, which will need to be stored on enterprise SSDs in our virtual host environment.

Attaching three maps rendered in your name in the DEV environment. As you can see the lakes in Friedland doesn't appear until you open that region only. I have tried to find a balance of detail level that works for most of the world, I do not wish to handcraft tenths of thousand maps. :)

Avatar
Magnus
  • Completed