Results now show a table of how long each step took, as well as the PG database size change.
* use `time` to compute profiling for each step
* call postgres to get database size
Build on top of PR #755, to be merged first.
Since we want every thing from osm_building_polygon (osm_id >= 0 and osm_id < 0), we can merge the two queries.
Note: the obp.osm_id >= 0 on the left join only apply to the left join part.
Turned out that some update jobs failed due to
```
{
"message": "Bad credentials",
"documentation_url": "https://developer.github.com/v3"
}
```
This is probably due to credentials expiring (long workflow startup?),
or some internal github issue.
For now, removing authenticated `curl` calls because most
of them can be done anonymously, and keeping them only when needed.
* delete output escaping (forgot to remove it -- was used for the older system)
* stop early if there are no pull requests (e.g. in case this is a fork)
A cron-based approach to find pull requests, possibly from forks,
that finished profiling, and post their results as comments.
See in-depth explanation of how this works at
https://github.com/nyurik/auto_pr_comments_from_forks
* On pull request and on commit, run base test followed by the test of the change,
comparing the results, and publishing the results to the Pull Request.
If the pull request is updated, the resulting comment will be updated.
* also save quickstart.log as an artifact
Note that due to GitHub workflow security restrictions, it is not possible to post PR comments if the change originated from a fork. I am still looking for workarounds.
To view what would have been posted, in the build results at the bottom, open `PR performance` details, and expand the ` Comment on Pull Request` (and its subitem).
Optimizations: the process keeps two caches -- one for the data test file, and one for the results of the performance run for the "base" revision. If this or other PR has been executed for the same revision and the same test data, performance test will only run for the proposed changes, not for the base.
Co-authored-by: Tomas Pohanka <TomPohys@gmail.com>
* allow postgres image to be overwritten with an env var
* allow DIFF_MODE var to be overwritten with an env var
* add /mapping and /cache dirs into tools image
* make `build-sql` target explicit rather than relying on a filename
* `tools-dev` will open a shell in a docker to experiment and debug (instead of `import-sql-dev` and `import-osm-dev`)
* separated `start-maputnik` from `start-postserve`
* renamed `clean-docker` into `db-destroy` to make it more explicit
* cleaner `db-start`, `db-stop`, `db-destroy` targets
* better output messages
Reorder POI data update and trigger creation to avoid refresh of materialized view after update done by initial import.
I checked the other updates and there are OK.
This is a partial migration of https://github.com/openmaptiles/openmaptiles/pull/785
* Use `import-data` instead of `import-lakelines`, `import-water`, and `import-natural-earth`
* Upgrade docker-compose.yml to version 2.3 (allows some extra env var usage in yaml file itself)
* Remove `openmaptiles-tools:latest` usage -- no longer needed, can use current version 4.1
* `db-start` does not do a container recreation in case docker-compose.yml definition has changed.
* a few minor cleanups in quickstart.sh
- Add `download-osmfr` and `download-bbbike` targets
- Port `DC_OPTS` to Windows
- Use make conditions instead of shell. Also simplifies `make -n` output
- Use remote Docker machine's IP, if defined, rather than `localhost` in `http://...` messages. Also applicable for Docker Toolbox for Windows.
- Align texts in `make help` output
- Update and bug-fix in `make remove-docker-images`
Mark waterway and transoprtation_name as having a dependency on another layer.
This is currently an unused parameter, but tools will use it later for faster
sql code generation.
Closes#796
@zstadler, thank you for PR.
Add `boundary=protected_area` parks
No performance impact, a very slight increase in size, but with a very good impact on features completeness.
Thanks
The implementation adds the `DC_PROJECT` parameter. It can also be set by an environment variable in the hosting shell. The environment variable can be overwritten by a make parameter, including `DC_PROJECT=` which restores the automatic project name.
#### _NOTE:_
It may be worthwhile to review the following make targets that are currently not impacted by the PR:
- `make start-postserve`
- `make list-docker-images`
- `make remove-docker-images`
- `make docker-unnecessary-clean`
This PR was suggested by @nyurik:
> zstadler 3:30 PM
Is it possible to run two instances of openmaptiles, and the postgis container in particular, on the same Linux host?
nyurik 4:17 PM
use docker-compose --project-name -- that should allow you to run everything in parallel
4:17
might need to update make file
4:17
btw, that would be a good PR for makefile -- to specify --project-name based on the current DIR name, but so that it can be overwritten by a makefile param
Co-authored-by: Yuri Astrakhan <yuriastrakhan@gmail.com>
import-wikidata should run before import-sql, and v4.1 allows for that.
Also it optimizes the wd_names table to just be a simple Wikidata ID -> labels lookup, with a proper index.
Minor other changes:
* `test-perf-null` target is now part of the Makefile
* `./data/osmstat.txt` is no longer created
* area download file now in this format: `${osm_area}-latest.osm.pbf`
* Switch OMT to use the new tools v4.0.0
* borders are dynamically generated from the PBF file instead of downloading a prepared CSV file
* all tools are executed as current user instead of root, thus files are easier to modify/delete if needed
* all data is stored in the local file system instead of docker volumes (Docker currently has a limitation of non-root operation for internal volumes). This also makes it easier to examine and test it.
* New `init-dirs` make target creates all the needed dirs - `build, data, cache`
* `make clean` deletes the whole `build` dir instead of individual files.
* `clean-docker` for backward compatibility deletes `cache` dirs (it used to be a volume)
* all `psql` calls are now done with `ON_ERROR_STOP=1`
* got rid of `pgclimb-*` targets -- same results can be done with `psql` (`pgclimb-list-views` & `pgclimb-list-tables` renamed to `list-views` and `list-tables`)
Add a one-liner script to take the tile lists from the dated subfolders, merge and deduplicate them, and write it out to tiles.txt.
Also update docs to reflect current behavior of `docker-compose run update-osm`.
This PR allows queries to be parallelized on recent versions of Postgres. The `PARALLEL SAFE` modifier has been added to the layer functions and a PLPGSQL function to convert strings into number has been replaced.
`PARALLEL SAFE` is a modifier for `CREATE FUNCTION` available since Postgres 9.6, so this change does not break current OpenMapTiles supported database version. More details about this topic [here](https://www.postgresql.org/docs/current/parallel-safety.html) and at the reference documentation for [`CREATE FUNCTION`](https://www.postgresql.org/docs/current/sql-createfunction.html).
### Testing procedure
The procedure to test this was:
* Imported `spain.pbf` in a clean environment
* Dumped the OpenMapTiles database from the Postgres Docker image
* Created a clean Postgres 12 database using the default Docker image
* Installed `postgis` 3 from the default Debian package and `osml10n` 2.5.8 from the repository (`make`, etc.)
* Restored the dump
* Lowered the postgres planner parameters for triggering parallel plans:
```sql
set parallel_setup_cost = 5;
set parallel_tuple_cost = 0.005;
```
* Manually added the `PARALLEL SAFE` modifier to each function involved in layer queries (not on updates or inserting functions).
* For each layer, run a testing query to confirm parallel workers were created, something like this:
```sql
explain analyze
select * from layer_aerodrome_label(tilebbox(8,128,95),10,null)
union all
select * from layer_aerodrome_label(tilebbox(8,128,97),10,null);
```
* After all the layers were processed and confirmed to start parallel executions, a more complete example was run. This example just retrieves the geometries for all the layers from the same tile but without using any MVT related function.
<details><summary>Testing query</summary>
```sql
-- Using the function layer_landuse
explain analyze
select geometry from layer_water(tilebbox(14,8020,6178),14)
union all
select geometry from layer_waterway(tilebbox(14,8020,6178),14)
union all
select geometry from layer_landcover(tilebbox(14,8020,6178),14)
union all
select geometry from layer_landuse(tilebbox(14,8020,6178),14)
union all
select geometry from layer_mountain_peak(tilebbox(14,8020,6178),14)
union all
select geometry from layer_park(tilebbox(14,8020,6178),14)
union all
select geometry from layer_boundary(tilebbox(14,8020,6178),14)
union all
select geometry from layer_aeroway(tilebbox(14,8020,6178),14)
union all
select geometry from layer_transportation(tilebbox(14,8020,6178),14)
union all
select geometry from layer_building(tilebbox(14,8020,6178),14)
union all
select geometry from layer_water_name(tilebbox(14,8020,6178),14)
union all
select geometry from layer_transportation_name(tilebbox(14,8020,6178),14)
union all
select geometry from layer_place(tilebbox(14,8020,6178),14)
union all
select geometry from layer_housenumber(tilebbox(14,8020,6178),14)
union all
select geometry from layer_poi(tilebbox(14,8020,6178),14)
union all
select geometry from layer_aerodrome_label(tilebbox(14,8020,6178),14);
```
</details>
You can inspect the execution plan and results on [this page](https://explain.dalibo.com/plan/3z). Also [attaching](https://github.com/openmaptiles/openmaptiles/files/3951822/explain-tile-simple.tar.gz) the query and JSON output for future reference. The website gives a ton of details, but you may want to search for nodes mentioning `workers` or `parallel` like in this area referring to `osm_border` or `osm_aeroway_linestring` entities

### Next steps
Since the execution plan is not showing a parallel append at the top level, meaning it's not running each layer individually, I want to continue experimenting with parameters and queries to see if it's possible to even parallelize more the request.
I will post my finding here, even no change in the code should happen.
cc. @nyurik
Co-authored-by: Yuri Astrakhan <yuriastrakhan@gmail.com>
Buildings from ways and multipolygons are loaded in table `osm_building_polygon`. But a table for `osm_building_multipolygon` is also loaded, the content is not used except to ensure an `osm_id` is from a multipolygon. To check if the object is from a multipolygon we have only to check if `osm_id` is negative. It is the counter part of e0c8ece375/layers/building/building.sql (L89)
I checked the objects are the same after this change.