Layer Stats =========== This page describes how to generate and analyze layer stats data to find ways to optimize tile size. ### Generating Layer Stats Run planetiler with `--output-layerstats` to generate an extra `.layerstats.tsv.gz` file with a row for each layer in each tile that can be used to analyze tile sizes. You can also get stats for an existing archive by running: ```bash java -jar planetiler.jar stats --input= --output=layerstats.tsv.gz ``` The output is a gzipped tsv with a row per layer on each tile and the following columns: | column | description | |---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------| | z | tile zoom | | x | tile x | | y | tile y | | hilbert | tile hilbert ID (defines [pmtiles](https://protomaps.com/docs/pmtiles) order) | | archived_tile_bytes | stored tile size (usually gzipped) | | layer | layer name | | layer_bytes | encoded size of this layer on this tile | | layer_features | number of features in this layer | | layer_attr_bytes | encoded size of the [attribute key/value pairs](https://github.com/mapbox/vector-tile-spec/tree/master/2.1#44-feature-attributes) in this layer | | layer_attr_keys | number of distinct attribute keys in this layer on this tile | | layer_attr_values | number of distinct attribute values in this layer on this tile | ### Analyzing Layer Stats Load a layer stats file in [duckdb](https://duckdb.org/): ```sql CREATE TABLE layerstats AS SELECT * FROM 'output.pmtiles.layerstats.tsv.gz'; ``` Then get the biggest layers: ```sql SELECT * FROM layerstats ORDER BY layer_bytes DESC LIMIT 2; ``` | z | x | y | hilbert | archived_tile_bytes | layer | layer_bytes | layer_features | layer_attr_bytes | layer_attr_keys | layer_attr_values | |----|-------|------|-----------|---------------------|-------------|-------------|----------------|------------------|-----------------|-------------------| | 14 | 13722 | 7013 | 305278258 | 1261474 | housenumber | 2412464 | 108384 | 30764 | 1 | 3021 | | 14 | 13723 | 7014 | 305278256 | 1064044 | housenumber | 1848990 | 83038 | 26022 | 1 | 2542 | To get a table of biggest layers by zoom: ```sql PIVOT ( SELECT z, layer, (max(layer_bytes)/1000)::int size FROM layerstats GROUP BY z, layer ORDER BY z ASC ) ON printf('%2d', z) USING sum(size); -- duckdb sorts columns lexicographically, so left-pad the zoom so 2 comes before 10 ``` | layer | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |---------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|------| | boundary | 10 | 75 | 85 | 53 | 44 | 25 | 18 | 15 | 15 | 29 | 24 | 18 | 32 | 18 | 10 | | landcover | 2 | 1 | 8 | 5 | 3 | 31 | 18 | 584 | 599 | 435 | 294 | 175 | 166 | 111 | 334 | | place | 116 | 314 | 833 | 830 | 525 | 270 | 165 | 80 | 51 | 54 | 63 | 70 | 50 | 122 | 221 | | water | 8 | 4 | 11 | 9 | 15 | 13 | 89 | 114 | 126 | 109 | 133 | 94 | 167 | 116 | 91 | | water_name | 7 | 19 | 25 | 15 | 11 | 6 | 6 | 4 | 3 | 6 | 5 | 4 | 4 | 4 | 29 | | waterway | | | | 1 | 4 | 2 | 18 | 13 | 10 | 28 | 20 | 16 | 60 | 66 | 73 | | park | | | | | 54 | 135 | 89 | 76 | 72 | 82 | 90 | 56 | 48 | 19 | 50 | | landuse | | | | | 3 | 2 | 33 | 67 | 95 | 107 | 177 | 132 | 66 | 313 | 109 | | transportation | | | | | 384 | 425 | 259 | 240 | 287 | 284 | 165 | 95 | 313 | 187 | 133 | | transportation_name | | | | | | | 32 | 20 | 18 | 13 | 30 | 18 | 65 | 59 | 169 | | mountain_peak | | | | | | | | 13 | 13 | 12 | 15 | 12 | 12 | 317 | 235 | | aerodrome_label | | | | | | | | | 5 | 4 | 5 | 4 | 4 | 4 | 4 | | aeroway | | | | | | | | | | | 16 | 26 | 35 | 31 | 18 | | poi | | | | | | | | | | | | | 35 | 18 | 811 | | building | | | | | | | | | | | | | | 94 | 1761 | | housenumber | | | | | | | | | | | | | | | 2412 | To get biggest tiles: ```sql CREATE TABLE tilestats AS SELECT z, x, y, any_value(archived_tile_bytes) gzipped, sum(layer_bytes) raw FROM layerstats GROUP BY z, x, y; SELECT z, x, y, format_bytes(gzipped::int) gzipped, format_bytes(raw::int) raw, FROM tilestats ORDER BY gzipped DESC LIMIT 2; ``` NOTE: this group by uses a lot of memory so you need to be running in file-backed mode `duckdb analysis.duckdb` (not in-memory mode) | z | x | y | gzipped | raw | |----|------|------|---------|------| | 13 | 2286 | 3211 | 9KB | 12KB | | 13 | 2340 | 2961 | 9KB | 12KB | To make it easier to look at these tiles on a map, you can define following macros that convert z/x/y coordinates to lat/lons: ```sql CREATE MACRO lon(z, x) AS (x/2**z) * 360 - 180; CREATE MACRO lat_n(z, y) AS pi() - 2 * pi() * y/2**z; CREATE MACRO lat(z, y) AS degrees(atan(0.5*(exp(lat_n(z, y)) - exp(-lat_n(z, y))))); CREATE MACRO debug_url(z, x, y) as concat( 'https://protomaps.github.io/PMTiles/#map=', z + 0.5, '/', round(lat(z, x + 0.5), 5), '/', round(lon(z, y + 0.5), 5) ); SELECT z, x, y, debug_url(z, x, y), layer, format_bytes(layer_bytes) size FROM layerstats ORDER BY layer_bytes DESC LIMIT 2; ``` | z | x | y | debug_url(z, x, y) | layer | size | |----|-------|------|-------------------------------------------------------------------|-------------|-------| | 14 | 13722 | 7013 | https://protomaps.github.io/PMTiles/#map=14.5/-76.32335/-25.89478 | housenumber | 2.4MB | | 14 | 13723 | 7014 | https://protomaps.github.io/PMTiles/#map=14.5/-76.32855/-25.8728 | housenumber | 1.8MB | Drag and drop your pmtiles archive to the pmtiles debugger to see the large tiles on a map. You can also switch to the "inspect" tab to inspect an individual tile. #### Computing Weighted Average Tile Sizes If you compute a straight average tile size, it will be dominated by ocean tiles that no one looks at. You can compute a weighted average based on actual usage by joining with a `z, x, y, loads` tile source. For convenience, [top_osm_tiles.tsv.gz](top_osm_tiles.tsv.gz) has the top 1 million tiles from 90 days of [OSM tile logs](https://planet.openstreetmap.org/tile_logs/) from summer 2023. You can load these sample weights using duckdb's [httpfs module](https://duckdb.org/docs/extensions/httpfs.html): ```sql INSTALL httpfs; CREATE TABLE weights AS SELECT z, x, y, loads FROM 'https://raw.githubusercontent.com/onthegomap/planetiler/main/layerstats/top_osm_tiles.tsv.gz'; ``` Then compute the weighted average tile size: ```sql SELECT format_bytes((sum(gzipped * loads) / sum(loads))::int) gzipped_avg, format_bytes((sum(raw * loads) / sum(loads))::int) raw_avg, FROM tilestats JOIN weights USING (z, x, y); ``` | gzipped_avg | raw_avg | |-------------|---------| | 81KB | 132KB | If you are working with an extract, then the low-zoom tiles will dominate, so you can make the weighted average respect the per-zoom weights that appear globally: ```sql WITH zoom_weights AS ( SELECT z, sum(loads) loads FROM weights GROUP BY z ), zoom_avgs AS ( SELECT z, sum(gzipped * loads) / sum(loads) gzipped, sum(raw * loads) / sum(loads) raw, FROM tilestats JOIN weights USING (z, x, y) GROUP BY z ) SELECT format_bytes((sum(gzipped * loads) / sum(loads))::int) gzipped_avg, format_bytes((sum(raw * loads) / sum(loads))::int) raw_avg, FROM zoom_avgs JOIN zoom_weights USING (z); ```