Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdal2tiles optimization or acceleration? #10803

Open
3DTOPO opened this issue Sep 15, 2024 · 17 comments
Open

gdal2tiles optimization or acceleration? #10803

3DTOPO opened this issue Sep 15, 2024 · 17 comments

Comments

@3DTOPO
Copy link

3DTOPO commented Sep 15, 2024

Feature description

I have an extensive imagery processing pipeline, including computation intensive histogram matching, many image adjustments and many compositing steps, but running gdal2tiles takes far longer than any part of my proccessing. I can create the base image from scratch much faster than tile.

I'm not complaining per se, but I'm wondering why it is slow and if there is room for optimization. I'm a  developer and don't have any experience profiling and optimizing code on other platforms. The machines that do the bulk of my processing run Ubuntu.

Has profiling been done on gdal2tiles to see the most expensive processes?

Besides any obvious optimizations, it would be really nice to have hardware acceleration for the most expensive processes, which I guess is the scaling operation.

I am willing to work on it if I have an idea of what would make the biggest improvement.

Additional context

No response

@jratike80
Copy link
Collaborator

Please give a re-producible example about what you do and what timings you get. Or at least the gdal2tiles command with source data.

@3DTOPO
Copy link
Author

3DTOPO commented Sep 16, 2024

It took about 62 hours to tile a .vrt that is 255145 x 255145 pixels. The VRT file is composed of 64 subtiles that are each 31900 x 31900 pixels. I can't make the image available nor would it be practical. I don't think the actual pixels makes any difference though. When running, all 12 cores are fully saturated. The machine has 128GB of RAM.

Here is the command I ran:

gdal2tiles.py --exclude --xyz --processes=12 --tiledriver=PNG --resampling=lanczos --zoom=18-18 --tilesize=512 image_z18.vrt z18_tiles

I know lanczos is slower than the other resample options but only slightly slower than bilinear and nearest isn't suitable.

@jratike80
Copy link
Collaborator

jratike80 commented Sep 17, 2024

It is often (usually) possible to do reasonable tests with smaller-than-all-that-we-have-in-production datasets, and with images which are alike the real ones and available with some free license. Or images may be artificial and created with for example https://gdal.org/en/latest/programs/gdal_create.html

Please show what gdalinfo prints about one of the 31900 x 31900 pixel sized subtile. You can hide the coordinates, what interests me is if they are tiled and if they have overviews.

Please help us to get a view about the scale of the issue by calculating some statistics. How many tiles there will be at zoom level 18, how many tiles gdal2tiles is generating now, and what tiles/second or tiles/minute rate would satisfy you? And perhaps the output as megabytes of image data per minute (or in some other appropriate units) that gdal2tiles can handle, and what your optimized pipeline gives.

@3DTOPO
Copy link
Author

3DTOPO commented Sep 17, 2024

Below is the output from one of the tiles. They are PNG with transparency.

I once tried first creating a TIFF of the entire .vrt using gdal_translate with create options set to use tiling. The pre-operation took a long time, and it didn't seem to speed up the tiling process enough to make it overall faster.

I haven't tried creating tiled TIFFs of the individual 31.9K images, though I suppose I should try.

I am working on a prototype area. Eventually, I will create tiles at Z18 for the entire United States. The prototype area produces 955,221 512x512 tiles. If I recall correctly, the US will have 668 million 512x512 tiles.

I don't really have a rate in mind; I just thought I would inquire because out of all the intensive image processing I am doing, this is by far the slowest part of production, which is counterintuitive to me. I can generate this imagery from many layers in less than 24 hours. If it was possible to achieve similar or better, that would be fantastic.

Driver: PNG/Portable Network Graphics
Files: z11_x346_y789_clear.png
z11_x346_y789_clear.png.aux.xml
Size is 31900, 31900
Coordinate System is:
PROJCRS["WGS 84 / Pseudo-Mercator",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["unnamed",
METHOD["Popular Visualisation Pseudo Mercator",
ID["EPSG",1024]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["False easting",0,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1]],
ID["EPSG",3857]]
Data axis to CRS axis mapping: 1,2
Origin = (-13267024.500000000000000,4598453.500000000000000)
Pixel Size = (0.613573667711599,-0.613573667711599)
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left (-13267024.500, 4598453.500) (119d10'46.95"W, 38d 8' 4.45"N)
Lower Left (-13267024.500, 4578880.500) (119d10'46.95"W, 37d59'46.10"N)
Upper Right (-13247451.500, 4598453.500) (119d 0'13.97"W, 38d 8' 4.45"N)
Lower Right (-13247451.500, 4578880.500) (119d 0'13.97"W, 37d59'46.10"N)
Center (-13257238.000, 4588667.000) (119d 5'30.46"W, 38d 3'55.40"N)
Band 1 Block=31900x1 Type=Byte, ColorInterp=Red
Mask Flags: PER_DATASET ALPHA
Band 2 Block=31900x1 Type=Byte, ColorInterp=Green
Mask Flags: PER_DATASET ALPHA
Band 3 Block=31900x1 Type=Byte, ColorInterp=Blue
Mask Flags: PER_DATASET ALPHA
Band 4 Block=31900x1 Type=Byte, ColorInterp=Alpha

@jratike80
Copy link
Collaborator

Larger PNG files are very unsuitable for gdal2tiles and generally always when there is a need to read data from a small area inside the image. The whole PNG image, in this case 31900x31900 pixels, must be decoded even if the need is to get source data for one 256x256 sized tile.

The first thing to do is to convert the source images into a more suitable format. Tiled GeoTIFFs are fine. I am not sure if gdal2tiles can benefit on overview levels or if it only need to access the full resolution image, and other zoom levels are then sampled from the base tiles. Doing any hardware acceleration or micro-optimization at some level is useless at this point. Here is a base command for the format conversion:

gdal_translate -of GTiff -co tiled=yes -co compress=LZW z11_x346_y789_clear.png z11_x346_y789_clear.tif

@3DTOPO
Copy link
Author

3DTOPO commented Sep 17, 2024

Thanks. That is what I've done before on the whole VRT. I'll experiment though and actually record some benchmarks.

@3DTOPO
Copy link
Author

3DTOPO commented Sep 18, 2024

I just ran two benchmarks, (A) using PNGS and (B) using tiled TIFFs. They were nearly the same time. The tiled TIFF was slightly faster, but I think it was because I was copying a file during the first test that slowed down disk access.

For (A) I created a .vrt using four of the 31.9K PNGs. For (B) I created tiled tiffs (using the command you provided) of the same four PNGs and created a .vrt of the the TIFFs.

(A) took 3 hours 59 minutes
(B) took 3 hours 46 minutes

Since it seems the only difference is from copying a file, I think it must not be using the tile features of the TIFF and would explain why its taking so long for me and why tiling did not help when I tried before.

The good news is that creating a TIFF with tile support is quite fast when I run it on the 31.9K PNGs which was not the case when I ran it on a VRT of all the PNGs.

@jratike80
Copy link
Collaborator

jratike80 commented Sep 18, 2024

I did some benchmarking with my laptop. The source image is an RGB image with 12000x12000 pixels.

gdal2tiles  --xyz --processes=4 --exclude --tiledriver=PNG --resampling=lanczos P4433H.tif timing_tiles
15.4 tiles per second (256x256), total time for 2677 tiles180 seconds

gdal2tiles --xyz --processes=4 --exclude --tiledriver=PNG --tilesize 512 --resampling=lanczos P4433H.tif timing_tiles
3.3 tiles/second (512x512) total time for 741 tiles 225 seconds

If a program is slow is not a bug by itself, but for me the speed really does not feel so good. I believe that for example Mapserver can perform much better.

I can't say if there is much to improve in gdal2tiles as a Python script, or should it be rewritten totally. The original author choose the latter approach and created MapTiler.

@jratike80
Copy link
Collaborator

I started to think that the "png files in a directory tree" feels somehow old-fashioned. I made some more tests with MBTiles and GeoPackage output, which do fundamentally the same thing: cut the source raster into tiles and save the PNG blobs somewhere on a disk. Both options were at least 3 times faster than gdal2tiles and wrote the output in 50 seconds. Yes, I know that a tile directory is simple on the server side.

gdal_translate -of MBTiles P4433H.tif timing.mbtiles
gdal_translate -of GPKG -co tile_format=PNG -co tiling_scheme=GoogleMapsCompatible P4433H.tif timing.gpkg

What I don't understand is that the following gdalwarp command runs in less than 7 seconds and writes warped tiles into GeoPackage at rate 315 tiles/second or more.
gdalwarp -of gpkg -t_srs epsg:3857 P4433H.tif timing.gpkg

I noticed that:

  • without tiling_schema option a custom schema is used; GoogleMapsCompatible is much slower
  • without selecting tile format tiles are written as JPEG; PNG is much slower

But anyway, 7 seconds is amazingly fast compared to 50 seconds with gdal_translate into MBTiles output, and 180 seconds with gdal2tiles. Even generating a cloud optimized GeoTIFF is slower, 20 seconds with

gdalwarp -of COG -t_srs epsg:3857 P4433H.tif timing_cog.tif

I know that this COG has overviews while the GPKG does not, sorry for the lazy test plan.
Source file was a12000x12000 RGB image in EPSG:3067.

@rouault
Copy link
Member

rouault commented Sep 18, 2024

I believe one of the major source of slowdown of gdal2tiles is that it requests the data corresponding to a tile into a temporary buffer whose dimension are 4 times the target size, and then apply downscaling using the resampling algorithm.

Cf

self.querysize = 4 * self.tile_size

This is a very inefficient (and dubious) way of proceeding. It should directly use the warping resampling code to fill a buffer at the expected dimension without doing that prior upsampling. I guess the rationale for that odd behavior is that gdal2tiles dates back to a time where the warping code didn't properly take into account the scaling factor when warping

@3DTOPO
Copy link
Author

3DTOPO commented Sep 19, 2024

Very interesting, thanks guys!

Question: is gdalwarp needed at all if the image is already in the target projection? Or is that what scales the tiles?

I wasn't aware gdal_translate did tiling. Is it possible to create a TILING_SCHEME for Slippy Tiles? Does it create the tile pyramid, or just tiles at the native resolution of the image?

@jratike80
Copy link
Collaborator

Warping may be needed for getting the right pixel size, but depending on the output driver it may be that gdalwarp in not needed. In my two first examples (MBTiles and GeoPackage with GoogleMapsCompatible) the data gets warped because the output schema requires that, but gdalwarp with -t_srs EPSG:3857 does not work because the resolution would not match what if required by the fixed tiling schema. However, gdal_translate works because then GDAL somehow knows automatically to warp into EPSG:3857 and to change the pixel size to match with the well-known tiling schema.

Gdal_translate does tiling if the output format supports or requires that. Database backed formats are such (MBTiles, GeoPackage, SpatiaLite, TileDB, there may be more). Also some file formats support tiling (GeoTIFF/COG, HFA etc.) but I guess that those tiles are not as concrete as the database tiles but rather a well defined sections of the code stream.

There is no "slippy tiles" format https://gdal.org/en/latest/drivers/raster/index.html. My thoughts are more theoretical. But if you look at the raster table of GeoPackage you can see that it has done the same thing than gdal2tiles. Maybe it would not make sense to extract the PNG blobs and save them into files which are named by zoom_level, tile_column, and tile_row, but certainly it would be possible.

image

The overviews in GPKG are documented in https://gdal.org/en/latest/drivers/raster/gpkg.html#overviews. Overviews into MBTiles are created in the same way.

@mdsumner
Copy link
Contributor

mdsumner commented Sep 19, 2024

fwiw, I wrote a version in R that almost feature complete and parallelizes well

https://github.com/hypertidy/filearchy

I think it would be probably easier to rewrite gdal2tiles this script from scratch, and that's vaguely on my todo to learn. It's really about lining up every target tile (an abstraction that could be done with output as simple as a table with xmin,xmax,ymin,ymax,file,zoom - also useful for other tasks), and warping the source to that list. It's very amenable to parallelization. Writing the index html could be separated out too as a handy general tool.

@mdsumner
Copy link
Contributor

the meat of the task is here

https://github.com/hypertidy/filearchy/blob/main/R/gdal_tiles.R#L32-L40

I could probably flesh out a basic go at that with osgeo.gdal, if others were interested to help with the broader python scaffolding

@mdsumner
Copy link
Contributor

what's the pythoniest way to generate a table of te and zoom from a bbox and shape? That's the kind of scaffolding I mean, it's a basic reusable component.

@3DTOPO
Copy link
Author

3DTOPO commented Sep 20, 2024

I ran my test again using four 31.9K images using actual size:

self.querysize = self.tile_size

It created 62,364 512x512 tiles from the TIFFs in 7 minutes! (148 tiles per second)

Just for good measure, I did the same test with PNGs and it took 21 minutes, or 3X as long so it shows it is using the TIFF tiles to its advantage.

I only created tiles at zoom level 18 for the test, but as far as I can tell, they look perfect. Is there any reason why I would need to make other changes?

@3DTOPO
Copy link
Author

3DTOPO commented Sep 22, 2024

So setting querysize to actual tile_size definitely breaks the overview tiles. I tried using 4x for the overview tiles, but something is still not right. I don't understand why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants