You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know that this importer is being integrated in geonode itself but i think the problem is the same, I noticed that the import process is getting slower and slower the more data there is in the geoserver catalog (with my setup, roughly 1000 layers results in 60seconds per layer import, no SSD).
From the geoserver log I see that there is a request made on each stores in the workspace of geoserver.
I think that this is due to the sanity_check phase of the publisher, as it makes request over all stores of geoserver with all possible names to check for srid.
As the publish_resources makes a call to geoserver-restconfig function : create_coveragestore, which returns the created resource, maybe we can avoid the sanity check over all the stores ?
I tested by removing the sanity checks, the time gets divided by 2. But I still have in geoserver logs a request over every stores...
I tracked it down to the create_geonode_resource, which calls the (geonode)resource_manager.create -> (geonode)sync_instance_with_geoserver -> (geonode)fetch_gs_resource->(geoserver-restconfig)get_resource-> (geoserver-restconfig)get_resources
The get_resources function is costly when there is no store given, as it loops over each stores of geoserver. In my case I have 1000 raster images, then there is at least 1000 request each time the get_resources function is called without a store
So when uploading a raster there are 3 calls to geoserver-restconfig get_resources() without a store given.
I think that the second one is cached, thus we have effectively 2 loops over all the stores.
The geoserver-restconfig makes the get_resources call without giving a store parameter, although it has it in parameter
My path to faster import would be:
Modify geoserver-restconfig create_coveragestore to call the get_resources with a store
Use the store for Sanitychecks in geonode-importer, here there is already the function get_geoserver_store_name taht returns the store name based on resource name, so we can use it.
Hi,
I know that this importer is being integrated in geonode itself but i think the problem is the same, I noticed that the import process is getting slower and slower the more data there is in the geoserver catalog (with my setup, roughly 1000 layers results in 60seconds per layer import, no SSD).
From the geoserver log I see that there is a request made on each stores in the workspace of geoserver.
I think that this is due to the sanity_check phase of the publisher, as it makes request over all stores of geoserver with all possible names to check for srid.
https://github.com/GeoNode/geonode-importer/blob/master/importer/publisher.py#L75
As the publish_resources makes a call to geoserver-restconfig function : create_coveragestore, which returns the created resource, maybe we can avoid the sanity check over all the stores ?
https://github.com/GeoNode/geoserver-restconfig/blob/ffbcbb175e9df37dbbd4bf0240058d79fb92eca0/src/geoserver/catalog.py#L689
The text was updated successfully, but these errors were encountered: