Description
Describe the bug
Under a low concurrency test, entity updates are rejected with an HTTP 500 error. This applies for catalog updates, namespace creation, table updates, ...
To Reproduce
- Check out this commit: pingtimeout@fbd3b90
- Run the server using the getting-started docker compose file:
docker compose -f getting-started/eclipselink/docker-compose.yml up
- Export the client ID and secrets as environment variables:
export CLIENT_ID=root CLIENT_SECRET=s3cr3t
- Run
./gradlew :polaris-benchmarks:gatlingRun
The simulation will run 5 concurrent users. Each user creates its own catalog (named C_0
, C_1
, ...). Then, each user sequentially creates 5 namespaces under its own catalog (named NS_0
, NS_1
, ...).
Actual Behavior
The Gatling output consistently shows that not all catalogs nor all namespaces could be created. In the output below, only 1
catalog was created and the other 4 creations were rejected with an HTTP 500 error.
========================================================================================================================
2025-03-05 16:42:50 UTC 0s elapsed
---- Requests -----------------------------------------------------------------------|---Total---|-----OK----|----KO----
> Global | 35 | 11 | 24
> Authenticate | 5 | 5 | 0
> Create Catalog | 5 | 1 | 4
> Create Namespace | 25 | 5 | 20
---- Errors ------------------------------------------------------------------------------------------------------------
> status.find.is(200), but actually found 404 20 (83.33%)
> status.find.is(201), but actually found 500 4 (16.67%)
This file is the server log for the Polaris instance. It contains numerous errors like the one below
2025-03-05 16:35:20 INFO [org.apache.polaris.service.exception.IcebergExceptionMapper] (executor-thread-1) Handling runtimeException Exception [EclipseLink-4002] (Eclipse Persistence Services - 4.0.5.v202412231137-a96b873527f305f932543045c8679bb1de8d3a43): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during conflict out checking.
Hint: The transaction might succeed if retried.
Error Code: 0
Call: UPDATE ENTITIES SET GRANTRECORDSVERSION = ?, VERSION = ? WHERE (((CATALOGID = ?) AND (ID = ?)) AND (VERSION = ?))
bind => [5 parameters bound]
Query: UpdateObjectQuery(org.apache.polaris.jpa.models.ModelEntity@2da9cea8)
Those errors are not caught and result in a HTTP 500 response to be sent to the client. Here is the payload that is received on the Gatling side:
{"error":{"message":"Exception [EclipseLink-4002] (Eclipse Persistence Services - 4.0.5.v202412231137-a96b873527f305f932543045c8679bb1de8d3a43): org.eclipse.persistence.exceptions.DatabaseException\nInternal Exception: org.postgresql.util.
PSQLException: ERROR: could not serialize access due to concurrent update\nError Code: 0\nCall: UPDATE ENTITIES SET GRANTRECORDSVERSION = ?, VERSION = ? WHERE (((CATALOGID = ?) AND (ID = ?)) AND (VERSION = ?))\n\tbind => [5 parameters boun
d]\nQuery: UpdateObjectQuery(org.apache.polaris.jpa.models.ModelEntity@1123186d)","type":"PersistenceException","code":500}}
Expected Behavior
Given that there is no overlap between catalogs and namespaces, all queries should succeed.
Additional context
This result was reproduced even after #1092 has been merged on main
.
System information
No response