Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

dennishuo · 2025-03-07T09:37:11Z

Fully differentiating between e.g. writeEntity an writeEntityInCurrentTxn allows all subclasses of AbstractTransactionalPersistence now to automatically inherit the ability to also behave as one-shot durable BasePersistence implementations.

This also allows now incrementally pulling shared logic out of PolarisMetaStoreManagerImpl into BaseMetaStoreManager which can be reused by other non-transactional implementations of PolarisMetaStoreManager; the BaseMetaStoreManager will only ever interact with BasePersistence while PolarisMetaStoreManagerImpl will interact with TransactionalPersistence.

…ManagerImpl into BaseMetaStoreManager

…r they are one-shot atomic or if they are intended to be run as part of a caller-managed transaction, in which case the action will not be flushed/durable until the transaction is committed. Delegate all the BasePersistence methods to subclass variations of *InCurrentTxn so that the TransactionalPersistence implementations also happen to fully implement the real semantics of BasePersistence.

dimas-b · 2025-03-07T23:52:18Z

...rc/main/java/org/apache/polaris/core/persistence/transactional/TransactionalPersistence.java

+  // Every method of BasePersistence will have a related *InCurrentTxn method here; the semantics
+  // being that transactional implementations of a PolarisMetaStoreManager may choose to
+  // self-manage outer transactions to perform all the persistence calls within that provided
+  // transaction, while the basic implementation will only use the "durable in a single-shot"
+  // methods from BasePersistence.


the new methods seem to match methods in BasePersistence 1:1, which seems like a redundant copy.

How about we use BasePersistence to define API methods, parameters and version check expectations, but not attach any atomicity or durability guarantees to them?

Then, AtomicPersistence extends BasePersistence <- here we define that each method call is durable and atomic.
TransactionPersistence extends BasePersistence <- allows explicitly managing Tx boundaries and defines that changes are durable on Tx commit. WDYT?

The old behavior was already effectively that, except that the semantics were just implied in javadocs rather than an interface; do you mean you'd want AtomicPersistence to basically be just an indicator interface that doesn't necessarily add new methods?

public interface AtomicPersitence extends BasePersistence {}

I had tried to make it work initially without renaming the methods, but I kept running into a few issues

If anything is calling things through the BasePersistence interface, such as BaseMetaStoreManager we don't know if the method call is intended to be inside an outer transaction or not, since the helpers in BaseMetaStoreManager could also be used by atomic or transactional impls of PolarisMetaStoreManager

Even within a "transactional" implementation, in particular when inheriting shared methods from BaseMetaStoreManager, we have a situation where we want to use atomic methods sometimes, and "InCurrentTxn" other times with the same instance of a TransactionalPersistence. To achieve this the other way we'd need a composition/delegator pattern where an AtomicPersistence impementation implements all the "atomic" versions by explicit runInTransaction just like the AbstractTransactionalPersistence does here, but would have to provide the handle to the underlying raw TransactionalPersistence anyways when a caller wants to call in a transaction.

With this PR's approach, we ensure very explicit semantics for each method definition, regardless of whether the methods were from the base interface or the TransactionalPersistence interface.

Also importantly, this approach allowed all current TransactionalPersistence impls to be able to function as a AtomicPersistence as you describe, but without any ambiguity, which made it easy to test.

Another thing to consider is that even for the transactional implementations, the "silent runInTransaction approach is pretty suboptimal, because it's not possible to manage transaction state explicitly, and depends on a ThreadLocal which isn't great (if someone tries to get cute and parallelize some validation fan-out requests against the EclipseLink impl right now, transactions break).

This refactoring could be a step towards the more idiomatic approach of adding a Transaction argument to the *InCurrentTxn versions of the method calls, where the Session/Transaction intended to be used in a given method call is unambiguous.

It looks like our EclipseLink impl actually could've been done that way, since it has an explicit EntityManager session which represents the live transaction, but only had to use ThreadLocal because the interfaces were not well-suited for "runInTransaction" semantics.

Oh another nice benefit I noticed when doing this was even for methods that aren't 1:1 from BasePersistence, I found it much easier to reason about what was happening by still following the naming convention, e.g. writeToEntitiesChangeTrackingInCurrentTxn so that it's easy to see at a glance whether all the method calls indeed abide by being "in a current transaction". This is also something that will benefit from actually providing an actual Transaction state object for these variations (or in that case, it's not as strictly necessary for the methods to be named as InCurrentTxn, but instead having (Transaction txn... in the method signature would serve the same purpose of disambiguation).

If we want to reduce the diffs on the existing PolarisMetaStoreManagerImpl class we could instead invert the semantics, as in unqualified writeEntity could already always mean "in current transaction" since the current implementations all have that semantic, and instead, we introduce the new methods in AtomicPersistence to have writeEntityAtomically, dropEntityAtomically, etc.

The problem with doing it inverted like that though is that the BasePersistence methods are automatically the "InCurrentTxn" variants, so a subclass that only implements AtomicPersistence would I suppose just throw UnsupportedOperationException for all the methods from BasePersistence which kind of defeats the purpose.

Alternatively, the doFooAtomically methods could be the methods declared in BasePersistence.

In general it makes sense to have the "Atomic" operations in the base interface and "Transaction" ones in the subinterface, because as AbstractTransactionalPersistence.java shows, all Transactional implementations can also achieve the semantics of "Atomic Persistence", but not all implementations of Atomic Persistence" can support transactional styles.

…ence-phase2

…of "new" methods defined in `BasePersistence` while the original "doFoo" will mean that it expects to be run in a current transaction. Removed all the InCurrentTxn suffixes.

dennishuo · 2025-03-08T04:48:23Z

@dimas-b Okay, I inverted the method naming so that BasePersistence defines all-new doFooAtomically methods, while TransactionalPersistence keeps the existing doFoo method names. This way there is no change in which method needs to be called from PolarisMetaStoreManagerImpl and indeed no changes to the TreeMap/EclipseLink implementations.

The nice thing here is the AbstractTransactionalPersistence just provides an adapter layer to make all those existing implementations also able to provide all the Atomically methods without them having to change at all.

dennishuo · 2025-03-09T06:24:04Z

If it helps to reason about how the the method renaming structure fits in and why things like writeEntityAtomically will actually diverge more, see the next phase that diffbases on top of this one: #1139

If it's preferred we could just close this PR and try to merge the "phase 3" one all as one shot instead.

dennishuo added 2 commits March 4, 2025 08:56

Initial sketch of how shared code can be lifted from PolarisMetaStore…

806caae

…ManagerImpl into BaseMetaStoreManager

dennishuo requested review from adutra, ashvina, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain and ebyhr as code owners March 7, 2025 09:37

dimas-b reviewed Mar 7, 2025

View reviewed changes

dennishuo added 2 commits March 8, 2025 02:41

Merge branch 'main' of github.com:dennishuo/polaris into dhuo-persist…

831c552

…ence-phase2

Invert the method naming semantics; "doFooAtomically" is now the set …

26be511

…of "new" methods defined in `BasePersistence` while the original "doFoo" will mean that it expects to be run in a current transaction. Removed all the InCurrentTxn suffixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

dennishuo commented Mar 7, 2025

dimas-b Mar 7, 2025

dennishuo Mar 8, 2025

dennishuo Mar 8, 2025

dennishuo Mar 8, 2025 •

edited

Loading

dennishuo commented Mar 8, 2025

dennishuo commented Mar 9, 2025

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

Are you sure you want to change the base?

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

Conversation

dennishuo commented Mar 7, 2025

dimas-b Mar 7, 2025

Choose a reason for hiding this comment

dennishuo Mar 8, 2025

Choose a reason for hiding this comment

dennishuo Mar 8, 2025

Choose a reason for hiding this comment

dennishuo Mar 8, 2025 • edited Loading

Choose a reason for hiding this comment

dennishuo commented Mar 8, 2025

dennishuo commented Mar 9, 2025

dennishuo Mar 8, 2025 •

edited

Loading