Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dennishuo
Copy link
Contributor

Fully differentiating between e.g. writeEntity an writeEntityInCurrentTxn allows all subclasses of AbstractTransactionalPersistence now to automatically inherit the ability to also behave as one-shot durable BasePersistence implementations.

This also allows now incrementally pulling shared logic out of PolarisMetaStoreManagerImpl into BaseMetaStoreManager which can be reused by other non-transactional implementations of PolarisMetaStoreManager; the BaseMetaStoreManager will only ever interact with BasePersistence while PolarisMetaStoreManagerImpl will interact with TransactionalPersistence.

…r they are

one-shot atomic or if they are intended to be run as part of a caller-managed
transaction, in which case the action will not be flushed/durable until the
transaction is committed.

Delegate all the BasePersistence methods to subclass variations of *InCurrentTxn
so that the TransactionalPersistence implementations also happen to fully
implement the real semantics of BasePersistence.
Comment on lines 110 to 114
// Every method of BasePersistence will have a related *InCurrentTxn method here; the semantics
// being that transactional implementations of a PolarisMetaStoreManager may choose to
// self-manage outer transactions to perform all the persistence calls within that provided
// transaction, while the basic implementation will only use the "durable in a single-shot"
// methods from BasePersistence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new methods seem to match methods in BasePersistence 1:1, which seems like a redundant copy.

How about we use BasePersistence to define API methods, parameters and version check expectations, but not attach any atomicity or durability guarantees to them?

Then, AtomicPersistence extends BasePersistence <- here we define that each method call is durable and atomic.
TransactionPersistence extends BasePersistence <- allows explicitly managing Tx boundaries and defines that changes are durable on Tx commit. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior was already effectively that, except that the semantics were just implied in javadocs rather than an interface; do you mean you'd want AtomicPersistence to basically be just an indicator interface that doesn't necessarily add new methods?

public interface AtomicPersitence extends BasePersistence {}

I had tried to make it work initially without renaming the methods, but I kept running into a few issues

  1. If anything is calling things through the BasePersistence interface, such as BaseMetaStoreManager we don't know if the method call is intended to be inside an outer transaction or not, since the helpers in BaseMetaStoreManager could also be used by atomic or transactional impls of PolarisMetaStoreManager
  2. Even within a "transactional" implementation, in particular when inheriting shared methods from BaseMetaStoreManager, we have a situation where we want to use atomic methods sometimes, and "InCurrentTxn" other times with the same instance of a TransactionalPersistence. To achieve this the other way we'd need a composition/delegator pattern where an AtomicPersistence impementation implements all the "atomic" versions by explicit runInTransaction just like the AbstractTransactionalPersistence does here, but would have to provide the handle to the underlying raw TransactionalPersistence anyways when a caller wants to call in a transaction.

With this PR's approach, we ensure very explicit semantics for each method definition, regardless of whether the methods were from the base interface or the TransactionalPersistence interface.

Also importantly, this approach allowed all current TransactionalPersistence impls to be able to function as a AtomicPersistence as you describe, but without any ambiguity, which made it easy to test.

Another thing to consider is that even for the transactional implementations, the "silent runInTransaction approach is pretty suboptimal, because it's not possible to manage transaction state explicitly, and depends on a ThreadLocal which isn't great (if someone tries to get cute and parallelize some validation fan-out requests against the EclipseLink impl right now, transactions break).

This refactoring could be a step towards the more idiomatic approach of adding a Transaction argument to the *InCurrentTxn versions of the method calls, where the Session/Transaction intended to be used in a given method call is unambiguous.

It looks like our EclipseLink impl actually could've been done that way, since it has an explicit EntityManager session which represents the live transaction, but only had to use ThreadLocal because the interfaces were not well-suited for "runInTransaction" semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh another nice benefit I noticed when doing this was even for methods that aren't 1:1 from BasePersistence, I found it much easier to reason about what was happening by still following the naming convention, e.g. writeToEntitiesChangeTrackingInCurrentTxn so that it's easy to see at a glance whether all the method calls indeed abide by being "in a current transaction". This is also something that will benefit from actually providing an actual Transaction state object for these variations (or in that case, it's not as strictly necessary for the methods to be named as InCurrentTxn, but instead having (Transaction txn... in the method signature would serve the same purpose of disambiguation).

Copy link
Contributor Author

@dennishuo dennishuo Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to reduce the diffs on the existing PolarisMetaStoreManagerImpl class we could instead invert the semantics, as in unqualified writeEntity could already always mean "in current transaction" since the current implementations all have that semantic, and instead, we introduce the new methods in AtomicPersistence to have writeEntityAtomically, dropEntityAtomically, etc.

The problem with doing it inverted like that though is that the BasePersistence methods are automatically the "InCurrentTxn" variants, so a subclass that only implements AtomicPersistence would I suppose just throw UnsupportedOperationException for all the methods from BasePersistence which kind of defeats the purpose.

Alternatively, the doFooAtomically methods could be the methods declared in BasePersistence.

In general it makes sense to have the "Atomic" operations in the base interface and "Transaction" ones in the subinterface, because as AbstractTransactionalPersistence.java shows, all Transactional implementations can also achieve the semantics of "Atomic Persistence", but not all implementations of Atomic Persistence" can support transactional styles.

…of "new" methods defined in `BasePersistence`

while the original "doFoo" will mean that it expects to be run in a current transaction. Removed all the InCurrentTxn
suffixes.
@dennishuo
Copy link
Contributor Author

@dimas-b Okay, I inverted the method naming so that BasePersistence defines all-new doFooAtomically methods, while TransactionalPersistence keeps the existing doFoo method names. This way there is no change in which method needs to be called from PolarisMetaStoreManagerImpl and indeed no changes to the TreeMap/EclipseLink implementations.

The nice thing here is the AbstractTransactionalPersistence just provides an adapter layer to make all those existing implementations also able to provide all the Atomically methods without them having to change at all.

@dennishuo
Copy link
Contributor Author

If it helps to reason about how the the method renaming structure fits in and why things like writeEntityAtomically will actually diverge more, see the next phase that diffbases on top of this one: #1139

If it's preferred we could just close this PR and try to merge the "phase 3" one all as one shot instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants