Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishable Changes vs Brain #5459

Open
mhsdesign opened this issue Feb 5, 2025 · 1 comment
Open

Publishable Changes vs Brain #5459

mhsdesign opened this issue Feb 5, 2025 · 1 comment
Labels

Comments

@mhsdesign
Copy link
Member

mhsdesign commented Feb 5, 2025

After (re)-discovering this bug: #4997 (comment)

ANY unpublished changes within a hierarchy will prevent that part of the tree from being deleted via the Neos Ui.

i think its time again to look at the change projection with an closer eye.

requirements for the tracked changes

now the current requirements for the change projection should be the following:

  1. count total changes of workspace
  2. count changes in site or document
  3. get all node ids to publish changes within a site or document
    a. when publishing a document changes on the document (like the title) should be published as well as all nested content within. Other child documents or tethered documents are not scope of this publish.
    b. get node ids of nodes that were deleted to publish deletions of documents by using publish all (in site) and publish document to remove content elements
    c. detect that changed nodes where the parent has been deleted also have to be included in the publish operation BUG: Publishing individual nodes is impossible when contents were created on a deleted document #4997
  4. show information like label and node type in the workspace review module
    a. also show information for deleted nodes
  5. track if a node was modified in multiple dimensions which should be shown in the publishing dialog.
    a. track which dimensions an aggregate scope change affected
    b. show differences of properties and subtree tags

additional optimisations to ignore redundant changes because of a followed deletion (probably doesnt have to be optimised as its a rare case):

  1. In the best case the creation followed by an immediate deletion should not be shown as change (4.a) - not even in the count (1. and 2.) and be detected as redundant operation to be dropped on the next rebase, publish, or discard. The same should apply if descendant nodes were changed inside.
  2. Changes to descendants where their previously in the base workspace existing parent was deleted should be ignored in a similar way and could theoretically be dropped/optimised after rebase, publish, or discard.

current implementation

The change projection tracks changes

contentStreamId created changed moved nodeAggregateId originDimensionSpacePoint deleted removalAttachmentPoint
user-cs 0 1 0 content-node-id {"language":"en_US"} 0 null

And the information like their closest parent document is evaluated via the publishing service while the content diffing is done during evaluation against the base workspace in the workspace review controller.

the requirements can almost be covered by the current implementation with following exceptions / downsides:

  • counting all changes for a scope (2.) get proportional slower with the amount of changes as the changes are flat without hierarchy and we check for each change its parents -> slows down neos ui a lot with big workspace as the publish button is always fetched.
  • to allow publishing deletions of nodes (3.b) we track their document or site, this is done by the hack of removalAttachmentPoint which obscures our cr api RemoveNodeAggregate command should not specify removalAttachmentPoint #4487
  • we cannot include previous changes made to a removed nodes descendants in the change set (3.c). The hierarchy cannot be resolved at the point of publishing as they are deleted and the changes will fail to apply as remainders: BUG: Publishing individual nodes is impossible when contents were created on a deleted document #4997 (comment)
  • the workspace modules review page has to interpret all changes at the time of display (4) this can be speed up if cached / gathered at the time of the change. To show information about deleted nodes we attempt to access the node in the base workspace (4.a) which does not always succeed.
  • the changes don't contain information about their affected dimensions (5.a) for aggregate changes (like change node type) which has to be fetched during evaluation (related to !!! BUGFIX: Aggregate changes must know their affected origin dimension space points #5395)
  • the diffing of properties and subtree tags is done during evaluation (5.b) in the workspace controller which lots of core logic

proposed solution

catchup hook

all the requirements without facing the problems of the current implementation should be solve-able by implementation the change tracking as catchup hook. That allows to get the actual hierarchy at the point of the change which later can be used to easily determine the count (2.) and prepare a working change set for publishing (3.) especially (3.c)

the slowness of the current evaluation would be pre-evaluated when making that change and when rebasing it. That means when rebasing or publishing with huge remaining changes - or publishing into a workspace the expensive findClosestNode queries are still run during that publishing time span and dont improve the time the button press takes. Only gathering the information when booting the Neos Ui is faster.

alternative solutions

keeping it a projection

it would be interesting to have the ability to write dependant projections to access the content graph. That would allow to access the hierarchy and being a little bit simpler to replay and reset. Though this is not possible currently.

Also tracking the whole hierarchy is not a good idea as this leaves us with a huge implementation. Au contraire to the uri path projection which also tracks hierarchy, we need to track every workspace and also every node if changed or not. This is an immense overload and not desirable or maintainable.

@mhsdesign mhsdesign added the 9.0 label Feb 5, 2025
@mhsdesign
Copy link
Member Author

mhsdesign commented Feb 5, 2025

another alternative solution we thought about briefly was to move complexity to the core:

One of the core problems is the hard to use api of PublishIndividualNodesFromWorkspace requiring a list of node ids to select. The intention of publishing a document is partially lost and the command just requires the node ids of the actually to be published changes. This is especially complex for (3.c) as descendants are deleted and unknown to the user at this point. A simple to use api could look like the following. To fulfil (3.a) and have it neos agnostic we need to avoid recursive publishing for nested documents. This can be expressed in the following ways:

$cr->handle(PublishNodesFromWorkspace::create(
    workspace: 'user-ws',
    startingNodeAggregateId: 'document-node-id',
    nodeTypeScope: 'Neos.Neos:Document' // stops when encountered as child
))

or

$cr->handle(PublishNodesFromWorkspace::create(
    workspace: 'user-ws',
    selection: [
        NodeToPublish::create('document-node-id'),
        NodeToPublishRecursively::create('document-main-content-collection-node-id'),
        // ... resolve all other children that are not of type "Neos.Neos:Document" from outside.
    ]
))

Both of the expressed ways would require the core to select the events to publish based on the required hierarchy. Now the information can be fetched from the graph in the happy case by finding all descendant node aggregate ids and publishing the events that target those.

For (3.b and 3.c) the content graph currently does not provide information for deleted nodes, but we can work around this the following ways:

soft removal of nodes in the graph projection

We should be able to internally keep removed nodes as soft removal in the graph directly or track them in a separate table. That way a purely internal api findAllChangedDescendantNodeAggregateIds() could be implemented using the soft deleted nodes.

Christian and me made following points when discussing the option of soft deletion of nodes:

  • soft deletions only happen in the current workspace were the changes were made
  • in the root workspace all deletion happen indefinitely
  • when forking a workspace we dont want to clone the soft deletions (effectively drop those)
  • when publishing deletions in a workspace the soft deletions are moved down, we fork new and apply remaining deletion changes again as soft deletion
  • leftover node rows must eventually be cleaned up when the deletion is published to live. As the first live executed remove node happens first it does not clean up the node as the workspace still holds a soft removal reference, on content-stream was removed we must ensure that when cleaning up the hierarchies with soft deletions we also drop the node rows.
  • database implementation:
    • introduce a removed column in the hierarchy relations table (all queries need to exclude removed=1 -> could be done in addSubtreeTagConstraints)
      • we might be really close here to reintroduce a removedContentShown flag like int 8.3 :O
    • a separate deleted_hierarchy_relations table could also work, introduces more complexity on the write side but makes reading easier.
  • edge case of creating a node, deleting it, and recreating it
    • without soft deletions the using the same node aggregate identifier this is possible. To prevent corruption with soft deletions we would need constraint checks to prevent recreation of such node. But then again the user can impossibly be aware of that. So first of all the user would need an api to check that and second the user would expect a way to restore the deletion (technically a discard node without rebase). We considered automagically restoring that node and merging the new initial properties and child nodes of the to be created node but that would break hell loose - but it could work to actually delete the node in that case and just override it.
  • the query of findAllChangedDescendantNodeAggregateIds will be super slow like findDescendantNodes so when publishing the site that will take a moment

gather nodes to publish during simulation

-> poc implementation: #5461

The cruelest case for a reliable publishing is (3.c). To be able to gather events that target nodes that are eventually deleted we have to ask the graph when it still holds this information.

Before the node was removed we have that information. So we could decide when handling the remove command to gather information about all its children that are about to be removed (slow because of cte) or even visit all previously made events and check if they are in the scope of the deletion. This information could then be attached to NodeAggregateWasRemoved.

But instead of introducing the complexity there and bloating up the events i had the insight that we can do the same during the simulation in CommandSimulator::run - before we handle a remove node and leverage the currently being build up transaction (in memory) graph:

$commandSimulator->run(
    static function ($handle, $contentGraph) use ($commandSimulator, $rebaseableCommands, $startingNodeAggregateId): SequenceNumber {
        $remainingCommands = [];
        foreach ($rebaseableCommands as $rebaseableCommand) {
            // extract into PublishingScopeCriteria and cache `findClosestNode` if no moves happened in the meantime
            if ($contentGraph->findClosestNode($rebaseableCommand->getAffectedNodeAggregateId(), FindClosestNodeFilter::create('Neos.Neos:Document'))?->aggregateId?->equals($startingNodeAggregateId)) {
                $handle($rebaseableCommand);
                continue;
            }
            $remainingCommands[] = $rebaseableCommand;
        }
        foreach ($remainingCommands as $remainingCommand) {
            $handle($remainingCommand);
        }
    }
);

Each $rebaseableCommand is checked if it is in the scope of publishing before it is run. That allows us to keep during publish or discard removals in the scope and also previously made changes. Later we could extend the mechanism to drop certain events if they would have been overruled as well but that is just an optimisation (6.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant