Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrigger operation and data index feedback #1857

Assignees

Comments

@fjtirado
Copy link

fjtirado commented Mar 4, 2025

Kogito currently support retrigger operation.
This operation allows an user to resume execution of a process instance that failed in a particular node (either after updating the model or after fixing the external service that failed)

On the other hand, Data index stores a list of executed node instances associated to a particular process instance. There is also an error property associated to the process instance, which contains the error message, but not a pointer to the node instance that failed

When the node is re-triggered, a new list of nodes, consequence of the dispatching of the re trigger request, is added to the existing list.

There is a problem for users that want to know what are the nodes that are part of the re-trigger execution. To cope with it, two alternatives come to my mind:

  1. add a new resume property to data index process instance object, which will contain a list of items ( a workflow instance can be resumed many times), each item containing the nodeinstanceid that was retriggered and the date of the retrigger attemp.
  2. add a new retriggerred flag to the node instance that was retriggered.

Related with that, Data index is fed by events published by runtimes. Runtimes only publishes the events that fed data index when an unit of work is completed. A unit of work is completed when the process instance state is completed, error or waiting (if the workflow is waiting for event)

For workflows that basically invoke synchronous operations, this means that once the retrigger is executed, the state that the user will see in data index will be either completed or error. If the retrigger was executed when the process instance was in failed state (most typical situation) and the retrigger failed again (which can happen) the user wont see any change in process state (he will see more node instances added to the process instance, though). Therefore, it will be convenient to add error message to the node instance that failed. This way, a user can query by node instances which error messages are not null to find out how many times the process instance has failed.

Also, it is probably sound (although not sure it is really required) to send a State Process instance event before dispatching the retrigger operation to set the status to active in data-index. This might be useful in case the workflow contains long operations and the event with the completed/error/waiting state result of the retrigger operation takes a while to be published.

@ydayagi
Copy link

ydayagi commented Mar 5, 2025

RHDH orchestrator is a consumer of SonataFlow operator over OCP. It executes workflows and queries ProcessInstances via Data Index. It also requires the retrigger feature. The ProcessInstance fails and is in 'ERROR' state. Orchestrator calls retrigger to have another execution of the instance starting from the point of failure. The Orchestrator's requirements/needs from this feature (retrigger) are:

  1. Query the ProcessInstacne and know it is currently executing due to a call to retrigger.
  2. Query the ProcessInstacne and know the reason for the last failed execution.
  3. Query the ProcessInstacne and know which are the nodes that are part of the last execution, either ongoing or completed

@fjtirado
Copy link
Author

fjtirado commented Mar 5, 2025

@ydayagi Regarding 1) the most usual scenario, as I tried to explain, is that when Data Index is udpated the retriggered workflow will be either on completed or error state. So, after this issues is merged, to find out if there has been a retriggered attemp you will have to check the node instance list, as for 2. and 3.

@ydayagi
Copy link

ydayagi commented Mar 5, 2025

@ydayagi Regarding 1) the most usual scenario, as I tried to explain, is that when Data Index is udpated the retriggered workflow will be either on completed or error state. So, after this issues is merged, to find out if there has been a retriggered attemp you will have to check the node instance list, as for 2. and 3.

that is not good

fjtirado added a commit to fjtirado/drools that referenced this issue Mar 5, 2025
To ProcessNodeTriggeredEvent interface
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 5, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 6, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 6, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 6, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-apps that referenced this issue Mar 6, 2025
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 6, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 6, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-apps that referenced this issue Mar 6, 2025
fjtirado added a commit to fjtirado/kogito-runtimes that referenced this issue Mar 7, 2025
Adds triggerCount to ProcessInstanceNode events
fjtirado added a commit to fjtirado/kogito-apps that referenced this issue Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment