The documentation you're currently reading is for version 3.2.0. Click here to view documentation for the latest stable version.
Pausing and Resuming Workflow Execution¶
An execution of a Mistral workflow can be paused by running
st2 execution pause <execution-id>.
An execution must be in a running state in order for pause to be successful. The execution will
initially go into a
pausing state, and will go into a
paused state when no more tasks are
in an active state such as
canceling. When a workflow execution
is paused, it can be resumed by running
st2 execution resume <execution-id>.
resume operation will cascade down to subworkflows, whether it’s another
workflow defined in a workbook or it’s another EWC action that is a Mistral workflow or Action
Chain. If the
pause operation is performed from a subworkflow or subchain, then the
will cascade up to the parent workflow or parent chain. However, if the
resume operation is
performed from a subworkflow or subchain, the
resume will not cascade up to the parent workflow
or parent chain. This allows users to resume and troubleshoot branches individually.
Canceling Workflow Execution¶
An execution of a Mistral workflow can be cancelled by running
st2 execution cancel <execution-id>. Workflow tasks that are still running will not be
canceled and will run to completion. No new tasks for the workflow will be scheduled.
Re-running Workflow Execution¶
An execution of a Mistral workflow can be re-run on error. The execution either can be re-run from
the beginning or from the task(s) that failed. The latter is useful for long running workflows with
temporary service or network outages. Re-running the workflow execution from the beginning is
exactly like re-running any EWC execution with the command
st2 execution re-run <execution-id>.
The re-run is a completely separate execution with a new execution ID in both EWC and Mistral.
Re-running the workflow from where it errored is slightly different. To retain context, the
original workflow execution is reused in Mistral but a new EWC execution will be created to stay
consistent in EWC. The re-run command has a new
--tasks option that takes a list of task
names to re-run.
For example, given a workflow that fails at task3 and task4 on separate parallel branches, the
st2 execution re-run <execution-id> --tasks task3 task4 will resume the Mistral
workflow execution and re-run both task3 and task4 using original inputs. Both the workflow and
task execution in Mistral have to be in an
errored state for re-run.
If using a Mistral workbook, tasks of subworkflows can also be re-run. For example, if the main
workflow has a task1 that calls subflow1, then to re-run subtask1 of subflow1, the syntax for the
st2 execution re-run command would be
st2 execution re-run <execution-id> --tasks task1.subtask1.
If the task to re-run is a “with-items” task, there is an option to re-run only failed iterations.
For example, task1 is a with-items task with 5 items. Let’s say 2 of the items failed. By
st2 execution re-run --tasks task1 task2 --no-reset task1 option, task1 will
only re-run the 2 items that failed. If the
--no-reset option is not provided, then all 5
items will be re-run.
Re-running workflow execution from the task(s) that failed is currently an experimental feature and subject to bug(s) and change(s). Please also note that re-running a subtask nested in another EWC action is not currently supported.
Task Timeout vs Action Timeout¶
Mistral supports a task
timeout: parameter. This sets the maximum amount of time Mistral will
wait before marking a task as failed. However, EWC actions implement their own timeouts.
The default value for each action timeout depends upon the action runner used. Typically this is
60s for SSH-based actions, and 600s for Python actions. The default can be changed on a per-action
basis, and can be over-ridden for each execution.
This can cause confusion when you need to extend the timeout for some tasks. Setting a longer Mistral timeout does not extend the underlying action timeout. For example, if you have a long-running command, this will not achieve the desired result:
version: '2.0' examples.task-timeout: type: direct input: - command tasks: task1: timeout: 120 action: core.local input: cmd: <% $.command %>
Instead, set the timeout on the underlying action. Note the indentation here:
version: '2.0' examples.action-timeout: type: direct input: - command tasks: task1: action: core.local input: cmd: <% $.command %> timeout: 120