The go-live test is a set of carefully composed steps which can be used to check whether the system is ready to be used. Each step briefly explains the potential problem that is being targetted, along with the way to check (Measurements), the tools to be used (Tools) and the potential impact in case something is wrong (Impact). Another similar approach to help resolve workflow problems is described in Resolving workflow problems - 10 steps and Resolving workflow problems - 6 more steps.
The workflow system is set up as a very rigid solution, which takes care of the problems that could occur. The one exception to this rule is the workflows of which a dump is produced. Where a method (read> ABAP coding) dumps, the workflow logging on the respective flow "hangs". Therefore the first thing to do on a health check is: check for dumps from user WF-BATCH
.
Note: Workflow related dumps could also be thrown by users other than WF-BATCH
. In SRM these dumps do not have to be visible to the end user...
- Measurements: Dumps should not be found
- Tools:
ST22
Dump analysis - Impact: Instances for workflow or single step tasks are stopped, even though the workflow logs report e.g. "In process"... forever !
Workflow operates via the RFC queue, which needs to be up and running. There are many possible problems with this queue which can easily be checked. 2 problems I've come across so far:
The password of user WF-BATCH
is no longer valid: on some systems this password can be reset. The RFC queue behind transaction SM58
uses an RFC user which is normally user WF-BATCH
. Resetting the password of WF-BATCH
also means this password needs to be reset on RFC settings (which can be done via transaction SWU3
, see test 5.)
Workflows that threw an error message: If en error message (statement MESSAGE E123(V0)
is thrown in a report, it will be stopped. If the report ran in the background,
Make sure that workflows don't go to everyone because there is no workflow administrator. When errors occur in workflow, the workflow administrator is addressed. If no such administrator is set up, the system will address all users.
- Measurements: Workflow administrator must be supplied. As a user ID (type US) or as one of the other types (preferred) against which at least 1 user ID should be available
- Tools: Menu: Tools - Business Workflow - Development - Administration - Basic settings - Workflow runtime - Maintain System Administrator for Runtime System (transaction
SWDC_RUNTIME
) - Impact: Irrelevant Work items could be placed in the inbox of users that should not process them.
Where Idocs are processed, R/3 allows usage of workflow to notify agents on faulty Idocs. For this setup an Idoc administrator is required to avoid work items to be send to all users (e.g. when a partner profile is not known).
- Measurements: Make sure an Idoc administrator is set up (much like the workflow administrator)
- Tools:
We46
Menu: Tools - Business communication - IDoc Basis - Administration - Idoc administration - Impact: Irrelevant Work items could be placed in the inbox of users that should not process them.
Practically everything that can be found in this overview is also covered in the automatic customizing settings. A transaction that tells you what needs to be done and which batch jobs should be up and running. This is SAP's go-live checklist and it is also linked to the actual transactions you need to execute when steps are missing or incomplete.
- Measurements: Make sure you don't find any unexplained red ticks !
- Tools:
SWU3
Automatic workflow customizing - Impact: From missing flow through to workflows not responding to halted system. If the system was not prepared to run workflow at all, this is the easiest way to find out.
The event log is a fabulous tool to find out what events were triggered and whether the triggering actually started a flow (or single step task). However it also consumes system resource which can slow down the system's performance considerably. Especially when the event log is active without restrictions (transaction SWELS
).
- Measurements: The event log should be switched OFF on a production system (check
SWELS
). What happens when a lot of workflow is triggered and the event log is switched on, theWF-BATCH
user will update tableSWFREVTLOG
in a DIA (dialog) process. So a clear symptom of this problem is the fact that the process overviewSM50
shows a lot ofWF-BATCH
-occupied DIA processes... - Tools:
SWEL
Event log;SWELS
Event log settings - Impact: System performance (it could well grind to a halt!). Interruption of updates.
You have set up a range of workflow which you expect to be triggered. These workflows should have triggering events which are active or "which have an active linkage".
- Measurements: An overview of active linkages should reveal which events will actually trigger workflows
- Tools:
SWETYPV
Overview of event type linkages;PFTC
Workflow template - tab: Triggering events - Impact: Missing workflow activity. Workflows that you expect to be triggered may not be triggered at all
In your system setup you're likely to have used Starting conditions for your workflows. Especially where standard SAP workflows are implemented, the starting conditions are the way to control whether a workflow should really be started. Starting conditions should be put on a transport seperately, which could easily have been forgotten...
- Measurements: Check the availability of the starting conditions you expect to be active
- Tools:
SWB_COND
Start conditions;SWB_PROCUREMENT
Start conditions - Impact: Too much workflow activity. Workflows that you expect NOT to be triggered may be triggered anyway
Workflows and single step tasks get triggered via the triggering event (or several triggering events) which should be active. This test will make sure that all relevant tasks and flows do get started.
- Measurements: proof of triggered flows (the results) visual check on inactive triggering events
- Tools:
SWI1
Selection report for work items Work Items Per Task;SWI2_FREQ
Work Items Per Task - Impact: Missing workflow activity. Some workflows or single step tasks will not be triggered. For the given workflow or single step task NO occurences will appear (ergo: all or nothing).
Triggering events that are thrown without triggering any workflow or single step task either because the relevant event has been set up as triggering event but is inactive (see Test "Triggering events - positive") or the event has not been set up as triggering event (or terminating event) at all.
- Measurements: The event log should list no triggered events by which no flows or tasks are influenced
- Tools:
SWEL
The event log (will reveal which events are not "listened to") - Impact: Performance (little impact). Where problems are looked at in the future, unused events can cause very time-consuming confusion.
Work items (dialog work items, potential inbox entries for end users) that are created should have a valid list of "Possible agents". Possible agents represent the group of work item receivers that the system will choose from. Most commonly used is "General task" which makes all system users possible agents for the task.
- Measurements: All foreground tasks (dialog work items) should be set to "General task" (or arrange possible users in another way)
- Tools:
PFTC
Display task (menu: Additional data - Agent assignment - Maintain - Button Attributes) - Impact: Where work items are created without possible agents, the work item will show up in the Error diagnosis list. Unavoidable errors, as a work item that has to choose agents out of an empty agents basket will always fail. Work items will go missing, errorlist wil build up.
Rule resolution determines who should get work items, acting like a fully automated postman.
- Measurements: Check whether the right users receive their work items. Agents that should not handle certain work items, should not become agent of them (negative test).
- Tools: Workflow log - list agents;
ZU21
Maintain rule resolution settings;SWI2_ADM1
Work items without agents - Impact: Work items are not delivered to the right agents, work items error because no agents could be determined or work items are addressed to all users on the system.
Where workflow is implemented, triggering events (and terminating events) control what starts and stops. In some cases the actual trigger is handled or processed immediately, in other cases the event queue is used. The event queue will queue events that are thrown and process them en mass at set times, reducing the overall system pressure.
- Measurements: Check if the queue is active and whether the related background job is up and running. Also check for "linkage with errors" events, which will need to be fixed. Are the number of events being queued on development system and production system the same ?
- Tools:
SWEQADM
Event queue administration - Impact: If the event queue is not switched on, the events that allow event queue usage will be triggered immediately, resulting in poor performance. If it is active but the batch job is not running (scheduled to run), the events that allow event queue usage will not be delivered (until the batch job is run). And linkages with errors will keep the respective events from being processed.
In the article Workflow in transport common transport problems are outlined. For workflows that have a dent in their version management, a simple check can be done on the receiving system.
- Measurements: All workflows should have the latest available version as active version, and no Runtime version should be active. Runtime version effectively means there is a newer version available but it is not used
- Tools:
SWDD
Workflow builder - Impact: Errors on specific workflows. Your workflow will trigger an older version or simply error (if the older version cannot be started, or if no older version is available). "Node 1 could not be found".
One of the most commonly seen errors is authorization. When the end user gets a work item in their inbox for which he/she is not authorized, the system will put the work item in the ERROR status, and show a message to the user. When other things go crooked on workflows at runtime, the respective work item will be set to ERROR as well. The diagnose errors report will list them and restarting respective flows/tasks can be done as well.
- Measurements: No unexplained errors should be reported
- Tools:
SWI2_DIAG
Diagnosis of Workflows with Errors;SWPR
Workflow restart after error;RSWWERRE
Execute Work Item Error Monitoring (sets work items to status ERROR) - Impact: Some workflow instances will not be (fully) processed
Many background tasks will be executed by workflow, and these tasks are executed by user WF-BATCH
which is visible in the process overview. Tasks are always executed on a Business Object like a Purchase Order, which implies that whatever the task is doing should be done for a single "item" only. General rule is that a task should never take longer than say 10 seconds.
- Measurements: No background tasks run by user
WF-BATCH
should run longer than 10 seconds - Tools:
SM50
Process overview,SWI2_DURA
Report: average processing time of tasks, Workflow logs (of the flows you expect to be slow) - Impact: System performance A badly build background task can effectively stop the system, because
WF-BATCH
tasks are run with high priority...!