SAP Test Automation: What to Test Matters More Than How — 4 Criteria
You've adopted the tool. Your team can build test scenarios, and execution is fast. Yet before the next scheduled upgrade, the person who opens the screen is still stuck on the same question:
"So… what do I actually test?"
Thousands of transactions, hundreds of condition combinations, changes piling up every release. The automation tool doesn't answer this. It solved how to test — and left what to test completely untouched.
This post is about the what.
'How' and 'What' Are Different Problems
Almost every conversation about SAP test automation has pointed in one direction: how to run faster, how to survive UI changes, how to read results accurately. All of it is about execution — the how.
Tools handle this well. Build a scenario once, and they run it faster and more consistently than any person.
But the faster the tool gets, the clearer one thing becomes: you can test the wrong things quickly and accurately.
Pick the wrong targets and flawless execution still leaves the real failure points unverified. The real difference is decided before execution even starts — at the step where you choose what to verify out of countless candidates. That's the gap tools don't fill.
Why 'What' Is Hard: Three Common Traps
Choosing what to test isn't hard because of skill. It's hard because there's no basis for choosing. Without one, teams usually fall into one of three traps.
Test everything. The safest-looking option. But the window between upgrade and go-live is days, maybe one to two weeks. Put everything in scope and you won't finish in time, so you end up cutting mid-way. A plan to test everything becomes a plan that finishes nothing.
Cut by gut. "Just cover the essentials" is realistic — but without a basis for what counts as essential, you lean on a senior person's instinct. When they're out or they leave, the basis leaves with them.
Run what you ran last time. The most common choice. But this release's changes weren't in last release's regression pack. Repeat the familiar and the newly risky areas slip through unverified.
All three share one root cause: no method for deciding what to test. What's needed isn't a faster tool — it's a way to narrow the candidates.
4 Criteria for Deciding What to Test
There are four criteria for narrowing an overwhelming whole down to what matters. One note first: each criterion only works if you actually have the input it needs. Know the criterion but lack the input, and you're back to guessing. So below, each criterion comes with what it requires.
Criterion 1. What Changed This Time
Start with "what does this change actually touch in our system?" SAP release notes run hundreds of pages, semi-annual releases bring thousands of changes, and only some affect you. Checking all of it by hand isn't realistic.
SAP knows this. For SAP S/4HANA Cloud Public Edition (GROW with SAP), there's a tool called RASD (Release Assessment and Scope Dependency). Ahead of an upgrade, it compares the upcoming release against your actual system usage and gives you a personalized list of deleted, deprecated, new, and changed objects — apps, APIs, CDS views, scope items. It shows which custom CDS views and apps are hit by deletions and deprecations (where-used), suggests where to focus testing based on degree and impact of change, and integrates with SAP Cloud ALM so you can turn findings into upgrade tasks.
Here's the key distinction: RASD tells you what changed and what to look at — it doesn't test those items for you. It hands you the input for Criterion 1: the list of what this change touches in your system. Turning that list into actual verification scenarios is still on you. And outside Public Edition, you'll need another way to get the same change-to-usage mapping.
In short, this criterion needs a mapping of what the release changed against where it lands in your system. For turning release changes into test scope, see SAP S/4HANA Upgrade Testing Strategy.
Criterion 2. What Stops the Business If It Breaks
Not every process carries the same weight. O2C and month-end close halt revenue and closing the moment they break; a report run once a quarter can wait a few days. The same defect lands very differently depending on where it hits.
So rank by "what happens if this stops?" — how many downstream steps stall, how many users are affected, whether revenue, closing, or audit are directly on the line. The tighter the timeline, the more it pays to fill from the top down.
The starting point is knowing your core business flows. Standard processes like O2C, P2P, and R2R usually form that backbone — and having those flows and their business priority mapped in advance makes the "what first" decision far clearer.
Criterion 3. Where Data Hands Off
Many real failures happen not on individual screens but where one module hands data to the next. The order is created fine in VA01, then breaks somewhere along delivery → goods issue → billing → accounting. Each transaction passes on its own; the end-to-end flow stalls.
So choose by flow, not by screen. Map where data hands off to the next module and the points worth verifying appear. A screen-by-screen checklist never surfaces those handoff points at all.
What you need here isn't a screen list but a process flow map — the steps showing which module passes data to which. With standard E2E flows already laid out, you adapt the map to your environment instead of drawing it from scratch. See Scenario-Based E2E Test Design.
Criterion 4. Exceptions That Only Exist in Production
Clean sample data only passes the happy path. Real failures hit the combined-condition exceptions — overseas sales with special discounts, multi-currency payments, tax-exempt transactions. A broken standard order is caught fast; a defect that fires only on a specific condition combination spreads quietly in production and erupts all at once at month-end close.
Looking at which exceptions actually occur in production — and how often — surfaces the "cases you can't afford to miss." The problem: sample data can't produce that list, because which combinations really happen lives only in production data. So this criterion depends on whether you can see real operational data. For using production data in verification, see SAP Data Migration Testing Strategy.
The Inputs Behind the Judgment
Each criterion needs its own input — and those inputs come from different places.
Criterion 1's input — the list of what a change touches in your system — is increasingly covered by tools like SAP's RASD. The other three are the problem. A standard-process backbone, a map of where modules hand off data, a list of production-only exceptions — release notes and impact-analysis tools don't give you these.
PerfecTwin's role here isn't to decide what to test for you — it's to put those inputs in your hands. Extracting real transaction data from the production DB produces Criterion 4's list of production-only exceptions; SAP standard process templates lay out the business backbone and E2E flows that Criteria 2 and 3 start from. With RASD covering Criterion 1 and PerfecTwin supplying the rest, all four criteria run on data instead of gut. The judgment is still yours — it just stands on firmer ground.
Where AI Fits In
Deciding what to test is exactly the kind of judgment the industry is now turning to. Across the SAP ecosystem, the move toward analyzing data to support human decisions is spreading fast, and testing is no exception. Analyzing test data and SAP master data to suggest "start with this combination of scenarios this time" is a natural next step — helping people apply the four criteria above faster and more accurately.
To be clear: AI doesn't decide what for you — it assists the judgment. Responsibility for what to verify still sits with the people who know the business. What's shifting is that data is making the starting point of that judgment far clearer.
In the End, the Difference Is in the 'What'
Tools solved the how. Scenario creation, fast execution, reliable repetition — that's the tool's job now.
What's left is the what: choosing what to verify out of an overwhelming whole — what changed, what can't stop, where data hands off, what only exists in production. A team that weighs these four on data catches the defects before go-live; a team that cuts scope by gut meets the same defects after go-live, as production incidents. That's why the same upgrade ends differently for two teams.
If you've adopted a test automation tool and still feel stuck, it's time to change the question — from "how do we run it?" to "what should we run?"
Want to see how PerfecTwin supports this judgment?
→ Request a demo