What happens on your team when a check (what some call
“automated test”) fails?
If you follow the common practice of automating manual test
cases, then the authors of the automation are expected to follow up and
diagnose the problem. Usually, it’s a failure in the automation itself, everybody
on the team knows that, so it doesn’t get much priority or respect generally,
but even if it does, it’s very time consuming and labor-intensive to resolve
the failure.
Alternatively, the author of the automation watches the
automation proceed to see if something goes wrong. That shortens the
communication chain, but it’s very time-consuming, expensive, and doesn’t scale
at all.
Regression tests or checks that are effective towards
managing quality risk must be capable of sending action items outside the test/QA team quickly. False positives, i.e., messages
on quality issues that ultimately turn out to not concern the product at all,
are wasteful and they corrode trust in the test/QA team. Therefore, quality
communications must be quick and trustworthy for test/QA to be effective.
On check failure, people can look at flight-recorder logs
that lead up to a point of failure, but logs tend to be uneven in quality,
verbose, and not workable for automated parsing. A person has to study them for
them to have any value, so the onus is on test/QA again to follow up. Bottom-up
testing, or testing at the service or API layer, helps, but the problem of
uneven log quality remains. Mixing presentation with the data, e.g., English
grammar or HTML, bloats the logs.
Imagine, instead, an artifact of pure structured data, dense
and succinct, whether the check passes or not. Steps are self-documenting in a
hierarchy that reflects the code, whether they pass, fail, or are blocked by an
earlier failure.
MetaAutomation puts all of this information in efficient,
pure data with a schema, even if the check needs to be run across multiple machines
or application layers.
A failed check can be retried immediately, and on failure,
the second result compared in detail to the first. Transient failures are
avoided, and persistent failures are reproduced. Automated analysis can
determine whether the failure is internal or external to the project, and even
find a responsible developer in the product or test role as needed.
If so configured, a product dev would receive an email if a)
the exact failure was reproduced, and
b) the check step, stack trace, and any other data added by check code
indicates ownership.
Atomic Check shows how to run end-to-end regression tests so
fast and reliably, they can be run as check-in gates in large numbers. Check
failures are so detailed, the probability that a problem needs to be reproduced
is small.
With MetaAutomation, communications outside test/QA are both
quick and trustworthy. See parts 1 and 3 of this series for more information.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.