Wednesday, August 5, 2015

Stronger Quality with MetaAutomation: Part 2 of 3, Handling a Check Failure


What happens on your team when a check (what some call “automated test”) fails?

If you follow the common practice of automating manual test cases, then the authors of the automation are expected to follow up and diagnose the problem. Usually, it’s a failure in the automation itself, everybody on the team knows that, so it doesn’t get much priority or respect generally, but even if it does, it’s very time consuming and labor-intensive to resolve the failure.

Alternatively, the author of the automation watches the automation proceed to see if something goes wrong. That shortens the communication chain, but it’s very time-consuming, expensive, and doesn’t scale at all.

Regression tests or checks that are effective towards managing quality risk must be capable of sending action items outside the test/QA team quickly. False positives, i.e., messages on quality issues that ultimately turn out to not concern the product at all, are wasteful and they corrode trust in the test/QA team. Therefore, quality communications must be quick and trustworthy for test/QA to be effective.

On check failure, people can look at flight-recorder logs that lead up to a point of failure, but logs tend to be uneven in quality, verbose, and not workable for automated parsing. A person has to study them for them to have any value, so the onus is on test/QA again to follow up. Bottom-up testing, or testing at the service or API layer, helps, but the problem of uneven log quality remains. Mixing presentation with the data, e.g., English grammar or HTML, bloats the logs.

Imagine, instead, an artifact of pure structured data, dense and succinct, whether the check passes or not. Steps are self-documenting in a hierarchy that reflects the code, whether they pass, fail, or are blocked by an earlier failure.

MetaAutomation puts all of this information in efficient, pure data with a schema, even if the check needs to be run across multiple machines or application layers.

A failed check can be retried immediately, and on failure, the second result compared in detail to the first. Transient failures are avoided, and persistent failures are reproduced. Automated analysis can determine whether the failure is internal or external to the project, and even find a responsible developer in the product or test role as needed.

If so configured, a product dev would receive an email if a) the exact failure was reproduced, and b) the check step, stack trace, and any other data added by check code indicates ownership.

Atomic Check shows how to run end-to-end regression tests so fast and reliably, they can be run as check-in gates in large numbers. Check failures are so detailed, the probability that a problem needs to be reproduced is small.

With MetaAutomation, communications outside test/QA are both quick and trustworthy. See parts 1 and 3 of this series for more information.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.