Tuesday, September 24, 2013

Smart Retry Your Automated Tests for Quality Value

If you automate a graphical user interface (GUI) or a web browser, you’re very familiar with this problem: there are many sporadic, one-off failures in the tests. Race conditions that are tricky or impossible to synchronize and failures from factors beyond your control or ownership break your tests, and the solution too often is to run the test again and see if it passes the 2nd time.

The result is dissonance and distraction for whoever’s running the tests: there’s another test failure. Does it matter? Do I just have to try it again? I’ll try it again, and hope the failure goes away.

Imagine transitioning your job from one where most issues that come to your attention are not actionable (e.g. “just ignore it, or try it again and hope the issue goes away”), to one where most issues that come to your attention are actionable. That sure would help your productivity, wouldn’t it?

I wrote about this topic here in some detail:

Now’s a good time for your organization to bring it up again. Smart Retry is an aspect of 2nd-order MetaAutomation:

Smart Retry is very valuable for your productivity and communication around the organization, but if you want to get there, you need two things which each have significant value in themselves:

2.       Tests that fail fast with good reporting http://metaautomation.blogspot.com/2011/09/fail-soon-fail-fast.html


3.       A process with some programmability to run your tests for you and make decisions based on the results

On item 3: If you are running your tests in parallel on different machines or virtual machines or in the cloud, you will have this already, and if you don’t have this, you will because the business value makes it inevitable.

For a distributed system, you will need also a non-trivial solution for this:

4.       A service that provides users for given roles from a user pool, for time-bound use with an automated test

A Smart Retry system is an automated solution to substitute for a big piece of human judgment: whether to just run the test again, vs. taking a significant action item on it. It adds a lot of business value in itself, and it also complements other systems that scale and strengthen the Quality story of your organization.

How to Find the Right Size for your Automated Tests

Here are some reasons you might do some automation for your Quality efforts:

1.       It might save a lot of time and effort, because it means manual tests that don’t have to be run again and again by humans

2.       The results of the tests can be more reliable and consistent than those of manual testers

3.       Done right, it will bring Quality information to your team much faster than with manual testing, which reduces uncertainty and wasted effort, and can help you ship faster

You want to automate the most important stuff first, of course. You know what the scenarios are. But, should you automate it as one huge run-on test, or a set of smaller ones, or something else? How do you plan your automation effort for greatest value?

Atomic tests are important. See http://metaautomation.blogspot.com/2011/09/atomic-tests.html for a previous post.

But, how atomic do you have to be? If you need the right user and you need to log in etc. isn’t it faster to just go through all the verifications you want in one big test, so you don’t have to log in multiple times etc.?

It might be faster to write the automation if the system is well-understood and stable, and it might be faster to run it as one huge scenario, too, assuming all goes well and the test passes. But, what if part of the test fails for any reason? What if you ever want to look at the test results, or even further, automate triage or do smart retry?

Smart Retry is the topic of my next post, here http://metaautomation.blogspot.com/2013/09/smart-retry-your-automated-tests-for.html

A failure in an automated test should end the test immediately (see http://metaautomation.blogspot.com/2011/09/fail-soon-fail-fast.html) if you’ve chosen your verifications wisely – otherwise, any remaining results in that automated test might be invalid anyway, and you’re wasting resources as well as burying the original failure i.e. making it more difficult to turn that into an action item. Automated tests often fail due to failures in the test code, or race conditions, or failures in external dependencies. When they do fail, and if significant product verifications aren’t being run because of the early failure, that means that significant parts of the product are not being tested by automation, and if you don’t figure out what parts are missing and run them manually, significant parts of the product aren’t getting tested at all!

Shorter, atomic tests scale better, because

·         You can retry them more quickly

·         They have focused, actionable results

·         You can run each one in parallel on a VM (or can in future, when the infrastructure is there) which means the whole set of tests can be run much faster

Atomic tests need actionable verifications, i.e. verifications that can fail with a specific action item. You never want a test to fail with a null-reference exception, even if it might be possible to work backwards in the logic to guess at root cause of the failure. The actionable verifications happen as the atomic test runs, so that in case of failure, the failure is descriptive (actionable) and unique for the root cause.

But, skip doing verifications that aren’t necessary for the success of the scenario. For example, there’s no need to verify that all PNG images came up on a web page; you need manual tests for that and many other aspects of product quality, and anything, you don’t need another source of potential intermittent failures to gum up your tests. Limit your verifications to things that, if they fail, the whole atomic test is headed for failure anyway. It’s those verifications that help point out the root cause of the failure, which in turn help find an action item to fix the test.

This might seem like a conflict: tests are more effective if they are shorter (atomic), but they need lots of actionable verifications anyway, so doesn’t that make them long?

I’ll clarify with two examples, Test A and Test B.

Test A is an automation of a simple scenario that starts from a neutral place (e.g. the login page to a web site), does some simple operation and checks the results including these two tests:

1.       Verification that a certain icon is loaded and displayed

2.       Verification that a certain icon has the correct aspect ratio (e.g. from something in the business logic)

Test A looks like an atomic test, because an important goal of the scenario is that icon and that’s what the verifications focus on. It does not make sense to break A into smaller tests because in order to do verification 2, the test will do verification 1.

Test B is similar but is aimed at a different result: some important text that accompanies the table that includes the famous icon of Test A. Test B does these verifications:

1.       Verify that the icon is displayed

2.       Verify that the text accompanying the table in which the icon appears is correct according to business rules

Test B is NOT an atomic test, because the verification 1 isn’t necessary for verification 2. Test B is better broken up into two tests or, better, as part of a suite with test A just remove verification 1 from test B because that verification happens in test A anyway. Verification 1 in test B might fail and therefore block the really important verification 2. Note that a failure of verification 1 could happen for lots of reasons, including

·         Product design change

·         Test code failure

·         Transient server failure (timeout)

·         Deployment failure

·         Product bug

So Test B is better off without verification 1, given that Test A is run as part of the same suite.

The “right size” for an automated test is:

·         Long enough to test an important property of a scenario, but no longer

·         Contain actionable verifications for the steps of the test, so failure is actionable

·         Short enough that no verifications are present that are unnecessary to the basic flow of the test