Thursday, September 24, 2015

Manual Test to MetaAutomation, Continous Testing and DevOps

A software test is some procedure with intent to measure quality.

When the test procedure is documented, usually the expected result is a pass or fail, but it could be some other basic measurement as well e.g. time for completion in the case of a simple performance test.

With exploratory testing, the result could be any issue that the person doing the exploration perceives as an issue, usually limited to issues that may be actionable by the team at some point in time.

There are many other kinds of test as well, and many people who occasionally fulfill the testing role by being open to recording, for the benefit of the team and the product, some issue of quality. Anybody with responsibility for the product, aka system under test or SUT, can do this role.

There are many tests where the procedure is automated and the pass/fail criterion are both automated, i.e. “scripted.” These can be fast and cheap to run, but the only information recorded is the result (the “artifact”) of the test. There are many issues that humans could potentially observe while they are testing the product, but for such issues which are difficult, risky, or expensive to code for detection, or just aren’t anticipated by whoever is scripting the test, recording doesn’t take place.

For example, while testing a REST service, a human tester might notice that performance is usually quite good but periodically slows down significantly. This can be worth recording as a “bug” for investigation and possible correction by the tester. But, a scripted test of that same service might not record any such pattern, other than a well-set time limit which fails the test prematurely sometimes so records a subset of the information that a human would perceive. A human could always decide “well, let me time this” and get out a stopwatch app to record the timing of the service response, but no scripted test could do this.

For another example, suppose a tester is working on a credit union portal, and notices that on the login page, the page layout is not acceptable in some way. A scripted test is very unlikely to include a measurement that would perceive this, and for good reason; such a script would be complex, slow and flaky.

Such tests are often classified as either “manual” or “scripted.” (There is also the term “automated” but that term is problematic, as I’ll describe below.) With the manual tests, testers can perceive, characterize and record a wide range of issues. This is true whether or not a stopwatch app or some other tool is used (in addition to the SUT, which of course is also a kind of tool); the human tester perceives issues. People are smart that way.

With the scripted tests, the quality issues that are noted are limited to what is explicitly coded as a measurement (aside from “the SUT crashed” which fortunately is less common these days). The number of measurements are best limited for many reasons, e.g.

·         If a quality measurement fails, but there are other quality measurements that are never made, those other quality measurements get hidden and not measured

·         The script must be fast

·         The script must be simple and robust as possible


Scripts written from manual tests tend to be brittle when up against complexities that human testers barely need to think about, e.g. race conditions or design or implementation changes.

The term “check” describes a type of test that is entirely scripted, including the quality measurement that is the target verification of the check. The term “check” is useful for many reasons, but is very important for avoiding a business risk that results from a misunderstanding:

Many tests are written as manual tests i.e. the original intent, and usually the original “run” or execution of the test, is by a person on an interface that is designed for humans e.g. a GUI or a web page or site. A very common practice is to take these tests and “automate” them i.e. write script that makes the SUT do just as if a human were running the tests, or as close as possible. Management and the team then assumes that the quality risk that would be covered by manual testers running the manual tests, are now covered by the “automation,” and they understand that there is no need to run the tests manually any more … but, they are mistaken.

Using the word “check” avoids this misunderstanding, because it’s clear (at least, more clear) that the only quality measurements happening are the ones explicitly coded into the newly-scripted procedure. This is similar to the distinction between “manual” and “scripted” or “automated” testing: “manual” testing means that it’s a person who perceives issues or decides how the test result is reported. People are smart! Computers and scripted procedures: not smart! (This leaves artificial intelligence aside for the moment, because it doesn’t apply here, nor for the near future.)

This is a good thing for clarity: manual testing will always be important for what that role is good at (e.g. finding bugs), and scripted testing is good for what it’s good at (regressing quality and making sure it all still works); actually, it can be an extremely powerful driver of quality, as I explain below.

Here is another good reason to not use the word “test” when we’re really talking about scripted verifications: the common phrases “test automation,” “automated test,” and “automated testing” are all oxymorons.

The term “automation” comes from industrial automation. Automation is all about producing a product or products faster, more accurately, or more cheaply, and with fewer people than before. This is used in IT departments as well, with no physical product like a factory would have, but in that case as well, the end product is the goal of the automation.

“Test automation” is about making the SUT do stuff, but nobody cares about the product output (unless that is needed for a quality measurement); it’s in development. Actually, people do care about quality measurements of the product related to what the product does, but that often has nothing to do with the by-design results of the product. If data is used to drive the product, if it looks like real-world data, it’s usually faked or anonymized for business and quality reasons. The word “automation” is a poor fit for software quality, so disassociate it from the modifying word “test.”

“Check” is a more accurate, and less risky, term to use. “Scripted verification” works too, and emphasizes that mostly what one is doing with scripted checks is simply verifications.

It might seem now that there’s a downside to checks or scripted verifications; since we know that humans notice stuff that checks do not, and that checks need to be short and simple to be useful, why write any scripting for quality measurements at all?

This answer is well-known: the scripted checks are highly repeatable, they can be fast, and they can (sometimes) be made very reliable. The results of the scripts are recorded for record-keeping and simple analysis and follow-up.

There’s another answer that builds on that, and as yet is not well-known: if the checks are written in a certain way, scalability, robustness and reliability, analysis and communication around the team can be vastly more effective.

The foundational, least dependent (and the most innovative) pattern of MetaAutomation is “Atomic Check.”

“Check” simply means that it’s a scripted (or, automated, if you will) verification.

“Atomic” means that it’s as simple as possible while still achievement measurement of the targeted behavior of the SUT.

Atomic Check describes exactly how to design the optimal check for each of your prioritized business requirement. It has these benefits as well:

·         Checks are all independent of each other, for scaling

·         Check artifacts are pure data, e.g. XML, providing for efficient and effect analysis across the product and SDLC and customized presentations for different roles across the team

·         Check steps, and their results (pass, fail, blocked) are self-documenting

·         The artifacts from check failures are highly actionable, so minimize the need to reproduce errors, including transient errors

·         They are generally written in the same language as the SUT, so there is no need to learn (or invent) a new language

DevOps is powerful in what it can do for shipping quality software faster. Continuous testing is the test/QA side of DevOps. MetaAutomation is the optimal way to do continuous testing.

Oh, and remember how important the manual testing role is? Atomic Check self-documents exactly what is working, so relieves manual testers of repetitive drudgery, allowing them to focus on what they’re good at. Duplicative work goes away.

More information on the MetaAutomation pattern language is here:

and, in other posts on this blog.