Friday, September 30, 2011

Test code quality: readability and maintainability

Next week I’ll get back into metaautomation, but for now, more on test code quality:

One of those great big life lessons I’ve picked up: in order to understand some system really well, you have to be able to teach it to someone else and have that person understand.

In terms of code, there’s a very high probability that others with job descriptions similar to you will be maintaining the code at some point in the future. So, nurture the habit of checking yourself: what would these future code-viewers think? Would they be able to understand my high-level coding decisions, if I’m not there to walk them through it?

If not, put in a code comment at a higher level of abstraction than the code itself. Brevity is good.

This is especially important to consider when writing methods that are intended as top-level, so no other code in that language in the system depends on them. For example, something that walks the system under test through some steps to achieve some atomic scenario, probably includes a number of steps that might not make sense to another tester, even given that (s)he knows the product. Self-documenting symbol names are good, but a higher-level comment might add teaching value.

I like to look at this as “beer truck” self-management: If I get hit by a beer truck on the way home today, could one of my colleagues step in and continue with work that is shared with the team? (Don’t worry, it’s just hypothetical, and I don’t like beer anyway.)

A colleague has to – in theory – step in and understand what’s going on. Consider that when writing stuff, and your code (with comments, if necessary) will be better for the team.

Thursday, September 29, 2011

Beware Duplicated Code

Test code has a self-esteem problem: it thinks it’s there as a one-off, to be thrown away. It exercises the product once, and that’s good enough to check the code in. Sometimes, hardly any thought goes into test code quality.

For games about sliding letters around on a touch screen, this makes some sense in that quality issues are low-impact anyway. But personally, that’s not why I like IT: I want to help computers solve important, impactful problems for people. Doing quality well becomes a real challenge.

Everybody knows about object-oriented (OO) principles, but when documentation is weak and pressures of productivity are high, code gets copied around and common problems get re-implemented. When this happens, cost and risk go up: bugs and oracle changes have to be addressed in more than one place. Sometimes code even gets abandoned, so it might look like a work item but it’s really not.

Find a way to use OO principles to minimize code duplication and keep the code relevant, just as with product code.

Either code is part of the product quality, or it’s not. If code isn’t used, get rid of it (it can be recovered; that’s what source code control is for) or comment any unusual status of the code for the product (e.g. class/method/member isn’t used right now, but kept around for some specific feature/milestone). Stray code can be expensive, because it can create maintenance cost. Commented-out code should be unusual, and if it’s needed, the commenting-out needs to be commented as to why it’s needed.

If it’s not contributing, get it off the team J

Duplicated code can easily become nearly-duplicated code, which rapidly creates a lot of cost and risk. True, you’re not shipping it, but keeping it readable and maintainable is very important. See my last post on the importance of this.

Do you have any fun (fun, in retrospect) bad-code stories?

Wednesday, September 28, 2011

If product quality is important, test code quality is too

I mean no disrespect to devs when I write this: test code can get even more complex than product code because of the larger number of dependencies to manage from tests perspective. If a product has N deployment modules, and M external dependencies, then Test has N + M external dependencies! Changes in any one of these dependencies can create a maintenance work item in the test code. The product is just being developed, too, so many changes are going on which sometimes cascade down the stack to test code. Even subtle changes in business rules might require changes to test oracles or simpler verification checks.

Readability and maintainability are therefore even more important to test code than they are to product code.

As with product code, test code should be self-documenting where possible, but that’s not sufficient: comments are good to have, where they help, at a level of abstraction higher than the code or pattern itself. This is important for product code of course, but even more so for test code, because of increased complexity and that coding skills tend to be very variable across members of the test team who write code.

With test code, the executables won’t ever be far from the source code, so information hiding (e.g. not showing a stack trace to end-users) isn’t an issue. Performance isn’t nearly the issue that it is with product code for many applications (other than maybe perf test code). So, log details on what’s going on! Strongly-typed log entries e.g. using enumerated types or XML are precise enough for automated parsing. Exceptions are a great tool, and you can make them even better

There are also lots of great tools out there to help with logging to one or more streams in a structured way. Consistency and structure make logged information more valuable and more extensible.

Do you have a favorite logging technique or tool?

Do you have a favorite strategy for test code?

Tuesday, September 27, 2011

Metaautomation: Is this a failure that we’re better off ignoring for the moment?

Yesterday’s post was about automated analysis of test failures. Today I post about accelerating business value from automated tests even more: moving past transient failures outside of team ownership, before even the results are reported.

In the age of information, systems are highly interconnected, and external dependencies are everywhere. Systems fail sometimes or just take too long for all sorts of reasons. Other systems (if done well) handle these hiccups gracefully.

Suppose the system that your team is working on is a high-impact system with external dependencies. It’s almost inevitable that there will be failures in the external systems that you really can’t do anything about (except maybe choose to make adjustments to your locally-defined timeouts). Will your system report on these failures to make a potential action item? In the interest of generating business value from the software quality team, should it?

IMO this is a great opportunity to skip past common non-actionable failures to increase the chances of finding something that really is actionable. People are smart, but they’re expensive to keep around, so we don’t want them to spend time on something that can be automated away.

Notice I didn’t say to toss out the artifacts from failures that can be quickly judged non-actionable. These artifacts could be useful later, and as noted earlier, storage space is cheap J

Suppose the team’s system is doing a financial transaction. An external authentication system (call it EAS) does a robust authentication for us. Sometimes, this external system times out, e.g. on the big shopping days after Thanksgiving or before Christmas. The system that the team owns handles this for external customers as gracefully as can be expected, and suggests trying again.

From the point of view of automated tests running on a version of the team’s system in a lab (or maybe, a live system with some automated tests as well) the EAS fails maybe several times an hour. The quality team doesn’t want to spend any time on these failures if it can avoid it, and it also doesn’t want to slow down the rest of the automated testing just because of a simple timeout; the team wants to get right to the more interesting test results if it can.

If the engine that schedules and runs your tests is capable of repeating a test, or running sets of atomic tests (see this post and reporting on the set? In this case, on a failed test, try again and see if the error is reproducible.

1.       Stop retry after N sequential failures that correlate with each other within C (and, report the reproduced error as a single test failure linked to the other failures in the set)

2.       Stop retry after M sequential failures (and, report on the failure set, at a lower priority than a reproduced error. All failures are linked together for potential follow-up)

3.       Stop retry on success (and, report the success, linked to any failed tries)

N could be configured as 2, and M might be 5. C could be 80%, or could be a set of correlation strengths depending on the correlation type, if multiple correlation types are used.

All artifacts get saved. All individual test tries get analyzed by the system, because here the engine that distributes and runs the tests is also looking at correlations between the tests.

One-off failures don’t require immediate attention from anybody, although it would be useful to go back and examine them in a batch to make sure that the failure pattern doesn’t show some larger problem that the team needs to take action on.

Failures that end up on tester’s plates are now fewer. People are much less likely to get bored and preoccupied with repeated test failures on external dependencies. The team is more productive now, and looks more responsive and productive to internal customers. Product quality is more effectively measured and monitored, and for a high-impact product, that’s where you want to be!

Can this be applied to your project?

Monday, September 26, 2011

Reality Check ...

On my last post:

Is this too dense? Would you rather eat 5 lbs of peanut butter in 60 seconds or less?

Please let me know what you think. I'm especially curious where people think this is applicable, or "I did something like this back at ABC Software ... "


Metaautomation: Have we seen this failure before? If yes, what’s the actionable?

In previous posts I’ve talked about quality artifacts:

·         Detailed info around the failure

·         Strongly typed info using XML, or at least formatted with a system for group use (e.g. labels from enumerated types to give consistency and avoid typos)

·         Context info around the failure

With consistent datetime formatting (or, even better, detailed-schema valid XML) and sufficiently real-time logging, logs can easily be correlated across threads.

The information needs to be formatted and clear in intent. Error messages or other information in text needs to be clear; too verbose is much better than too succinct, because if the message is too brief, it doesn’t help.

Consistent logs can be interpreted by a new member of the team, with a little product knowledge. Really consistent logs can be compared in an automated way, post-test run. For metaautomation, that’s what we need! With logging APIs that use team-specific enumerated types and custom classes, the constraints are expressed in intellisense and by the compiler, and this frees people to exercise their human intelligence on what the important stuff is being logged. (This may be the subject of a future post…)

Logs (XML, text stream, or something in between e.g. name-value pairs) are best stored somewhere on a WebDAV server (or, an FTP server) for later use, along with dumps, legacy logs, etc. A person or automated process can reference all the artifacts with read-only access for analysis after the test run!

If a tester does the analysis, it will be after the test run. Automated analysis is less risky after the test run, too, because otherwise the test can get delayed or just more fragile because of the additional points of failure from the analysis process.

I’ve implemented an automated analysis system where the artifacts are reduced to N properties per test run. The number and the names of properties and the order of evaluation are configurable. The methods of property value derivation are supplied with an executable module that only depends on the artifacts, and does internal error handling. In my case, this is a .Net assembly that exports public methods with names that map to the “Name” attribute in the XML template that determines the properties. This system is extensible by updating the XML template and the .Net assembly (and restarting the server process if necessary to pick up changes).

Every test run that meets certain criteria (in my case, I did this for one type of test only) gets reduced to valid XML stored in a row in a table in the database of automated analysis. Any failure can be associated with an actionable, and after that association, the actionable can be correlated with a later test failure by the automated system! This answers the question “have we seen this failure before?” with a decimal or floating point value for correlation strength.

This takes human input, at least for a while: people have to enter the action item association with a test failure. No system that I know is smart enough to do this reliably. I quote Michael Bolton on this:  “Sapient activity by [a human] … is needed to determine the response.”

How the different properties are weighted can be varied depending on the test failure and the actionable. For example, if a specific test resolution action item is specific to the (virtual) machine that runs the test, then the identity of that virtual machine is significant to the correlation strength, otherwise it’s insignificant and should not be factored into the correlation strength.

This system might seem a bit heavy for some applications, because after all… we’re “just” testing, right? I agree with James Bach and the Context-Driven school of testing value of any practice depends on its context.” And, if you’re creating a game about exploding birds, this approach isn’t appropriate. However, if you’re writing high-impact applications e.g. involving financial transactions, I submit that it is appropriate to getting value out of your automated tests and making real, significant failures quickly actionable.

In practice, very few test failures in such a system would point to product failures; most failures point to other actionables, or can be ignored temporarily. (This is the subject of a future post.) But, in order to draw attention to the important failures and act on them quickly, your system (including all automated systems and the testers that work with them) needs to be able to tell the difference, even more quickly.

With automated tests running often, it’s very easy for humans to get overwhelmed with results, and they respond with general strategies in order to cope. People are smart, but they’re not very good at dealing with repetitive things. That’s why you need an automated system.

Friday, September 23, 2011

Atomic Tests

Atomic tests are good. Not the Dr. Strangelove kind of atomic tests, I mean automated tests.

Atomic tests test one thing about the product. An atomic test can’t be broken up into smaller tests and still test the same thing, hence the “atomic” quality of the test. Atoms can’t be broken up into smaller pieces.

Atomic tests also must be independent of each other. They can be run individually or in groups, in any order, so they can be distributed and run faster than if they were run in a sequential set. Failures can be known, reported and acted on sooner (by humans or by an automated process).

A test that tests two things, A and B, risks not testing B at all because the test of A failed. So, to maximize test coverage, test B separately from A if possible. If it’s not possible to test B without testing A first (e.g. an end-to-end or E2E test) then be sure to test A without testing B as well (in an independent atomic test) to make the artifacts and reporting more clear and to eliminate any possibility of latent dependency of the test of A on the test of B.

So, if an E2E test requires running through A before testing B (e.g. “A” is creating an account, and “B” is doing some operation with that account) then there are at least two tests:

1.       Test A: Create an account

2.       Test B: Create an account, then do some operation with that account

Realizing business value with the automated tests includes some follow-up on the results, to ask questions like “have we seen this failure before?” (see this post The next favorite question of good testers is “can we do it again, to reproduce the same failure?” With atomic tests, failures are more reproducible because the tested actions are as simple as possible. Re-runs are quicker, too!

Adam Goucher writes about this here: “… a test case should only be measuring one, and only one thing.” and “Test cases should not be dependent on other test cases.” This isn’t a new idea in itself, but it’s a new thing to use the memorable term “atomic” as a simple rule for the team to follow.

… and if “atomic” isn’t memorable enough for you, stream the movie

Thursday, September 22, 2011

Persisting the Artifacts from Automated Tests

Storage space is cheap. Check that, storage space is very cheap, and getting cheaper all the time. Moore’s law reigns here. I see a 1TB internal hard drive for $55, for example, on my favorite online retailer of computer components. Next year I’ll probably buy a bunch of 4TB drives for personal use, even though my family’s data only amounts to 1.4TB so far (mostly digital photos).

Given a reasonably stable product with detailed logging and/or other reporting, businesses have a great incentive to save that stuff for a while, within constraints of confidentiality, privacy and personally identifiable information (PII). Anybody looking at quality will want to ask this question sometimes: has our product seen that behavior before? Is there a pattern? Patterns of behavior are great because they help characterize and prioritize product quality issues.

Test infrastructure that exercises the product has a strong causal relationship with these product artifacts, and helps illuminate them as well, e.g. “initiate process A, start transaction B …” in a temporal relationship with one or more threads / processes / cloud deployments of the product. In a controlled test environment in a lab (which could even be in the cloud) certain scenarios would be exercised repeatedly on a schedule, which can make correlation between artifacts even more valuable; given that some things stay the same e.g. the way the product is exercised or even the starting-point data, what changes becomes more significant, and more likely actionable.

Long-term management of artifacts of product behavior works well in conjunction with the artifacts generated by automated tests. Saving product artifacts (logs, errors, etc.) with the automated test artifacts (logs, exceptions, etc.) simplifies storage and analysis.

Tomorrow: atomic tests. Next week: more on automated parsing of logs, and optimizing logs for automated parsing!

Wednesday, September 21, 2011

Using custom exception types

A potentially costly item that comes up on automation failure (if you care, which you should) is delegating/triage: who, what, or how to follow up? There’s a need to add value to the product quality knowledge and preserve work that’s already been done, so if you don’t have time to follow up, the team has bigger problems L

Custom exception types are great for this, because they clearly and unambiguously denote problem source. They avoid the synchronization problems, typos, other omissions that logs can have, and become a programmable aspect of product quality. Strong typing means that the compiler is your friend, so a class of implementation problems is found earlier rather than later.

For products that ship to run remotely on client OS’s or in the cloud, they can be configured and turned on or off if performance requires it or information is limited for end-users benefit.

The “is-a” idiom works very well for handlers with an inheritance hierarchy of exception types. For example, suppose the product or project name is “Widget”:

Note too that “oracle” refers to the predictive agent, not the database company.

Each class can have it’s own data, accessors, serialization, constructors etc. Information about a failure is now encapsulated for the thread. The catch(…) clause specification uses the is-a idiom, so can be used to handle these exceptions in an organized way.

If there are other threads in the module or different modules, and they might be blocking on a synchronous communication, they can be interrupted through a back door for the lab environment, and information there encapsulated as well for packaging with the primary thread. This is more reliable than e.g. depending on log timestamp synchronization, although that can work as well.

Tomorrow: more thoughts on packaging and presenting information for humans (manual analysis) and automated processes.

Tuesday, September 20, 2011

Put actionable information in the automated test failure

This post follows up on yesterday’s post

Metaautomation is about maximizing the value of automation to the business.

When automation fails, you don’t want to have to do the test again with a debugger or other diagnostic tool, for at least these reasons:

·         It’s time consuming i.e. expensive

·         It might demand expertise an individual tester might not have, so would pull in another person

·         On reproducing the failure, there’s a good chance that behavior will be different, maybe even in a way that isn’t obvious, so now you’ve complicated the issue and/or lost information

·         Automated analysis (a Metaautomation topic to come later) wouldn’t be useful in this hypothetical case because it’s already established that important information is missing

There will always be some un-anticipated or unprecedented failures that fail the tests and require detailed follow-up. The detailed follow-up becomes affordable if most failures don’t require a detailed, time-consuming follow-up. So, let me optimistically say that I think if the correct information is included with the failure, 90% of failures won’t require a manual repro or a debugging session.

How to get there?

Logging helps, but it’s not sufficient. Important context information for the failure can be logged successfully. If the steps executed in pursuit of use case execution are logged, and except for the last one before the failure, they represent the happy path for the automated test, they could be useful. They also must be constructed well (timestamp, and at least one level of context for execution) to be useful. Consistency in labels and structures is very important for readability and for automated failure analysis.

Exceptions are key, because done well, they provide a single object to the test code or framework that contains all of the information you care to throw into it. They’re thrown by the framework at failure time, or a custom exception is created and thrown at the time a failure/exceptional condition is encountered. Custom exceptions are typically written in product code, and can be written for test code as well.

Using the same language or framework for the test automation as you use for the product is important, because if there’s a change, information will be lost across the boundary and/or you don’t have the same power in handling, wrapping and throwing exceptions. Java will work for this, but I prefer .Net Framework with C# (or any of the other .Net language implementations e.g. VB.Net which compile to the same IL anyway).

Think about a failure condition being measured (or just encountered e.g. in the case of a network exception) and rolling up the stack. At some point, a stack trace, exception type and message will be reported and recorded. Maybe that’s enough, but information could be lost too, for example if there’s some state that pertains to the failure that just got lost as it fell off the stack. In that case, you could improve failure reporting by:

1.       Catching the exception. Catch (System.Exception) might work here, depending on the needs of the calling code …

2.       Create a string with context information for the error, in the context of the scope of the try block corresponding to the catch of step 1

3.       Create an instance of a custom exception type, where the type communicates the current context/location e.g. CompanyAbcModuleDefTestException

4.       For the constructor of the custom exception, pass the exception of step 1 (to become the innerexception of the custom exception) and the string of step 2

5.       Throw the custom exception instance

6.       Repeat steps 1-5 in as many different contexts as helps nail down the root cause of the failure, hence the action item to respond to the test failure

If this sounds like a lot of overhead, remember what it’s like to struggle to reproduce an error at 1AM just prior to code freeze for ship… and also this is a very nice kind of functional test documentation, to benefit all the different testers who might need to maintain the code for other reasons.

Now, when the test code does final reporting of the failure, all that information is available in an object to be serialized in e.g. XML or HTML to create a highly stable stream, or even objects that lend themselves very nicely both to human readability and (for the future) automated test failure analysis. Clarity into the test code base or harness helps at refactor or extension time, too.

Monday, September 19, 2011

Fail soon, fail fast

This and the next 6 or so posts will focus on metaautomation, which is defined by the  post.

The most basic part of metaautomation is to maximize the chances that an automation failure is actionable, without necessarily even running the automated test again. The post linked above lists some example cases of this.

There will always be cases where more analysis requires loading up test code and SUT, but that’s time consuming (expensive!) and metaautomation wouldn’t help much either, except possibly for any automated triage (which I’ll address in a week).

At some point in a failed test, a failure condition occurs. The sooner the failure condition can be measured the better, because (assuming a single thread of execution) some information at that point that is important for analysis might be lost if the thread rolls up the stack with an exception or execution continues in vain.

Also, the faster the test ends after that point, the better, because

a)      (depending on the SUT) any further steps probably won’t add any useful information to the artifacts

b)       Resources to continue the test – including time - probably aren’t free and might be limited

Tomorrow, I’ll write about strategies to maximize relevant information, hence minimize the chances someone in test will have to load everything up with source code and step through to see what’s going on.

Friday, September 16, 2011

The Value of Automation

There’s a nice paper by Brian Marick called “When Should a Test Be Automated?”

But, I emphatically disagree with one of the assertions used in the paper: “Bugs are the value of automation…” When creating and running the automation some piece of automation the first time, bugs are part, or maybe even most, of the value of automation. When the automation is running in the lab on a regular basis, bugs could show up and those are very valuable to the team because they might mean that some change broke a part of the product. It’s very important to find such issues quickly.

IMO the greatest value of automation is to measure the quality of the product in a reliable, stable way, often and quickly. Without some good measure of the quality of the product, you can’t (or shouldn’t) ship.

Code coverage is one view of quality. Scenarios run is another. Perf and load testing are others. Automation helps with all of these.

On Monday, I’ll start with a daily series of posts on metaautomation.

Thursday, September 15, 2011

Manual testing will never go away

What do you do for a living? Test?

When people ask “So, what do you do?” is that what you tell them? It’s the generally used professional term – “Test” – but someone outside the profession probably has a greatly simplified view of what that means, which means it sounds like a very simple job. The inquirer might think “oh, that sounds easy, I can do that” but be too polite to say anything.

I have a bit of “dev envy” because everybody and their boss knows what developers do – they make computers do stuff. That sounds intimidating to someone who doesn’t work with computers, as it should – devs have a very challenging job. Devs tend to be smaa-art people, and others know it!

Testers OTOH, they just try a few things and see if it works, right? How hard can that be?

Since you’ve read this far, I presume you know that testers have very challenging jobs too. Sometimes I think testers have a job that requires even more smarts than devs – because good ones have to know almost as much as the devs do, plus they keep up on many different aspects of product development and quality, plus they tend at times to be much more interrupt-driven than devs, plus they have to interface with test members of many different roles as well as represent the customers’ interests. I’ve done both dev and test, and IMO test is more challenging.

At a large software company in Redmond WA, testers are expected to automate all of their tests (at least, that was my experience), which becomes even more challenging after running automated tests for a while because they tend to fail. James Bach offers an interesting perspective on this phenomenon here: For example, Bach notes that automated tests don’t catch bugs by default; manual testers do.

Adam Yuret (his blog is here ) reminded me of a great perspective on the process of automating tests: if testers define a test “case” exercising the product, a good tester executing the test case will find bugs by default. People are generally smart and observant, and they’ll spot things that are out of whack.

Common testing wisdom says that if that test case is automated, the benefit of the manual test is amplified because a) it’s run more often b) more quickly and c) more reliably. Unfortunately, that’s not generally the case; as Yuret points out, what a manual test generates in terms of a measurable result for product quality is greater than that same test run as an automated test, especially for graphical user interface (GUI) tests. It gets worse than that for automated tests: automated tests have to measure a result somehow, and by default they don’t. Your automated test has zero or more “checks” of product behavior.

If the automated test has zero checks, the result of the automated test will always be PASS no matter what happens. The value of the automated test case is now negative, because the product team is operating on information that the test case is being “covered” by the test part of the team, whereas actually it’s not.

Bach notes an example of this in his article (linked above): “The dBase team at Borland once discovered that about 3,000 tests in their suite were hard-coded to report success…” I’ve seen even worse: I’ve reviewed an automated test to discover that the result was always success, despite the fact that the product wasn’t driven to do anything at all.

The challenge is not just to make sure that the checks of product behavior are in there – i.e. the test fails if the product does not behave as expected – but that in case of failure, the automated test result is descriptive. A Boolean result of FAIL with no other information is not very useful other than as an indication that the test team has work to do on that automated test.

The value of manual test never goes away, at any point in the software development lifecycle (SDLC). Automation can be a powerful tool, but it’s not at all easy to do well. Testers have a very difficult and challenging job.

Next time (tomorrow), I’ll post about the value of automation, and then next week, move into metaautomation.

Wednesday, September 14, 2011


How does one define “quality?”

I like the definition that is used by Adam Goucher, Jerry Weinberg and others: “Quality is value to some person that matters.” OTOH, this needs to be more detailed to be meaningful to complex or high-impact software projects.

Given that the quality bar of software is getting higher (OK, except maybe for some social media applications and games about exploding birds) and our reliance on IT is increasing we really need to create a reasonably complete picture of the product quality before releasing it to the end-user. Actually, we must do it even earlier than that: to manage risk, we need to have some handle on quality while developing the product.

Quality is defined with input from the customer (“I want the software to do this…”) but it’s also complex and difficult and it wouldn’t work to lean on the customer for all aspects of what constitutes quality, especially non-functional quality. So, the team has to define quality in all its gory detail: the product owner, developers, and test / QA people collaborate and flesh out a quality document or documents. The definition of quality can’t be 100% complete, it will (or should!) evolve before product release as the product develops and market context changes, and it must include things like scale, perf, failure scenarios, security issues, discoverability and other things which the end-user can’t be bothered with.

Notice that “some person that matters” is still the end-user customer in the end, but the software development team members are now proxies for the end-user customer. In agile methodology, the product owner (PO) is formally the “voice of the customer” but really, all team members take on variations of that role in different ways. Robust quality requires leadership from the PO and it requires contributions from the other roles as well.

Test isn’t just about finding bugs. It has to measure quality through the whole process, and bugs are just part of the total quality picture. The rest of the team depends on this. Shipping without a complete picture of quality gives a high risk of the end-user finding a high-impact bug (i.e. a bug that the team would never have let out the door, had they known about it), which could be damaging to the business.

For a high-impact application – say, medical or financial software – if the measurement of quality at any point in the SDLC isn’t reasonably complete, the extent of the incompleteness represents unacceptable risk and is similar to lack in quality. The risk of unmeasured quality can increase during the SDLC due to runtime dependencies or design dependencies.

A quick takeaway is: test early and often, and measure Test productivity with breadth and detail of the quality picture, not just by number of bugs found or fixed!

Tomorrow: Test
Friday: starts a series of posts on metaautomation

Tuesday, September 13, 2011

Intro to MetaAutomation

It’s the MetaAutomation Blog!
The first think you might be wondering is: what’s metaautomation?

Automation of course is getting the software product or system under test (SUT) to do stuff without a person driving the SUT directly, e.g. while the team sleeps or parties, and with zero or more checks of product behavior with Boolean results that get reported somehow. I’m thinking of E2E or scenario tests, not e.g. unit, perf or stress tests. A failure in basic automation is often useful only as an indicator that the test team has more work to do.

Metaautomation is taking it to the next level: basically making automation failures actionable, up to and including automated triage.

The second thing you might be wondering: why is this important to me, the SDET and advocate for product quality?

In a typical software creation scenario there are two prominent metrics in use by the test team: number of test cases and number of test cases automated. The test automaters are under some pressure to deliver as many automated tests as they can, but in a week or day, some or most automated steps begin to fail. The report that shows the flipped bits – the automated test cases that are now failing – but doesn’t give much more info than that, so at some point the test automater has to stop automating new test cases and go back and try to fix the failures. This can take as long as the original automation for that test effort, so sometimes it isn’t done at all – the failed automation continues failing as before, and has little, no or even negative value to any useful measure of product quality.

When you’re the owner and you get word of an automation failure, a variety of questions might come to mind:

Is the root cause a product failure, an automation failure, a dependency failure, a race condition beyond team ownership, or something even more complex?

Has this root cause failure happened before?

Would it happen again? Is it even reproducible, reliably or at all?

If it looks like a product failure, does it require debugging the product, or just the test code, to get more information before you pass the issue on as a bug?

If root cause is or might be beyond team control (but, it’s not an issue for the product end-user) will you just try again to see if it passes the 2nd time?

Metaautomation addresses all of these issues. Note that, unlike automation, metaautomation depends on knowledge of product implementation, e.g. the source code.

You could think of it like this: the relationship of metaautomation to automation is analogous to the relationship of automation to manual test. Manual test is one of the first things you do to measure product quality, and the importance of manual test never goes away, but the power of the team in maintaining and promoting product quality is significantly enhanced with good product automation (including at least minimal reporting of the automation results).

Manual test is costly in terms of people and time, and it’s never really reliable because people are human.

Automation addresses the time cost (by running automation) the boredom (by having a computer, lab, or cloud processing, do the work for you) and some other issues as well (e.g. frequency of the measurement of the product, consistency, reliability…). That’s why automation is so big these days. Everybody in software knows the value, so everybody’s doin’ it.

Unfortunately, when automation fails, manual analysis can be very costly in terms of time. It might mean loading the test code, and the product, hooking up or simulating external dependencies and doing a debugging session. A complex or difficult bug can take hours or days of follow-up. Sometimes, in consideration of these costs, it’s not even done at all.

Metaautomation addresses these costs and, if done well, can increase team efficiency and make product quality stronger.

If you’re doing good automation, you’re probably already doing the most basic kind of metaautomation: logging, as in having your test code and/or product stream, at least, hard-coded messages to a text-based file. This is a kind of metaautomation because it’s orthogonal to having the product do stuff and test runtime verifications; it’s the first piece of post-run test analysis, and it can be used to feed other pieces of metaautomation that I’ll be getting into with future posts, although there are better sources of information (which will be addressed in future posts as well).

Metaautomation isn’t the answer to all scenarios. There will still be some manual analysis, just like there will still be some manual testing. But, metaautomation speeds analysis by reducing the need to examine the product or test code (especially by saving the need to debug through the code) and it can speed triage of issues as a direct result (depending on team process). Automated automated (yep, 2nd-order automated) test re-runs and automated test failure analysis are other pieces of metaautomation that can speed human-based triage further, or even bypass it.

The third thing you’re probably wondering is: why do we need a new word for this? Are you kidding me - six @#! syllables?

Automation is summarized nicely here:

Note that the article briefly discusses reporting, but not analysis. Most of the value of automation comes after the automation is created: that’s where metaautomation can help a team shine!

The plan for this blog:

I plan to post every day or so for as long as it takes to (re)define

·         Quality

·         Test

·         Metaautomation, the what and the how (this will take roughly a dozen posts)

·         Values in automation

·          … I have other ideas, and the blog should have momentum by this time …

Following posts will tend to be shorter and might be less dense.

Oh, yeah, note too:
This blog is not associated in any way with the industrial and marine company although, I do like boats, especially sea kayaks.