Monday, October 17, 2011

Automate Business Logic First

At PNSQC 2011 last week, I met some very interesting and smart characters. One of them was Douglas Hoffman, current president of the Association for Software Testing. (some links: http://www.softwarequalitymethods.com/,

One of Douglas’ observations is that API-level E2E testing is 10 times as fast as graphical user-interface (GUI) level testing. He knows this from a very impressive array of industry experience.

Alan Page writes: “For 95% of all software applications, automating the GUI is a waste of time.” http://blogs.msdn.com//b/alanpa/archive/2008/09/18/gui-schmooey.aspx

I agree. For a software product in production or a project under development, assuming that there is some degree of separation between the GUI and the business logic it depends on, it’s quicker and more effective to automation the business logic, not the GUI. Some would call this automating the application programming interface (API).

I currently have the privilege of working on a team that does this right: the GUI layer is thin in terms of logic. The business happens below the API.

Here are some things that make automation that includes the GUI expensive:

·         GUI design can change often, because it’s what is displayed to end users. GUI is complex and laden with personal values. Whenever there’s a change in the GUI, any automation that depends on it must be fixed.

·         GUIs must be localized, and this usually means much more than just changing the displayed strings, introducing an additional level of instability to GUI automation.

·         GUI automation is rife with race conditions due to the nature of display.

·         Brian Marick: “Graphical user interfaces are notoriously unstable.” http://www.stickyminds.com/getfile.asp?ot=XML&id=2010&fn=XDD2010filelistfilename1%2Edoc

·         Automating the GUI takes many more dependencies on the product than automating the API, because the GUI is much more complex than the API. A result of this is that the GUI automation is much less atomic than API automation (see an earlier post http://metaautomation.blogspot.com/2011/09/atomic-tests.html), therefore riskier.

·         GUI automation requires an entire display mechanism, at least in software, even if the window is hidden. This involves significant overhead.

I’ve never seen stable GUI automation. What I’ve seen instead is that it’s expensive to keep going with GUI automation, if the team cares about keeping it running.

I’ve seen this many times: automation which includes the GUI fails, and it’s understood – usually correctly, but not always – that it’s just a GUI problem causing instability. This can have the effect of hiding deeper product problems, which can introduce a lot of risk to a project.

Here are some reasons to automate the business logic instead:

·         API automation is simpler and more transparent to write

·         API automation is much more stable

·         API automation runs 10 times as fast as GUI automation (from Douglas Hoffman again) (although, in my experience the difference is even larger)

·         There is no display overhead

·         With API automation, the compiler is your friend (assuming you’re using a strongly-typed language, which you should be. See this post http://metaautomation.blogspot.com/2011/10/patterns-and-antipatterns-in-automation_03.html)

·         Failures in API automation are much more likely to be actionable and important …

This last point is huge: If API automation fails for some reason other than for dependency failures, timeouts or race conditions (e.g. as mentioned here http://metaautomation.blogspot.com/2011/09/intro-to-metaautomation.html ) then it’s due to some change in business layers of the product and this instantly becomes a quality issue. It might be due to some intentional design change that the developers are making without first telling Test, but just as often it’s a significant quality issue that is worth filing a bug – and in that case, the team just found a quality failure essentially instantly, so it can be fixed very quickly and downstream risk is minimized. If it’s your automation that found the bug, you’re a hero!

Here’s another reason to focus on the business logic:

I’ve heard it said that it’s better to automation the GUI, because then you’re automating the whole product. At a shallow level, this has the ring of truth, but consider this perspective instead: suppose your team focuses on automating business logic, and therefore gets much more quality information, quicker regression, better coverage etc. Then, a GUI bug is found, and this bug is found later in the software development life cycle (SDLC) than otherwise, but no worries: the risk of deferring a GUI bug fix is very low, because if the application is at all well-designed, none of the business logic depends on the GUI flow.

Manual test will never go away, and people are smart and able to spot all sorts of GUI issues that automation can’t do without huge investment in automated GUI checks. Therefore, the GUI bugs are likely to be found by manual testers anyway, and they’re still relatively low risk because the rest of the product doesn’t depend on the GUI.

This is why I’m happy to focus on business logic automation.

Wednesday, October 12, 2011

No, you can't outsource quality (detour from antipatterns topic)

Due to illness and travel and the desire to put more attention into this, I'm not ready to continue the series of posts on antipatterns at the moment.

Twitter (140 characters?) and my available hardware didn't allow posting at the time, plus I was paying attention and not multi-tasking so here's my discussion two days after the fact.

It was satisfying to skewer Julian Harty in the auditorium this morning, though, if a little bit scary (... do people really believe what he's talking about?).

Harty's theme was "The Death of Testing." To be fair, I think the title and theme may have been influenced by simple business considerations of the PNSQC conference at which this took place, and they're trying to attract people who do software quality professionally to the PNSQC conference by scaring them into fearing for their jobs. If so, it worked, and attendance was high.

I want to give due to Harty's presentation skills; he's very good at engaging the audience.

The main thesis of his talk seemed sincere. He was talking about Google practices, and honestly qualified his comments by pointing out that he left Google in June of last year. (hmm, wonder how that happened...)

The idea is that "testing" in the broad sense of measuring and monitoring the overall quality of the product can be outsourced for free. Google does this with the "Give us feedback" functionality on their sites. The idea is that each of the many, many end-users of Google's products have the opportunity to tell somebody on the appropriate internal that there's some problem, and communicate with some individual at Google about the process of fixing it.

This works rarely, but often enough given that there are so many Google users.

Harty's thesis: this is free for Google, the quality is better because there are more eyeballs, and Google appears to respect customers and strengthen loyalty. Google has successfully outsourced quality.

... Yeah?  Copious steaming bovine excrement.

If I find a good bug this way, and go through the Google-prescribed process of getting it fixed, I could receive a cash prize of a few grand (according to Harty).

Now, suppose this is a security flaw. (There will always, always be security flaws, known or unknown.) Suppose this involves personally identifiable information (PII) i.e. most of Google functionality. Suppose I'm the first to find and characterize it. Suppose it's exploitable, e.g. I can use it to see the PII of anybody I want. Suppose I'm not the most ethical person...

I have a choice: do I report it to Google as they would like me to do, and chance getting a few grand as a reward? Or, do I report it to blackhats, and try to get $ a few million?

Of course I'd go to the blackhats! When I do this, all users of Google are exposed to the risk of identity theft. Identity theft is the worst thing that can happen to you on the internet.

Meanwhile, Google thinks that it has successfully outsourced product quality! Great deal, huh? The stockholders love it. Conference speakers talking about latest trends LOVE it. But the end result is identify theft for large numbers of Google customers.

Outsourcing quality can't possibly work for a company in the long run.

Testing is not dead.

Monday, October 3, 2011

Patterns and antipatterns in automation (part 2 of 3-4 parts)

For a product with significant impact, a pattern is writing test automation in the same language as the product (for Java) or the same framework as the product (for .Net). An antipattern is scripting some other, lighter-weight language.

I know many perceive that a script language e.g. Python, Perl or Javascript is better suited to write test automation because it may be quicker. But, with a strongly-typed compiled language, the compiler is your friend, finding all sorts of errors early. With a decent IDE, intellisense is your friend as well, hinting at what you can do and backing that up with the compiler.

If test automation is written in a different language than the product, then testers are distanced from the product code enough that they don’t have anything to say about it; it’s not their domain. But, product code is usually a good example to follow when writing good, robust, extensible test code. Today at work I entered two significant bugs in product code that I wouldn’t have been able to do if I weren’t continually up with the languages used in the product (a grammar of XML, and C#).

Another reason for having the test code in the same language (or framework) as the product is that you know that no information is lost with remote calls or exceptions rolling up the stack: the stack is the same through product and test code, and the test code can handle product exceptions if that’s called for in the test.

The barrier to entry with C# is actually quite low; a dev can help, and testers don’t need to get fancy with lambda expressions, delegates and partial class declarations. If a tester were to create robust, extensible, OO code anyway, a powerful language is needed.

I’ve seen many problems with interfacing a script language or GUI test framework with a compiled OO language: awkwardness of error handling or verification, dropped information, limited design choices…

What do you think? Do you have a favorite language or framework for test automation, that’s different than the product language?

Topic to be continued tomorrow …

Patterns and antipatterns in automation (part 1 of 2)

Patterns are common ways of solving frequent problems in software.

From the introduction to the famous “Gang of Four” patterns book “Design Patterns” (Gamma, Helm, Johnson, Vlissides): Design Patterns make it easier to reuse successful designs and architectures.

In Test, patterns are used to follow well-established best practices. Antipatterns are patterns of action “ … that may be commonly used but is ineffective and/or counterproductive in practice” http://en.wikipedia.org/wiki/Anti-pattern.

For example, one pattern of action is to automate as many test cases as quickly possible because that’s the performance metric that management is using, and we all want to look productive for the business. When the test cases start failing, ignore them and continue automating test cases, because that’s the metric that management is using.

This is actually an antipattern because it’s counterproductive practice. The business value of automating the product (or parts of the product) is to exercise and regress behavior (make sure it’s still working per design) for parts of the product on a regular schedule. If that can’t be done (because the automation broke) the value of those cases goes away or even worse: the quality team proceeds as if quality was being measured for that part of the product, when it’s not because those relatively high-priority test cases are failing.

There’s a closely-associated antipattern which I’ve addressed in practice for years but for which I credit my friend Adam Yuret (his blog is here http://contextdrivenagility.com/) for help in crystallizing: test cases are mostly written as if they were manual test cases, and they have value for running through the test case as a human tester and observing the results with human perception and intelligence. If this manual test case is automated, the verifications (aka “checks”) are typically many fewer than what a human might perceive, and they might even be zero i.e. nothing is getting verified and the value of running that automated test is also zero.

Adam maintains that what was a (manual) test case is no longer a test case because the entire flow of the user experience (UX) is no longer being measured; there are zero or more checks that are being measured, but they must explicitly be written, coded, and tested by the person doing the test automation. By default, they’re not.

I disagree with the “test case is no longer a test case” judgment, but this does point out an important truth about quality practice: the value of manual testing never goes away (unless maybe if the product has no graphical user-interface aka GUI).

The antipattern here shows up in two common and related team management practices: the idea that automating a GUI test case means that that a certain part of the product never has to be visited manually ever again, or that the practice of manual testing from the GUI (and, the manual testers) can simply be automated away.

I’ll continue this post tomorrow with more patterns vs. antipatterns…

Saturday, October 1, 2011

Duplication of information is anathema to quality

I drive an electric car: a Nissan Leaf. It's fun, reliable, zippy, quiet and very comfortable, there's no stink and I never buy gas. I have few complaints, but here's one:

When I'm driving, I often want to know the time. I can look above the steering wheel, to see a digital clock, or at the navigation system, to see another digital clock. Unfortunately, these times are almost never the same! They disagree by a minute or more. So, what time is it?


The software development life cycle (SDLC) gets very complicated and involves many engineer-hours of work, plus other investments. There's a lot of information involved.

Yesterday I attended an excellent seminar by Jim Benson (personal blog here http://ourfounder.typepad.com/ ) about Personal Kanban (Jim's book of that title is e.g.here http://www.amazon.com/Personal-Kanban-Mapping-Work-Navigating/dp/1453802266/ref=as_li_ss_mfw) and it was very enjoyable but I was bothered by the reliance on sticky notes for designing and communicating informtion. Those sticky notes show up across all sorts of agile methodologies. I can see the advantages: it's very social to be collaborative with your team in a room with the sticky notes, and the muscle memory and physicality of moving the stickies around helps communicate and absorb information.

But, I asked (OK, pestered, but Jim was a good sport about it) several questions along these lines: do we really need the sticky notes? There's cost and risk in relying on people to manually translate information from plans or some virtual document to the stickies on the board, then after the meeting with the stickies or at some other time (depending on how the team does it) this has to be carried over to the IT workspaces or documents. There are many potential points of failure, and many potential points of information duplication.

The problem with duplication of information is the same with the two clocks in my car: they can easily get out of sync, and then which do you believe? Information can get lost too, if I reorganize the stickies thinking that someone has already done the maintenance step of writing the information to the appropriate document, where actually that hasn't happened for some reason.

I predict that in a few years, the stickies will be gone because there will be a sufficient hardware and software solution to solve all of the problems that the stickies do, without the cost and risk. Team communication around work items will be more robust and efficient. There won't be any stickies to flutter to the floor (as a few did, during Jim's talk).



Worse than the clocks in my car, and even worse than the stickies, is superfluous docs that get out of sync. By "superfluous" I mean people ignore them, so they get out of sync and/or out of date, so they can cause confusion; for example, a doc that lists test cases that also are in a database.The test cases are probably going to change, and there's a very good chance that the doc will get out of sync, and there's also a good chance that someone will rely on the doc when it represents incorrect information.

Better to limit a document to information that doesn't exist elsewhere, and link docs to other docs and resources (databases, source code repositories) so it's clear where to get and where to update information.


Even worse than all of the above: duplicated logic in code, or duplicated code. See posts here http://metaautomation.blogspot.com/2011/09/beware-duplicated-code.html and here http://metaautomation.blogspot.com/2011/09/test-code-quality-readability-and.html .

Use team best practices when writing, and don't be afraid to clean up unused code! Duplicated logic can haunt the team and the product quality.

There are times when information duplication is by design and there's a robust system to keep it so, e.g. databases that are de-normalized or replicated or diagrams to visually represent a coded system... OK, so long as the diagrams are frequently used and updated by stakeholders on the team.

Beware the hazard of things getting out of sync. If the clocks disagree, they both look wrong!







Friday, September 30, 2011

Test code quality: readability and maintainability

Next week I’ll get back into metaautomation, but for now, more on test code quality:

One of those great big life lessons I’ve picked up: in order to understand some system really well, you have to be able to teach it to someone else and have that person understand.

In terms of code, there’s a very high probability that others with job descriptions similar to you will be maintaining the code at some point in the future. So, nurture the habit of checking yourself: what would these future code-viewers think? Would they be able to understand my high-level coding decisions, if I’m not there to walk them through it?

If not, put in a code comment at a higher level of abstraction than the code itself. Brevity is good.

This is especially important to consider when writing methods that are intended as top-level, so no other code in that language in the system depends on them. For example, something that walks the system under test through some steps to achieve some atomic scenario, probably includes a number of steps that might not make sense to another tester, even given that (s)he knows the product. Self-documenting symbol names are good, but a higher-level comment might add teaching value.


I like to look at this as “beer truck” self-management: If I get hit by a beer truck on the way home today, could one of my colleagues step in and continue with work that is shared with the team? (Don’t worry, it’s just hypothetical, and I don’t like beer anyway.)

A colleague has to – in theory – step in and understand what’s going on. Consider that when writing stuff, and your code (with comments, if necessary) will be better for the team.

Thursday, September 29, 2011

Beware Duplicated Code


Test code has a self-esteem problem: it thinks it’s there as a one-off, to be thrown away. It exercises the product once, and that’s good enough to check the code in. Sometimes, hardly any thought goes into test code quality.

For games about sliding letters around on a touch screen, this makes some sense in that quality issues are low-impact anyway. But personally, that’s not why I like IT: I want to help computers solve important, impactful problems for people. Doing quality well becomes a real challenge.

Everybody knows about object-oriented (OO) principles, but when documentation is weak and pressures of productivity are high, code gets copied around and common problems get re-implemented. When this happens, cost and risk go up: bugs and oracle changes have to be addressed in more than one place. Sometimes code even gets abandoned, so it might look like a work item but it’s really not.

Find a way to use OO principles to minimize code duplication and keep the code relevant, just as with product code.

Either code is part of the product quality, or it’s not. If code isn’t used, get rid of it (it can be recovered; that’s what source code control is for) or comment any unusual status of the code for the product (e.g. class/method/member isn’t used right now, but kept around for some specific feature/milestone). Stray code can be expensive, because it can create maintenance cost. Commented-out code should be unusual, and if it’s needed, the commenting-out needs to be commented as to why it’s needed.

If it’s not contributing, get it off the team J

Duplicated code can easily become nearly-duplicated code, which rapidly creates a lot of cost and risk. True, you’re not shipping it, but keeping it readable and maintainable is very important. See my last post http://metaautomation.blogspot.com/2011/09/if-product-quality-is-important-test.html on the importance of this.

Do you have any fun (fun, in retrospect) bad-code stories?

Wednesday, September 28, 2011

If product quality is important, test code quality is too

I mean no disrespect to devs when I write this: test code can get even more complex than product code because of the larger number of dependencies to manage from tests perspective. If a product has N deployment modules, and M external dependencies, then Test has N + M external dependencies! Changes in any one of these dependencies can create a maintenance work item in the test code. The product is just being developed, too, so many changes are going on which sometimes cascade down the stack to test code. Even subtle changes in business rules might require changes to test oracles or simpler verification checks.

Readability and maintainability are therefore even more important to test code than they are to product code.

As with product code, test code should be self-documenting where possible, but that’s not sufficient: comments are good to have, where they help, at a level of abstraction higher than the code or pattern itself. This is important for product code of course, but even more so for test code, because of increased complexity and that coding skills tend to be very variable across members of the test team who write code.

With test code, the executables won’t ever be far from the source code, so information hiding (e.g. not showing a stack trace to end-users) isn’t an issue. Performance isn’t nearly the issue that it is with product code for many applications (other than maybe perf test code). So, log details on what’s going on! Strongly-typed log entries e.g. using enumerated types or XML are precise enough for automated parsing. Exceptions are a great tool, and you can make them even better http://metaautomation.blogspot.com/2011/09/using-custom-exception-types.html.

There are also lots of great tools out there to help with logging to one or more streams in a structured way. Consistency and structure make logged information more valuable and more extensible.

Do you have a favorite logging technique or tool?

Do you have a favorite strategy for test code?

Tuesday, September 27, 2011

Metaautomation: Is this a failure that we’re better off ignoring for the moment?


Yesterday’s post was about automated analysis of test failures. Today I post about accelerating business value from automated tests even more: moving past transient failures outside of team ownership, before even the results are reported.

In the age of information, systems are highly interconnected, and external dependencies are everywhere. Systems fail sometimes or just take too long for all sorts of reasons. Other systems (if done well) handle these hiccups gracefully.

Suppose the system that your team is working on is a high-impact system with external dependencies. It’s almost inevitable that there will be failures in the external systems that you really can’t do anything about (except maybe choose to make adjustments to your locally-defined timeouts). Will your system report on these failures to make a potential action item? In the interest of generating business value from the software quality team, should it?

IMO this is a great opportunity to skip past common non-actionable failures to increase the chances of finding something that really is actionable. People are smart, but they’re expensive to keep around, so we don’t want them to spend time on something that can be automated away.

Notice I didn’t say to toss out the artifacts from failures that can be quickly judged non-actionable. These artifacts could be useful later, and as noted earlier, storage space is cheap J

Suppose the team’s system is doing a financial transaction. An external authentication system (call it EAS) does a robust authentication for us. Sometimes, this external system times out, e.g. on the big shopping days after Thanksgiving or before Christmas. The system that the team owns handles this for external customers as gracefully as can be expected, and suggests trying again.

From the point of view of automated tests running on a version of the team’s system in a lab (or maybe, a live system with some automated tests as well) the EAS fails maybe several times an hour. The quality team doesn’t want to spend any time on these failures if it can avoid it, and it also doesn’t want to slow down the rest of the automated testing just because of a simple timeout; the team wants to get right to the more interesting test results if it can.

If the engine that schedules and runs your tests is capable of repeating a test, or running sets of atomic tests (see this post http://metaautomation.blogspot.com/2011/09/atomic-tests.html) and reporting on the set? In this case, on a failed test, try again and see if the error is reproducible.

1.       Stop retry after N sequential failures that correlate with each other within C (and, report the reproduced error as a single test failure linked to the other failures in the set)

2.       Stop retry after M sequential failures (and, report on the failure set, at a lower priority than a reproduced error. All failures are linked together for potential follow-up)

3.       Stop retry on success (and, report the success, linked to any failed tries)

N could be configured as 2, and M might be 5. C could be 80%, or could be a set of correlation strengths depending on the correlation type, if multiple correlation types are used.

All artifacts get saved. All individual test tries get analyzed by the system, because here the engine that distributes and runs the tests is also looking at correlations between the tests.

One-off failures don’t require immediate attention from anybody, although it would be useful to go back and examine them in a batch to make sure that the failure pattern doesn’t show some larger problem that the team needs to take action on.

Failures that end up on tester’s plates are now fewer. People are much less likely to get bored and preoccupied with repeated test failures on external dependencies. The team is more productive now, and looks more responsive and productive to internal customers. Product quality is more effectively measured and monitored, and for a high-impact product, that’s where you want to be!



Can this be applied to your project?

Monday, September 26, 2011

Reality Check ...

On my last post: http://metaautomation.blogspot.com/2011/09/metaautomation-have-we-seen-this.html

Is this too dense? Would you rather eat 5 lbs of peanut butter in 60 seconds or less?

Please let me know what you think. I'm especially curious where people think this is applicable, or "I did something like this back at ABC Software ... "

Thanks! 

Metaautomation: Have we seen this failure before? If yes, what’s the actionable?

In previous posts I’ve talked about quality artifacts:

·         Detailed info around the failure

·         Strongly typed info using XML, or at least formatted with a system for group use (e.g. labels from enumerated types to give consistency and avoid typos)

·         Context info around the failure

With consistent datetime formatting (or, even better, detailed-schema valid XML) and sufficiently real-time logging, logs can easily be correlated across threads.

The information needs to be formatted and clear in intent. Error messages or other information in text needs to be clear; too verbose is much better than too succinct, because if the message is too brief, it doesn’t help.

Consistent logs can be interpreted by a new member of the team, with a little product knowledge. Really consistent logs can be compared in an automated way, post-test run. For metaautomation, that’s what we need! With logging APIs that use team-specific enumerated types and custom classes, the constraints are expressed in intellisense and by the compiler, and this frees people to exercise their human intelligence on what the important stuff is being logged. (This may be the subject of a future post…)

Logs (XML, text stream, or something in between e.g. name-value pairs) are best stored somewhere on a WebDAV server (or, an FTP server) for later use, along with dumps, legacy logs, etc. A person or automated process can reference all the artifacts with read-only access for analysis after the test run!

If a tester does the analysis, it will be after the test run. Automated analysis is less risky after the test run, too, because otherwise the test can get delayed or just more fragile because of the additional points of failure from the analysis process.

I’ve implemented an automated analysis system where the artifacts are reduced to N properties per test run. The number and the names of properties and the order of evaluation are configurable. The methods of property value derivation are supplied with an executable module that only depends on the artifacts, and does internal error handling. In my case, this is a .Net assembly that exports public methods with names that map to the “Name” attribute in the XML template that determines the properties. This system is extensible by updating the XML template and the .Net assembly (and restarting the server process if necessary to pick up changes).

Every test run that meets certain criteria (in my case, I did this for one type of test only) gets reduced to valid XML stored in a row in a table in the database of automated analysis. Any failure can be associated with an actionable, and after that association, the actionable can be correlated with a later test failure by the automated system! This answers the question “have we seen this failure before?” with a decimal or floating point value for correlation strength.

This takes human input, at least for a while: people have to enter the action item association with a test failure. No system that I know is smart enough to do this reliably. I quote Michael Bolton on this: http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing/  “Sapient activity by [a human] … is needed to determine the response.”

How the different properties are weighted can be varied depending on the test failure and the actionable. For example, if a specific test resolution action item is specific to the (virtual) machine that runs the test, then the identity of that virtual machine is significant to the correlation strength, otherwise it’s insignificant and should not be factored into the correlation strength.

This system might seem a bit heavy for some applications, because after all… we’re “just” testing, right? I agree with James Bach and the Context-Driven school of testing http://www.context-driven-testing.com/The value of any practice depends on its context.” And, if you’re creating a game about exploding birds, this approach isn’t appropriate. However, if you’re writing high-impact applications e.g. involving financial transactions, I submit that it is appropriate to getting value out of your automated tests and making real, significant failures quickly actionable.

In practice, very few test failures in such a system would point to product failures; most failures point to other actionables, or can be ignored temporarily. (This is the subject of a future post.) But, in order to draw attention to the important failures and act on them quickly, your system (including all automated systems and the testers that work with them) needs to be able to tell the difference, even more quickly.

With automated tests running often, it’s very easy for humans to get overwhelmed with results, and they respond with general strategies in order to cope. People are smart, but they’re not very good at dealing with repetitive things. That’s why you need an automated system.

Friday, September 23, 2011

Atomic Tests


Atomic tests are good. Not the Dr. Strangelove kind of atomic tests, I mean automated tests.

Atomic tests test one thing about the product. An atomic test can’t be broken up into smaller tests and still test the same thing, hence the “atomic” quality of the test. Atoms can’t be broken up into smaller pieces.

Atomic tests also must be independent of each other. They can be run individually or in groups, in any order, so they can be distributed and run faster than if they were run in a sequential set. Failures can be known, reported and acted on sooner (by humans or by an automated process).

A test that tests two things, A and B, risks not testing B at all because the test of A failed. So, to maximize test coverage, test B separately from A if possible. If it’s not possible to test B without testing A first (e.g. an end-to-end or E2E test) then be sure to test A without testing B as well (in an independent atomic test) to make the artifacts and reporting more clear and to eliminate any possibility of latent dependency of the test of A on the test of B.

So, if an E2E test requires running through A before testing B (e.g. “A” is creating an account, and “B” is doing some operation with that account) then there are at least two tests:

1.       Test A: Create an account

2.       Test B: Create an account, then do some operation with that account

Realizing business value with the automated tests includes some follow-up on the results, to ask questions like “have we seen this failure before?” (see this post http://metaautomation.blogspot.com/2011/09/persisting-artifacts-from-automated.html) The next favorite question of good testers is “can we do it again, to reproduce the same failure?” With atomic tests, failures are more reproducible because the tested actions are as simple as possible. Re-runs are quicker, too!

Adam Goucher writes about this here: http://adam.goucher.ca/?cat=3 “… a test case should only be measuring one, and only one thing.” and “Test cases should not be dependent on other test cases.” This isn’t a new idea in itself, but it’s a new thing to use the memorable term “atomic” as a simple rule for the team to follow.

… and if “atomic” isn’t memorable enough for you, stream the movie http://www.bing.com/search?q=%22dr+strangelove

Thursday, September 22, 2011

Persisting the Artifacts from Automated Tests


Storage space is cheap. Check that, storage space is very cheap, and getting cheaper all the time. Moore’s law reigns here. I see a 1TB internal hard drive for $55, for example, on my favorite online retailer of computer components. Next year I’ll probably buy a bunch of 4TB drives for personal use, even though my family’s data only amounts to 1.4TB so far (mostly digital photos).

Given a reasonably stable product with detailed logging and/or other reporting, businesses have a great incentive to save that stuff for a while, within constraints of confidentiality, privacy and personally identifiable information (PII). Anybody looking at quality will want to ask this question sometimes: has our product seen that behavior before? Is there a pattern? Patterns of behavior are great because they help characterize and prioritize product quality issues.

Test infrastructure that exercises the product has a strong causal relationship with these product artifacts, and helps illuminate them as well, e.g. “initiate process A, start transaction B …” in a temporal relationship with one or more threads / processes / cloud deployments of the product. In a controlled test environment in a lab (which could even be in the cloud) certain scenarios would be exercised repeatedly on a schedule, which can make correlation between artifacts even more valuable; given that some things stay the same e.g. the way the product is exercised or even the starting-point data, what changes becomes more significant, and more likely actionable.

Long-term management of artifacts of product behavior works well in conjunction with the artifacts generated by automated tests. Saving product artifacts (logs, errors, etc.) with the automated test artifacts (logs, exceptions, etc.) simplifies storage and analysis.

Tomorrow: atomic tests. Next week: more on automated parsing of logs, and optimizing logs for automated parsing!

Wednesday, September 21, 2011

Using custom exception types



A potentially costly item that comes up on automation failure (if you care, which you should) is delegating/triage: who, what, or how to follow up? There’s a need to add value to the product quality knowledge and preserve work that’s already been done, so if you don’t have time to follow up, the team has bigger problems L

Custom exception types are great for this, because they clearly and unambiguously denote problem source. They avoid the synchronization problems, typos, other omissions that logs can have, and become a programmable aspect of product quality. Strong typing means that the compiler is your friend, so a class of implementation problems is found earlier rather than later.

For products that ship to run remotely on client OS’s or in the cloud, they can be configured and turned on or off if performance requires it or information is limited for end-users benefit.

The “is-a” idiom works very well for handlers with an inheritance hierarchy of exception types. For example, suppose the product or project name is “Widget”:



Note too that “oracle” refers to the predictive agent, not the database company.

Each class can have it’s own data, accessors, serialization, constructors etc. Information about a failure is now encapsulated for the thread. The catch(…) clause specification uses the is-a idiom, so can be used to handle these exceptions in an organized way.

If there are other threads in the module or different modules, and they might be blocking on a synchronous communication, they can be interrupted through a back door for the lab environment, and information there encapsulated as well for packaging with the primary thread. This is more reliable than e.g. depending on log timestamp synchronization, although that can work as well.

Tomorrow: more thoughts on packaging and presenting information for humans (manual analysis) and automated processes.

Tuesday, September 20, 2011

Put actionable information in the automated test failure


This post follows up on yesterday’s post http://metaautomation.blogspot.com/2011/09/fail-soon-fail-fast.htm.

Metaautomation is about maximizing the value of automation to the business.

When automation fails, you don’t want to have to do the test again with a debugger or other diagnostic tool, for at least these reasons:

·         It’s time consuming i.e. expensive

·         It might demand expertise an individual tester might not have, so would pull in another person

·         On reproducing the failure, there’s a good chance that behavior will be different, maybe even in a way that isn’t obvious, so now you’ve complicated the issue and/or lost information

·         Automated analysis (a Metaautomation topic to come later) wouldn’t be useful in this hypothetical case because it’s already established that important information is missing

There will always be some un-anticipated or unprecedented failures that fail the tests and require detailed follow-up. The detailed follow-up becomes affordable if most failures don’t require a detailed, time-consuming follow-up. So, let me optimistically say that I think if the correct information is included with the failure, 90% of failures won’t require a manual repro or a debugging session.

How to get there?

Logging helps, but it’s not sufficient. Important context information for the failure can be logged successfully. If the steps executed in pursuit of use case execution are logged, and except for the last one before the failure, they represent the happy path for the automated test, they could be useful. They also must be constructed well (timestamp, and at least one level of context for execution) to be useful. Consistency in labels and structures is very important for readability and for automated failure analysis.

Exceptions are key, because done well, they provide a single object to the test code or framework that contains all of the information you care to throw into it. They’re thrown by the framework at failure time, or a custom exception is created and thrown at the time a failure/exceptional condition is encountered. Custom exceptions are typically written in product code, and can be written for test code as well.

Using the same language or framework for the test automation as you use for the product is important, because if there’s a change, information will be lost across the boundary and/or you don’t have the same power in handling, wrapping and throwing exceptions. Java will work for this, but I prefer .Net Framework with C# (or any of the other .Net language implementations e.g. VB.Net which compile to the same IL anyway).

Think about a failure condition being measured (or just encountered e.g. in the case of a network exception) and rolling up the stack. At some point, a stack trace, exception type and message will be reported and recorded. Maybe that’s enough, but information could be lost too, for example if there’s some state that pertains to the failure that just got lost as it fell off the stack. In that case, you could improve failure reporting by:

1.       Catching the exception. Catch (System.Exception) might work here, depending on the needs of the calling code …

2.       Create a string with context information for the error, in the context of the scope of the try block corresponding to the catch of step 1

3.       Create an instance of a custom exception type, where the type communicates the current context/location e.g. CompanyAbcModuleDefTestException

4.       For the constructor of the custom exception, pass the exception of step 1 (to become the innerexception of the custom exception) and the string of step 2

5.       Throw the custom exception instance

6.       Repeat steps 1-5 in as many different contexts as helps nail down the root cause of the failure, hence the action item to respond to the test failure

If this sounds like a lot of overhead, remember what it’s like to struggle to reproduce an error at 1AM just prior to code freeze for ship… and also this is a very nice kind of functional test documentation, to benefit all the different testers who might need to maintain the code for other reasons.

Now, when the test code does final reporting of the failure, all that information is available in an object to be serialized in e.g. XML or HTML to create a highly stable stream, or even objects that lend themselves very nicely both to human readability and (for the future) automated test failure analysis. Clarity into the test code base or harness helps at refactor or extension time, too.

Monday, September 19, 2011

Fail soon, fail fast


This and the next 6 or so posts will focus on metaautomation, which is defined by the http://metaautomation.blogspot.com/2011/09/intro-to-metaautomation.html  post.

The most basic part of metaautomation is to maximize the chances that an automation failure is actionable, without necessarily even running the automated test again. The post linked above lists some example cases of this.

There will always be cases where more analysis requires loading up test code and SUT, but that’s time consuming (expensive!) and metaautomation wouldn’t help much either, except possibly for any automated triage (which I’ll address in a week).

At some point in a failed test, a failure condition occurs. The sooner the failure condition can be measured the better, because (assuming a single thread of execution) some information at that point that is important for analysis might be lost if the thread rolls up the stack with an exception or execution continues in vain.

Also, the faster the test ends after that point, the better, because

a)      (depending on the SUT) any further steps probably won’t add any useful information to the artifacts

b)       Resources to continue the test – including time - probably aren’t free and might be limited

Tomorrow, I’ll write about strategies to maximize relevant information, hence minimize the chances someone in test will have to load everything up with source code and step through to see what’s going on.

Friday, September 16, 2011

The Value of Automation

There’s a nice paper by Brian Marick called “When Should a Test Be Automated?” http://www.stickyminds.com/getfile.asp?ot=XML&id=2010&fn=XDD2010filelistfilename1%2Edoc

But, I emphatically disagree with one of the assertions used in the paper: “Bugs are the value of automation…” When creating and running the automation some piece of automation the first time, bugs are part, or maybe even most, of the value of automation. When the automation is running in the lab on a regular basis, bugs could show up and those are very valuable to the team because they might mean that some change broke a part of the product. It’s very important to find such issues quickly.

IMO the greatest value of automation is to measure the quality of the product in a reliable, stable way, often and quickly. Without some good measure of the quality of the product, you can’t (or shouldn’t) ship.

Code coverage is one view of quality. Scenarios run is another. Perf and load testing are others. Automation helps with all of these.

On Monday, I’ll start with a daily series of posts on metaautomation.

Thursday, September 15, 2011

Manual testing will never go away

What do you do for a living? Test?

When people ask “So, what do you do?” is that what you tell them? It’s the generally used professional term – “Test” – but someone outside the profession probably has a greatly simplified view of what that means, which means it sounds like a very simple job. The inquirer might think “oh, that sounds easy, I can do that” but be too polite to say anything.

I have a bit of “dev envy” because everybody and their boss knows what developers do – they make computers do stuff. That sounds intimidating to someone who doesn’t work with computers, as it should – devs have a very challenging job. Devs tend to be smaa-art people, and others know it!

Testers OTOH, they just try a few things and see if it works, right? How hard can that be?

Since you’ve read this far, I presume you know that testers have very challenging jobs too. Sometimes I think testers have a job that requires even more smarts than devs – because good ones have to know almost as much as the devs do, plus they keep up on many different aspects of product development and quality, plus they tend at times to be much more interrupt-driven than devs, plus they have to interface with test members of many different roles as well as represent the customers’ interests. I’ve done both dev and test, and IMO test is more challenging.

At a large software company in Redmond WA, testers are expected to automate all of their tests (at least, that was my experience), which becomes even more challenging after running automated tests for a while because they tend to fail. James Bach offers an interesting perspective on this phenomenon here: http://www.satisfice.com/articles/test_automation_snake_oil.pdf For example, Bach notes that automated tests don’t catch bugs by default; manual testers do.

Adam Yuret (his blog is here http://contextdrivenagility.com ) reminded me of a great perspective on the process of automating tests: if testers define a test “case” exercising the product, a good tester executing the test case will find bugs by default. People are generally smart and observant, and they’ll spot things that are out of whack.

Common testing wisdom says that if that test case is automated, the benefit of the manual test is amplified because a) it’s run more often b) more quickly and c) more reliably. Unfortunately, that’s not generally the case; as Yuret points out, what a manual test generates in terms of a measurable result for product quality is greater than that same test run as an automated test, especially for graphical user interface (GUI) tests. It gets worse than that for automated tests: automated tests have to measure a result somehow, and by default they don’t. Your automated test has zero or more “checks” of product behavior.

If the automated test has zero checks, the result of the automated test will always be PASS no matter what happens. The value of the automated test case is now negative, because the product team is operating on information that the test case is being “covered” by the test part of the team, whereas actually it’s not.

Bach notes an example of this in his article (linked above): “The dBase team at Borland once discovered that about 3,000 tests in their suite were hard-coded to report success…” I’ve seen even worse: I’ve reviewed an automated test to discover that the result was always success, despite the fact that the product wasn’t driven to do anything at all.

The challenge is not just to make sure that the checks of product behavior are in there – i.e. the test fails if the product does not behave as expected – but that in case of failure, the automated test result is descriptive. A Boolean result of FAIL with no other information is not very useful other than as an indication that the test team has work to do on that automated test.

The value of manual test never goes away, at any point in the software development lifecycle (SDLC). Automation can be a powerful tool, but it’s not at all easy to do well. Testers have a very difficult and challenging job.

Next time (tomorrow), I’ll post about the value of automation, and then next week, move into metaautomation.

Wednesday, September 14, 2011

Quality


How does one define “quality?”

I like the definition that is used by Adam Goucher http://adam.goucher.ca/?cat=3, Jerry Weinberg http://thetesteye.com/blog/tag/jerry-weinberg/ and others: “Quality is value to some person that matters.” OTOH, this needs to be more detailed to be meaningful to complex or high-impact software projects.

Given that the quality bar of software is getting higher (OK, except maybe for some social media applications and games about exploding birds) and our reliance on IT is increasing we really need to create a reasonably complete picture of the product quality before releasing it to the end-user. Actually, we must do it even earlier than that: to manage risk, we need to have some handle on quality while developing the product.

Quality is defined with input from the customer (“I want the software to do this…”) but it’s also complex and difficult and it wouldn’t work to lean on the customer for all aspects of what constitutes quality, especially non-functional quality. So, the team has to define quality in all its gory detail: the product owner, developers, and test / QA people collaborate and flesh out a quality document or documents. The definition of quality can’t be 100% complete, it will (or should!) evolve before product release as the product develops and market context changes, and it must include things like scale, perf, failure scenarios, security issues, discoverability and other things which the end-user can’t be bothered with.

Notice that “some person that matters” is still the end-user customer in the end, but the software development team members are now proxies for the end-user customer. In agile methodology, the product owner (PO) is formally the “voice of the customer” but really, all team members take on variations of that role in different ways. Robust quality requires leadership from the PO and it requires contributions from the other roles as well.

Test isn’t just about finding bugs. It has to measure quality through the whole process, and bugs are just part of the total quality picture. The rest of the team depends on this. Shipping without a complete picture of quality gives a high risk of the end-user finding a high-impact bug (i.e. a bug that the team would never have let out the door, had they known about it), which could be damaging to the business.

For a high-impact application – say, medical or financial software – if the measurement of quality at any point in the SDLC isn’t reasonably complete, the extent of the incompleteness represents unacceptable risk and is similar to lack in quality. The risk of unmeasured quality can increase during the SDLC due to runtime dependencies or design dependencies.

A quick takeaway is: test early and often, and measure Test productivity with breadth and detail of the quality picture, not just by number of bugs found or fixed!

Tomorrow: Test
Friday: starts a series of posts on metaautomation

Tuesday, September 13, 2011

Intro to MetaAutomation

It’s the MetaAutomation Blog!
The first think you might be wondering is: what’s metaautomation?

Automation of course is getting the software product or system under test (SUT) to do stuff without a person driving the SUT directly, e.g. while the team sleeps or parties, and with zero or more checks of product behavior with Boolean results that get reported somehow. I’m thinking of E2E or scenario tests, not e.g. unit, perf or stress tests. A failure in basic automation is often useful only as an indicator that the test team has more work to do.

Metaautomation is taking it to the next level: basically making automation failures actionable, up to and including automated triage.

The second thing you might be wondering: why is this important to me, the SDET and advocate for product quality?

In a typical software creation scenario there are two prominent metrics in use by the test team: number of test cases and number of test cases automated. The test automaters are under some pressure to deliver as many automated tests as they can, but in a week or day, some or most automated steps begin to fail. The report that shows the flipped bits – the automated test cases that are now failing – but doesn’t give much more info than that, so at some point the test automater has to stop automating new test cases and go back and try to fix the failures. This can take as long as the original automation for that test effort, so sometimes it isn’t done at all – the failed automation continues failing as before, and has little, no or even negative value to any useful measure of product quality.

When you’re the owner and you get word of an automation failure, a variety of questions might come to mind:

Is the root cause a product failure, an automation failure, a dependency failure, a race condition beyond team ownership, or something even more complex?

Has this root cause failure happened before?

Would it happen again? Is it even reproducible, reliably or at all?

If it looks like a product failure, does it require debugging the product, or just the test code, to get more information before you pass the issue on as a bug?

If root cause is or might be beyond team control (but, it’s not an issue for the product end-user) will you just try again to see if it passes the 2nd time?

Metaautomation addresses all of these issues. Note that, unlike automation, metaautomation depends on knowledge of product implementation, e.g. the source code.

You could think of it like this: the relationship of metaautomation to automation is analogous to the relationship of automation to manual test. Manual test is one of the first things you do to measure product quality, and the importance of manual test never goes away, but the power of the team in maintaining and promoting product quality is significantly enhanced with good product automation (including at least minimal reporting of the automation results).

Manual test is costly in terms of people and time, and it’s never really reliable because people are human.

Automation addresses the time cost (by running automation) the boredom (by having a computer, lab, or cloud processing, do the work for you) and some other issues as well (e.g. frequency of the measurement of the product, consistency, reliability…). That’s why automation is so big these days. Everybody in software knows the value, so everybody’s doin’ it.

Unfortunately, when automation fails, manual analysis can be very costly in terms of time. It might mean loading the test code, and the product, hooking up or simulating external dependencies and doing a debugging session. A complex or difficult bug can take hours or days of follow-up. Sometimes, in consideration of these costs, it’s not even done at all.

Metaautomation addresses these costs and, if done well, can increase team efficiency and make product quality stronger.

If you’re doing good automation, you’re probably already doing the most basic kind of metaautomation: logging, as in having your test code and/or product stream, at least, hard-coded messages to a text-based file. This is a kind of metaautomation because it’s orthogonal to having the product do stuff and test runtime verifications; it’s the first piece of post-run test analysis, and it can be used to feed other pieces of metaautomation that I’ll be getting into with future posts, although there are better sources of information (which will be addressed in future posts as well).

Metaautomation isn’t the answer to all scenarios. There will still be some manual analysis, just like there will still be some manual testing. But, metaautomation speeds analysis by reducing the need to examine the product or test code (especially by saving the need to debug through the code) and it can speed triage of issues as a direct result (depending on team process). Automated automated (yep, 2nd-order automated) test re-runs and automated test failure analysis are other pieces of metaautomation that can speed human-based triage further, or even bypass it.

The third thing you’re probably wondering is: why do we need a new word for this? Are you kidding me - six @#! syllables?

Automation is summarized nicely here:


Note that the article briefly discusses reporting, but not analysis. Most of the value of automation comes after the automation is created: that’s where metaautomation can help a team shine!


The plan for this blog:

I plan to post every day or so for as long as it takes to (re)define

·         Quality

·         Test

·         Metaautomation, the what and the how (this will take roughly a dozen posts)

·         Values in automation

·          … I have other ideas, and the blog should have momentum by this time …

Following posts will tend to be shorter and might be less dense.


Oh, yeah, note too:
This blog is not associated in any way with the industrial and marine company http://www.meta-automation.com/ although, I do like boats, especially sea kayaks.