Thursday, November 14, 2013

MetaAutomation: The Book

MetaAutomation is moving from a blog to a book.

I've been writing, pulling ideas together, and synthesizing, to create a package that represents a stronger value proposition to people concerned with automated software testing that includes any or all of these:
  • regression tests
  • functional tests
  • all tests that are initially or intended to become fully repeatable
  • positive and negative tests
The package (and the book) is for tests that include functional dependencies (internal and external services etc.) and generally don't have fakes or shims. So, it does NOT address any of

  • security tests
  • performance tests
  • stress tests
  • code metrics
  • model-based testing
  • fault injection
  • accessibility
  • discoverability
  • suitability or validation
The book doesn't address every topic of this blog, but it does create a valuable big-picture synthesis that isn't possible in the blog format.


Tuesday, September 24, 2013

Smart Retry Your Automated Tests for Quality Value

If you automate a graphical user interface (GUI) or a web browser, you’re very familiar with this problem: there are many sporadic, one-off failures in the tests. Race conditions that are tricky or impossible to synchronize and failures from factors beyond your control or ownership break your tests, and the solution too often is to run the test again and see if it passes the 2nd time.

The result is dissonance and distraction for whoever’s running the tests: there’s another test failure. Does it matter? Do I just have to try it again? I’ll try it again, and hope the failure goes away.

Imagine transitioning your job from one where most issues that come to your attention are not actionable (e.g. “just ignore it, or try it again and hope the issue goes away”), to one where most issues that come to your attention are actionable. That sure would help your productivity, wouldn’t it?

I wrote about this topic here in some detail:

Now’s a good time for your organization to bring it up again. Smart Retry is an aspect of 2nd-order MetaAutomation:

Smart Retry is very valuable for your productivity and communication around the organization, but if you want to get there, you need two things which each have significant value in themselves:

2.       Tests that fail fast with good reporting


3.       A process with some programmability to run your tests for you and make decisions based on the results

On item 3: If you are running your tests in parallel on different machines or virtual machines or in the cloud, you will have this already, and if you don’t have this, you will because the business value makes it inevitable.

For a distributed system, you will need also a non-trivial solution for this:

4.       A service that provides users for given roles from a user pool, for time-bound use with an automated test

A Smart Retry system is an automated solution to substitute for a big piece of human judgment: whether to just run the test again, vs. taking a significant action item on it. It adds a lot of business value in itself, and it also complements other systems that scale and strengthen the Quality story of your organization.

How to Find the Right Size for your Automated Tests

Here are some reasons you might do some automation for your Quality efforts:

1.       It might save a lot of time and effort, because it means manual tests that don’t have to be run again and again by humans

2.       The results of the tests can be more reliable and consistent than those of manual testers

3.       Done right, it will bring Quality information to your team much faster than with manual testing, which reduces uncertainty and wasted effort, and can help you ship faster

You want to automate the most important stuff first, of course. You know what the scenarios are. But, should you automate it as one huge run-on test, or a set of smaller ones, or something else? How do you plan your automation effort for greatest value?

Atomic tests are important. See for a previous post.

But, how atomic do you have to be? If you need the right user and you need to log in etc. isn’t it faster to just go through all the verifications you want in one big test, so you don’t have to log in multiple times etc.?

It might be faster to write the automation if the system is well-understood and stable, and it might be faster to run it as one huge scenario, too, assuming all goes well and the test passes. But, what if part of the test fails for any reason? What if you ever want to look at the test results, or even further, automate triage or do smart retry?

Smart Retry is the topic of my next post, here

A failure in an automated test should end the test immediately (see if you’ve chosen your verifications wisely – otherwise, any remaining results in that automated test might be invalid anyway, and you’re wasting resources as well as burying the original failure i.e. making it more difficult to turn that into an action item. Automated tests often fail due to failures in the test code, or race conditions, or failures in external dependencies. When they do fail, and if significant product verifications aren’t being run because of the early failure, that means that significant parts of the product are not being tested by automation, and if you don’t figure out what parts are missing and run them manually, significant parts of the product aren’t getting tested at all!

Shorter, atomic tests scale better, because

·         You can retry them more quickly

·         They have focused, actionable results

·         You can run each one in parallel on a VM (or can in future, when the infrastructure is there) which means the whole set of tests can be run much faster

Atomic tests need actionable verifications, i.e. verifications that can fail with a specific action item. You never want a test to fail with a null-reference exception, even if it might be possible to work backwards in the logic to guess at root cause of the failure. The actionable verifications happen as the atomic test runs, so that in case of failure, the failure is descriptive (actionable) and unique for the root cause.

But, skip doing verifications that aren’t necessary for the success of the scenario. For example, there’s no need to verify that all PNG images came up on a web page; you need manual tests for that and many other aspects of product quality, and anything, you don’t need another source of potential intermittent failures to gum up your tests. Limit your verifications to things that, if they fail, the whole atomic test is headed for failure anyway. It’s those verifications that help point out the root cause of the failure, which in turn help find an action item to fix the test.

This might seem like a conflict: tests are more effective if they are shorter (atomic), but they need lots of actionable verifications anyway, so doesn’t that make them long?

I’ll clarify with two examples, Test A and Test B.

Test A is an automation of a simple scenario that starts from a neutral place (e.g. the login page to a web site), does some simple operation and checks the results including these two tests:

1.       Verification that a certain icon is loaded and displayed

2.       Verification that a certain icon has the correct aspect ratio (e.g. from something in the business logic)

Test A looks like an atomic test, because an important goal of the scenario is that icon and that’s what the verifications focus on. It does not make sense to break A into smaller tests because in order to do verification 2, the test will do verification 1.

Test B is similar but is aimed at a different result: some important text that accompanies the table that includes the famous icon of Test A. Test B does these verifications:

1.       Verify that the icon is displayed

2.       Verify that the text accompanying the table in which the icon appears is correct according to business rules

Test B is NOT an atomic test, because the verification 1 isn’t necessary for verification 2. Test B is better broken up into two tests or, better, as part of a suite with test A just remove verification 1 from test B because that verification happens in test A anyway. Verification 1 in test B might fail and therefore block the really important verification 2. Note that a failure of verification 1 could happen for lots of reasons, including

·         Product design change

·         Test code failure

·         Transient server failure (timeout)

·         Deployment failure

·         Product bug

So Test B is better off without verification 1, given that Test A is run as part of the same suite.

The “right size” for an automated test is:

·         Long enough to test an important property of a scenario, but no longer

·         Contain actionable verifications for the steps of the test, so failure is actionable

·         Short enough that no verifications are present that are unnecessary to the basic flow of the test

Friday, May 3, 2013

The Software Quality Process, part 3 of 3: Quality Characterization, Bugs, and When to Ship

The software business requires shipping at some point, and your team and business probably started out with a ship date or target for the product.

Here’s where quality considerations become really important: you do NOT want your customers to discover serious issues with your software before you do. If they do, it could be very damaging to your business. Depending on your priorities, you might consider delaying ship for adequate testing and QA... of course, if you read and can follow my previous posts, you won’t need to delay J

All software ships with unfixed bugs. (Show me software that has no bugs, and I’ll show you software that hasn’t been tested enough.) You can’t be 100% certain that end-users (or blackhats) won’t find serious issues with your software that would cause you regret, but you can do some things to minimize your risk:

Ensure that you hire good people for Test early in the product cycle, and give them the opportunity to do their best work.

Have Test present at requirement and design meetings from product start. Their role is to make sure that the product is testable, and to minimize the many risks of building and shipping a product.

Make sure that the Test Plan is addressed completely, and updated as needed along the way.

When development is complete, all significant bugs have been addressed, and you’re approaching ship time, take a week or three to exercise the product thoroughly and make sure that all product issues that might need fixing or that impact product quality from any perspective are addressed with bugs, and the bugs gets triaged. Probably, most or all bugs will be deferred to a patch or service pack, but the important thing is that you have confidence that there aren’t serious issues that might impact customers but that are unknown to the team. Go through the test plan and make sure that all areas of product quality have been measured, as completely as you can in a timeframe that’s reasonable for your business.

… if after that, there are no bad surprises, it’s ship time!

Links to previous installments of this short series:


Wednesday, May 1, 2013

The Software Quality Process, part 2 of 3: Triaging bugs, and bug flow

Think of bugs as atoms of actionable communication that flow around the group with their messages about product quality. They speak only of the product, the customer, engineering and design details.

Triage is about prioritizing the bugs and assigning them to people if as needed to ensure that the bugs keep flowing. Triage can be done on all bugs that haven’t been triaged yet, or all bugs assigned to certain people, or all new and active bugs that aren’t assigned to anybody.

A leader in the group or a meeting  with enough knowledge of the product, engineering issues around the product, end-users, any intermediate customers and other context to make edits if needed to the severity (how important is the issue?) and priority (which issues should be addressed first?). Bugs get assigned to developers for fixing, or program managers for design considerations, or back to testers for more information or as “won’t fix” or postponed as needed.

Test and dev need to work more closely when Test finds blocking bugs that prevent major scenarios from working, or regression bugs that block scenarios that were working before.

Test is always responsible for removing bugs from the flow by verifying and closing them. In some groups, testers are the only ones that can create bugs, but it’s generally OK for anybody to create bugs as long as they’re good bugs.

See part 1 for what makes a good bug:

Tuesday, April 30, 2013

The Software Quality Process, part 1 of 3: Creating docs and bugs

This is post 1 of a series on software QA and Test, seen from the process perspective. Links to parts 2 and 3 will be added here as I finish writing and posting those parts. I use the term “Test” with a capital T to mean the test and QA org, the person or people responsible for measuring and communicating quality.

Software is about telling computers what to do with information. The scope of these posts is about the pure information part of that, so I’m skipping over hardware-related issues, but methods described here could be applied to hardware + software systems as well e.g. the mobile-device business.

Early in the software development life cycle (SDLC) Test needs to be involved, to listen and learn, but also to influence. Important questions for Test to address include: is the product testable? Can the team arrive at a good-enough measure of quality soon enough to ship? Where are the biggest risks? Where would the customer pain points likely be, and can these be mitigated with design decisions that are made early in the SDLC?

One product of these meetings is the Test Plan. The Test Plan needs to include either a link or links to a readable, maintainable, high-level view of the product, probably graphical for efficiency, or include a product description itself at a high level – but not both! The goal here is to have enough information that a new team member can quickly figure out what’s going on without being a burden, and there’s something that people can quickly refer to, but to minimize duplication of information, be as agile as makes sense for the product space, and not to spend too much time documenting or making pretty diagrams.

The Test Plan would continue with a strategy for characterizing the quality of the product with sufficient confidence that it becomes OK to ship. There’s much more to this than just finding bugs. the Test Plan must address ALL aspects of the product that can affect quality in any way, including security issues, integration, installation, deployment, scale, stress, usability, discoverability, and so on. More on this “characterizing quality” think with part 3 of this series to follow.

The Test Plan should contain scenarios to describe the product from the perspective of the end-user, or the client business, or services on which it depends, etc. It could contain test cases, too, but a more agile approach is to have the test cases exist as self-documenting entities in the source code. Modern build systems can generate simple docs from the code that are accessible to decision makers on the team even if they don’t have (or choose to build for themselves) access to the actual Test source code.

The Test Plan is generally the first important product from Test for communicating around the software team what Test is up to, and provides a framework for the characterization of the product quality I’ll address in part 3. The rest of this post is about bugs…

Bugs are created as necessary to communicate quality issues around the team. Here are some qualities of good bugs:

·         The title is succinct, descriptive, and includes keywords (to be searchable)

·         The bug is atomic, so addresses just one fix in one code change or change set

·         The bug is clear and has enough detail for the intended audiences, primarily program managers and developers, but also other people in Test and executives

·         The bug has screenshots if that helps at all with making the bug understandable, e.g.

o   A screenshot of a GUI as seen by product end-user

o   A screenshot of a section of an XML document, formatted and colorized per the IDE used in the developer team, if the bug relates to that XML

o   A screenshot of some specific code as seen in the IDE

·         Links to related, dependent bugs, or bugs on which this bug depends

·         Links to tests cases and/or test case failures if those are maintained in the same database

·         Links to documents

That could potentially add up to a lot of work from Test, and very big and detailed bugs. Watch out for too much detail, though; one of the risks of creating these bugs is that it could be too specific, when the problem that the bug reports is part of a larger problem. Addressing the bug as a sub-problem when the bigger issue gets unrecognized risks losing track of the issue, which could create rework and/or product quality risk.

Bugs are work items that have state, e.g.

·         New

·         Active

·         In progress

·         Fixed

·         Postponed

·         Duplicate

·         Won’t fix

·         Closed

As they “bounce” around the group to be handled by team members asynchronously i.e. when the time is most efficient for them. But, they also create a searchable record of product quality issues, a record of which areas Test has been working on the product, and inform the decision of when to ship product.

The product will ship with unfixed bugs! (If not, it hasn’t been tested, and it probably shouldn’t ship at all.) This will be addressed in the following posts.

There are two more posts to this series:

The Software Quality Process, part 2 of 3: Triaging bugs, and bug flow
The Software Quality Process, part 3 of 3: Quality Characterization, Bugs, and When to Ship

Wednesday, April 10, 2013

For Your Quality Customers, Add Value with Every Change

Who are your customers?

Of course, you’re developing software for the end user. The other members of your team are the first customers though.

I’ve written about many techniques of advanced software quality that can reduce risk and strengthen your quality story. This is how the Test team can best add value and make the Devs more productive, meaning that you can ship faster and with lower risk!

To inspire trust and reduce risk to the team, every change set that goes into the product must add quality value, that is, information about the quality of the product.

The problem is, very few software projects are starting anew. Most have some quality infrastructure, maybe some copy/pasted scripts, maybe a set of test cases that are run manually. The team members are customers of this existing infrastructure.

So, existing quality assets such as these must be maintained or replaced. At every change to documents are code, value is added, never taken away.

This is important for the same reason that failures in test code, or failures that are perceived as being due to test failures, must be fixed ASAP: the quality knowledge of the product must always advance and improve. If it does not advance, due to dropped coverage from “old” test infrastructure or tests that fail so often they’re perceived as not worth fixing, then parts of the product are not tested anymore, and knowledge of and stability of the product is lost. This kind of project rot must be avoided.

Every change and every addition to product quality infrastructure, no matter how sophisticated, agile, actionable, self-reporting etc. must add to existing knowledge of the product.
This makes a strong and productive team: mutual respect and attention to keeping the quality moving forward.

Tuesday, April 9, 2013

An Organization and Structure for Data-Driven Testing

This post follows up on the one from yesterday:

So, data-driven testing is the way to go for a huge return on finding and regressing product issues and measuring the whole quality picture. How to start?

I like XML files to do this. Here are some reasons:

1.       Your favorite text editor will work for reviewing, editing, and extending your test set.

2.       If given values for a test are optional and you provide defaults as needed, the XML can be “sparse” and even easier to read and edit. The data that drives the test also expresses the focus and reason for the test, in the data itself!

3.       You can be as constrained or as loose with the data schema (the layout of the XML data) as you want.

4.       Extending your data engine can be as simple as allowing and parsing different values. For example, for testing with objects that include pointers or references, you can put “null” as a value in your XML and have your engine parse and use that for the test, in the context as defined in the XML.

There are many engines that help with data-driven tests, or with some time and skill, you can write your own.

To make the tests more readable and extensible, use different XML files to drive different kinds of tests – e.g. positive vs. negative tests, scenario A vs. scenario B, vs. scenario C. With appropriate labels, comments, error messages and bug numbers inline with the data for the individual test, all your tests can be self-documenting and even self-reporting, freeing you from maintaining documents with details about the tests and removing that source of errors and potential conflicts.

A relational database is a more powerful way of handling large amounts of structured data. This would be a better choice for example if you were doing fuzz testing by generating large numbers of tests, according to your randomization scheme, and then saving to and executing from a SQL database. Even with fuzz testing, it’s very important that tests be as repeatable as possible!


Monday, April 8, 2013

The Power of Data-Driven Testing

This post assumes a focus on integration and end-to-end testing of the less-dependent parts of a product, where the greatest quality risks are found: in the business logic, data or cloud layers. See this post for a discussion of why this is most effective for a product that has important information:

Automated testing usually involves some inline code in a class method. A common pattern is to copy and paste code, or create test libs with some shared operations and call the libs from the test method. The tests correspond to the methods 1:1, so 50 automated tests look like 50 methods on a class with minor hard-coded variations between repeated patterns in code.

For repeated patterns like this, there’s a much better way: data-driven testing.

Data-driven tests use a data source to drive the tests. Within the limits of a pattern of testing as defined by the capabilities of the system reading the data to drive the test, each set of data for the pattern drives an individual test. The set of data for each test could be a row in a relational database table or view, or an XML element of a certain type in an XML document or file.

Why is this better?

For one, agility. The test set can be modified to fit product changes with changes in the test-driving data, at very low risk. It can also be extended as far as you want, within limits described by how the data is read.

Helping the agility comes readability, meaning that it’s easy for anyone to see what is tested and what is not for a given test set. It’s easy to verify that the equivalence classes you want covered are represented for a given set, or the pairwise sets are there, boundaries are checked with positive and negative tests, etc. for a given test set.

To help readability, you can put readable terms into your test-driving data. Containers can have “null” or an integer count or something else. Enumerated types can be a label used in the type, say “Green,” “Red” or “Blue”, or the integer -1 or 4 for negative limit tests.

Best of all, failure of a specific test can be tracked with a bug number or a note, for example, “Fernando is following up on whether this behavior is by-design” or “Bug 12345” or a direct link to the bug as viewed in a browser. When a test with a failure note like this fails, the test artifacts will include a note, bug number, link or other vector that can significantly speed triage and resolution.

The next post

Has some notes on organization, structure and design for data-driven tests.