Monday, September 26, 2011
Metaautomation: Have we seen this failure before? If yes, what’s the actionable?
In previous posts I’ve talked about quality artifacts:
· Detailed info around the failure
· Strongly typed info using XML, or at least formatted with a system for group use (e.g. labels from enumerated types to give consistency and avoid typos)
· Context info around the failure
With consistent datetime formatting (or, even better, detailed-schema valid XML) and sufficiently real-time logging, logs can easily be correlated across threads.
The information needs to be formatted and clear in intent. Error messages or other information in text needs to be clear; too verbose is much better than too succinct, because if the message is too brief, it doesn’t help.
Consistent logs can be interpreted by a new member of the team, with a little product knowledge. Really consistent logs can be compared in an automated way, post-test run. For metaautomation, that’s what we need! With logging APIs that use team-specific enumerated types and custom classes, the constraints are expressed in intellisense and by the compiler, and this frees people to exercise their human intelligence on what the important stuff is being logged. (This may be the subject of a future post…)
Logs (XML, text stream, or something in between e.g. name-value pairs) are best stored somewhere on a WebDAV server (or, an FTP server) for later use, along with dumps, legacy logs, etc. A person or automated process can reference all the artifacts with read-only access for analysis after the test run!
If a tester does the analysis, it will be after the test run. Automated analysis is less risky after the test run, too, because otherwise the test can get delayed or just more fragile because of the additional points of failure from the analysis process.
I’ve implemented an automated analysis system where the artifacts are reduced to N properties per test run. The number and the names of properties and the order of evaluation are configurable. The methods of property value derivation are supplied with an executable module that only depends on the artifacts, and does internal error handling. In my case, this is a .Net assembly that exports public methods with names that map to the “Name” attribute in the XML template that determines the properties. This system is extensible by updating the XML template and the .Net assembly (and restarting the server process if necessary to pick up changes).
Every test run that meets certain criteria (in my case, I did this for one type of test only) gets reduced to valid XML stored in a row in a table in the database of automated analysis. Any failure can be associated with an actionable, and after that association, the actionable can be correlated with a later test failure by the automated system! This answers the question “have we seen this failure before?” with a decimal or floating point value for correlation strength.
This takes human input, at least for a while: people have to enter the action item association with a test failure. No system that I know is smart enough to do this reliably. I quote Michael Bolton on this: http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing/ “Sapient activity by [a human] … is needed to determine the response.”
How the different properties are weighted can be varied depending on the test failure and the actionable. For example, if a specific test resolution action item is specific to the (virtual) machine that runs the test, then the identity of that virtual machine is significant to the correlation strength, otherwise it’s insignificant and should not be factored into the correlation strength.
This system might seem a bit heavy for some applications, because after all… we’re “just” testing, right? I agree with James Bach and the Context-Driven school of testing http://www.context-driven-testing.com/ “The value of any practice depends on its context.” And, if you’re creating a game about exploding birds, this approach isn’t appropriate. However, if you’re writing high-impact applications e.g. involving financial transactions, I submit that it is appropriate to getting value out of your automated tests and making real, significant failures quickly actionable.
In practice, very few test failures in such a system would point to product failures; most failures point to other actionables, or can be ignored temporarily. (This is the subject of a future post.) But, in order to draw attention to the important failures and act on them quickly, your system (including all automated systems and the testers that work with them) needs to be able to tell the difference, even more quickly.
With automated tests running often, it’s very easy for humans to get overwhelmed with results, and they respond with general strategies in order to cope. People are smart, but they’re not very good at dealing with repetitive things. That’s why you need an automated system.