Wednesday, April 10, 2013

For Your Quality Customers, Add Value with Every Change


Who are your customers?

Of course, you’re developing software for the end user. The other members of your team are the first customers though.

I’ve written about many techniques of advanced software quality that can reduce risk and strengthen your quality story. This is how the Test team can best add value and make the Devs more productive, meaning that you can ship faster and with lower risk!

To inspire trust and reduce risk to the team, every change set that goes into the product must add quality value, that is, information about the quality of the product.

The problem is, very few software projects are starting anew. Most have some quality infrastructure, maybe some copy/pasted scripts, maybe a set of test cases that are run manually. The team members are customers of this existing infrastructure.

So, existing quality assets such as these must be maintained or replaced. At every change to documents are code, value is added, never taken away.

This is important for the same reason that failures in test code, or failures that are perceived as being due to test failures, must be fixed ASAP: the quality knowledge of the product must always advance and improve. If it does not advance, due to dropped coverage from “old” test infrastructure or tests that fail so often they’re perceived as not worth fixing, then parts of the product are not tested anymore, and knowledge of and stability of the product is lost. This kind of project rot must be avoided.

Every change and every addition to product quality infrastructure, no matter how sophisticated, agile, actionable, self-reporting etc. must add to existing knowledge of the product.
This makes a strong and productive team: mutual respect and attention to keeping the quality moving forward.

Tuesday, April 9, 2013

An Organization and Structure for Data-Driven Testing


This post follows up on the one from yesterday:


So, data-driven testing is the way to go for a huge return on finding and regressing product issues and measuring the whole quality picture. How to start?

I like XML files to do this. Here are some reasons:

1.       Your favorite text editor will work for reviewing, editing, and extending your test set.

2.       If given values for a test are optional and you provide defaults as needed, the XML can be “sparse” and even easier to read and edit. The data that drives the test also expresses the focus and reason for the test, in the data itself!

3.       You can be as constrained or as loose with the data schema (the layout of the XML data) as you want.

4.       Extending your data engine can be as simple as allowing and parsing different values. For example, for testing with objects that include pointers or references, you can put “null” as a value in your XML and have your engine parse and use that for the test, in the context as defined in the XML.

There are many engines that help with data-driven tests, or with some time and skill, you can write your own.

To make the tests more readable and extensible, use different XML files to drive different kinds of tests – e.g. positive vs. negative tests, scenario A vs. scenario B, vs. scenario C. With appropriate labels, comments, error messages and bug numbers inline with the data for the individual test, all your tests can be self-documenting and even self-reporting, freeing you from maintaining documents with details about the tests and removing that source of errors and potential conflicts.

A relational database is a more powerful way of handling large amounts of structured data. This would be a better choice for example if you were doing fuzz testing by generating large numbers of tests, according to your randomization scheme, and then saving to and executing from a SQL database. Even with fuzz testing, it’s very important that tests be as repeatable as possible!

 

Monday, April 8, 2013

The Power of Data-Driven Testing


This post assumes a focus on integration and end-to-end testing of the less-dependent parts of a product, where the greatest quality risks are found: in the business logic, data or cloud layers. See this post for a discussion of why this is most effective for a product that has important information: http://metaautomation.blogspot.com/2011/10/automate-business-logic-first.html

Automated testing usually involves some inline code in a class method. A common pattern is to copy and paste code, or create test libs with some shared operations and call the libs from the test method. The tests correspond to the methods 1:1, so 50 automated tests look like 50 methods on a class with minor hard-coded variations between repeated patterns in code.

For repeated patterns like this, there’s a much better way: data-driven testing.

Data-driven tests use a data source to drive the tests. Within the limits of a pattern of testing as defined by the capabilities of the system reading the data to drive the test, each set of data for the pattern drives an individual test. The set of data for each test could be a row in a relational database table or view, or an XML element of a certain type in an XML document or file.

Why is this better?

For one, agility. The test set can be modified to fit product changes with changes in the test-driving data, at very low risk. It can also be extended as far as you want, within limits described by how the data is read.

Helping the agility comes readability, meaning that it’s easy for anyone to see what is tested and what is not for a given test set. It’s easy to verify that the equivalence classes you want covered are represented for a given set, or the pairwise sets are there, boundaries are checked with positive and negative tests, etc. for a given test set.

To help readability, you can put readable terms into your test-driving data. Containers can have “null” or an integer count or something else. Enumerated types can be a label used in the type, say “Green,” “Red” or “Blue”, or the integer -1 or 4 for negative limit tests.

Best of all, failure of a specific test can be tracked with a bug number or a note, for example, “Fernando is following up on whether this behavior is by-design” or “Bug 12345” or a direct link to the bug as viewed in a browser. When a test with a failure note like this fails, the test artifacts will include a note, bug number, link or other vector that can significantly speed triage and resolution.

The next post


Has some notes on organization, structure and design for data-driven tests.

Tuesday, November 6, 2012

MetaAutomation Grows Up: New, Refined Definition

I'm preparing for my talk tonight at SeaSPIN meeting in Bothell, WA, with details on the site here:
 
 
This is the first time I've presented the material in this medium (a one-hour talk) and in preparing, I have a new and more refined definition:
 
Metaautomation is a meme of well-known practices and interrelated software technologies, used to communicate, plan, and execute on scaling automation up in strength and effectiveness and integrating software quality more effectively into the SDLC.
First-order metaautomation describes technologies applied at automation runtime.
Second-order metaautomation includes techniques applied to artifacts of one or more automated tests at some time after a test run is complete.

If you're in the neighborhood, be sure to vote, then come on by!
 

Wednesday, October 3, 2012

Metaautomation and the Death of Test, part 2: the Quality Profession



One of the reason you want to keep testers around is that their motivations are very different than the devs. Devs want to check in working code, to earn their magnetic ball toys and the esteem of their peers. Testers want to write highly actionable automation – the topic of this blog – and measure quality so the team can ship the code, but especially to find good bugs, to earn their wooden puzzle toys and the esteem of their peers.

Here’s my post on the Metaautomation meme http://metaautomation.blogspot.com/2012/08/the-metaautomation-meme.html for describing how automation can provide the best value for the SDLC (software development lifecycle).

Automation is just part of the quality picture, but an important one. Many years ago, all STE’s at Microsoft were encouraged to become SDETs – i.e. to learn how to automate the product - because Microsoft recognized the importance of having quickly and accurately repeatable regression of product behavior.

Now, if automating the product – make it do stuff repeatedly! – is all there is, then it’s reasonable to suppose that devs can take a little time out of their normal duties to automate the product. But of course, that takes time – sometimes a surprising amount of time – and they have to maintain the automation as well, or just ignore it when it starts failing, which makes it worse than useless.

The idea that all you have to do it simple automation, with no care towards making failures actionable, is myopic IMO (although, attractive perhaps to the business analyst). I address this in more detail here http://metaautomation.blogspot.com/2011/09/intro-to-metaautomation.html.

This post addresses the importance of testing to the SDLC the SDLC http://metaautomation.blogspot.com/2012/01/how-to-ship-software-faster.html.

This post is about managing for strong, actionable automation and looking forward to second-order metaautomation http://metaautomation.blogspot.com/2012/08/managing-metaautomation.html.


Not all of these techniques are completely new. Some are practiced in corners of Microsoft, and (I’m told) Amazon. The metaautomation meme just makes it easier to describe and communicate how far the team can (and in some cases, should) go to make the quality process more powerful.

Metaautomation is the part of the test profession that is expressed in automation.

 

Are there other labels that people use to describe what I call first- and second-order metaautomation? Please comment below and I will respond.

Metaautomation and the Death of Test, part 1: No Actually, you need Test


There’s a meme going around, mostly out of Google it seems, that “Test is Dead.”

The prototypical example of this is gmail. The special SDLC (software development lifecycle) of gmail, for purposes of this meme, goes like this: devs write some feature. Devs do unit testing and some E2E testing, check in, and move on. The code is deployed to live on some (not all) web servers on the farm. End-users notice issues, end users have nothing else to do so they use the “Report a bug” control on the page to send a bug back to Google, Google receives the bug and the report is well-written with sufficient detail, but not too much, so the bug can be prioritized and potentially fixed. Tada! Testing has been outsourced to customers.

… except that the conditions that must be true for such a scenario to work tightly limit the applicability of this technique. See for example this link, which discusses the security implications of this approach:  http://metaautomation.blogspot.com/2011/10/no-you-cant-outsource-quality-detour.html. The end-users must know exactly what to expect from whatever product it is, and they’re not going to read a manual or requirements spec, so the functionality must be a reworking of some well-known idea, say, an in-browser email client or an online game of checkers. No automation is available, so regressions might go undetected for a while and be more expensive to fix than otherwise, and fixing the regression might even break the feature set with code changes that causes the regression in the first place. Clearly, this technique is much too risky for important or mission-critical data e.g. financial or medical data.

But, there’s one idea here that does work and is worth elaborating: devs are accountable to do some degree of E2E testing.

Why is E2E testing important for devs? Can’t they just toss code over the wall, after unit tests pass, and let testers deal with it? After all, that’s their job… but testers have better things to do, which is the topic of part 2 http://metaautomation.blogspot.com/2012/10/metaautomation-and-death-of-test-part-2.html 

Imagine if dev implements a feature, sees that unit tests pass, thinks “good enough” and checks it in. Assume that E2E tests are not written for this feature, because hey, it’s a brand-new feature. Build of the product in the repository succeeds. (Assume the team is not doing branch development.) Whoever tests that build first finds some issues, and writes bugs, and puts them on the developer’s plate. The dev eventually sees the bugs, theatrically slaps his/her own forehead, repros the bug and with minimal research, fixes it. If the bug isn’t tended to for a week, this is even more expensive because the code associated with the bug might not be so familiar to the dev, so it would require more research to fix the issue.

It would be MUCH better if the dev tested out the feature first with some E2E scenarios, before the checkin, or have the tester take the changeset (using Visual Studio’s TFS, this is a “shelf set”) and do some testing of the changes, to find the issues before checkin. Why better? Because a) the fix will be quicker to do b) no need to enter bugs for the record, and c) nobody need be hindered by the broken functionality of the issues, because they’re never checked in. Oh, and d) triage doesn’t have to look at the bugs, because there aren’t any reported bugs.

Another useful way to address this is to check in tests for the feature at the same time that the feature is checked in, which means that whoever wrote the E2E tests (probably a tester) combines that changeset with the product feature change. This can save a lot of churn, and the symmetry of checkin in the combined feature and quality tests looks simple and robust. The problem is if the feature automation is not ready when the feature is, and checkin of the feature would be held back. That might slow the dev down and for a complex product, there are likely other stakeholders (dev, test, and PMs) waiting on the changes, so the cost of waiting must be compared to the value of doing a unified dev + test checkin.

Therefore, the dev should be expected by team convention to do some amount of E2E testing of a new feature. How much?

For simplicity of argument, assume for the moment that nobody from Test is involved before checkin.

Too little testing on the dev’s part, and the dev won’t find his/her obvious, blocking bugs. (“Blocking” means that functionality is broken and breaks a scenario or two around the new feature, so some testing and other use of the product is blocked.) Too much, and the feature is delayed, along with other features and other work that depends on the feature.

I asked this question – how much testing by the devs? – of James Whittaker, when he gave a talk last month at SASQAG in Bellevue, Washington.

(Inline references: James’ blog is here http://blogs.msdn.com/b/jw_on_tech/. SASQAG is here http://www.sasqag.org/. )

James’ answer was that it depends on the dev’s reputation for quality. Fair enough, but I’d prefer to start out with formal, uniform expectations and relax them for individuals as they earn the team’s trust:

First, have the test team define repeatable E2E test cases for the new feature being implemented. These test cases are to be used through the SDLC and beyond, so might as well write them earlier in the cycle than they normally are. Give the test cases sufficient detail that anybody who knows the product can run them, and precise enough that distinct bugs are always correlated with different test cases.

Then, have the devs execute the test cases when they think the feature is ready. If the feature is non-GUI (e.g. an SDK API) then maybe the E2E test can be implemented easily too, and the test run that way, before checkin and then afterwards for regression. If it’s a GUI feature e.g. in a web browser, probably the feature can’t be automated before implementation is complete.

I recommend a minimum of two happy-path test cases, one edge case if applicable, and one negative case. It’s expected at project outset that a) a tester writes the test cases before the feature is implemented b) the dev (and maybe the tester too) runs the test cases before checkin.

This will save the whole team a lot of time, but especially the testers… for the good of the product, they should be extremely busy anyway, which is the topic of part 2 of this post. http://metaautomation.blogspot.com/2012/10/metaautomation-and-death-of-test-part-2.html 

Thursday, August 30, 2012

The MetaAutomation Meme


The word “Meme” was coined by British evolutionary biologist Richard Dawkins to describe the spread of ideas and cultural phenomena, including cultural patterns and technologies.

Metaautomation describes a set of techniques and technologies that enable a view of software quality that is both deeper and broader than is possible with traditional software automation alone, and given sufficient investment, this can be taken further to do smart automated test retries and even automated triage and pattern detection that wouldn’t be possible with traditional techniques.

For the more advanced metaautomation concepts, the investment and risk are greater, and the potential reward in terms of team productivity are much greater. So, I’m dividing the meme into two parts:

·         First-order metaautomation: making test failures actionable, and minimizing the chances that a debugging session is necessary to find out what happened

·         Second-order metaautomation: creating some degree of automated triage, automated failure resolution, and automated smart test retry

 

Metaautomation is an innovation analogous to the simple structural arch: before arches, the span and strength of bridges was limited by tensile strength (resistance to bending) of the spanning material. A famous example of this technology is North Bridge in Concord, Massachusetts.


But with arches, the span and strength is limited by the compressive strength of the material used. This works well with another common building material – stone - so the technology allows much more impressive and long-lasting results, for example, the Alcantara Bridge in Spain.


The techniques of metaautomation did not originate with me, but in defining the term and establishing a meme for the pattern, I hope to make such techniques more understandable and easy to communicate, easier to cost and express benefits for the software process, and therefore more common.

The first order of metaautomation will become very commonly used as the value is more widely understood. The second order of metaautomation is good for large, distributed and long-lived projects, or where data has high impact e.g. health care or aviation systems.

Wednesday, August 29, 2012

Managing MetaAutomation


“If you can’t measure it, you can’t manage it.”

This quote has been attributed to Peter Drucker, Andy Grove, Robert Kaplan, and who knows who else. Oh, and me. I said it, so put me down on the list too.

The common measurement of automation is the number of test cases automated. Since what management measures is what management gets, one result of this practice can be an antipattern:


a product scenario is exercised, probably to completion, but confidence about that completion can be elusive, and in case of any kind of failure, a very significant investment is required of the test developers to follow up and resolve the failure to an action item – which can cause team members to procrastinate on resolving the failure because that’s not what’s being measured, and the behaviors addressed by the failing automated tests get ignored for a time, which in turn causes project risk because the product quality measurement provided by test automation is disrupted.

How does one encourage the correct behaviors to get robust automation with strong, scalable value towards measuring and regressing product quality – and positively measure the team members’ behaviors, too? I’m talking about metaautomation, of course, and how to encourage progress towards metaautomation in output from the team. Here are some thoughts on useful performance metrics towards that end.

Some goals for your team:

·         advance the effectiveness of test automation to achieve quick and effective regression detection

·         achieve quicker and more accurate triage to keep needless work off people’s plates

·         reduce wasted time for everybody on poorly-defined failures

(that is first order metaautomation, the topic of a future post)

… and beyond that, where a deeper investment in quality is warranted, look forward to

·         smart automated test retry

·         some degree of automated triage

(this is the second order of metaautomation, to be covered in more detailed also in a future post)

I think improving team spirit and cohesion, and improve technical learning in your individual contributors, can be achieved at the same time. In order to get there, measurement of performance in these areas must be combined with other management metrics used for assessing individual performance.

Metaautomation-friendly practices accelerate the test automation rate during the automation project as classes, coding patterns and other structures are put into place. For example: Given two projects, one doing simple minimal automation (call it project A) and the other doing metaautomation to the first order (project B), project A will start out faster but will suffer over time from failed tests that are either neglected, causing blind spots in software quality, or failed tests that take significant investment to get them working again. Project B will eventually overtake project A in rate of successfully running automation, and probably eventually in raw numbers of tests automated. In project B, the quality value of running tests is much greater because the test failures won’t be perceived by the team as time-sucking noise. I covered this topic pretty well in previous posts. All team members need to understand this foundational concept.

So, how do we make metaautomation qualities (in performance of test team members) measurable at test automation time?

First, you can bring the team up to agreed-on code standards. Most projects have preexisting code, so defining the implementing the standards is probably going to be an iterative process.

This can also be a team-strengthening collaborative process. For a large project, have everybody read existing code standards (if they exist) and propose additions or changes - offline to save time. Minimally, everyone will learn the code standards, but much better, they have some ownership in improving the standards, through an email thread or wiki. This shouldn’t take a lot of time, and is a great opportunity for team members to learn team practices and show their ability to contribute to the team while learning how to write more effective, readable, maintainable, metaautomation-friendly code themselves. In Test, this allows them to feel more ownership than testers normally have AND emphasizes team contribution and learning.

Peer code reviews are an even better opportunity for team members to communicate, learn from and influence each other with respect to these coding practices and standards. Just as it’s important for testers to learn the whole project, they benefit from learning the whole team as well, and I advocate that everybody get chances to review others’ work as an optional or required reviewer. This is another opportunity to bring out team players, bring the team together, and give introverts opportunities to reach out with two-way communication and learning. Testers should be encouraged to push for testability in the product code, and qualities of metaautomation – per the earlier team agreement – in test code. Suggestions must be followed up on, not necessarily in the code itself, but it’s important for everybody on the team to recognize that they are all learning and teaching at the same time. No cowboy code allowed!

For example: in the case of discussing a topic for which developer Foo is much more knowledgeable than developer Bar, developer Foo is expected to provide some educational assist to Bar, e.g. a link and some context. Foo and Bar will both benefit from a respectful transfer of information: Foo from the greater understanding that comes through the teaching process (however minimal), Bar form the learning, and both of them from team cohesion.

See what testers can come up with for techniques to improve visibility into the root cause of any one failure – i.e. if a test fails due to some specific failure, is it easy to find root cause of the failure by inspecting output – the artifacts of the failed test case run?

Encouraging everybody to communicate with each other in terms of the code will accelerate learning and improvement all around, and if done right, will improve team cohesion as well. It will also bring out the value of the individual contributors as team players, and since team members will all figure out that this is one thing that management is noticing, they’ll do their best to help each other out and not default to isolation.

I think this is a great opportunity for positive reinforcement from the test lead or manager; not singling out an individual for praise, which can have negative effects on morale, but rather noting and raising the visibility of ways in which the team can achieve things through teamwork, that none of the individuals on the team could achieve. Positive reinforcement is appropriate here because the encouraged behaviors are associated with learning, collaboration, and innovation.

Here are summary steps to strengthen your team using principles of metaautomation:

1.      Establish that the pro-metaautomation behaviors described here are expected

2.      Encourage and give positive reinforcement at a team level

3.      Make measurements of contributions and integrate these measurements with other metrics and expectations used in evaluating performance

Using these as a guide, you can make metaautomation manageable, and lead your team to new strengths in promoting a quality software product.

Thursday, July 19, 2012

The Legacy of Stephen R. Covey: testing with Character, not Personality


Stephen R. Covey, author of the hugely successful book "The 7 Habits of Highly Effective People," died July 16th 2012.

I picked up his book because I realized that I need to nurture my own leadership skills in order to promote product quality at the next level. Working as an individual contributor doesn't cut it anymore; I need to influence the big picture, not patch up automation projects that are already established according to the patterns of software quality as it's generally practiced today.

Serendipity! There it is, in the first full chapter of his book. Covey pithily describes his journey of discovery through American self-help literature as a tension between two approaches toward being personally effective: The "Character Ethic" and the "Personality Ethic."

The Character Ethic is deep and foundational, but not always obvious on the surface behaviors of a person. The Personality Ethic displays on the surface, but doesn’t necessarily have any depth.

Covey dismisses the Personality Ethic approach to increasing personal effectiveness as beneficial in some settings, but weak in the long term and perhaps even damaging, whereas the Character Ethic flows from a person's basic values and practices.

He describes his own paradigm shift, triggered by a parenting challenge with one of his sons. Covey used the personality ethic as taught in his day to guide his parenting style, but he transitioned to the character ethic (as described by Benjamin Franklin) with good effect. The Character Ethic requires taking responsibility, and Covey shows that his paradigm shift is complete when he takes responsibility for his actions re. his son.

I've seen this happen at many software teams: people do software quality by emphasizing a basic automation of the product. The result is N test cases automated, and M of them pass. What happens with the N-M test cases automated that do not pass is not considered important, except that the team recognizes the need to get these test cases running again at some point. The highest priority - I've seen this priority be placed even higher than fixing "broken" test cases - is automating more test cases. The count of test cases automated is a superficial measure of success and productivity in measuring the quality of the product, but it's what they are measured on. Management needs a metric, and this is the best or most usable one they know of.

Covey's shift from personality ethic to character ethic is a pretty good analogy to the shift away from the automate-as-many-test-cases-as-you-can focus, towards the deep automation that really tests the product, and attempts to maximize the actionability of test failures. (The first approach might use a minimal positive-result verification like this: the automated test didn't throw an exception, therefore it's good.)

Focusing on automated test count– that is, the traditional way of automating the product - has value in the near term because it really does make the product do stuff and you can find a lot of product bugs through the process of creating automation. But, in the longer term with this approach, the inevitable failures of these tests are costly to follow up on and often the team gets an inaccurate picture of how completely the quality of the product has been measured because the automated test might hide failures or even be testing the wrong thing. I've even seen a very expensive 3rd-party test harness report success on a test, when on detailed follow-up, I found that the test didn't do anything meaningful.

Test automation with attention to metaautomation takes some up-front test harness design and more careful test automation with attention to patterns of failure reporting, but in the longer run a better and more complete measurement of product quality happens; test failures due to product issues are more actionable and are likely to be followed up on quickly. Trust in test automation code is higher, which enables the team to be more productive. Most importantly: failures due to regressions in the product are fixed more quickly, which keeps quality moving forward and reduces risk associated with product changes.

A test lead might ask these questions of someone automating a test: In how many different ways might this automated test fail, and what happens when it does fail?

Covey's character ethic, applied to software quality, makes for stronger quality and stronger product character. 

Here is a related post on the need for test code quality: http://metaautomation.blogspot.com/2011/09/if-product-quality-is-important-test.html 

Wednesday, February 1, 2012

The Risk of Agile, and How to Mitigate

Agile software development process is all the rage these days, and for good reason: by managing the complexity inherent in software projects, minimizing WIP (work in progress), and a bunch of other great values outlined here


I’d like to address the principles individually in a following post, but for now, I’ve experienced some real risks that follow directly from the agile process as commonly conceived and realized. I will address some of those risks here.

I’ve seen this in multiple workplaces: the project has been divided into close-knit teams, and the teams work in sprints of e.g. two weeks.  The sprint goals and costs are agreed on at the beginning of each sprint. Each team member takes on the sprint goals as they have bandwidth, and at the end of the sprint, the team does some simple demo to show the team’s work.

The general problem is, although these teams are ultimately highly interdependent - they all fit together to make the product! – they each end up developing local solutions to common problems.

Class types, XSD schemas, logging solutions, configuration files – either these are ultimately shared or they should be. If the former, software development risk is pushed back to the end of the cycle when everybody has to figure out how to integrate and work together. If the latter, there’s duplicated work with resulting cost and risk. (Such risk could show up late in the cycle: “I thought you added the streaming log option!” “I did, but over here. You guys developed your own logger, so you’re on your own.”)

The solution: Share resources early. Sorry agilites, this requires a bit of planning. Decide early on what the shared resources are, and be aware of when the need might arise for a resource that might be shared. Put shared resources in a common location…

I’ve seen cases where this is not done, and it’s very expensive. Agile dislikes planning, but without a plan, for a project that involves engineers in the double-digits, people eventually don’t know what’s going on they must guess, and often guess incorrectly. Confused and frustrated engineers can create an amusing setting (for those with a sense of humor about it) but it doesn’t ship quality software on schedule.

And remember: Duplication on information or resources is anathema to quality!