12 Practical Steps to Avoid Stabilization Sprints
It’s all about building quality in. A stabilization sprint is only needed if the product isn’t meeting your quality requirements.
These are practical steps that you can take as a project manager or scrum master. They don’t require you to launch a company-wide, top-down initiative. You can follow these steps one at a time, and see improvement with each step you take. And then others will see your results and ask you how you did it.
#1 Recognize That You Have a Problem
You’re creating poor quality, and having to rework it. It’s just like a poor manufacturing process. You build something, inspect it, find that it doesn’t meet requirements, and either re-work it to meet requirements, or scrap it. You’ve written code, found that it’s unstable, and now you’re reworking it with a stabilization sprint.
We learned this lesson in manufacturing many years ago. Unfortunately, we had to learn it the hard way. Deming and other quality leaders taught these principles. The Japanese listened, and turned their manufacturing industries into the best in the world. Only then did the rest of us take notice and start to listen.
And now it’s happening all over again in software development. We argue that stabilization sprints (rework) are necessary. We rely too much on testing (inspection).
Software (the US) is different, we say. Bugs (defects) are unavoidable. Technical debt (inventories, delays) grows no matter what. Stabilization sprints (rework) are to be expected.
Can you hear the 20th century echos of the manufacturing industry?
We don’t ask ourselves what we need to change in how we write code in the first place. We can hardly imagine doing it any other way.
And when someone does come along and start to show us the way, we dismiss them. What they’re proposing is too disruptive, too costly, it just won’t work. We’re different. Deja vu. They ignored Deming for decades.
It’s time to apply the hard-fought lessons we’ve learned about quality in other industries to the development of quality software.
#2 Don’t Allow the System to be Unstable
First of all, even the skeptics have to admit that you don’t have to allow your production system to become unstable. You have a choice.
Even if your tests (inspection) don’t catch stability problems before they’re released, you should roll back as soon as you discover stability problems.
This simple suggestion launches a classic debate.
Immature organizations think that it’s OK (or sometimes even unavoidable) to fix broken systems in place. They allow the production system to be broken while they scramble and tinker. They’re in a hurry, and they think it’s better to fix it as fast as they can than to take the time to roll back. And it’s just too embarrassing to admit that you have to roll back.
Mature organizations, on the other hand, prioritize customer service and the stability of the system. They know that the customers are better served by having a system that’s working, even if they do have to wait a bit longer for those new features. This is incident management, and your priority is to get the system working again as quickly as possible.
If your release breaks the system, roll back to your previous release. Don’t try to fix it in place. And don’t just live with it until the next release. You need to discipline yourself to not tolerate the release of unstable systems.
You’ll need this discipline in the next steps that take you to a level of quality that will allow you to avoid stabilization sprints.
#3 Find the Root Cause
After you roll back, you’re naturally going to focus on finding and fixing the immediate issue so that you can re-release. Yes, this is rework. But we’re taking this a step at a time. If you’re having stability issues, you have to put out the fires first.
So go ahead, find and fix the issue, and then re-release the system.
Now comes the important part. To make progress toward having a more stable system in the long run, and avoiding these kinds of problems in the future, you can’t stop here. If you do, they’ll pile up until you do need a stabilization sprint.
Now you have to find and fix the root cause of the issue. What is the underlying problem that allowed this unstable code to be written, tested, and released without discovering it?
Notice that I’m not recommending you add more inspections and more testing to catch this type of problem in the future. You may need some of this, but you need to be careful not to add too much checking and testing.
This is what happened in the manufacturing industry. When defects weren’t found until the customer found them, we’d increase the depth and breadth of our quality inspections. More testing, more inspections, more items on the checklists. It’s a knee-jerk reaction that creates too much inspection, and gives quality control a bad name.
Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. As Harold F. Dodge said, “You can not inspect quality into a product.”W. Edwards Deming, Out of the Crisis, p. 29
Instead of adding more testing and inspections, go further up the value chain, and find the root cause of the problem.
Be careful not to focus on who caused the problem. Instead, focus on what caused it. What about the environment and processes could be changed to improve the quality and avoid this type of problem in the future?
#4 Don’t Interrupt the Developers
If you’ve worked as a software developer, you know about “the flow” or “the zone”. It requires a lot of concentration to write good code. There are so many variables and ripple effects to consider. It takes a few minutes to get into this state of concentration.
Break this concentration, and you dramatically increase the chances of bugs. Interrupt a software developer every 15 minutes, and they’ll get almost nothing done all day, and you’ll end up with lots of bugs, poor design decisions, technical debt, and so on.
Do everything you can to encourage and protect their ability to be in the zone. You need to minimize distractions. Out of the corner of their eye, they see someone walk past their cubicle. That’s a distraction. They hear a conversation down the hall. Another distraction.
Have you ever wondered why software developers prefer to work in dark rooms with no windows and away from everyone? It’s not that they’re anti-social. (Well, maybe they are, but that’s another issue.) They’re instinctively attracted to environments that minimize distractions.
If you put them in cubicles or open spaces, they’ll wear headphones to block out the noise. But the music can be a distraction, too. If I put on music, it helps to drown out the noise, but I don’t hear the music. After a few minutes, I’m in the zone. Hours can go by, and I won’t remember what’s played.
It’s like when you get home and realize that you don’t remember the drive? You had something on your mind, and you drove home on autopilot.
On the other hand, a new song starting up can interrupt your train of thought, and break the zone.
Headphones are just an attempt to compensate for a bad environment. It’s better to fix the environment.
Give developers their own offices. If you can’t do that, get the cubicle walls as high as possible. Let them put up their own visual barriers if necessary. Let them wear their headphones.
And don’t interrupt them when they’re working.
You could reserve the afternoons for the zone. Hold all meetings in the mornings. Collaborate in the mornings. No meetings and no interruptions are allowed the afternoon.
Or you could flip it reserve the mornings for the zone. Whatever you do, the idea is to reserve at least half the day for uninterrupted work.
#5 Do Code Reviews
Something magical happens when one developer explains their code to another developer. As they’re describing it, they suddenly realize something. You can see the “aha” in their face. A better way to do it. Or a ripple effect they hadn’t considered. Or a mistake they made (probably when they were interrupted).
This is even more common than the other developer finding something that could be improved. The simple act of explaining it to someone else causes you to see things you didn’t see before.
And, you’ll also get plenty of good input and advice from the other developer as well. They’ll see things differently, and help to catch defects and technical debt.
Code review will help you avoid the very problems that cause the need for stabilization sprints.
#6 Do Design Reviews
And why not do design reviews as well as code reviews? The same magic happens when two or three developers review proposed designs, explore options, and brainstorm ideas.
Better designs create less technical debt, are more flexible, and more stable.
#7 Develop Automated Regression Tests
Don’t overdo this one. The key is to identify the most critical paths through the system. For an eCommerce site, they would be:
- Browsing products
- Adding items to the cart
- Making a purchase
- Signing up
Create regression tests for these paths. Make sure your system passes these tests before you release. More than that, run these tests with every code check-in.
It’s far better to catch something that breaks a critical path right in the moment, when the details of the change are still fresh in the developer’s mind. And you won’t waste time searching through a sprint’s worth of changes to figure out which one broke it.
Don’t worry about the other paths. Regression testing needs to be fast – like 5 minutes – so that you don’t have a barrier to running them with every code check-in.
If you want, you can have more regression tests, and you can run them at night. It won’t matter if these take an hour to run. In the morning, if the tests found a problem, you only have 1 day of check-ins to go through to find the source of the problem. That’s much better than waiting until the end of the sprint. Or worse, until it’s deployed and it breaks.
#8 Consider Paired Programming
This may sound extreme, but in some cases it may be worth it. Years ago a friend and I designed and built an animation graphics library for educational software for IBM. We spent 6 weeks designing it in detail, coded it up, tested it, found and fixed 2 typos during 1 day of testing, and deployed it.
10 years later, it was still in use. In all that time, they never found a single bug. Not one.
And they never added a line of code to it. It did everything it was supposed to do – flawlessly.
#9 Create a Staging Environment
Some issues don’t show up until it’s under a load in a production-like environment. Instead of waiting for these to show up in production, be proactive and create a staging environment that has the same setup as your production environment.
Deploy your release candidate to your staging environment, and use some testing tools to simulate a heavy production load. Lots of users logged in at once. Lots of transactions going on. Make sure that it will handle your busiest peaks of traffic.
Then, push it until it breaks. Know how much it can take. Investigate any limits that you think may become a problem. Fix them before they become a problem and require a stabilization sprint.
#10 Monitor Production
You need spot and solve production problems before they happen. Monitor response time, memory usage, CPU usage, bandwidth usage, disk usage, disk faults, transactions per second/minute/hour, traffic, and so on.
Replace hardware before it fails. Increase memory and bandwidth before they’re exhausted.
Investigate episodes of poor response time before they spread.
Discover problems while they’re still small – before they need a whole stabilization sprint to deal with them.
#11 Select the Right People for the Team
If your team doesn’t have the skills to do all this, then add people to the team who do. Or train the people you have.
If they aren’t willing, then you need to convince them, or replace them. It sounds harsh, but if your team isn’t willing to do things differently, then you’ll never get different results.
However, keep in mind that a lot of these concepts are initially counter-intuitive. After all, the US manufacturing industry resisted these quality concepts for decades. Even then, they only started adopting modern quality practices when they were forced to by the competition. Give people some time.
Some people will never give up on the idea that they need stabilization sprints. Others will get it right away. Most will take a bit of coaching and mentoring from you.
#12 Sharpen the Saw
Never stop learning and improving. Adopt continuous improvement as a personal habit. Help your team and your organization adopt a culture of quality. The benefits will go far beyond just avoiding stabilization sprints.
Learn more about, and implement what you learn. Nothing sells quite so well as proven results. The more success you have by applying sound quality principles, the more people will want to know how you did it.
For more ideas on how to have more success with your Scrum projects, check out Why Scrum Fails: The 2 Main Reasons.