Roman on Software Engineering: Scrum - Part IV: Predicting load and overhead

Continuing my series on Scrum and iterative planning, I'd like to touch upon a topic rarely mentioned in the various Agile talks and courses: that is the subject of disruptions and unplanned work.

Imagine yourself a situation: you have a perfectly planned sprint, the burndown chart is flying towards its target at the bottom right like an arrow from Robin Hood's bow, and then you get a call from TechSupport: critical widespread defect affecting several of your key accounts.
Immediate triage shows that the fix is not trivial and you need to divert two of your best engineers for at least a couple of days to provide a resolution.

The team and sprints are both small, so Robin Hood's analogy is no longer there. Actually, the chance of the team to complete what they've committed is fairly close to zero, especially once you'd be done with all the activities: triaging, fixing, building, releasing and dealing with the various e-mails.

Now, someone might say that a sprint is a sprint: yes, we may have incoming work, but it goes into the backlog, not immediately drops on our lap. In fact, in a previous company of mine there was even an official line management goal derived by smart and well meaning people: "There shall not be instances of working on non-sprint tasks".

Well, here's the news - sprints do not pay us money. Customers do. If I'm a customer of British Gas (fortunately, I'm not), my electricity supply stops, I call them, and get an answer such as: "your outage is not in our current sprint, so you go on the backlog", I won't be their customer for much longer.

(As a side point, this did happen to me with a major mobile network. Fortunately, I was still in the cool off period, and promptly cancelled the contract)

So, we can take it as granted - if you're working on an active product that people pay money for, you can and will get tasks that are more important than what you have pre-planned.

How do we deal with that then? The most natural solution is contingency: if your team's capacity is X, then just plan half/two-thirds/five-eights of X, and leave the rest for emergencies.

Now, that's a good start, but there is still scope for improvement. Underplanning a sprint is as bad as over-committing, since predictability is not there in both cases. If you're managing a busy product team, then your disruptions can vary depending on many factors, and contingency will shift with them. Let's throw a few examples of what can affect contingency:

Sales cycles. End of Q4 and mid-June are not born equal.
Major releases. If you've just put a big feature out, the likelihood of ricochets flying is high.
Initiatives in adjacent teams. A team near you might be starting on a big project where they'll need your team's module or domain expertise.
VLE (Very large enterprises) rolling out your products. Expect defects and TechSupport escalations - enough said.
Release cycles and regression testing.
Infrastructural changes. For example, IT might be changing the routing in your Dev network where your automation equipment sits.
Organisational changes. Technical Support or Pre-Sales might be reorganising so you might be more exposed to customer issues.
Hiring and arranging job interviews.

This is not an attempt to provide an exhaustive list, and of course different organisations have different types of disruptions. However, I hope that by now you've been nodding your head - especially if you've been in the middle of more than a few sprints.

Teams do not operate in a vacuum. On the contrary, we operate in a state of Brownian motion where various problems bump into us and we similarly bring our problems to others.
This all takes us back to the same leitmotif that is running throughout all of these posts: planning is an art.
When I'm planning a sprint, one of my tasks is to predict the possible forces that might affect us and adjust contingency accordingly. It is not the easiest job, and most of the time I get it slightly wrong; moreover, there's no recipe - this can be done well only by knowing your organisation and its surroundings, and experience is a must.
Perhaps the best analogy that comes to mind is guitar soloing (you can guess my hobby right there). If you're soloing over a particular chord progression, you might just stick to the song's key, and be mostly ok. If you want it to sound original and fresh, you need to be familiar with the chord progression inside out, and switch your solo key dynamically as the chords underneath it change. It is hard, but that is the way to get the best results.

So, there were many words but a distinct lack of specific advice. I'll try compensating for that with three points:

Always have a contingency.
Avoid automatically using the same contingency over and over. Be aware of what is going on around you.
Record what you used your contingency for, and use it as a feedback and learning process.

Before closing off, let's just dwell a bit on that last bulletpoint. When any unexpected task comes, it is easy just to do it, chalk off it on the mental contingency blackboard and move along. However, there's no feedback in it; if I want to learn and improve and understand why there was too much or too little contingency, knowing and reporting on what it was used for is the only way.
Of course, I'm not advocating recording tasks such as: Talked with Tech Support engineer X about customer Y - 5'43''. However, registering tasks that took at least a few hours helps both tuning planning in future sprints, and showing team's unplanned work to upper management.

In the next post, I stick my head out and share my experience on the eternal debate of story points versus real time.

Roman on Software Engineering

Friday, 23 January 2015

Scrum - Part IV: Predicting load and overhead

No comments :

Post a Comment

Blog Archive