Saturday, 31 January 2015

Python profiling with cProfile - Part I: Intro


In this post I'd like to venture outside of the Agile territory for a change. There's still of course enough left to be said on that subject, but a bit of variety never hurt any blog.

So, I'll jump to "something completely different", i.e. Python profiling. It makes sense to start small, so today I'll go over the basics: i.e. take a simple programming task, implement a non-optimised solution, use profiler to find it, fix, and then move on to grander things.

The task for today is simple: take a text file as an input and count the number of sentences in it.


Attempt 1

Here's my first stab at the task. It's obviously non-optimised (as we want to show off profiling), but it also happens to be a bit incorrect.

import re, sys

SENTENCE_DELIMITER_PATTERN = r'[.!?]'

def countSentences(fileName):
   inputFile = open(fileName, "r")
   result = 0

   for line in inputFile:
      splitRes = re.split(SENTENCE_DELIMITER_PATTERN, line.strip())
      result += len(splitRes) - 1

   return result

if __name__ == "__main__":
   sentencesCount = countSentences(sys.argv[1])
   print sentencesCount

An input like that ?! Or maybe like that!! Or even like that...could easily throw it off. I could pull out a proper regex that gets most sentences right, but our focus today is optimisation, so I hope that the great regex spirit in the sky will forgive me for now.
(And yes, the error handling is non-existent, and output is not very descriptive etc etc)

In any case, running it over a sample input below gives the right answer of "6", so at least we do not have basic defects and can move on:

Test sentence 1.
Test sentence
2 ? Test
sentence 3!
Test sentence 4. Test sentence
5. Finished.
Now, let's unleash this upon War and Peace and do some timing:
    
    roman@localhost:~/blog$ time python countSentences.py WarAndPeace.txt
    37866
    real    0m0.810s
    user    0m0.388s
    sys     0m0.004s


Result looks about right, Count Leo Tolstoy could sure crack out a few sentences in his prime.
But the timing isn't too good, is it? Yes, I did not use top spec laptop for this, but still, we can do far, far better. Let's add some profiling.

Attempt 1 - Profiled

To profile, we need to do just this one change:
sentencesCount = countSentences(sys.argv[1])
with
   import cProfile
   cProfile.run('sentencesCount = countSentences(sys.argv[1])')

Disclaimer: There are more elegant ways of introducing profiling which do not involve source changes. This was the the quickest way to zoom in.

These are the results:
         390168 function calls in 3.206 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.206    3.206 <string>:1(<module>)
        1    0.838    0.838    3.206    3.206 countSentences.py:5(countSentences)
    65007    0.703    0.000    2.012    0.000 re.py:164(split)
    65007    0.551    0.000    0.753    0.000 re.py:226(_compile)
        2    0.000    0.000    0.003    0.002 sre_compile.py:178(_compile_charset)
        2    0.000    0.000    0.003    0.002 sre_compile.py:207(_optimize_charset)
        6    0.000    0.000    0.000    0.000 sre_compile.py:24(_identityfunction)
        2    0.003    0.002    0.003    0.002 sre_compile.py:258(_mk_bitmap)
        1    0.000    0.000    0.003    0.003 sre_compile.py:32(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:359(_compile_info)
        2    0.000    0.000    0.000    0.000 sre_compile.py:472(isstring)
        1    0.000    0.000    0.003    0.003 sre_compile.py:478(_code)
        1    0.000    0.000    0.004    0.004 sre_compile.py:493(compile)
        1    0.000    0.000    0.000    0.000 sre_parse.py:138(append)
        1    0.000    0.000    0.000    0.000 sre_parse.py:140(getwidth)
        1    0.000    0.000    0.000    0.000 sre_parse.py:178(__init__)
        8    0.000    0.000    0.000    0.000 sre_parse.py:182(__next)
        5    0.000    0.000    0.000    0.000 sre_parse.py:195(match)
        7    0.000    0.000    0.000    0.000 sre_parse.py:201(get)
        1    0.000    0.000    0.000    0.000 sre_parse.py:301(_parse_sub)
        1    0.000    0.000    0.000    0.000 sre_parse.py:379(_parse)
        1    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
        1    0.000    0.000    0.000    0.000 sre_parse.py:675(parse)
        1    0.000    0.000    0.000    0.000 sre_parse.py:90(__init__)
        1    0.000    0.000    0.000    0.000 {_sre.compile}
        3    0.000    0.000    0.000    0.000 {isinstance}
    65030    0.158    0.000    0.158    0.000 {len}
       41    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
    65007    0.198    0.000    0.198    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
    65007    0.556    0.000    0.556    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
    65007    0.198    0.000    0.198    0.000 {method 'strip' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {min}
        1    0.000    0.000    0.000    0.000 {open}
        3    0.000    0.000    0.000    0.000 {ord}

So what does this all mean?

The tottime column shows time spent in a function not counting further function calls. The cumtime column does the same including function calls.
For example, we spent 0.556 seconds in regex split calls; as it does not have any profiled invocations, both columns show the same value. On the other hand, our entry point, countSentences, has 0.838 as tottime and 3.206 as cumtime. It means that we spent most of the time in the various regex calls, and about 25% was spent on file iteration and other primitive calls.

ncalls is straightforward: it shows the number of times a particular function got called, and it is our first hint in optimising - we compiled the same regex 65007 times!

    65007    0.551    0.000    0.753    0.000 re.py:226(_compile)

Let's try and get rid of that.

Attempt 2


import re, sys

SENTENCE_DELIMITER_REGEX = re.compile(r'[.!?]')

def countSentences(fileName):
   inputFile = open(fileName, "r")
   result = 0

   for line in inputFile:
      splitRes = SENTENCE_DELIMITER_REGEX.split(line.strip())
      result += len(splitRes) - 1

   return result

if __name__ == "__main__":
   sentencesCount = countSentences(sys.argv[1])
   print sentencesCount

The main difference is that we are pre-compiling the regex this time.
Let's do a bit of timing, and see how far it took us:

   roman@localhost:~/blog$ time python countSentencesRoundTwo.py WarAndPeace.txt
   37866


   real    0m0.511s
   user    0m0.228s
   sys     0m0.016s


Not bad! We went from 0.81 sec down to 0.51 sec - almost 40% speed-up. We also have the same result, which is always good to verify when optimising.
We can still do better though; let's crank up the profiler once more (the technique is the same as in the first round):

         195025 function calls in 1.607 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.607    1.607 <string>:1(<module>)
        1    0.763    0.763    1.607    1.607 countSentencesRoundTwo.py:5(countSentences)
    65007    0.147    0.000    0.147    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    65007    0.507    0.000    0.507    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
    65007    0.189    0.000    0.189    0.000 {method 'strip' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {open}

An immediate observation - there are far fewer lines: regex compilation was dominating our profiling result. Of course, we still have one regex object created outside of the profiling call, however considering that each compilation took below a millisecond (percall column in the first attempt), we can safely ignore it.

So, where do we go from here? Split and strip take their time - maybe is it possible to get rid of them?
Strip is definitely easy; just removing it should not affect the result.
Split is less obvious, but come think of it: we do not need the actual sentences - we just need to count them. This means that we can iterate over the input and count, rather than split it - this should also save the {len} call.
Another idea - why do we process line-by-line? There is nothing in our code that forces that: we could easily switch to chunk-based processing, which would spare some of the parsing.

With all of that, we come to

Attempt 3


import re, sys

SENTENCE_DELIMITER_REGEX = re.compile(r'[.!?]')
CHUNK_SIZE = 32 * 1024

def countSentences(fileName):   
   inputFile = open(fileName, "r")
   result = 0

   counter = 0
   while True:
      inputChunk = inputFile.read(CHUNK_SIZE)
      if len(inputChunk) == 0:
         break

      for _ in SENTENCE_DELIMITER_REGEX.finditer(inputChunk):
         result += 1
   return result

if __name__ == "__main__":
   sentencesCount = countSentences(sys.argv[1])
   print sentencesCount

A few items worthy of note:
  • We are using finditer to go over each sentence separator. As we're not interested in the actual sentences, we use an unnamed variable (underscore as per common convention).
  • We read chunk-by-chunk and break out of the loop as soon as the read comes empty.
  • CHUNK_SIZE is 32 KB. I cheated here a bit, and tuned it behind the scenes; of course it does not take a great effort to automate it and run comparative tests with various chunk sizes.
How does this perform then?

   roman@localhost:~/blog$ time python countSentencesRoundThree.py WarAndPeace.txt
   37866

   real    0m0.248s
   user    0m0.112s
   sys     0m0.000s


Almost a twofold speedup compared to the second attempt, and four times faster than our starting point!

Can we make it even faster?
Possibly - for example, by replacing the regex with bespoke string matching. However, that would be too much cheating: if you recall, at the start of the post I conceded the point that we are very far from matching sentences accurately. If I wanted to do it properly, we need regexes, so by removing them we'd stray too far from our original intent. Tolstoy would not approve.

Let's have one final glance at profiling:

          303 function calls in 0.212 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.212    0.212 <string>:1(<module>)
        1    0.175    0.175    0.212    0.212 countSentencesRoundThree.py:6(countSentences)
      100    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       99    0.000    0.000    0.000    0.000 {method 'finditer' of '_sre.SRE_Pattern' objects}
      100    0.037    0.000    0.037    0.000 {method 'read' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {open}


Not too much here, is there? It's still possible to make a few conclusions though - for example, that increasing the chunk size can't save more than 37 msec.
However, far more important is what is not there. 
For example, looping over the iterator must be doing the heavy lifting, so where is it? Here we come to idiosyncrasies of how Python profiling works; it does not play that well with iterators and generators (amongst other things, which I hope to come to in due time). As a side note, it is possible though to gather profiling data for the hidden .next() iterator calls, but I'll skim over that to save a bit of space for now.

Summary


This all might seem like basic stuff, especially to those who have been doing Python development for a while. Yet, it is surprising how many times I've seen in the past all the inefficiencies covered in this post, and more. Partially this is because many of us make the journey into Python from other languages, which means that even senior developers may fall into a language-specific trap.

More specifically, it shows that profiling is not that hard, at least when we deal with serial algorithms, and that a combination of human+profiler can go a long way.

In later posts, I'd like to take this further and explore profiling of concurrent and asynchronous software with Twisted.

Acknowledgment

I've used SyntaxHighlighter for the coding examples - impressive and very easy to set up.

Resources

For more related resources, see:

- https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code 
- http://developerblog.redhat.com/2014/02/17/profiling-python-programs 
- https://docs.python.org/2/library/profile.html  

Monday, 26 January 2015

Scrum - Part V: Story points


This is the post where I need to don my flameproof vest, since for one reason or another the subject of story points versus plain time estimates is one of the most opinionated battlefields.
I'll start off with two seemingly contradictory statements:
  • I find story points (let's call them SPs to save on blog space from now on) useful.
  • I've actually employed them in only 2 out of the 100-odd sprints I've managed so far, and do not intend to use them in foreseeable future.
The contradiction should get reconciled at the end; in the meantime let's do the obligatory intro, which as always is better described elsewhere.
SPs is/are a comparative work estimation technique, whereby the team uses imaginary units to compare tasks to one another rather than give an absolute time measure.
For example, I might say that writing this specific blog entry is three story points, since the topic is mid-sized, and it's about three times bigger than the smallest post that's going to grace this series. (As opposed to an absolute measure of four hours which is roughly how long it would take me from concept to pressing the Publish button).

There are different variants of SPs, with Fibonacci sequences being commonly used, but in fact, there's nothing revolutionary about any of them. Comparative techniques such as T-Shirt sizes, function points and so on exist and are being used for years; long before Scrum arrived at the scene.
Steve McConnell goes over a number of such techniques in his book, which is by the way a must in my opinion for anyone serious about software project estimation. Also, while not being an estimation pioneer by a long shot, I've been employing some of those back in 2004 before the onset of Scrum's popularity.

All comparative techniques are essentially based on one single premise; the person doing the estimation is likely to get it wrong. So, instead of coming up with false precision (e.g. this refactoring job is going to take me 2 weeks, 3 days, and 4 hours), we force the estimate to coarser units that better represent our understanding.

I agree with that. The units of weeks, days, hours and yes, minutes, are usually far more precise than what we, mere mortals, know about our upcoming tasks.
The keyword here though is "usually", and this is where I need to go into long-term versus short-term planning.

Let's go back to my long-running example of multimedia player, and pick up a single task (not at random!):

Integrate Chinese UI localisation provided by an external agency

Now, unless I've integrated Far East l10n in very similar circumstances before, I simply cannot/should not/must not provide a time estimate! It will be somewhere between wrong and inaccurate, and the right way to tackle this would to use a comparative technique: XL T-Shirt size, 34 SPs, you name it.

But this is where the crux of the message comes in: I would never contemplate putting such a task into any sprint. Yes, I'll estimate it if my boss comes over and would like to have a general off the cuff idea. I will estimate it if we do roadmap planning from afar, and need to figure out what in general the team would be working on. Still, I'll never put it as-is into my next sprint.

Instead, here are typical tasks that will be a sprint candidates:

Append Chinese UTF-16 LE localisation files to the multilingual directory
Review and run QA for localised menu items

Granularity is at a completely different level here. It is vital that by the time we reached sprint planning and commitment we do not have monolithic tasks; the work is already broken into fine sized units.

Yes, we may get the breakdown wrong, but this is where the law of large numbers comes in. Let's say I have 25 small tasks in the sprint: each and every one may have a wrong estimate, but they are extremely unlikely to err in the same direction. Statistics and probabilities are on my side, and they are very good allies to have.

And here we come to the main morale of this story: with sprint planning, the small guys always win. Or, to paraphrase, having many small tasks is a must for a successful sprint.
If I have that, then it does not matter much whether I use SPs or straight time estimates. The fact is that previous analysis and law of large numbers do the heavy lifting. If all of my tasks are within say 5 SPs, or two days, then the likelihood of success is high.

So, with all things being equal, I slightly prefer time estimates. Why? Because they are more natural, easier to track, and I do not need the extra complexity from comparative estimation if the tasks are small and well understood.

If a person takes two days holiday in a three week sprint, I simply subtract two days from our sprint allowance. I do not need to run a formula which applies a factor of 13/15 to their nominal SP bucket, and then apply another multiplier to account for load factor (see my previous post). This is simply a matter of convenience - nothing else.
Of course, all this convenience would be null and void with long-term planning, which I'll come to in later posts.

To summarise: comparative estimation works great when tasks are not well understood. This is juxtaposed with short-term sprint planning, where tasks should be small and well defined.

Now, we're ready for the reconciliation I promised back at the start.
SPs is a great technique, and I've used it a number of times informally for looking beyond the next month or two - however, that was always outside of sprints. 

While the vest is still upon my person, I'd like to mention that all this is borne on experience. Apart from two exploratory attempts, we have been stubbornly using real-time estimates in our sprints. Our prediction ability was usually reasonably accurate; I'd be the last person to enter a competition, but we more often than not did what and when we promised, and after all, this is what good estimation is all about.

Of course, experiences may differ, and it's quite likely that we'll enter a situation where SPs are a must. For example, we have to jump right into several poorly planned tasks due to business forces, and we need to at least slightly mitigate the unknowns. Fortunately, I haven't been quite there yet, but this may happen. As with all techniques, religiously avoiding is even worse than unconditionally employing.

In the next instalment of these series, I move on to a slightly less controversial of workflows and task lifecycles.

Friday, 23 January 2015

Scrum - Part IV: Predicting load and overhead

Continuing my series on Scrum and iterative planning, I'd like to touch upon a topic rarely mentioned in the various Agile talks and courses: that is the subject of disruptions and unplanned work.

Imagine yourself a situation: you have a perfectly planned sprint, the burndown chart is flying towards its target at the bottom right like an arrow from Robin Hood's bow, and then you get a call from TechSupport: critical widespread defect affecting several of your key accounts.

Immediate triage shows that the fix is not trivial and you need to divert two of your best engineers for at least a couple of days to provide a resolution.

The team and sprints are both small, so Robin Hood's analogy is no longer there. Actually, the chance of the team to complete what they've committed is fairly close to zero, especially once you'd be done with all the activities: triaging, fixing, building, releasing and dealing with the various e-mails.


Now, someone might say that a sprint is a sprint: yes, we may have incoming work, but it goes into the backlog, not immediately drops on our lap. In fact, in a previous company of mine there was even an official line management goal derived by smart and well meaning people: "There shall not be instances of working on non-sprint tasks".


Well, here's the news - sprints do not pay us money. Customers do. If I'm a customer of British Gas (fortunately, I'm not), my electricity supply stops, I call them, and get an answer such as: "your outage is not in our current sprint, so you go on the backlog", I won't be their customer for much longer.


(As a side point, this did happen to me with a major mobile network. Fortunately, I was still in the cool off period, and promptly cancelled the contract)

So, we can take it as granted - if you're working on an active product that people pay money for, you can and will get tasks that are more important than what you have pre-planned.

How do we deal with that then? The most natural solution is contingency: if your team's capacity is X, then just plan half/two-thirds/five-eights of X, and leave the rest for emergencies.


Now, that's a good start, but there is still scope for improvement. Underplanning a sprint is as bad as over-committing, since predictability is not there in both cases. If you're managing a busy product team, then your disruptions can vary depending on many factors, and contingency will shift with them. Let's throw a few examples of what can affect contingency:



  1. Sales cycles. End of Q4 and mid-June are not born equal.
  2. Major releases. If you've just put a big feature out, the likelihood of ricochets flying is high.
  3. Initiatives in adjacent teams. A team near you might be starting on a big project where they'll need your team's module or domain expertise.
  4. VLE (Very large enterprises) rolling out your products. Expect defects and TechSupport escalations - enough said.
  5. Release cycles and regression testing.
  6. Infrastructural changes. For example, IT might be changing the routing in your Dev network where your automation equipment sits.
  7. Organisational changes. Technical Support or Pre-Sales might be reorganising so you might be more exposed to customer issues.
  8. Hiring and arranging job interviews.
This is not an attempt to provide an exhaustive list, and of course different organisations have different types of disruptions. However, I hope that by now you've been nodding your head - especially if you've been in the middle of more than a few sprints.


Teams do not operate in a vacuum. On the contrary, we operate in a state of Brownian motion where various problems bump into us and we similarly bring our problems to others.

This all takes us back to the same leitmotif that is running throughout all of these posts: planning is an art
When I'm planning a sprint, one of my tasks is to predict the possible forces that might affect us and adjust contingency accordingly. It is not the easiest job, and most of the time I get it slightly wrong; moreover, there's no recipe - this can be done well only by knowing your organisation and its surroundings, and experience is a must.
Perhaps the best analogy that comes to mind is guitar soloing (you can guess my hobby right there). If you're soloing over a particular chord progression, you might just stick to the song's key, and be mostly ok. If you want it to sound original and fresh, you need to be familiar with the chord progression inside out, and switch your solo key dynamically as the chords underneath it change. It is hard, but that is the way to get the best results.

So, there were many words but a distinct lack of specific advice. I'll try compensating for that with three points:



  • Always have a contingency.
  • Avoid automatically using the same contingency over and over. Be aware of what is going on around you.
  • Record what you used your contingency for, and use it as a feedback and learning process.

Before closing off, let's just dwell a bit on that last bulletpoint. When any unexpected task comes, it is easy just to do it, chalk off it on the mental contingency blackboard and move along. However, there's no feedback in it; if I want to learn and improve and understand why there was too much or too little contingency, knowing and reporting on what it was used for is the only way.
Of course, I'm not advocating recording tasks such as: Talked with Tech Support engineer X about customer Y - 5'43''. However, registering tasks that took at least a few hours helps both tuning planning in future sprints, and showing team's unplanned work to upper management.

In the next post, I stick my head out and share my experience on the eternal debate of story points versus real time.

Sunday, 18 January 2015

Scrum - Part III: Estimation

This is a continuation of series on Agile/Scrum, where I'm trying to dissect some of the standard ceremonies, and go over their trade-offs on typical teams and tasks.

This time I'd like to talk a bit about task estimation. I'll leave aside the units for now, be they story points or plain time estimates, as this merits a discussion of its own (my examples will use story points though). I'll also park factors such as long-term planning, as they also justify their own post.

The question we want to answer here is simply this: if I'm at sprint X, how can I understand well the effort required for candidate tasks in sprint X+1?

Scrum provides a number of ways; the most popular tends to be planning poker. These tend to be a part of backlog grooming, and involve participation from the entire team. As always, these are described better elsewhere, but the upshot is that the entire team estimates each task (usually independently), and any major differences are discussed and reconciled.


In principle, this is a great approach, as it helps removing personal bias and gives more people chance to comment and uncover potential pitfalls and challenges.

However, when applying it to diverse teams, I've been hitting a variety of challenges:


  • Subject matter experts. If we're looking at a specialised task (say, "design an SNMP trap processor for an IPS device"), then can safely assume that not just everybody can understand the effort involved. If we have one guy in the team who has done SNMP trap processors in the past, then his estimate would carry more value than the rest of the team combined.
  • Wide range of expertise. Even assuming we deal with a fairly common task (e.g., "add auto-tests to increase module A's coverage to 95%"), we still need to account for different expertise levels. A senior developer might have written hundreds of auto-tests in the past and probably has worked on this specific module. A junior developer might not.
  • Different cost per individual. By the same token, even if both the senior and the junior developers understand well what it would take to implement the task, they still might have different figures and both be right. In my auto-tests/code coverage example, the senior developer can estimate the effort at X, the junior at X * 2, and both will be perfectly correct. After all, individual developer productivity can vary up to a factor of 10, and there are reasons why people get paid differently at different career levels and capabilities.
    So, there is no such thing as an absolute task estimate, same way as there is no absolute weight for an object. Weight depends on the gravity force, and a task estimate depends on who is going to work on it.
  • Module teams. When a team consists of people working on different projects over the course of multiple sprints, we can hardly expect each and every one to have strong knowledge over all the projects involved. The estimate from someone in the midst of a given project tends to be more valuable.

Let's take my multimedia player example from the previous post. We eventually arrived to this backlog in the order named:


  • Prepare high-level effort estimate for porting to the MacOS platform.
  • Address 3 visual defects.
  • Integrate Chinese UI localisation provided by an external agency - start development and QA planning
  • Support two more codecs. - start development and QA planning
  • Enhance skin support for more customisation.
We still have the same team  of 6 guys on a project: two UI engineers, two QA and two back-end/middleware developers.

Now let's do a bit of planning poker simulation. First, everyone looks at this task:


  • Prepare high-level effort estimate for porting to the MacOS platform.
One person in the team did porting to Mac, so he puts an estimate of 13 story points. Everyone else raises a '?', QA abstains.

Now, we move one to: 

  • Address 3 visual defects.
The UI guys and QA show 2 story points, the middleware developers abstain.
  • Integrate Chinese UI localisation provided by an external agency - start development and QA planning
Here we might get any spread from UI developers depending on whether they had to do Far East l10n, and same with QA. We might get the same number if and only if everyone has a similar level of experience.

We won't go through the rest of them, but this should give the general idea.


So, how would I go about planning something like this? 

The key goal is: give a chance for everyone who can contribute to the estimate to do so

This is not necessarily the entire team! If it's almost that, then that's fine - an "official" practice would work. However, even in the most "Scrum"-friendly scenario, as above, this is rarely so.

Hence, in that specific case, I'd probably ask of the middleware developers to estimate their tasks, UI developers to do the same, similarly for QA, get both UI and QA to plan defect fixes, and then review/discuss each in their own small forum. 
As we have different levels of experience, already decide at that point who is likely to work on what and which estimate we take.
Finally, present the overall estimates and plan to the entire team, but not use it to do the initial planning, just as a sanity check in case we omitted to consult with someone.

With of all of that, we still maintain the main principle: the people who are going to work on the tasks, do their own estimates. On the other hand, we also have efficiency: we minimise overhead from the estimation process and gather it from the people who are best placed to contribute.


Lastly, you might be viewing my example with a healthy dose of skepticism. Five user stories across six people, including big unknowns (e.g. port to Mac OS)? It does not matter what tactics you use, the sprint will go sideways!

All true, and I'd like to tackle this later on by talking about long-term planning. The purpose so far was to keep examples comprehensible; if I were taking real-life small tasks, we'd spend too much time explaining their technical background.

Saturday, 10 January 2015

Scrum - Part II: Sprint planning

Having talked in my first post on daily (or not-so-daily) stand-ups, I'd like to share some of my experience around sprint planning.

Let's start with the classic approach, that I'll quickly glide over, as it's far better described elsewhere.


A Product Owner comes with pre-sorted backlog of items that have been previously estimated by the team, and together with the ScrumMaster and team members agrees on which of those can enter the next sprint.

Sounds easy enough, so it's time to throw a couple of spanners in the works!

Let's say we have a team of 6 guys on a project: two UI engineers, two QA and two back-end/middleware developers. We are implementing a media player, and the project is nearing its Alpha stage.

The product owner has the following list of items in the order given:

a. Integrate Chinese UI localisation provided by an external agency.

b. Enhance skin support for more customisation.
c. Address 3 visual defects.
d. Support two more codecs.
e. Prepare high-level effort estimate for porting to the MacOS platform.

UI engineers can do only (a) and (b) in the sprint, so where do we end up? Do we let the middleware guys work on (d) even though it's less important than (c)?

What if we have another dependent platform team that needs input from us on MacOS? For sure, it's not the top priority, but if we do not inform them in advance on what we require, we might not be able to start working on (e) for a while.
Lastly, even if UI engineers can work on the first two items, they are unlikely to complete either them before end of the sprint, which might leave QA with nothing to do. If we worked on (c), we could have the pipeline to QA going and keep everyone busy.

Of course, my example was very much Scrum-friendly: what if we have a module team of eleven people working on six projects, most already in flight, all with a complex network of dependencies?

Inevitably, the backlog order becomes just one of the many factors with deciding what enters a sprint, and the mythical line that defines sprint's contents becomes a bit similar to the Netherlands-Belgium border.

It's always easy to say what does not work, but it's harder to provide an alternative. Usually, the pattern I've been following was along those lines:



  1. Figure out priority order. (Yes, it's still there!)
  2. Balance tasks within the team
  3. Check dependencies to other teams
  4. Consider longer-term planning
  5. Ensure pipeline of developed features to QA, and of tested features to release
Coming back to the media player example, our priority order was:


  • Integrate Chinese UI localisation provided by an external agency.
  • Enhance skin support for more customisation.
  • Address 3 visual defects.
  • Support two more codecs.
  • Prepare high-level effort estimate for porting to the MacOS platform.
Balancing tasks within the team we end up with (i.e. give the middleware guys something to do):
  • Integrate Chinese UI localisation provided by an external agency.
  • Support two more codecs.
  • Enhance skin support for more customisation.
  • Address 3 visual defects.
  • Prepare high-level effort estimate for porting to the MacOS platform.
Counting dependencies and longer-term plan (i.e. figure out dependencies with the platform team and get a better understanding of MacOS for following sprints) we get:
  • Prepare high-level effort estimate for porting to the MacOS platform.
  • Integrate Chinese UI localisation provided by an external agency.
  • Support two more codecs.
  • Enhance skin support for more customisation.
  • Address 3 visual defects.
 Finally, we need to give QA something to do. They could do planning for some of the bigger tasks, but it seems worthwhile to get the defects out of our way as well and provide at least some customer value within the sprint. So this is the final result:
  • Prepare high-level effort estimate for porting to the MacOS platform.
  • Address 3 visual defects.
  • Integrate Chinese UI localisation provided by an external agency - start development and QA planning
  • Support two more codecs. - start development and QA planning
  • Enhance skin support for more customisation.

In short, planning is an art, and simply using a descending priority order is a gross over-simplification. In future posts, I'll attempt going over various other factors that affect planning, such as expected load, planning spikes, and pinning tasks to future sprints.

Thursday, 8 January 2015

Scrum - Part I: Daily stand-ups


I'd like to open these series of posts and the blog by sharing my thoughts around Agile, and specifically, Scrum.

Over the past 7-odd years, I've been through a number of workshops in all roles imaginable: Product Owner, ScrumMaster, individual contributor, and manager - sometimes representing a schizophrenic amalgamation of all four in a single session.
My main issue with each and every one of those sessions was that said coaches always presented a set menu: i.e. you shall have to do steps A, B, C, D up to Z, and success will ensue.
Now, each of the individual steps made a lot of sense, but life tends to present a never-ending series of trade-offs: i.e. you can do step A, but then you need to be wary of specific conditions where it might not work great, and this is exactly what I felt was the gap.

As software developers, we are expected to wield a large set of tools, and then apply a specific tool here, and another tool there - this is what we are being paid for. If a software engineer went to a talk on Monday and learned what a State design pattern is, and then started using it from Tuesday onwards on every "if" statement, he/she won't be providing a whole lot of value (this, in fact, precisely what happened to an ex-colleague of mine).
To take an even closer parallel, an experienced project manager would apply a different set of tactics depending on what project is being managed.
Why not do the same for running teams and sprint iterations?

So, what I'll attempt to do is dissect a few Scrum practices and look at what situations they worked well for my teams, and which they did not.

Let's start with the holy of the holy: daily stand-ups. I'm sure that most Agile adepts will be shaking their head with indignity: surely daily stand-ups are a must for each Scrum team.
Well, I've been a ScrumMaster for a group of 12 engineers with varying expertise levels: the team owned a set of modules (rather than projects), and they worked on different tasks. If we take the classic guideline of no more than fifteen minutes per stand-up, we end up with 75 seconds per person. 
I'd challenge anyone (even a great speaker) to condense an unfamiliar subject, progress, plan and showstoppers included, into 75 seconds. The team consisted of great guys, but even their best admirer would not call each and everyone one a brilliant speaker. Hence, these stand-ups devolved into either interrupting the chattier folk, or 1-to-1 conversations between the two guys who did share a project, while many people were trying to stare up a hole in the ceiling, get rid of their slot and move on with their lives.

Now, I'm sure that if this were a team of 4 people of the same expertise level, working on the same project, then it would have worked out great. The problem happens to be that it wasn't.

Of course, the solution suggested itself: re-define the notion of team, and have only stand-ups between the people who are collaborating daily. Considering that in this case, this would have been not more than 2-3 people at once, we devolved to the "speak to each other" practice.

One can raise a valid risk of going the silo-ed way, and ending up with people becoming unaware of what is going on around them. So far, the solution we gravitated to was having a weekly sit-down (time-boxed, half an hour long) involving the entire team. 
Would I recommend this for everyone? Of course not. 
Did it work? In my opinion, yes, for these specific teams. At least, the team members in both cases were greatly in favour of moving away from the all-inclusive stand-up.

With all of that, here's my set of pros and cons for the daily stand-up.

Pros: Works well for project teams, especially efficient when engineers have similar expertise. 

Cons: Not suitable for large module teams, especially if they work on disparate areas. Becomes a burden on teams larger than 6.