Saturday, 28 March 2015

Substring matching: battle royale

At the finish of the recent optimisation series  I've concluded (amongst other things) that C++ regex matching was faster than the Python one, but that conclusion did not sit well with me. It was unclear whether it was down to inherent differences between languages or algorithms.
Back then, the post said that I'll explore it further, and no time is better than now. 

To understand things a bit better, I went all the way back - before regexes made their appearance. If you recall, the naive approach simply took a set of patterns and searched those one by one in the text.
The reason I made all this journey back is that apparently such an algorithm (or lack thereof) would serve as the perfect boxing ring to compare two languages; without different algorithms playing the roles of illegal substances.

You probably noticed the words apparently, and would. Yes, I was proven wrong; these "illegal substances" played an even larger role than with regexes. However, sometimes it is better to follow a bumpy road than drive down a straight motorway; you find out a few more things about the car you're driving.

Let's shake the dust off the Python snippet from one month ago, and convert it from URLs to files (asynchronous optimisation is not a goal for us today).
import sys

def getMatchingPatterns(patterns, text):
   return filter(text.__contains__, patterns)

def serialScan(filenames, patterns):
   return zip(filenames, [getMatchingPatterns(patterns, open(filename).read()) for filename in filenames])

if __name__ == "__main__":
   with open(sys.argv[1]) as filenamesListFile:
      filenames = filenamesListFile.read().split()
   with open(sys.argv[2]) as patternsFile:
      patterns = patternsFile.read().split()

   resultTuple = serialScan(filenames, patterns)
   for filename, patterns in resultTuple:
      print ': '.join([filename, ','.join(patterns)])

Now it is time to take a deep breath, since its C++ sibling is not as compact:

#include <iostream>
#include <iterator>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <algorithm>

using namespace std;
using MatchResult = unordered_map<string, vector<string>>;
static const size_t PATTERN_RESERVE_DEFAULT_SIZE = 5000;

MatchResult serialMatch(const vector<string> &filenames, 
                        const vector<string> &patterns)
   {
   MatchResult res;
   for (auto &filename : filenames)
      {
      ifstream file(filename);
      const string fileContents((istreambuf_iterator<char>(file)),
                                 istreambuf_iterator<char>());
      vector<string> matches;
      std::copy_if(patterns.cbegin(), patterns.cend(), 
                    back_inserter(matches),
            [&fileContents] (const string &pattern) 
            { return fileContents.find(pattern) != string::npos; } );
      res.insert(make_pair(filename, std::move(matches)));
      }
   return res;
   }

int main(int argc, char **argv)
   {
   vector<string> filenames;
   ifstream filenamesListFile(argv[1]);
   std::copy(istream_iterator<string>(filenamesListFile), 
             istream_iterator<string>(),
             back_inserter(filenames));

   vector<string> patterns;
   patterns.reserve(PATTERN_RESERVE_DEFAULT_SIZE);
   ifstream patternsFile(argv[2]);
   std::copy(istream_iterator<string>(patternsFile), 
             istream_iterator<string>(),
             back_inserter(patterns));

   auto matchResult = serialMatch(filenames, patterns);
    
   for (const auto &matchItem : matchResult)
       {
       cout << matchItem.first << ": ";
       for (const auto &matchString : matchItem.second)
            cout << matchString << ",";
       cout << endl;
       }
   }

Well, I did warn that C++11 is going to come back!
Just in case something looks unfamiliar:
  • Line 10 uses an alias declaration. In this case, I could have also used the old trusty typedef, but using looks a bit more elegant.
  • Line 23 goes for a copy_if with a capturing lambda function. For now it uses the straightforward std::string::find function to identify substrings. (Not for very long though)
  • We move rather than copy results into the eventual output in line 27.
  • Hash map also pays a visit (look for unordered_map). To be fair, it existed long before 2011; it's just the syntax that is new.
There are a few bits and pieces sprinkled about, e.g. auto, inserter iterators, but they are beside the point; there are far better resources available if you wish to know more. 
Regardless, it can't escape one's attention that not all languages are born equal. The C++ brethren is three times longer - despite aggressive usage of stream shortcuts and lambda functions.

However, it surely must be faster? When embarking on this road, I hoped that the answer will be a cautious 'yes'.

Not so:
roman@localhost:~/blog/Python-C++ comparison$ time C++/stlMatch filenamesList.txt twoThousandWords.txt
yahoo.html:


real    0m5.084s
user    0m4.716s
sys     0m0.060s


roman@localhost:~/blog/Python-C++ comparison$ time python Python/serialMatch.py filenamesList.txt twoThousandWords.txt

real    0m4.051s
user    0m3.260s
sys     0m0.040s



The first round of this battle royale is won fair and square by Python. The one million dollar question is why.

For one, Python more often than not goes for optimised C implementation in its core libraries, and it's undoubtedly the case here. It is also quite possible that handcrafted C is more efficient than the C++ I came up with - even if the latter is encouraged with -O3 and move semantics.

But 20% difference? That can't be explained by any of the above, hence I went to StackOverflow and looked for experts' wisdom.

Said wisdom was provided within a couple of hours (aren't specialist sites great!). It turns out that Python 2.5 and later uses a bespoke string matching algorithm based on Boyer-Moore-Horspool algorithm.
On the other hand, libstdc++ uses a simple comparison as it is templatised for arbitrary character types, and tries being as generic as possible. It could specialize the template for 8-bit characters, but it doesn't.

Together with the excellent answer, I also received a bit of advice; try out more efficient string matching algorithms included in Boost. These are the same Boyer-Moore family of algorithms that Python employs by default - with C++, they have to be used explicitly.

Good advice ought to be followed, and I've amended my C++ example slightly by changing the lambda function in the copy_if to the following variant (just add the word horspool for the third algorithm, and don't forget the includes).
[&fileContents] (const string &pattern) 
               { return boost::algorithm::boyer_moore_search(
                              fileContents.begin(), 
                              fileContents.end(), 
                              pattern.begin(), pattern.end()) 
                          != fileContents.end(); } );

For baseline/worst case I've also used strstr. Time for the results!
Searched string set size strstr std::string::find boyer_moore_search boyer_moore_horspool_search Python/string
20 0.149s 0.087s 0.082s 0.069s 0.085s
200 1.004s 0.537s 0.402s 0.277s 0.380s
2000 9.502s 4.975s 3.537s 2.305s 3.420s
20000 133.449s 57.084s 34.896s 22.369s 33.969s

As usual, a few observations:

a) When we deal with small set sizes C++ has a slight edge. With a handful of expressions to look for, algorithms are not as dominating, and different micro-optimisations matter.

b) Don't use strstr!

c) If string matching is in your critical path, then use one of the Boost algorithms. I haven't measured memory usage of the horspool algorithm (it trades off memory use for speed), but it was unnoticeable with the 175KB patterns, and 300 KB text sets.

This is all very educating, but where does it leave us? My original idea of using string matching as the arena on which we compare languages and nothing but the languages was deeply flawed. Ninety percent, if not more, of the comparison was down to algorithms, and language specifics had very little role to play.

However, as with many other roads, this one was far more interesting than the destination. We got some stats on string matching, found good optimisation opportunities in C++, so the time was still worth it.

But, before closing the post on that philosophical note, you might ask - what about regexes? How does the insight into substring search performance help us there?

It does not help a lot, or, you might say, at all. Languages are not inherently fast or slow (not unless their name starts with an "R", ends with "y" and has a couple more letters in the middle). Their comparative performance is down to what functions you use, which algorithms they hide, and how optimised these algorithms are. 
This implies that the regex performance delta was down to algorithms, not C++ being faster as a rule.

Of course, whenever the critical path is entirely within your own business logic, generic comparison becomes valid. Could be a good topic for yet another performance post...

Saturday, 21 March 2015

Scrum - Part IX: Task and release pinning

Time to go back to Agile/Scrum. The last post talked about long-term planning of projects that cross several teams and sprints, however there are even smaller facets to Scrum scheduling.

Often, we get into a situation where people - customers, other development groups, execs (yes, despite what you might think, they are all people!) - do not need a particular feature next sprint, or tomorrow. They just need a commitment that a feature will be there by next February.

When we mix and match that into Agile backlog, a dilemma appears. These features need not be at the top. However, if we stick them into the middle of the list, they will permanently remain there; higher priority stuff will jump above. We need a way to ensure the tasks get done at a certain point in the future, and classic sprint planning won't cut it all the way.

Wait, why do we need to commit for next February?

Thanks for interrupting - indeed, I'm racing too fast. Let's give a few examples:

Renewal. Customer's renewal is coming up in a few months. They do not need a specific feature right now, but they are adamant on having it by the time of the renewal: to see commitment from us, and/or facilitate their internal migration.

Cross-team dependency. Recall my example from the last post. We had several teams, and one of them, system QA, was not involved in the project a couple of sprints. However, when Sprint 44 comes, it's critical that they'll have time to work on our feature and understand well what needs to be done.

Marketing. Analyst review is scheduled for date X, and we need to run a set of activities, such as getting demo environment in place, tidying up specific issues from last review etc.

There's more: upcoming hardware upgrade from vendor which will require performance testing, exec reviews, release regression testing etc.
These cases might or might not be frequent depending on what your team does, and whether the product is well established, but hopefully by now we agree that they exist.

Ok, how do we deal with future commits?

One approach is pinning a task to a future sprint. Taking the system QA team as a case in point; even though they are currently in Sprint 41, nothing prevents pre-populating a few tasks in Sprint 44, such as end-to-end testing of telemetry.

Of course, reality may always change. In fact, if we go back to the previous post, it did change - the feature got delayed for valid reasons, and it couldn't have been tested before Sprint 45. This is fine; having the telemetry test in there was just a declaration of intent and reminder-slash-placeholder - not a commitment.

In the same spirit, our QA group might become overloaded in Sprint 44, and even despite the dependency will have to push back the telemetry test. This is also fine; at least having the placeholder there will remind us that this choice will affect other groups who need to be informed.

By the way, QA is just an example; I realise that many Scrum teams incorporate the QA function. If this specific case bothers you, feel free to swap it with front-end/analytics/back-end/DB/UX/"anything-else-that-comes-to-mind" team.


But surely we cannot pre-plan sprints in next February?

Yep, this technique works ok for dependencies in projects round the corner, but does not cover the renewal, roadmap or marketing examples. Nobody will relish seeing thirty future sprints in their backlog; just the thought of managing and staring that is a migraine trigger.

A more benign technique is tagging future tasks. Let's say that a marquee customer is switching to Mac OS X next March. We do not have a strategic requirement to support this OS, but we absolutely have to add it for them, otherwise there will be disaster and chaos.

We could put this at the top of our backlog, but we won't cover ourselves in glory by supporting the OS today. It will just gather dust until next March - we could have spent our cycles for something that people need here and now.
So, tagging OS X support with something like "February 2016" will serve as a reminder. (Or we could tag with whatever version id we plan to release in Feb-16, e.g. Ver 6.0)

None of this is revolutionary by any stretch of imagination; just a few small techniques to make sprints more manageable.

Origins of species sprints


Perhaps the only remaining point is a gradual evolution of tasks. This is also called more formally backlog grooming; the idea is that as we get closer to starting on a user story, we zoom in on it, and derive better estimates.

The reason I brought it up here is that with tasks pre-planned in the future, we know a bit better what should be groomed/evolved/designed (circle out your favourite verb).

In many cases, this evolution is gradual and very low-key. Let's take our system QA task as an example; it is currently pinned to Sprint 44, and we are at Sprint 41.
Current Sprint Target Sprint Estimate What we know
41 44 One week We need to verify and potentially automate a few global metrics
42 44 Six working days: three days - manual, three days - automation There are two global reports, both are automatable, and we can use some of our existing UI scraping code
43 45 Five working days: manual testing of per-client and per-codec reports(half day each), augment UI scraping code (one day), write automation test cases (two days), prepare test exit report and retest bug fixes (one day) Full list of use cases, low-level design of automation code, and test plan draft.

In case you're wondering why there are days instead of story points, have a look at this post.
Also, the usage of one week versus five days is completely intentional. Coarse units reflect poor understanding of a task; 160 working hours and one person-month mean the same thing in theory, but in practice people read them quite differently.
Coming back to the table - in each sprint the understanding of testing to be done was gradually evolved. In Sprint 41 we had a placeholder that assisted future capacity planning. By the time Sprint 43 rolled in, we knew fairly well what we have on our plate. (If you're asking who provided the numbers, have a look here).

Just to make things clear: there are more focussed and faster planning techniques - we do not always have the luxury of having tasks set up three sprints in advance. One of them is a planning spike, and I'll cover it in a future post.


Sunday, 15 March 2015

Python optimisation - Part IV: Another look at concurrency

The last optimisation post has left a few threads hanging. 
Yes, there was parallelism, but the performance gain was, shall we say, underwhelming. Back then I've alluded to the I/O element of the task, but had not showed what happens if we get rid of it.
Also, I moved the test bed to a different machine, but never said what actually happens if we remained on the single-core one.

So, while this post might be a bit light on code and heavy on numbers, it will add more background on parallelism.


What happens when multiple threads run on a single core?

Let's take the multi-threaded example from last time and run it on a single core machine.

time python pythonWithoutGIL.py urls.txt words.txt

real    0m9.721s
user    0m8.004s
sys     0m0.216s


And now let's remove the magic GIL-release line from the C++ code, recompile and run again:

time python pythonWithGIL.py urls.txt words.txt

real    0m9.158s
user    0m7.268s
sys     0m0.208s


The locking solution is actually faster by about 10%! When you think about it, it is not surprising: the one core can execute only a single thread at a time anyway, while having multiple active threads increases the number of context switches. The OS actively spends CPU cycles on preempting threads, while with the GIL locked it would not have had that option.

The possibility of having too much concurrency occasionally takes people by surprise; even fairly experienced developers sometimes hardcode thread pool size rather than dynamically derive concurrency based on number of vcores. 

What performance gain do we get from parallelism when applied to pure processing tasks?

Let's take the networking element out. I'll slightly amend our long running example by fetching scanned content from the local filesystem. Here's an updated Python wrapper:
from twisted.internet import reactor, defer, threads
import sys, re
import contentMatchPattern

def stopReactor(ignore_res):
   reactor.stop()

def printResults(filename, matchingPatterns):
   print ': '.join([filename, ','.join(matchingPatterns)])

def scanFile(filename, pattern_regex):
   pageContent = open(filename).read()
   matchingPatterns = contentMatchPattern.matchPatterns(pageContent, pattern_regex)
   printResults(filename, matchingPatterns)

def parallelScan(filenames, patterns):
   patterns.sort(key = lambda x: len(x), reverse = True)
   pattern_regex = '(' + '|'.join(patterns) + ')'

   deferreds = []
   for filename in filenames:
      d = threads.deferToThread(scanFile, filename, pattern_regex)
      deferreds.append(d)

   defer.DeferredList(deferreds).addCallback(stopReactor)

if __name__ == "__main__":
   with open(sys.argv[1]) as filenamesListFile:
      filenames = filenamesListFile.read().split()
   with open(sys.argv[2]) as patternsFile:
      patterns = patternsFile.read().split()

   parallelScan(filenames, patterns)
   reactor.run()
Very similar to what we had before - the only difference is that we take filenames rather than URLs as an input. The content is exactly the same: I've pre-downloaded the Alexa top sites. Actually, everything else is exactly the same too; the C++ module was consuming content as a string, so it could not care less whether it arrived from filesystem or network.
Now, we will do a few runs on the same 6-core system from the previous post while increasing the set of words we search for (which further emphasizes the CPU-bound part of this exercise).

There will be three programs: Python-only, C++ extension with GIL removed, C++ extension with GIL left in place.
For the Python-only program, we will recompile the regexes for each of the input files: same way as the C++ extension does. This is to level the playing field, and make sure that the skew from long regex compilation (especially with longer sets of search words) does not affect us.

I've already shown the Python wrapper for C++ extension above, and the Python-only code is below:
from twisted.internet import reactor, defer, threads
import sys, re, copy

def stopReactor(ignore_res):
   reactor.stop()

def printResults(filename, matchingPatterns):
   print ': '.join([filename, ','.join(matchingPatterns)])

def scanFile(filename, patterns):
   pageContent = open(filename).read()
   matchingPatterns = set()
   patterns.sort(key = lambda x: len(x), reverse = True)
   pattern_regex = re.compile('|'.join(patterns))
   
   for matchObj in pattern_regex.finditer(pageContent):
      matchingPatterns.add(matchObj.group(0))

   printResults(filename, matchingPatterns)

def parallelScan(filenames, patterns):
   deferreds = []
   for filename in filenames:
      d = threads.deferToThread(scanFile, filename, copy.deepcopy(patterns))
      deferreds.append(d)

   defer.DeferredList(deferreds).addCallback(stopReactor)

if __name__ == "__main__":
   with open(sys.argv[1]) as filenamesListFile:
      filenames = filenamesListFile.read().split()
   with open(sys.argv[2]) as patternsFile:
      patterns = patternsFile.read().split()

   parallelScan(filenames, patterns)
   reactor.run()
As promised, we are recompiling the regexes for each file.
So, without further ado, here are the results:
Pattern set size Python Only Python with C++ (GIL locked) Python with C++ (GIL released)
500 1.44s 1.19s 1.15s
4000 4.97s 2.69s 1.63s
8000 8.82s 4.99s 2.36s
16000 16.65s 10.23s 4.48s


This puts things nicely into perspective:

Observation #1: As soon as we hit sizable sets of patterns, the execution time starts growing linearly; it does not matter if use C++ or Python for our heavy duty matching.
This is as expected: our algorithm is driven by the complexity of the regex. In turn, the regex grows linearly with the pattern set size.

Observation #2: Releasing the GIL on multi-core is paramount. Yes, we arrived to the same conclusion the last time, but back then it was 14%. Not any longer - even comparing apples to apples (or C++ to C++), we get a speed-up of 2.
Actually, here you might ask - "why not 6?". Indeed, why not - aren't there six cores? Well, we use inputs of various sizes, and the longest ones form the critical path. If I'd run all the tests on exactly the same input, we would have had an even higher speed-up factor.

Observation #3: C++ provides value irrespective of parallelism. Yes, you might say that we hobbled Python by recompiling the regex each time. But then, we could say that it is C++ that was hampered; with some effort we could have cached the compiled regex in static memory.
Going back to cold, hard data - we get 70% speedup from C++ even before the GIL is gone.
Before the flames start shooting in my direction: I'm not saying that Python is always slower than C++. Understanding how performance compares between the two in various cases is important, and the answer can never be said in one syllable or paragraph. Definitely worth exploring further.

Time to wrap things up

It was a long road from basic sentence counting to parallel, asynchronous performance tuning. We found Python's GIL, navigated around it, and looked at cases when parallel execution does more harm than good.
There's always more to say about Python/C++ performance comparison, so the wider topic is very likely to remain on this blog for a while.

Saturday, 7 March 2015

Scrum - Part VIII : Long-term planning

 

Introduction


This is the eight part of the Agile blog saga. This time I'd like to turn to long-term planning and talk about how it can be reconciled with constantly evolving Agile priorities.

For the sake of this post, let's assume we're all project managers; if not by title, then by responsibility.

We have a requirement to deliver (and I'll dig out examples shortly), crossing several teams, with many technical dependencies.
As expected, all teams run Scrum, and duly perform all the ceremonies. However, if we look at the list of these ceremonies (to refresh your memory: it's backlog grooming, demos, sprint planning, retrospective and daily stand-ups), none help that much with our task.

The challenge here is that Scrum, as a discipline, tends to focus on iterations. There is a distance, however, from iterations and individual stories to something that customers, sales, marketing and execs will care about.

You might say - but these are user stories, and they have acceptance criteria! True, but they also have to fit into sprint. Thus, they need to be quite small. Thus, they have to be fairly trivial to be customer-facing. Let's throw a few examples of mini-projects that would not fit into a single sprint/team:

  • Add telemetry on customer usage of a widget
  • Implement two-factor authentication for a web service
  • Allow resellers to co-brand the product's UI
  • Add periodic back-ups of blogs to a cloud storage (this one is personal wishful thinking for the platform I'm using to type this blog)
None of these are huge, many person-year, projects. Nevertheless, I'd be surprised if any would fit to a single team and sprint. They cross front-end, and back-end, they do have a few weeks or more of development and QA, so at least some coordination and framework on top of Agile ceremonies is required.

Specifically, with our project manager hat on, we need to answer at least these basic questions:
  1. How do I draw the line between sprint items to customer deliverables?
  2. How do I track dependencies across teams' backlogs and ensure they stay in sync?
  3. Is the project on track? What is the ETA?
  4. What is the impact on commitments from changes in Agile backlog priority order?
The last question is especially important, since we cannot go back to drafting Gantt charts and doing waterfall-like plans. The world has changed, and it moves underneath our feet, so we need to anticipate and figure out quickly priority shifts in all teams involved. 

Of course, I don't delude myself in having the ultimate answer. Moreover, there is probably no such answer to start with - specific organisations and teams would find different techniques effective.
The rest of this write-up would go over techniques that worked for our teams with no guarantees or illusions of grandeur (which, actually, can be said about the entire series).

Epics

Let's start with question 1. To save you scrolling up and down, here it is copy/pasted:
How do I draw the line between sprint items to customer deliverables?

Well, I gave away the answer in the heading, but it's longer than just the word epic. There's any number of long and short definition of those. I'd go for:

Epic is a project that crosses several backlogs and/or several sprints.

To make things less theoretical, let's pick up the first example above, i.e. adding telemetry.
At a very high level, we want to track usage of our online multimedia converter; e.g. which countries/IPs they came from, what formats they converted to, and what browsers they've used.
When we break it down to teams and user stories, we get something like this:

  • Middleware team - Record IPs (1), client-side headers (2) and formats (3) requested via a new API.
  • Back-end team - Create a REST-ful service for adding new telemetry records (4). Create a new table for storing telemetry data (5). Add another API for summarising and fetching. (6)
  • Front-end team - Provide a new internal page which fetches telemetry data from back-end (7) and displays it in a format that can be sorted (8) and exported (9). 

There are other (and better) breakdowns, but our main topic here is epics and not architecting internal reports, so I'll cut a few corners.
Each of the number in parentheses maps to a single user story, and each can have well defined acceptance criteria.

In fact, there may be well more than nine user stories, for example "Define a REST-ful for posting telemetry records", a precursor to (4)".


Combining all these to a single epic for Telemetry at the very least enables each team to see the big picture, and allows us (remember, we're all project managers) to quickly figure out the work involved.


Dependencies

Let's move on to the next question: How do I track dependencies across teams' backlogs and ensure they stay in sync?

This is one area where, to my opinion, "classical" project management techniques still outperform Scrum. It's easy to ridicule huge Gantt charts with an incestuous labyrinth of dependency arrows, but the ability to prod a dependency at the bottom and see how far the house of cards flies is useful.

JIRA and other Scrum management tools do provide relationships between tasks. However, these are more akin to documentation, and deriving critical paths and impact from delays is, for better or for worse, left to us, humans.


Technique I've been using on some projects is sprint pre-planning (yes, it's an oxymoron and against classic Scrum practices, but bear with me). 

Taking our sample project, we could tentatively define the following:

Sprint 42:
  • Back-end team: Create a REST-ful service for data provisioning (4)
  • Front-end team: Design internal page user interface and prototype using mock interfaces
 Sprint 43:
  • Middleware team: Record IPs (1) and client side-headers (2) using the REST-ful service
  • Back-end team: Create table for storing telemetry data (5), and create API for fetching statistics (6)
  • Front-end team: Develop sorting (8) and data export (9).
 Sprint 44:
  • Middleware team: Record requested data formats (3)
  • Front-end team: Integrate with retrieval API (7), and validate end-to-end functionality.
  • System QA and release. 

To avoid all this complexity, you might suggest taking a developer from each group and creating a virtual feature team. Definitely an option, but it comes with strings attached, and I'll cover it in a future post.

Of course we operate in an Agile environment, and any of these tasks can be pushed back by the respective Product Owner(s). Even for such a minor project, the likelihood of everything happening as desired is small.
However, having these allows the (virtual) project manager to re-plan and see what the new release/target sprint would be.


Tracking could be done in a similar way: talk to the teams and agree on an "ideal" plan, and then track and adjust progress.

For example, let's say the back-end team has been diverted for production issues in Sprint 43, and could only create the table (5), but not the API (6). 
This is a perfectly legitimate situation and reason, so we re-adjust (assuming sprint 42 went to plan):

 Sprint 43:
  • Middleware team: Record IPs (1) and client side-headers (2) using the REST-ful service
  • Back-end team: Create table for storing telemetry data (5), and create API for fetching statistics (6)
  • Front-end team: Develop sorting (8) and data export (9).
 Sprint 44:
  • Middleware team: Record requested data formats (3)
  • Back-end team: Create API for fetching statistics (6)
  • Front-end team: Integrate with retrieval API (7), and validate end-to-end functionality.
  • System QA and release. 

 Sprint 45:

  • Front-end team: Validate end-to-end functionality.
  • System QA and release. 

Yes, it is all painfully obvious, and yes, it is a poor-man's version of traditional project management. Nevertheless, I've seen many people rely entirely on backlogs and spontaneous coordination, with not too great results.

When planning an ideal plan, I'd always budget for eventualities such as the one we had above. It would depend on what happens around the teams, but it would take a lot of convincing to commit on Sprint 44 to important people.

What epics are not

Occasionally epics are used just as themes or work buckets. For example, the team might have epics called: Refactoring, widget A, operational support etc.
These are not projects: they can go on indefinitely, and there is no target. But, hey - who said that my definition of epic is the right one, and the bucket definition is the wrong one?
Some explanation is required, and I'll give my reasons on why this can be done better and elsewhere.

1) Such buckets are hard to track. Let's say I have a mini-project: "Enhance widget A to provide telemetry". 
The widget A epic does not have a target release, so I can't track it there. I also can't create a new epic, since a single user story can't be assigned to multiple epics.
The widget A epic leaves me without any attractive options.

2) These buckets are like the undead. There can be only more of them - they never die. The team's board becomes a mausoleum of epics and it becomes harder to see the projects/epics that do matter.

3) Figuring out how much of team's work falls in each area (a noble goal) can be done using other means such as labels.

Summary

Scrum does not prescribe how projects are coordinated and managed. Some form of framework is required, and there is a variety of those. I went over epics, their pre-planning, adjustments and tracking. We also went over what epics are not.


In the next post, I'd like to talk about another long-term planning technique: task pinning.