| Roman on Software Engineering

Thursday, 21 April 2016

Software engineer goals: when SMART is not smart

This time around I'd like to talk about creating goals for software engineers.
First things first, let's get the well-known body of work behind us. You probably know about SMART goals. If you do not, the link to Wikipedia is in the previous sentence. One way or another, by the time we went past this paragraph, we're all on the same page with regards to specific, measurable, achievable, relevant, and time-bound (or finite, if smarf were a good acronym) goals.

However, once we leave this familiar territory behind us, we get into the big lawless desert that separates the SMART ideal from real, flesh and blood, people. As managers, we intuitively know what we'd like to get from our team, and we know what SMART means. What we don't always know is how to relate one to another.
In this mini-article I'll try dispelling the notion that goals can be generated in an algorithmic way, by ticking specific boxes.

Let's kick things off with a few counter-examples:

SMART goals that aren't smart

Counter Example #1

Resolve at least 10 bugs in the next 3 months

This goal is specific, measurable, achievable, relevant, and time-bound. It still has one big problem that should banish it into neverland.

Here's the thing: bugs are not born equal. A typo is a bug, and a subtle kernel packet routing problem is a bug. Expertise and time required for dealing with these two issues are incomparable. By setting such a goal, we encourage the reviewee to seek out the easier bugs: after all, they have to hit a number. And, by extension, we nudge their career into technical comfort zone, and inhibit progress.

Counter Example #2

Raise 2 patents in the next year

This goal also ticks all the 5 boxes, though one can argue that the "A" (achievable) is a bit shaky. However, just to make our discussion simpler, let's assume that the reviewee is working on projects that have potential for patents. There's still a big chasm however, one which is happily supplied by the legal system. It takes years to fully approve a patent, and until those have elapsed, it's hard to tell whether the patent brought benefit to the business. Internally, there are few people who can serve as the quality gate before the patent goes out to the bureau. So, we might get 2 patents, but those may waste our internal resources, employee's time or both - it's hard to control and assess their value.

Counter Example #3

Find 15 defects

I've seen this one a number of times, and it is one of the worse sins to be stamped on a QA engineer's goals' spreadsheet. Firstly, it creates antagonism between developers and testers: if a developer does their job well, then there are fewer defects to find. Hence, maybe the tester now is less interested in having strong quality before getting their hands on the new feature. Also, as per the first counter-example, bugs are not born equal. Often we'll value a single non-trivial, high impact defect detection over 10 typo findings.

Thinking about side effects

In each of these three counter-examples, we had a particular outcome in mind, dressed it in SMART clothing, but the result was full of side-effects that subverted our original aim far and wide. The intents are hard to argue with: we want developers to fix defects at a reasonable pace, we want them to be innovative, and we want our testers to release quality software. The problem is not with them, but with how they were expressed. The obvious question now comes: how should they be expressed in a smart (lowercase) way?

Revised examples

Counter Example #1

Resolve at least 10 bugs in the next 3 months

What was the original goal here? To be blunt, we want the developer to be productive, and not sit on a single work item for weeks while his teammates are sweating through the backlog. However, as described above, we solved that problem by creating several others.
The sad truth is that the number of defects resolved do not codify productivity. What does codify productivity is the total value or complexity of tasks completed. If we operate in an Agile-like environment, we could express this via percentage of tasks completed versus committed (which can be more than 100%):

Complete at least 90% of committed tasks each sprint

Of course, the word sprint can be replaced with month, or any other recurring time unit. Note that the goal does not prevent people from under-committing. The good news is that we are neither lawyers nor sportsmen. We don't have to legislate everything, and we can prevent under-commitment when planning our work, and ensuring that everyone gets their fair share.

Counter Example #2

Raise 2 patents in the next year

Here, we were obviously after innovation. There's a senior engineer on the payroll, and we'd like him/her to generate sound ideas, and not just implement what they're told. However, we well know that innovation does not come via patents, and the value of those patents can be hardly measured come review time. Perhaps it's time to detach ourselves from numbers and go for something like this:

Proactively suggest design improvements and/or feature enhancements

Now this goal is not SMART, as it is not measurable. However, given the right description, it sets clear expectations, and it provides enough to talk about. We probably should say that suggestions accepted by the team, including product management, count more, and that we're going after quality rather than number. Ideas, among other concepts, are hard to measure and count, and it's more honest and productive to accept that rather than retrofit measurable goals for the sake of measurable goals.

Counter Example #3

Find 15 defects

The intent here is simple enough: we don't want to get defects out. However, setting a goal of finding X defects is akin to setting a goal of catching Y criminals. For sure, the bugs/criminals will be found, but they are likely to be the wrong ones, while the right ones will escape.
And, this is exactly what matters: how many ~~criminals~~ bugs escape, not how many are detected. Also, bugs are not born equal, so a fair goal would go along these lines:

Have no more than 3 critical, 5 high, 7 medium and 9 trivial defects escape under area of your responsibility

Summary

This mini-article had examples, counter-examples, numbers and absence of numbers, but little in a way of recipes and templates. That is fully intentional: you know your team better than I do, so I can't tell what goals you should set. This is exactly why I used the word crafting in the title, and not generating: people are unique, and so are their goals. SMART is not a rule or an axiom, but a guideline, and guidelines always have exceptions.

Saturday, 23 January 2016

Should managers code?

Intro

Let's start with a question: is there a point in the career ladder where we stop contributing individually, and only coordinate (or facilitate) others?

My view is that this point is like the proverbial asymptotic curve; it can get as close as we like to zero, but will never quite reach it. Or, coming back to plain terms - everyone should and can contribute; it's only a question of how much and what for. The line between a manager and an individual contributor is not the same as a geographical border, which draws a distinct line between "us" and "them". Rather, it is a long blur, and throughout that blur we should care about what we bring to the team. As careers traverse these shades and their accompanying titles, personal contributions follow along that curve getting close to a zero, but - the main point - there are still good reasons to code/test/review. They become indirect: a manager rolls up his/her sleeve not to add manpower, but for other reasons, which I'll go into very shortly.

Matching personal experience and ability to influence

Let's go back (or forward) in time to that moment where you (will have) got the desired promotion letter. Presumably, you reached that moment by doing your job well, and gaining the recognition of your peers and managers, and again, presumably, you have strong experience with the technology and ability to influence. (You might have met managers who answer none of the conditions in this sentence, but they are neither the topic nor the audience for this article). With all of that, it makes little sense to go and abandon all of that useful experience on the next day.
So, there's a dilemma: on one hand, you want to and can code, on another, you're no longer paid to do that. Or, is it actually true? Let's start with:

Leadership by example

Most respected, though not necessarily the longest-lived, generals lead their troops and take part in battles. IT managers benefit from a quieter lifestyle; apart from an occasional coffee overdose, or a minor heart attack due to a misdirected e-mail, they are usually safe from workplace hazards. However, there is one parallel: both soldiers and software engineers respect leaders who from time to time can stand shoulder to shoulder with them and demonstrate that they can pull their own weight.

To start with, it breaks some of that barrier between "us", the workers in the trenches, and "them", the generals who command from above. There's more of that important team feeling, especially if the manager helps personally during crunch time. On the other hand, personal experience makes it easier to understand software engineers' grievances and requests for refactoring, switching toolsets and so on.
In my experience, there were a number of instances where personally seeing and maintaining particularly bad legacy code gave me the right mindset and conviction to push upper management for the right investment in refactoring. There was also a case or two where using development toolset helped lobbying for easier processes, and resisting introduction of cumbersome workflows that always seem fine to a manager, but not to someone who has to endure them every day.

The good news is that it's enough to do a few tasks every month to get all the benefit from the above. People remember the general that led them to a single battle; they don't necessarily expect him to do it on each and every one. Solving 3-4 small bugs every month shouldn't take more than a couple of work days; i.e. not more than ten percent of office time. This small investment can buy respect of your reports, as well as the right mindset and awareness to help them with their day-to-day job. However, it can also help you personally:

Experience and career continuity

Let's become egoistical for a moment, and think of our livelihood, i.e. ensuring that paycheck continues arriving month after month. Those that have been in the industry for long enough, know that nothing in this world is forever. Companies' fortunes turn around, bubbles pop, and financial crises appear every decade or so. Of course, the more indispensable you are, the less likely you are to be affected by the cataclysms and storms that rage outside, but at the end of the day the mantra is the same: nothing is forever, and nobody is untouchable. We must always think of our skillset and consider how we'll fare in the greater beyond should the worst happen.

Now, here's the thing: managers have it tough out there. The ratio of individual contributors to managers vacancies is 1-to-10 (or thereabouts), and software developers/testers simply have more openings to apply for. Moreover, a large proportion of employers look nowadays for managers with strong technical chops. In other words, a strategy of shying away from hands-on work is a dangerous one. In today's climate, it can leave one unemployed for a while should a disaster strike.
Of course, "hands-on" and coding is not one of the same. There are great leaders out there who never code, but yet understand and speak technology every day. However, if you're in middle management, you can count on being asked coding questions during interviews (thanks Google), and those 3-4 defects a month can make a big difference in keeping your toolset sharpened.
Just in case you haven't been fully convinced, let me throw another argument:

Fair reviews and career development

Providing team members with fair reviews and gently nudging their careers in the right direction is one of main responsibilities that we, managers, have. What are developers doing most of their time? Right, they code, design and review software. So, when the time comes, the bulk of their assessment should be based on how well they've done just that.

Unfortunately, "should" is often a word removed from reality; usually we review best only what we observe, and managers that distance themselves from coding can't observe. Instead, they review using peer feedback coupled with their personal interactions with the team, which tend to be e-mails, meeting participation, and general visibility. These are all fine, but remind slightly of an art critic that forms her opinion by not seeing the art in question, but reading second-hand opinions.
Hence, those personal coding tasks, the occasional code review, if chosen judiciously, can play an important role in observing and providing fair performance viewpoint. And, before I depart onward, let's not forget about the art of giving praise. It's one thing to praise using generic phrases: "hey, you've delivered feature X one week ahead of time, attaboy", and another to do so while being very specific: "you've delivered feature X week ahead of time by reusing module Y, which didn't occur to any of us, and then you had another nice shortcut with introducing an adaptor layer on top of existing interface framework". This difference can mean a lot to a developer who genuinely and rightfully takes pride in their work.

Avoiding time-critical tasks

It's not all plain sailing though. Though it's not too common, managers can err on the other side, and do too much hands-on work. The one rule to remember is: your role is to help others perform. Coding is often easier, and more gratifying than answering a long e-mail, preparing a budget, or crafting a personal review. The unfortunate reality is that these "dull" activities always must take priority; they are what we signed up for when the promotion came through. An addendum to that unfortunate reality is unpredictability; we can never tell when a critical e-mail will drop in the mailbox, or when we have a crisis to resolve.
Continuing the same train of thought: coding and context switches are not friends. Everyone in the team gets interrupted, but a developer usually has the mandate to park questions for an hour or two while they are putting finishing touches on a task. A manager cannot play this card; his/her official job description is to facilitate others, not to code. Hence, when a manager looks for tasks to take over for all the aforementioned reasons, these should not be time-critical. Ideally, if they are put on ice for a day or two, nobody will raise an alarm, and team's performance won't get affected.

Margin of error and personal responsibility

So far, I've been lumping coding and code reviews together, as both are technical, personal contribution, tasks. However, reviews are special in a way, so I'll tease them apart for the course of a couple of paragraphs.
While I was going on above about coordination, e-mails, budgets etc. etc, the truth is that management is first and foremost about responsibility; for R&D managers, responsibility over software their teams produce. Here, there is a whole spectrum of options of ensuring quality: from trusting the team all the way to personally reviewing each and every output.

As always in life, extremes have pronounced weaknesses. Reviewing each semicolon typed by the team does not scale. Trusting the team is fine up to a point where a particularly bad decision misfires. It's not hard to find core parts of the product where mistakes simply cost too much: think something along the lines of Facebook's personal wall, JavaScript inside google.com, packet management inside a web proxy etc. I'm sure if you give it a bit of thought, you can find similar business-sensitive zones in your product(s).
Personally watching over these areas has both the effect of adding another pair of eyeballs, and technical awareness which would allow you to react quicker if something does eventually go wrong.

Summary

Hopefully, this article has put an argument for managers not to divorce themselves from source code. Or, if you're not a manager, it may explain why your boss keeps looking over your should from time to time, and makes an appearance in your SCM tool, and why it's not necessarily a bad thing.

Sunday, 6 September 2015

Time Management

Last time around I've spoken about taming meeting invites, and in passing mentioned that this technique is one of the cornerstones upon which effective time management resides.

Time management is a bit like playing a musical instrument: many people try it, few persevere with it, and even fewer get it right.
However, this is yet another case where mastery pays off. As a superficial example, let's take two imaginary technical leads: each gets, say, 20 daily e-mails/instant messages/mailing posts with requests for help. Due to efficient schedule organisation, one manages to respond to each and every one in the same day; the other responds on average within a couple of days.

What happens is a positive feedback loop. People, seeing that the first guy provides more immediate help, start flocking to him. More importantly, he gains more visibility, and more popularity points. And, eventually, when the time comes to move up the ladder, he will have the edge over the other one, even if the latter is as good on pure professional merit.

Here's a bit of visualisation:

Of course, it's ~~a little bit~~ idealistic, and no deity has cancelled office politics, or pure element of luck, but it's undeniable that strong time management skills are an asset for both individual and their company.
Since you're hopefully convinced by now that the topic is not negligible, we can take a look at a few helpful practices.

Time as currency

When we buy items (soft drinks, cars, houses, curiously shaped lava lamps), we always attach intrinsic value: i.e. how much we're willing to spend on them. However, dollars/pounds/euros/yuans (insert your monetary unit of choice) is not the only currency we hold. In today's hectic IT environment, another important and finite asset we've got is time.

Let me stress and underline the word finite here. Spend an hour on an obscure design question, and that will be an hour you cannot spend on helping others with their challenges, brushing on new technologies, or spending with your family or hobby.

Now, with material objects, we usually know how much we're willing to spend. I'm happy to invest thousand pounds in a high-end guitar, and would not depart from more than a pound on a bicycle (since I don't ride them). For another person, it will be completely the other way around: both approaches are individual and perfectly valid.

Importantly, this analogy still holds true if we replace pounds (or your currency of choice) with minutes, hours and days.

Defining a set of responsibilities for a new micro-service is something that as an architect I'd be willing to several hours for. Vice versa, as a manager, I'd be unwilling to examine an interface in microscopic detail, and will restrict my time there to 30 minutes at most; the goal will be just to ensure that I understand it at a high level, and can tie it to business goals.

Same task - different intrinsic values.

Just to avoid displaying managers as superficial - there are plenty of opposite examples. An engineer does not have to spend more than 15 minutes drafting a daily update e-mail. Manager can and should pay more than that as their role demands accurate and unequivocal messaging.

All in all, there are little mental price tags attached to all that we do; we just normally don't notice them, Being oblivious to those little tags can easily cause to over- or under-spend; it is as if we were pulling random wads of cash from the wallet when paying over the counter in a supermarket. Understanding how many minutes we're willing to pay on X, and staying true to that is a cornerstone technique for efficient time management.

Context switches

Despite what philosophers and humanists may say, people do resemble machines in more ways than one, and a prime example of that are context switches. You're probably familiar with the topic, but let me mention it anyway - context switches occur whenever we move from one unfinished task to another. Both humans and machines do not perform any useful work when they swap contexts; at worst, if we constantly switch back and forth between tasks rather than completing each one in turn, we're thrashing in a state of unproductive semi-panic.

Of course, it's not hard to arrive to the solution: stop thrashing and do your work in a strict order. Or, as one Russian writer pointed out - "if you want to be happy, be so".

The problem is that things don't always go our way, and we cannot always complete all our work sequentially. If a developer starts writing a module that would take 8 hours to complete, they ~~can't~~ should not ignore the world around them for an entire day, and not deal with important code reviews, cries for help etc.

The trick is quick prioritisation. When a new request comes in, I tend to take a quick priority call among the following response times:

Drop everything and answer immediately. E.g. a widespread outage, an onsite sales engineer needing help, or a immediate call for action from my boss.
30 minutes. For example, someone else in the company is stuck with their (non-critical) task, and could be unblocked with my help.
2-4 hours. Typical examples are: a general request for information on a non-urgent customer case, or a code review.
Next working day. Planned document review, request for a weekly report etc.

The idea is that unless the environment is extremely chaotic, and most incoming transmissions do not fall in the first bucket, I can at least manage my interruptions, and reach a certain point where it's easier to switch to something else.

But, things can sure fall in the second and third bucket a lot, and quick and dirty prioritisation does not bring full salvation.

Forecasting interruptions

If your role does not include a lot of coordination, i.e. your title does not have the magic words Manager, Lead or Architect in it, then you can safely stop reading here. In many ways, you can count your blessings: people and priorities don't pull in different directions, and you get to create something new without being constantly tapped on the shoulder. So, if you're still with me (either by the virtue of your role, or due to sheer curiosity), let's say that you get a lot - and I mean, a lot - of interruptions that demand response within the same day or sooner.

Here, the challenge of being able to get on with the day-to-day job is still pretty much there. Yes, we just mitigated it a bit by doing a quick pre-filter, but mitigating does not mean solving. It is still very easy to get bogged down in a constant reactive cycle, where you just respond to requests rather than generate new value.

Yet, there is one little factor though that gives IT professionals a little edge towards solving this, and distinguishes software professionals and, say, football goalkeepers. The latter do not get any pattern in how often balls come their way. It might be once every 5 minutes, a cluster of ten balls towards the end of the game, or never. With software professionals, there's always a pattern. Normally, emails and questions come mostly during the first hour of the working day, towards the end of the day, and whenever any time zones we work with kick in.
If we zoom out for a moment, we can also observe a seasonal flow: end of fiscal year tends to be way more active than mid-July.

Understanding the organisation and its industry brings in more insight into when it's safer to plunge into long, focused, endeavours, and when it's better to deal with superficial tasks.
With all of that, I tend to pre-plan time slots which are open for involved, uninterruptible engagements; for example, it might be 10-12 AM, and 2-3 PM on a typical mid-week, mid-fiscal quarter day. Both slots avoid the time needed to answer morning and last night's questions, and also late afternoon when other time zones become active.

All that builds into day pre-planning. Before the day starts, I tend to know what meetings are scheduled, when I need to tend the goalposts (that goalkeeper analogy keeps on working!), and when I can switch into uninterruptible tasks, such as long-term planning or architecture design, without great risk of forsaking something important.

Summary

Altogether, there's no magic bullet. Yes, there's plenty of books about 7, 10, or 1024 habits of highly effective people, and no, you or me won't become highly efficient overnight by reading them, or indeed, this blog post.
A large part of being efficient stems from good knowledge of your company, industry, people around you, and your own strength and weaknesses, including ability to organise the workday, as well as concentrate and switch from one topic to another. Moreover, time management employed by a CEO might not be right for a developer, or a mid-level manager.

As with the other posts, my goal was not to proclaim a world cure, but to share the little tips and tricks that worked for me, with the hope that they might help a few people in their careers and work/life balance.

P.S. Interestingly enough, it took over a month to write a blog on time management. That's life: it just loves irony.

Sunday, 26 July 2015

Taming meeting invites

Introduction

As I was doing the detour into various programming topics, something was nagging me a bit in the background, something that was left unsaid in the previous series on Scrum.
After a short while, I realised that many of the recommendations in those series were underpinned by the invisible hand of time management, and this is the one meta-subject that haven't got the attention it deserves yet - at least on these pages.

For me, effective time management is a preliminary condition to success. Technical leaders have their attention drawn in many directions in many different ways, and many times within each day and each hour. Moreover, the more they lead the higher the frequency. At some point, we face a non-exclusive choice:

Work long hours
Ignore some of the ~~detractors~~ pleas for help
Actively decide how much time each particular subject merits

For years, I've been actively employing (3), though of course (1) is also unavoidable when the situation demands it. In a separate post, I'll bring up a few examples and guidelines on how one can quickly prioritise and allocate the right time slots, but first let's circle back: how all of this related to the subject of this post, namely "meetings"?

I can answer thus: meetings is whenever someone else decides for you how long you should spend on X. Hence, time management and meetings are deeply intertwined.

Now, before there is another step further, let me add a big disclaimer: I'm not saying that meetings are bad, and all of us should permanently sit in our rooms/cubicles/open spaces and lonely type at the keyboard.
The art consists of organising efficient and effective meetings on stuff that matters and do not make them longer than they should be. After all, if you spent an hour in a meeting that helped neither human nor beast, you'll have either to ignore an hour-worth of something else, something that matters, or go home one hour later.

Anyhow, let's be a bit more positive: here's the list of

Three meeting types that work

The brainstorm

Few attendees: definitely less than 10, but ideally less than 5. Everyone participates, and new decisions get formulated.

Examples include a production incident war room, or interactive technical design.

The review

Also fewer than 10 attendees, with one main presenter and a preliminary material distribution. There are all kinds of reviews: deployment, design, planning, project etc. You can also add an internal product demo in the same bucket.

The update

This meetings tend to have a big number of attendees: anything from 5 up to 50000. Their purpose is to convey information, and not kick off discussions, or take new decisions.

Company-wide quarterly update, Scrum stand-up, product training, architecture overview all belong to this group.

So far, so good, so what?

Ok, so this all beyond obvious, so why even bother listing it? As with many other phenomena, the trick is not realising what works, but what doesn't. Our enemy in this particular post are inefficient meetings, and those exist in the big galactic void between those three constellations. In not so many words, they happen whenever someone does not have a clear picture of what type of meeting they are going to call in.

The organiser might go for a brainstorm with twenty people, where only two speak, and eighteen wish for earth to open underneath their feet and swallow them whole. They might call in a review where nobody knows in advance what this is about and end up spending eons discussing mundane topics. They might go as far as call in an update with 50 people who don't care about what they are going to be updated on.

Taking out a page from a diagnostician's book, let's see how we recognize an inefficient meeting once it's upon us, and how we prevent those from occurring in the first place.

How to recognize an inefficient meeting?

This section seems a bit redundant - criteria such as "am I bored out of my mind?" and "am I answering my e-mail while it's going along?" come to mind. But they are not precise: if you and the other guy are engaged and are chatting away, while the rest of the room is doing stress testing for Facebook, the meeting is still ineffective.

My criterion has been surveying the room from time to time: if 20%+ of the attendees are mentally elsewhere, then the meeting isn't going great. You can tweak the number up to derive more adjectives: for example, with 80%+ it can be safely called "waste of time and money".

This occasional visual inspection is not a meaningless exercise. It is important lesson and feedback, since some meetings are not inherently bad; it is just that the speaker needs to do a better job of engaging. One example I covered at length in the past was the status update meeting.

However some meetings cannot be made whole even with the most eloquent presenter; and it's possible to detect those in advance.

How to recognize an inefficient meeting before it starts?

Here, the meeting types come to the fore again:

The brainstorm

The most common fallacy in my book is lack previous material or preliminary discussion.

I had the pleasure of receiving generic invites in the past; for example, "Design widget A", or "How do we make our customers happy?".
The cause is noble, but then what helps everyone to be productive? What if one guy might have been thinking about widget A all of his wakeful hours, and everyone else would not consider widget A until you served it with watercress on a plate?

This means that rather than having a brainstorm, we are going to have an update, while we won't really know whether there are other alternatives to design the widget, and/or whether we've got the right one.

Organiser's job is making sure that all the invitees:

Care about widget A
Have something to say about widget A
Know what others think about widget A

Brainstorming is much closer to playing chess than playing poker. The more we know which pieces/cards everyone else holds, the better we can choose what topics we spend time discussing, and the better we can build on each other's ideas.

In not so many words, brainstorms should not happen unless people were encouraged to think, contemplate and feed back offline first, so that we know exactly which points we are going to solve.

Note: The one exception to that rule are war-room discussions, where there's no time for niceties, but these should be rare in a healthy organisation.

Of course, there are many other reasons why these meetings might not go well - more vocal people dominating the proceedings, political agenda etc. These are already covered to a great extent elsewhere, and are more about running rather than organising a meeting.

The review

Here, as with the brainstorm, more often than not people come to the meeting from different starting points.
For example, let's say that you're reporting on your team's progress with creating widget A. Two managers/architects in the room collaborated on said widget, two more know just what it's supposed to do, and two more got tacked on to the meeting and are vaguely aware that said widget exists.

Do you cater for the last group, and explain in minute details what this widget is for and how it came to be? Then 5.5 guys will be wasting their time (the 0.5 is you).
Do you cater for the first group and do technical update only on what happened in the last 2 weeks? Then only three people will be actually interested and get to ask meaningful questions.
Even with the best of intent, it's hard to make a great meeting out of this.

Another typical fallacy is people being undecided between review and brainstorm. I.e. they do a presentation, ask open questions, and let others have a discussion. That doesn't tend to succeed for two reasons:

Too many people in the meeting: brainstorms should have a small group, while reviews spread themselves out.
Lack of preparation: as per above, brainstorms need more thinking done in advance.

In short, this kind of meeting works when all attendees know the subject, care about it to a reasonable degree, and already have a few questions in mind. The best way of getting there is pre-distributing the slides, answering basic questions offline if need be, and allowing attendants to be ready with the hard questions for the meeting itself.

The update

Updates have the most attendees, and tend to live or die by the skills of their presenter. Thus, they are a bit of a special case - it's harder to tell from a meeting invite how they are going to pan out. The only realistic warning sign are the words "discuss", "decide" or their siblings - they imply that the organiser is going for an open forum in a meeting least suited for that purpose.

What to do when an inefficient meeting lands in your calendar?

Rejecting is easy: Outlook has a convenient button for that very purpose. However, this is neither safe nor very helpful - especially if done without an accompanying note.
Explaining why this meeting might not work is less fraught with political peril, though also requires caution.

For example, it is legitimate to ask for a crisp agenda definition in a brainstorm, but demanding slides in advance for a review might be construed as unnecessary pressure. Here, we are both feet within the minefield of office politics, and it has to be tread very carefully.

The important point though is that it's you who is going to go home extra three hours later, and it's you whose priorities were forced by the meeting invite. If you firmly believe that your presence there will not provide any value, then it's best to state that and bail out using the best political tactics at your disposal - or help the organiser do a better job.
Obviously, you'll also be spawning meetings yourself; here it's best to keep to the saying: "Do unto others as you would have them do unto you".

Saturday, 11 July 2015

Regex engine and parallel string matching

The last post on Cython ended up with a surprising revelation on regexes - it turned out that Python matching of patterns OR-ed within a single regex was orders of magnitude faster than its regex.h cousin.

However, there were a few why-s, if-s and but-s left:

What is so different between the two engines? I alluded to backtracking specifics unearthed in a StackOverflow post, but that explanation wasn't truly satisfactory, as there was no other public source that referred to the same disparity.
Is this unique to regex.h? What about boost::regex or std::regex?
Does the size of the matched string set matter? Remember, I've fixed the set of needles to be searched in the haystack, but could it be that one engine performs better on smaller/larger sets?

These are exactly the questions I'd like to answer today, and here's how we are going to proceed:

Prepare three extensions that implement pattern matching: one for boost::regex, one for C++11 regex, and one for regex.h.
Run a set of tests for 2, 20, 200 and 2000 patterns which go over the same set of inputs.
Compare, deduce and write a blog post.

All of these shall be unveiled shortly, with an important correction that had to be performed between steps b) and c). However, I'm jumping ahead: let's step back and look at step a. As usual, I'll share the sources, but for compile/build/embed specifics will refer to previous posts.

Here's the boost::regex one:

#include <boost/python.hpp>
#include <iostream>
#include <set>
#include <boost/regex.hpp>

using namespace std;

boost::python::list matchPatternsImpl(const string &content, const string &pattern_regex)
   {
   set<string> tempSet;
   boost::regex regex(pattern_regex);
   boost::match_results<std::string::const_iterator> what;
   boost::match_flag_type flags = boost::match_default;

   auto searchIt = content.cbegin();
   while (boost::regex_search(searchIt, content.cend(), what, regex, flags))
      {
      tempSet.emplace(what[1].first, what[1].second);
      searchIt = what[1].second;
      }
  
   boost::python::list ret;
   for (auto item : tempSet)
      ret.append(std::move(item));
   return ret;
   }

BOOST_PYTHON_MODULE(contentMatchPattern)
{
    using namespace boost::python;
    def("matchPatternsImpl", matchPatternsImpl);
}

Now, it all looks as same old, but in fact, there are a couple of novelties. First, note the usage of Boost::Python. Rather than doing complicated marshalling of C++ data structures to Python and back via Python.h, we use a nice shortcut on lines 28-32. Who said that writing C++ extensions for Python was hard?

To follow up on that, we're not returning a primitive type here, but rather a Python list. This is where boost::python::list does its little magic by defining a C++ type that can be automatically coerced to Pythonic structures.
On lines 22-25 we take our C++ set and create such a list while using C++11 move semantics to avoid unnecessary copy constructors. Another minor landmark is the usage of emplace which avoids additional copies and constructs a string directly in the container. As always, my advice is to read and experiment with those if you're serious about C++11.

However, let's get back to business. We have the Boost example sorted, so let's move on to std::regex. Here, there is a bit of an anti-climax, as it is almost a carbon copy of the code above, which is entirely unsurprising - std::regex was modelled after Boost. To get the code. just replace boost:: with std:: in the right places.

All we have remaining is regex.h, which is a throwback to my previous post. The main difference though is that we're back to conventional weapons; Cython for sure looked unorthodox, but we want to do a like-for-like comparison, so this will be a pure C++ wrapper.

#include <boost/python.hpp>
#include <iostream>
#include <set>
#include <regex.h>

using namespace std;

boost::python::list matchPatternsImpl(const string &content, const string &pattern_regex)
   {
   set<string> tempSet;
   regex_t regex_obj;
   regcomp(&regex_obj, pattern_regex.c_str(), REG_EXTENDED);
   
   size_t current_str_pos(0);
   regmatch_t regmatch_obj[1];
   int regex_res = regexec(&regex_obj, content.c_str(), 1, regmatch_obj, 0);

   while (regex_res == 0)
      {
      tempSet.emplace(content.begin() + current_str_pos + regmatch_obj[0].rm_so, content.begin() + current_str_pos + regmatch_obj[0].rm_eo);
      current_str_pos += regmatch_obj[0].rm_eo;
      regex_res = regexec(&regex_obj, content.c_str() + current_str_pos, 1, regmatch_obj, 0);
      }
  
   boost::python::list ret;
   for (auto item : tempSet)
      ret.append(std::move(item));
   return ret;
   }
   
BOOST_PYTHON_MODULE(contentMatchPattern)
{
    using namespace boost::python;
    def("matchPatternsImpl", matchPatternsImpl);
}

The Boost::Python wrapper is one and the same, and the main difference is with the regcomp, and regexec.

Right, we have step a) complete, so it's time to run and compare.

Searched string set size	Python	regex.h	boost::regex	std::regex
2	100	140	151	120
20	110	400	440	410
200	130	2310	2372	2355
2000	248	31270	31263	31278

Note: All measurements are in milliseconds, and the platform is Python 2.7.9, Cygwin/Windows, 4-core 2.10 Ghz CPU

Erm - what's going on here? It's reasonable that Python would perform similarly to the Boost regex engine, but having two orders of magnitude is entirely unexpected and counter-intuitive. This points to a defect at the user's end (i.e. my code) rather than a radical difference between the libraries. After a bit of head scratching, the problem was found - this line:

boost::match_flag_type flags = boost::match_default;

needed to be replaced with this:

boost::match_flag_type flags = boost::match_default | boost::match_any;

We have to find any match, and not necessarily scan for all of them; this is taken care of by the outer loop.
After changing the one line, we get a saner set of results:

Searched string set size	Python	boost::regex
2	100	90
20	110	100
200	130	130
2000	248	270

How much of a difference does one line make! Python terminates the regex search as soon as it finds a match, while the other libraries press on. This is why regex.h was non-performant: it simply does not have such an option (std::regex does, with very similar metrics to boost).

Here are my takeaways from this exercise:

Boost::Python is a very convenient way of exposing C++ APIs. Unless you do highly customized parameters management or do not have access to Boost, it should be preferred to manual Python_ function calls.
Whenever you hit a non-performant library, blame yourself first.
Whenever doing parallel string matching via regexes, always look for match_any parameter, or variant thereof.
Avoid regex.h, unless the powers to be force development in pure C.

Sunday, 28 June 2015

Cython and regular expressions

Recap

I'm returning yet again to the long suffering text matching example from a couple of months back. The goal is/was to take a text document, a set of patterns, and see which of these patterns could be found. We went through a variety of techniques: Python micro-optimisations, regexes, C++ extensions, Twisted, reactors and improved string matching algorithms. My intent this time around is to try improvements brought out by Cython.

In case you're unfamiliar with Cython, I'll do a two sentences' introduction: its aim is to preserve the ease of development inherent to Python while extending it with static type definitions and easy C/C++ bindings. Considering that dynamic typing comes at a great cost, it's perfectly suited for optimising hotspots in Python code and striking a balance between productivity and performance.

As before, the end result turned out to be surprising (at least to me), but what matters is the journey. Let's embark on it.

Firstly, I'll shake the dust off the pure Python example:

from twisted.internet import reactor, defer, threads
import sys, re

def stopReactor(ignore_res):
   reactor.stop()

def printResults(filename, matchingPatterns):
   print ': '.join([filename, ','.join(matchingPatterns)])

def scanFile(filename, pattern_regex):
   pageContent = open(filename).read()
   matchingPatterns = set()
   for matchObj in pattern_regex.finditer(pageContent):
      matchingPatterns.add(matchObj.group(0))

   printResults(filename, matchingPatterns)

def parallelScan(filenames, patterns):
   patterns.sort(key = lambda x: len(x), reverse = True)
   pattern_regex = re.compile('|'.join(patterns))

   deferreds = []
   for filename in filenames:
      d = threads.deferToThread(scanFile, filename, pattern_regex)
      deferreds.append(d)

   defer.DeferredList(deferreds).addCallback(stopReactor)

if __name__ == "__main__":
   with open(sys.argv[1]) as filenamesListFile:
      filenames = filenamesListFile.read().split()
   with open(sys.argv[2]) as patternsFile:
      patterns = patternsFile.read().split()

   parallelScan(filenames, patterns)
   reactor.run()

This was just a copy/paste to save you the bother of clicking through to an older post. I still preserved the reactor and multi-threading, but they won't be playing a major role today.

We know that it's scanFile where we spend most of the time, and this is the hotspot that needs optimisation treatment with Cython. Since the matching is done via regexes, we cannot in good faith claim that we're optimising without also switching their library; here, I'm going to use regex.h.

Writing Cython code is not a common skill, and we're not dealing with Hello, World here, so as the narrator, I have two choices: jump to the end result, and go for an autopsy, or build it piece by piece. Taking the mantra that it's the journey that matters, I'll go for the second option.

Cython

Going for the easy bit, here's the updated scanFile function:

def scanFile(filename, pattern_regex):
   pageContent = open(filename).read()
   matchingPatterns = contentMatchPatternCython.matchPatterns(pageContent, pattern_regex)
   printResults(filename, matchingPatterns)

Nothing glamorous here, we just defer matching to the Cython extension that will be built.
Let's tick the build checkbox too:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(name='contentMatchPatternCython',
      cmdclass={'build_ext': build_ext},
      ext_modules = [Extension('contentMatchPatternCython', 
                     sources = ['contentMatchPatternCython.pyx'])])

Still nothing exciting, as this is just a small bootstrapping script that will build the Cython extension.

Time to take a deep breath, and look at how we write the actual algorithm.
This is the skeleton:

def matchPatterns(bytes pageContent, bytes regex):
   cdef set matchingPatterns = set()
   return matchingPatterns

Baby steps here. We just return an empty set to satisfy the function signature, however it already differs from vanilla Python code by defining types for pageContent, regex and matchingPatterns.
Why? We'd like to use Cython's memory and performance optimisations by pre-defining the variable types which will get us optimised variable storage and access. In this specific case, there is no requirement to support unicode, hence the bytes type definition.

Now, let's import regex.h functions:

cdef extern from "regex.h" nogil:
    ctypedef struct regmatch_t:
       int rm_so
       int rm_eo
    ctypedef struct regex_t:
       pass
    int REG_EXTENDED
    int regcomp(regex_t* preg, const char* regex, int cflags)
    int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags)
    void regfree(regex_t* preg)

Note that we only import what we need later on, and that we tell Cython to release the GIL when executing the imported functions via the magic nogil keyword.

Ok, so we have the interface and the required C library function imported into PCython. All that's left is the algorithm, and tying it all together:

cdef extern from "regex.h" nogil:
    ctypedef struct regmatch_t:
       int rm_so
       int rm_eo
    ctypedef struct regex_t:
       pass
    int REG_EXTENDED
    int regcomp(regex_t* preg, const char* regex, int cflags)
    int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags)
    void regfree(regex_t* preg) 

def matchPatterns(bytes pageContent, bytes regex):
   cdef set matchingPatterns = set()
   cdef regex_t regex_obj
   cdef regmatch_t regmatch_obj[1]
   cdef int regex_res = 0
   cdef int current_str_pos = 0
   
   regcomp(&regex_obj, regex, REG_EXTENDED)
   regex_res = regexec(&regex_obj, pageContent[current_str_pos:], 1, regmatch_obj, 0)
   while regex_res == 0:
      matchingPatterns.add(pageContent[current_str_pos + regmatch_obj[0].rm_so: current_str_pos + regmatch_obj[0].rm_eo])
      current_str_pos += regmatch_obj[0].rm_eo
      regex_res = regexec(&regex_obj, pageContent[current_str_pos:], 1, regmatch_obj, 0)

   regfree(&regex_obj)
   return matchingPatterns

The interesting stuff happens at lines 19-24 where the usual compilation and regex execution takes place. The code is not one-to-one to pure Python since the underlying C library does not support the notion of generators, and we have to do string slicing (which is cousin once removed to pointer arithmetic).

Cython is relatively new to me as well, and even though I've written the code above, it caused a certain degree of cognitive dissonance: a bit akin to someone seeing platypus for the first time.

The code looks like C, but semicolons are nowhere to be found, while there are Python indentations and weird brackets around strings. If you've been developing in both languages for a while, seeing them unified in matrimony within a single function is a new experience.

Anyhow, time to cut the nostalgia, stop coding and start measuring! At this point, my intent was to try this on a single <file, pattern> pair, and move on to doing further Cython tweaks to ensure we can release the GIL on the entire regex matching loop. Reality, however, turned to be different...

Performance

From here, and until the end of the post, the inputs are fixed: we use a single haystack (17KB source of google.html) and a fixed set of needles (first 2000 tokens of /usr/share/dict/words).

Making sure to run a few iterations and take the fastest one, here's the timing using the Python-only example:

$ time python pythonOnly.py shortFilenamesList.txt shortWords.txt

real 0m0.868s

user 0m0.546s

sys 0m0.312s

All right, so how does Cython compare? Drum roll please...

$ time python pythonWithCython.py shortFilenamesList.txt shortWords.txt

real 0m2.396s

user 0m2.012s

sys 0m0.374s

Ouch - 2.7 times slower! Of course, such a difference has to be explained. In my mind, there are two possible sources: different regex library or Cython itself. The former is far more likely, but we should never make fast assumptions with performance tuning. Hence, the natural next step was to write the same in C.

I'll omit the C code for brevity, as it mirrors the Cython code with the semicolons and pointer arithmetics thrown in. Let's jump to the result:

$ time ./a.exe

real 0m1.547s

user 0m1.513s

sys 0m0.031s

No joy - it's still way slower than Python. Time for

Conclusions

So, we know by now that it's the difference in the regex library that matters; no low-level optimisations in Cython or C can overcome that. In hindsight, there was little basis to suppose that Cython will matter on this specific task, as Python's re implementation is also native C.

However, this was not time wasted, and here's why:

This is yet another tangible example of why blanket statements such as Language X is slow, and Y is fast are untrue. In this case, Python came up faster; in another it may well be slower.
Thanks to user @nhahtdh on StackOverflow, I gained insight into the difference in performance. regex.h implements a backtracking engine, and does not optimise non-backtracking regexes, which is almost certainly something that Python re does.
Good Cython practice. There is a reasonable template in place to import C libraries into Cython, link with it, build etc. (There are of course plenty of other resources that show the same, but it's always helpful to do and gradually demonstrate it by yourself)

What next?

Coming back to the task at hand: can we truly say that Python's re engine is simply better? My answer is a definite 'no' since engines shine on different patterns. Ours has been a long sequence of OR expressions, and it is just one possibility among many.

So, there are many options to take it further: try other matching patterns, compare with C++ boost::regex, or go for examples that are less dependent on supporting libraries. While Cython has not been of much help with optimising our venerable string matching example, it yet has a future in this blog.

Subscribe to: Posts ( Atom )