Sunday 24 May 2015

Code comments: explaining the 'why'

While I'm on the subject on code reviews. it's hard to get past one of my favourite subjects: code comments.

You might wonder here whether comments justify having a full blog post. After all, it's quite simple: add a line in plain English (or whichever language tickles your fancy at a given moment), and move on to doing real coding.

My main bugbear with review comments is not so much their absence or presence, but rather their purpose: in many cases, they explain the obvious and leave the complicated as an exercise for the reader (to borrow a phrase from one of my university tutors).

Let me dig out an example:
#include <iostream>
#include <string>
 
using namespace std;
 
bool TestPalindrome(string::const_iterator forwardIt, string::const_iterator backIt)
    {
    for (; forwardIt < backIt; ++forwardIt, --backIt)
       if (*forwardIt != *backIt)
         return false;
    
    return true;
    }
 
bool ProcessTestCase()
    {
    string inputStr;
    cin >> inputStr;
 
    auto forwardIt = inputStr.cbegin();
    auto backIt = --inputStr.cend();
    
    for (;forwardIt < backIt; ++forwardIt, --backIt)
       {
       if (TestPalindrome(forwardIt + 1, backIt) || TestPalindrome(forwardIt, backIt - 1))
          return true;
       if (*forwardIt != *backIt)
          return false;
       }
    return true;
    }
 
int main(int argc, char *argv[])
    {
    ios_base::sync_with_stdio(false);
    size_t testCaseCount(0);
    cin >> testCaseCount;
    for (size_t i = 0; i < testCaseCount; ++i)
      {
      if (ProcessTestCase())
         cout << "YES";
      else
         cout << "NO";
      cout << endl;
      }
    return 0;
    } 

Now, at this point, you don't know what this code is for and how it does it. Of course, a bit of reverse engineering is a nice past-time for a lonely afternoon, but let's say that we'd really like to take a shortcut and see some comments.

So, here they are:
/* Determine whether a set of inputs can be converted to 
   a palindrome by removing a character
*/

#include <iostream>
#include <string>
 
using namespace std;
 
bool TestPalindrome(string::const_iterator forwardIt, string::const_iterator backIt)
    {
    for (; forwardIt < backIt; ++forwardIt, --backIt)
    // If character mismatch, return false
       if (*forwardIt != *backIt)
         return false;
    
    // Loop finished, return true
    return true;
    }
 
bool ProcessTestCase()
    {
    string inputStr;
    cin >> inputStr;
 
    // Set iterators to the start and end of string
    auto forwardIt = inputStr.cbegin();
    auto backIt = --inputStr.cend();
    
    for (;forwardIt < backIt; ++forwardIt, --backIt)
       {
       // Check for palindromes
       if (TestPalindrome(forwardIt + 1, backIt) || TestPalindrome(forwardIt, backIt - 1))
          return true;
       // If iterators are not equal, exit the loop
       if (*forwardIt != *backIt)
          return false;
       }
    return true;
    }
 
int main(int argc, char *argv[])
    {
    ios_base::sync_with_stdio(false);
    size_t testCaseCount(0);
    // Read test case count
    cin >> testCaseCount;
    for (size_t i = 0; i < testCaseCount; ++i)
      {
      // Print out result
      if (ProcessTestCase())
         cout << "YES";
      else
         cout << "NO";
      cout << endl;
      }
    return 0;
    }

The top level comment should take us forward - it at least explains what the purpose of the code is. But what about the others? It's hard to argue with their truthfulness, but they are kind of obvious.

Here, I'm approaching the crux of the message: often, people put Mathematician's answers as comments. I.e. the information is correct, it is in the right place, and yet, it does not help whatsoever.
The right comments add information that's absent in the code itself, and in our example, the code already tells that we read the test case count, print out the results, and exit the loop.

Of course, it's easier said than done. Not everyone can just flick a switch, get an out-of-body experience, and know what everyone else doesn't know. My personal approach to that was adding comments in two phases:
  1. While writing, when knowingly adding complex code.
  2. After finishing coding, and before submitting it for review. There's always a natural break between those two when automated tests are doing their worst. This is the right time to take a step back, read the code again, and pick up quirks that seemed natural on the first pass, but do not longer look that way after a few minutes away from the screen.

Here's the same code example with better comments:
/* Determine whether a set of input can be converted to 
   a palindrome by removing a character
*/

#include <iostream>
#include <string>
 
using namespace std;
 
bool TestPalindrome(string::const_iterator forwardIt, string::const_iterator backIt)
    {
    for (; forwardIt < backIt; ++forwardIt, --backIt)
       if (*forwardIt != *backIt)
         return false;
    
    return true;
    }
 
bool ProcessTestCase()
    {
    string inputStr;
    cin >> inputStr;
 
    auto forwardIt = inputStr.cbegin();
    auto backIt = --inputStr.cend();

    // Run two counters: from the start and end of string. 
    // If they point to different characters,
    // check whether removing one of them yields a palindrome. 
    // If not, we cannot convert the input
    // to palindrome by a single character removal
    for (;forwardIt < backIt; ++forwardIt, --backIt)
       {
       if (TestPalindrome(forwardIt + 1, backIt) || TestPalindrome(forwardIt, backIt - 1))
          return true;
       if (*forwardIt != *backIt)
          return false;
       }
    // If inputStr itself is a palindrome, 
    // we can simply remove a character from the middle,
    // so return true here.
    return true;
    }
 
int main(int argc, char *argv[])
    {
    ios_base::sync_with_stdio(false);
    size_t testCaseCount(0);
    cin >> testCaseCount;
    for (size_t i = 0; i < testCaseCount; ++i)
      {
      if (ProcessTestCase())
         cout << "YES";
      else
         cout << "NO";
      cout << endl;
      }
    return 0;
    } 

Comparing the two examples one against the other:
  • No more comments which tell what the line below does, such as "Set iterators to the start and end of string". If the reader knows C++, they won't need it. If the reader does not know C++, then they'll need far more than that.
  • New comment explaining the main algorithm (lines 27-31).
  • Comment highlighting a specific edge condition (lines 39-41).
The last one exemplifies the difference quite well. If someone annotates code as per the first example, they would never consider commenting what return true does. But, the important bit is not what it does, but why it's there, and why it does not return false.

Coming back to the two-phase approach: if I need more than a minute to think of a specific code line, then it is complex, and probably merits a comment. In this specific case, the return true took me a couple of minutes, so it got a comment as its chaperon.
And, to close a circle, let's mention code reviews. If someone needs clarification during review, then nine times of ten the answer should manifest itself within the code.

So, here we are: comments should explain 'why' rather than 'what', and they should be read and written by taking a 3rd person view.

However, it's also possible to take things into the other extreme. If you find that comments overwhelm the content, then it's quite possible that the content itself is the problem.

For example, I could add a comment for the line below:
res = itertools.product([x for x in myList if x in y else calcValue(x)], anotherList)
like this:
# For all values not in the intersection of x and y, recalculate value, and then create a product with another list
res = itertools.product([x for x in myList if x in y else calcValue(x)], anotherList)
or I could just rewrite the code to make it more explicit:
newList = []
for x in myList:
   newValue = x if x in y else calcValue(x)
   newList.append(newValue)
res = itertools.product(newList, anotherList)
The last variant is more readable despite not having a single comment.

No comments :

Post a Comment