Sunday, 3 May 2015

Reviewing Code: Theory in practice

As promised last time, I'll apply some of the review practices on a victim code sample.

The task at hand is generating a few pseudo-random passwords, and the code we're going to review is below (yep, it's C++ - home away from home):

#include <iostream>
#include <cstdlib>
#include <ctime>

const int MAX = 90;
const int MIN = 65;

char * createPassword();

int main()
{
    char * p = createPassword();    
    return 0;
}

char * createPassword()
{
    unsigned seed = time(0);

    srand(seed);

    char x = ' ';
    int passwordLength = 0;
    int numOfPasswords = 0;

    std::cout << "How many chars in password?";
    std::cin >> passwordLength;
    char * pwptr = new char[passwordLength];

    std::cout << "How many passwords should be generated?";
    std::cin >> numOfPasswords;

    int passwordcount = 0;

    do{
        for(int cnt = 0; cnt < passwordLength; cnt++)
        {
            x = (rand() % (MAX - MIN + 1)) + MIN;
            pwptr[cnt] = x;
            std::cout << pwptr[cnt];
        }
        std::cout << std::endl;
        passwordcount++;
        } while(passwordcount != numOfPasswords);

        return pwptr;
}
I've specifically selected a reasonable and functional fragment, so that we remain attached to reality. Let's recall the review order, striking out the stages that do not apply here:


  • Code reuse (or lack thereof)
  • Encapsulation, interfaces and self explanatory variable names. 
  • Correctness, i.e. no obvious defects with inverted conditions, missed basic use cases etc.
  • Performance and efficiency
  • More esoteric errors, such as off-by-ones, imprecise variable types, signed/unsigned, proper variable scope and so on
  • Auto-tests and code coverage
  • Correct merges to release branches and external dependencies.

Well, auto-tests might well apply, as they are conspicuous by their absence, but I'll play the usual card of brevity in the name of clarity.
Otherwise, there's not much room for reuse (we are not a part of a bigger project), or performance (pseudo-random number generation takes the lion share of the processing).

Now, if you have C++ baggage and a bit of time to spare, try and do your own review; we'll compare notes as we go along. 


Encapsulation and interfaces

Immediate observation: the function is called createPassword , but in fact it performs queryUserAndCreatePasswords.
It makes a lot of sense to separate creation of a single password, and remove user interaction. After all, if someone uses this function externally, they would not expect it to strike up a conversation with the user. 

This needs to be fixed before we move on, as lack of encapsulation might hide other issues. Rotating the table, and becoming the coder for 5 minutes, I end up with iteration number two:

#include <iostream>
#include <cstdlib>
#include <ctime>

void generatePasswords(int passwordLength, int numOfPasswords);
void generatePassword(int passwordLength, char *pwptr);

int main()
    {
    int passwordLength = 0;
    int numOfPasswords = 0;
    std::cout << "How many chars in password?";
    std::cin >> passwordLength;
 
    std::cout << "How many passwords should be generated?";
    std::cin >> numOfPasswords; 
 
    generatePasswords(passwordLength, numOfPasswords);
    return 0;
    }

void generatePasswords(int passwordLength, int numOfPasswords)
    {
    unsigned seed = time(0);
    srand(seed);
 
    char * pwptr = new char[passwordLength];
    for (int i = 0; i < numOfPasswords; ++i)
       {
       generatePassword(passwordLength, pwptr);
       std::cout << pwptr << std::endl;
       }
    }

void generatePassword(int passwordLength, char *pwptr)
    {
    const int MAX = 90;
    const int MIN = 65;
 
    char x = ' ';
    for(int cnt = 0; cnt < passwordLength; cnt++)
        {
        x = (rand() % (MAX - MIN + 1)) + MIN;
        pwptr[cnt] = x;
        }
    }
User interaction is moved to main(), and we now have a passable interface. People may rightfully frown upon such similar names as generatePassword and generatePasswords. My justification is that only generatePassword (in singular) is a candidate for an external interface, so we do not need to dwell too much on the other function's name.

Now, that we have a proper interfaces in place, let's look for 

Correctness

Jumping right into the crux of that matter: it might not look obvious, but the random passwords are not that random! 

Rand() creates predictable sequences if seeded with the same timestamp. So, if we invoke generatePasswords multiple times within the same second, we'll get the same sequence, and thus, the same set of passwords. Not good.

You might point out that with user interaction in place, there's not much chance of multiple password sets being requested within the same second. 
True, however we hope to reuse password generation in other, more automated, contexts by providing it as an interface. Naturally, we can't expect the callers to avoid calling the function twice in the same second.

This provides a nice opportunity for using random device facilities in C+11. Not only are they using hardware for generating entropy, they also have a convenient interface for specifying ranges and thus, casting the modulo function into oblivion. Without further ado, here's the third iteration: 
#include <iostream>
#include <random>

void generatePasswords(int passwordLength, int numOfPasswords);
void generatePassword(int passwordLength, char *pwptr);

int main()
    {
    int passwordLength = 0;
    int numOfPasswords = 0;
    std::cout << "How many chars in password?";
    std::cin >> passwordLength;
 
    std::cout << "How many passwords should be generated?";
    std::cin >> numOfPasswords; 
 
    generatePasswords(passwordLength, numOfPasswords);
    return 0;
    }

void generatePasswords(int passwordLength, int numOfPasswords)
    {
    char * pwptr = new char[passwordLength];
    for (int i = 0; i < numOfPasswords; ++i)
        {
        generatePassword(passwordLength, pwptr);
        std::cout << pwptr << std::endl;
        }
    }

void generatePassword(int passwordLength, char *pwptr)
    {
    const int MAX = 90;
    const int MIN = 65;
    std::random_device randomDevice;
    std::uniform_int_distribution<int> distribution (MIN, MAX);
 
    for(int cnt = 0; cnt < passwordLength; cnt++)
        pwptr[cnt] = distribution(randomDevice);
    }

General code quality

Now that we have both encapsulation and correctness sorted, let's look for more intricate defects.

The memory leak in line 23 is a case in point, and should be eliminated (just to state the obvious, pwptr is not freed).
Moreover, it's not just a memory leak - each subsequent password generation wipes out the previous token.

Additionally, we cannot have negative password lengths or counts, and the passwords themselves contain chars.

With all of that, we get iteration no. 4:
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
#include <string>
#include <algorithm>

using PasswordList = std::vector<std::string>;
std::string generatePassword(size_t passwordLength);

int main()
    {
    size_t passwordLength = 0;
    size_t numOfPasswords = 0;
    std::cout << "How many chars in password?";
    std::cin >> passwordLength;
 
    std::cout << "How many passwords should be generated?";
    std::cin >> numOfPasswords; 
 
    PasswordList passwordList;
    std::generate_n( std::back_inserter(passwordList), numOfPasswords,
      [passwordLength] () { return generatePassword(passwordLength); });

    std::copy(passwordList.begin(), passwordList.end(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
 
    return 0;
    }

std::string generatePassword(size_t passwordLength)
    {
    static const char MAX = 'Z', MIN = 'A';
    std::random_device randomDevice;
    std::uniform_int_distribution<int> distribution (MIN, MAX);
 
    std::string res(passwordLength, 0);
 
    for(size_t i = 0; i < passwordLength; ++i)
        res[i] = distribution(randomDevice);
 
    return res;
    }

There are several code improvements:

  • As promised, generatePasswords is gone due to C++ conveniently providing generate_n.
  • generatePassword now returns an std::string, hence we need not worry about memory leaks, allocations etc. If you're concerned about unnecessary string copes, have a look at RVO.
  • Generated passwords are printed using std::copy to an ostream_iterator.
  • Various declarations and definitions are tidied up: int became size_t; rather than having a magic number 65, I'm using 'A' and so on.

Hitting the Approval button


At this point, I finally pass the review. This was a fairly easy task, and the starting point wasn't even half bad. 
Nevertheless, my chances of hitting all the issues including memory leaks, wrong variable types, time sensitivity in a single pass would have been low.
So, having separate incremental review and rewriting stages helped a lot in flushing the issues out one by one.

1 comment :