Wednesday, May 29, 2013

How to understand undocumented code

Often, new developers are overwhelmed by the quantity of undocumented/unfamiliar code they have to contend with on a project.  And, if left unchecked, they'll often start reimplementing things because they don't know what's already been done, or they're uncomfortable depending on it.

An incredibly valuable strategy for helping new developers to understand a large unfamiliar code base is to have them document it.  The team will appreciate the heck out of what they're doing, and they'll learn the entire codebase in the process.  

Here's a workable approach to documenting a large codebase:

Start at the bottom.  List out all of the files, categorize them - "main.c" and friends are "top dogs",  a bunch of stuff will be "supporting", and utils.c db.c and such will be "low level".

Document the lowest level routines first.  These have NO dependencies on other modules in the code base (but may have system dependencies of course).

What do they do?  What do they do that's surprising?  What do they do that makes you wonder why?  Document the heck out of them.

Then work up one level.  For each low-level file, find all the files that include it.  Sort those out, and find ones that ONLY include it and maybe a few other low-level files.   Document those.  What are they doing?  What encapsulations do they provide, turning the low-level stuff into something more domain specific?

Don't hesitate to dive back into the low-level files and refine their documentation as you learn more about what their consumers are doing and how they're using the low-level stuff.

Loop over all lowest-level files and do that work-up-one-level thing.

Once that's done, move your focus up one level, and repeat the process.   Loop over each "supporting" file, documenting it better, based on what you've learned from what you've done so far.

Reach up a level for each file - find the files that include it, and start to document what they're doing as consumers of that file.  Document the rest of whats in each of those files.

Then, just as you did before... loop over all supporting files and do that work-up-one-level thing.  Find all the files that are including them, and document what they're doing as they act as consumers of those supporting files.

If you treat the entire codebase as a dependency graph, with main.c at the head of the tree, and work up from the bottom as I've described, you'll end up with extremely good documentation of what every file is doing and why.

...and THAT is how you learn an undocumented code base - you document it.

No comments:

Post a Comment