How to document code

Documentation is secondary to good design.

Well-designed code needs less documentation; it already expresses important ideas using the features of the programming language. For example, here are some important aspects of code design:

Names: Clear and descriptive names illuminate the function and operation of the code.
Access control: Properly defined access levels indicate what can be known and modified by users, and implicitly, what is safe to modify or intended to be modified.
Single path to success: Providing only one correct way to accomplish a task ensures consistent and correct implementation by engineers.
Consistency: Maintaining consistency throughout the system simplifies its use and understanding. For instance, if you extend functionality through functional programming in one part of the system, avoid using inheritance in another.

But the design cannot capture everything there is to know about code: a name cannot express all the constraints of a complex method; the architecture does not tell you why it was designed like that. This is where documentation comes in. Here, I will discuss the documentation that is included with the code:

Doc comments
Error messages
Internal comments
Asserts

Who will read your documentation?

Like all writing, documentation is about transmitting ideas from your head to your readers. This requires that you think about them: What do they know? What do they want to know?

Broadly, your readers fall in two categories: users of your code, and maintainers of your code (the latter often includes yourself). Below we break these groups down further into different roles, with different needs. Thinking about these types of readers can help you write better documentation.

Users

Assessors

Assessors are people trying to figure out if they want to use your code (especially if it is a library).
They are interested in:

Library purpose
Feature overview
Performance
Dependencies
Compatibility

On a meta level, they are also interested in the quality of the documentation itself.

Integrators

Integrators are people trying to figure out how to use a system from your code.
They are interested in:

Entry points
Integration steps
Examples
Troubleshooting

Consumers

Consumers are people trying to figure out how a specific thing works or is meant to be used. They need to use it, but not necessarily integrate it – either because it has been done by someone else already, or it is not required. All C# programmers are consumers of the List type, for example.
The are interested in:

What a method does, or a type or other entity represents
Contracts
Examples

Customizers

Customizers are people trying to extend your system to fit their needs.
The are interested in:

Customization hooks

Maintainers

Fixers

Fixers are people trying to fix a bug in the code.
The are interested in:

Details of how algorithms work
Links to authoritative sources of information
Invariants
Justification of choices
Pitfalls to avoid

Extenders

Extenders are people trying to add a new feature to your code.

The are interested in:

Customization hooks
Intended way to extend a system internally
Justification of choices
Alternatives considered (but not implemented)

Architects

Architects are people trying to improve the design of the code.

The are interested in:

Justification of choices
Alternatives considered (but not implemented)

It is a mistake to think of the readers of your documentation simply as “users for your code”.

The two sets of readers correspond to two types of code documentation: doc comments and internal comments. But each of the specific use cases needs different types of documentation. When you write documentation, you need to consider all of them. If you don’t, your documentation will be lacking.

There may well be other reader types; you can develop guidelines specific to them following the same line of thought as we use here.

The roles above are not always different people. As a programmer, you probably take on all of these roles on a regular basis.

How will they read your documentation?

They won’t.
If they do, they will do it in a non-linear order: they visit the documentation in direct response to a problem they are having; not because they are drilling down from the top-level content.
They’ll skim, looking for the exact piece they need to solve their problem. Therefore, they may well miss important information.
They’ll read in a state of stress or annoyance. Since they are visiting the documentation in response to a problem, they are even more likely to be “poor” readers.

One can argue that engineers should read documentation carefully, in the intended order, but the truth is that many don’t, and while it may hurt them, it also hurts you. No-one that rates an asset one star adds a disclaimer saying they did not read the docs so take their one-star with a pinch of salt. Players are not more tolerant to bugs caused by programmers not reading the docs. So, engineers not reading documentation is your problem.

Therefore:

Design the code to minimize the need for documentation.
Lead engineers to important top-level documentation from specific documentation.
Make the content easy to skim. Use bullets, tables, and examples; carefully structure the content.
Important information should be easy to find from the perspective of something going wrong. If a user encounters an argument exception, the error should tell them what they did wrong, and not require them to read the library error handling documentation to find out what is the overall rules for arguments. (If such rules exist, the error message should also point to these).
Write well.
Proofread carefully, have it proofread, and test it on engineers.
Not everything is for everyone. We said that assessors will read your purpose; consumers typically won’t. Therefore, do not bother with information for consumers in your purpose doc. This will keep it more focused and relevant to the assessor that will* read it.

* might 😉

Pieces of Documentation

Purpose and Feature Overview

Summarize your library’s main features and reason for existence. This description should allow a reader to make an educated guess about whether what they may be looking for will be present.

If there are competing libraries, say what is the specific advantage (or tradeoff) using your library.

Briefly list the main features of your library, expanding on the summary mentioned above. This could introduce the entry points discussed below.

Include your design philosophy if you believe it will help the user choose or better understand your library.

Document next steps: how should users start learning the library? In smaller libraries, this may simply be pointing them to the main namespaces for them to drill down further.

Entry Points

In top-level documentation, list the key types of a system.

How to create the system.
The core classes the user will interact with.

All types that form part of a system need links (directly or indirectly) to the relevant entry points. Types without constructors should say how instances of them can be obtained.

The entry points are usually where you document aspects of the system the user needs to know, including the other pieces of information required for integration discussed below: integration steps, examples, and troubleshooting.

Integration steps

Document integration steps at the entry point. (If there is more than one, choose a principle one, and link there from the others).

If there are typical integration points or patterns to use, list them.

List the steps a user must take to integrate the system. Say:

What types they need to define, if any.
What configuration objects they need to construct.
What central objects they need to construct.
What auxiliary objects they need to construct.

Identify optional steps, and make it clear when the user should take them.

If there is more than one way to integrate the system, list them all. Explain when to use which.

Examples

Examples are very helpful to show how and when to use code.

Examples give you the opportunity to show terminology and best practices related to the feature. For example, if your API is designed to be used fluently, examples will make this clear.

Examples should be concrete, and from the domain the library is being used in. Examples should be from scenarios that could occur if possible.

If a feature was designed with specific use cases in mind, use those as examples. This is particularly important if your code abstracts a concept; engineers may not realize their concrete problem can be casted in your abstractions.
Use concrete examples that are easy to visualize. For example, if the domain is game development, then use types such as Player, Enemy, Monster.

In some cases the code may apply broadly, and not to a specific domain. Even then, use concrete examples. I like to use animals for objects; they make for vivid images, and are easy to put in hierarchies to show inheritance relationships through words such as Animal, Mammal, etc. It is also easy to come up with names for the start of the alphabet if you need something that can be sorted.

Avoid abstract entities such as MyClass or Class1, or letter.

If you have lots of examples in your code, you should define standards and apply them consistently.

One possibility is to use the same standards as applied to the code-base itself. This may be especially appropriate for internally used libraries.

It is, however, often desirable to have examples that are more compact than real code, so you may opt for more a permissive standard.

Internal standards may also be obscure; in this case you may choose a standard that is widely used (for example, Microsoft’s standard for .Net code).

Use real code compiled against the library when possible. This way the compiler can help ensure the documentation always has code that compiles and is up to date. Most documentation systems allow you to do this. It is usually tricky to get to work, but well worth the effort.

Troubleshooting

It is frustrating to use code that does not work as expected because of a configuration problem. Your design can help avoid incorrect configurations and carefully enforce contracts, but it may be impossible to avoid issues completely. How you report issues to the user is crucial in making your code useful and pleasant to work with.

In some cases, providing helpful error messages requires changing the code design. For example, a parser might encounter various error states that only make sense within the context of how the parser works. Instead of just saying a bracket is missing, it’s more useful to tell the user that the system expects an “if-statement” and the syntax is incorrect. To provide this level of detail, the parser must be structured to have this context available when detecting errors.

Help the user deal with problems:

List common problems.
Explain how they can program defensively to flag common issues.
Link to mechanisms a user can use to debug issues with the system.
Explain techniques a user can use to troubleshoot your system.

Exception error messages

Logical errors should give information about the code that could help a user understand what they should change, and how. Application errors should have the information that could help them craft a suitable response, especially information that they may pass on to the user when their involvement is required.

Many errors are of the form: “expected something to have some condition, but instead it had a different condition”, for example, “Expected animal to be a cat, but it was a dog instead.” Messages for this error type should name all three things: what was inspected, what was expected, and what was observed.

In many cases you may well guess what happened: if you expect color values between 0 and 1, and see a value bigger than 1, the engineer probably forgot to normalize a byte. Similarly, if you see a large value when an angle in radians is expected, the engineer probably passed in an angle in degrees. Your message should refer to these common situations to speed things up:

“Angle in ‘angle’ is larger than 2*Math.Pi. Did you pass in an angle in degrees?”

Customization hooks

For customizers

There are a few standard ways in which a system can be customized, for example, using inheritance, supplying a configuration object, supplying functions or functors, and so on. List all these mechanisms at the entry point of the system, linking to other parts of the system if necessary.
Provide practical use-cases as examples.

For extenders

Extenders have all the hooks available that customers have, but because they can modify or add to the code base, they have additional mechanisms available. The additional mechanisms should be documented as an internal comment.

Algorithm details

Some details of the algorithm should be given as the API documentation; the rest should appear as internal documentation. Where to put what is part of the design, but nothing should be omitted.

Specify the name of the algorithm if it is a standard algorithm (for example, MergeSort).
Specify the contract the algorithm adheres to.
Specific features or optimizations (for example, using selection sort to sort small lists).
Give high level overview of the algorithm and its steps.
Use asserts throughout to test assumptions and invariants.
Document any “cleverness”, especially optimizations that rely on properties that are not immediately obvious.
Give the time and space complexity.
Provide links to algorithm descriptions (usually Wikipedia).
If you adapted source code, link to it, or if it is from a book, reference it.

Invariants

An invariant is a property or condition that remains true throughout the execution of an algorithm or a portion of code. For example, in selection sort, the invariant at the end of the outer loop with index i is that the elements from 0 to i are in sorted order.

Invariants are often used to prove the correctness of algorithms, but they are very handy debugging aids.

The best way to add an invariant is through an assert statement. That way, your invariants are automatically checked when you run your code in debug mode. It is usually good to define validation methods for complex invariants. Assert(BeginingOfListIstSorted(list, i)).

Sometimes you need to do extra work to test a series of invariants. The neatest way to do this and have the code stripped out from Release builds is to define a method to check all the invariants, and make the method execute conditionally (in C#, by using the Conditional attribute.)

Many coding standards require that you need to add a message to assert. I am not sure this is that useful, as long as you use methods for complex assertions so the name can serve the purpose a message would.

Post conditions are usually better tested with Unit tests. However, in some cases you can perform a more precise or informative check using data internal to the method. In this case, asserting conditions on the return value is appropriate.

Predictions are usually checked explicitly so that you can throw appropriate exceptions. However, you may choose to limit these checks to public methods, and assume that they hold for private methods. In this case, it is appropriate to assert conditions on the parameters of a method. You may also choose not to do explicit checks for performance reasons. In this case, I would still add explicit tests but make them conditional on the mode (for example, only run them in debug builds).

Design Choices

Document the reasons for your design choices. This will help others understand the intent of the code, and prevent them from making changes that could make the code worse in some way. For example, you may have chosen to use an array instead of a List because it allows you to use some other method without conversion. Documenting this will prevent someone from changing your method just to discover this fact later.

Document design alternatives considered. This is helpful to prevent engineers from replicating that work when their intuition leads them to consider changing the system to one of these alternatives. List the alternatives you considered, how you evaluated them, and what criteria you used to make the final choice.

A series of these design considerations exhibit the underlying design philosophy or overarching goals of the library. Although thoughtful designers will document this on the outset, there will be a lot of principles and techniques that are revealed or discovered only after time.

Implementation Pitfalls

The first implementation of a method is often incorrect, because you are missing some detail or making some wrong assumptions. As you fix these problems, do document them in the code. Again, this will help a fixer with wrong assumptions from making changes that could break the code. //I thought this list will never be empty, but it can be empty when...

This is particularly useful when you do checks to skip code. Why are the checks necessary? Why is it OK to skip the code?

Entities

When documenting entities (types and members), it’s crucial to address the needs of various readers. Documentation of entities commonly focuses on consumers but often neglects aspects important to customizers and extenders:

For Consumers: Documentation typically covers what an entity means and how to use it. For example, when documenting an exception, describe what it signifies and how consumers can handle it.
For Extenders: Extenders need to know whether they should throw a specific exception or another one when implementing related features. Highlight the differences between similar exceptions to guide their decision, a point often overlooked.
For Customizers: Customizers look for whether an exception is suitable to extend from. Emphasize differences between exceptions to aid their choice, which is frequently neglected.

As mentioned before, types that cannot be instantiated directly (abstract types, interfaces, types with private constructors) should specify how instances should be obtained. Clarify whether engineers are expected to create their own types or use provided ones. If both scenarios apply, describe what is typical or outline both explicitly.

Consider the user’s journey. Engineers often read about types because they encounter them as return or parameter types. For return types, explain what can be done with them. For parameter types, explain how to obtain instances.

Conclusion

Good design reduces the need for extensive documentation, but it cannot replace it completely. Design and architecture lay the foundation, but documentation fills in the gaps by giving context, constraints, and use cases.

By considering the diverse needs of different readers, you can greatly enhance the usability, maintainability, and overall quality of your code.