Code and Cognition

Some years ago, I had the idea that understanding code must depend on our brain. I had heard of George Miller’s “Magic Seven” and tried to apply it. A friend mentioned knowing about the number 4, not 7. At first, I was only interested in which was true, but eventually the research lasted many years and led to my own model of understanding. What was the outcome?

First of all - we often talk of “the brain,” but this is strictly only the name for the organ in our head. Biologists talk of the brain, psychologists talk of memory, which summarizes all processes involved in recognizing, storing, transforming, and expressing information. Psychology assigns different responsibilities to memory; we will focus on the long term memory and the working memory. The limitations of long term memory and working memory directly influence our understanding.

Fortunately, understanding memory does not classify people into “understanding” and “not understanding” but gives us valuable insights into how to present information such that many people could understand it.

Long Term Memory

The responsibility of long term memory is storing information. The units that are stored are called chunks. Every concept we know (i.e., we have an imagination of) is represented by a chunk. Since a concept in the real world could have no imagination, a weak imagination, or a strong imagination, chunks can be missing, quite superficial, or detailed. Examples of chunks:

A pixel is almost always recognized as a chunk. But blind people do not have an imagination of a dot, or at least have a different one
A character A consists of pixels. All people who can read will store it in a similar chunk. But cultures with non-Latin character sets will have a different chunk, and people who cannot read Latin characters will not have this chunk at all
The word observer is a chunk for most English-speaking people. But not for speakers of other languages
The concept of the design pattern observer is a chunk for many software engineers. Other professions will not associate the same aspects with this word

So long term memory can help us with understanding only if every read information is associated with the correct chunk - and the details of this chunk must be sufficient.

Working Memory

The responsibility of the working memory is combining and transforming information. According to works of Nelson Cowan, working memory can combine 4 chunks at a time. But why should we combine chunks anyway?

If storing and retrieving information is needed, the long term memory would probably be sufficient. But this would mean that our cognition would always be limited to concepts we already know. The working memory expands our knowledge to scenarios that are new to us or simply too special to store.

Most of us know the result of 1+1, and most will not use the chunk addition on 1 and 1 but will activate the chunk 1+1. But we all know that addition may be applied to an infinite number of number combinations. The long term memory is quite large, but not infinite, so adding arbitrary numbers is a common task for the working memory.

Now consider code. We know that there exists an infinite number of Turing machines. So it is also not possible to store every possible line of code in the long term memory. Instead, we process a line of code in our working memory. The limit of 4 chunks means that we cannot understand lines of code well that contain more than 4 chunks.

At this point, it gets a bit tricky. If we cannot consume a common understanding (same chunks), we cannot really evaluate whether a line of code is easily understood or not. A line of code with 80 words may be perfectly understandable for someone who has a chunk for exactly these 80 words. The common reader will recognize far too many chunks to process in one step and will get confused.

So the working memory is limiting our understanding because we cannot evaluate arbitrarily complex operations in one step. Code must be written in a way that every logical step uses at most 4 chunks.

Summary

We now know that:

working memory limits us to process at most 4 chunks of information at one time
long term memory limits our chunks to all concepts we did learn in the past

Consequently, writing understandable code must:

combine at most 4 chunks of information in one expression/statement/structure
use only chunks that are available to the intended readers

If we do so, we can derive most of the clean code rules:

the rule of using only few attributes in classes (working memory can only match 4 in one abstraction)
the rule of using at most 2 arguments in methods (working memory can combine 4 chunks: this, the method, both arguments)
the rule of using good names (because bad names are assigned to the wrong chunks)
the rule of self-explaining method names (if we cannot write method names self-explaining, certainly more than 4 chunks are needed, in this case documentation is not understandable too)

Applied

The following examples all describe the same program, but they vary in style and level of detail:

var result = doIt(3,2,4,2,2);

This is an example for chunks that do not contain sufficient information. The variable name result offers no clue about the operation being performed, while the function name doIt is equally uninformative. The sequence of numbers gives no hint about the semantics of the line.

The primary issue here lies in the absence of high-level concepts associated with these elements. The long-term memory cannot form associations due to the abstract and vague nature of the code. (To be fair, the real culprit here is the writer.)

var result = powerSumAndRoot(3, 2, 4, 2, 2)

In this version, the writer has made an attempt to make the code comprehensible. The function name powerSumAndRoot suggests an operation involving powers, sums, and roots (the inverse of powers). However, it still falls short of clarity. The relationship between the words and how the numeric parameters correspond to the operations remain ambiguous.

Here, the problem shifts to the working memory. The reader struggles to combine the 8 chunks (operations and numbers), far exceeding the working memory’s limit of approximately 4 chunks.

var exp = 3;
exp *= 3;
exp += 4 * 4;
exp = pow(exp, 1 / 2);
		
var result = exp;

This example appears to inline the logic of the earlier function call. While it breaks down the process into smaller steps, it is still not easy to follow. The chunks *, +, and pow are familiar, and each line is straightforward in isolation. However, the imperative nature of the code poses challenges:

Each line assigns a new value to exp.
Each assignment overwrites the previous value of exp.
To understand result, the reader must mentally simulate the entire sequence of operations.

This approach taxes both the working memory (with multiple operations) and the long-term memory (with frequent writes and retractions). As neural networks suggest, learning new information is relatively inexpensive, but unlearning or overwriting old information is much costlier. The constant reassignments add unnecessary cognitive load. While this version is not completely unreadable, it is still quite challenging.

var aa = 3 * 3;
var bb = 4 * 4;
var sum = aa + bb;
var result = sqrt(sum);

This declarative approach improves upon the previous example. The chunks *, + and sqrt (square root) are familiar, and there are declarations - and no reassignments. Each line represents a distinct relation. And every relation combined leads to result.

Additionally, the code can be read in reverse (from the result to its sources), allowing readers to stop when they have enough understanding. The last two lines provide a strong contextual hint about the preceding lines, aiding comprehension.

This example effectively balances the load on both working memory and long-term memory. Each line introduces fewer than 4 chunks, and these chunks are commonly understood.

var vec = new Vec(3,4);
var result = vec.norm();

This example is somewhat polarizing. Some readers may find it the most readable, while others might see it as only a slight improvement over the earlier examples. Why?

Developers with a background in algebra will recognize the terms Vec (vector) and norm (vector magnitude). For them, this code is highly readable. Others without this background will need to consult documentation, although they will at least recognize the knowledge gap.

In this case, the chunk stored in long-term memory is domain-specific: it is familiar to one group of developers but unfamiliar to another. Apart from that, the working memory is not strained, as each line contains fewer than four chunks.

For examples and more details, refer to e.g.

talk “Warum ist Code so schwer zu verstehen” at Entwicklertag
- with a brief disenchantment of millers magic number 7
- and some smaller inaccuracies especially on the short term memory
talk “Readable Code” at Java Forum Stuttgart
- with a brief disenchantment of millers magic number 7
- and some smaller inaccuracies especially on the short term memory
talk “Warum ist Code so schwer zu verstehen” at XP Days Stuttgart
- with visualized cognitive processes
talk “Code in your Brain” at xp days stuttgart
- with the cute pictures of hedgehogs
- with some references to Daniel Kahnemans Behavioral Economics
article “Code and Cognition” on entwickler.de
podcast “Warum ist Code so schwer zu verstehen”

to Overview

Code and Cognition

Long Term Memory

Working Memory

Summary

Applied

More