Failing Early for Nicer Error Messages
The Principle
Let us say, we have a chain of function calls, or more generally speaking a chain of code parts that give some command to the next part of the chain.
Now let us assume that some part of the chain, w.l.o.g. part 1 contains a bug. This bug makes part 1 give a command to part 2 that is impossible to fulfill. This command makes part 2 gives another, different but also impossible command to part 3. This command makes in turn part 3 give another, different and impossible command to part 4. Part 4 then realizes that it was given an impossible command, prints an error message and aborts.
My argument is that the shorter the chain between giving an impossible command and realizing the command is impossible is, the easier the debugging will be. There are two reasons for this:
- We know that that every part of the chain that comes after the abort, is not the buggy part. Here, we know that part 5 is not the problem, if part 3 would have aborted, we would have also known that part 4 is not the problem.
- Since each part of the chain after the bug, here part 2 and part 3, transforms an impossible command into another impossible command, the command that will be detected as impossible, here command 4, will be quite distant from the initial problem, i.e. that part 1 gives command 2.
In other words, assertions are a good thing.
Case Study 1: Dynamic Typing vs Static Typing
Statically typed languages are an example of early failure, in contrast to dynamically typed languages which are an example of late failure. Consider this C++ code.
class MyType {
public:
void member() {}
};
void part3(MyType b) { b.member(); }
void part2(MyType a) { part3(a); }
void part1() { part2(123); }
void part0() { part1(); }
int main() { part0(); }
part1
: It should call part2
with a variable of type MyType
, but it calls it with an integer. Because C++ is a statically typed language, part2
checks if it was given an impossible command i.e. an argument of the wrong type and the error message correctly pinpoints the problem: main.cpp:7:16: error: no matching function for call to 'part2'
void part1() { part2(123); }
^~~~~
main.cpp:6:6: note: candidate function not viable: no known conversion from 'int' to 'MyType' for 1st argument
void part2(MyType a) { part3(a); }
^
Just from looking at this error message, we can see that the bug has to be in the body of part1
or the signature of part2
. In contrast, if we run this Python code:
class MyType:
def member(self):
pass
def part3(b):
b.member()
def part2(a):
part3(a)
def part1():
part2(123)
def part0():
part1()
part0()
It fails with
Traceback (most recent call last):
line 12, in <module>
part0()
line 11, in part0
part1()
line 9, in part1
part2(123)
line 7, in part2
part3(a)
line 5, in part3
b.member()
AttributeError: 'int' object has no attribute 'member'
Because Python is a dynamically typed, the error message only says that the bug is either in part0
, part1
, part2
or part3
. Much less specific than the C++ version, much more work in finding the bug. In a statically typed language, compilation fails as soon as a type is wrong, i.e. it fails early. In a dynamically typed language it only fails when you actually try to do something with this type that you cannot, i.e. it fails late.
That is why statically typed languages are easier to debug and therefore better than dynamically typed languages.
Running away from the angry mob.
Case Study 2: C++ Templates vs Rust Generics
The statically typed language C++ was a positive example in the previous chapter, but it will be a negative example in this one. Once we add a template to our previous C++ code, i.e.
class MyType {
public:
void member() {}
};
template <typename T> void part3(T b) { b.member(); }
template <typename T> void part2(T a) { part3(a); }
void part1() { part2(123); }
void part0() { part1(); }
int main() { part0(); }
main.cpp:5:42: error: member reference base type 'int' is not a structure or union
template <typename T> void part3(T b) { b.member(); }
~^~~~~~~
main.cpp:6:41: note: in instantiation of function template specialization 'part3<int>' requested here
template <typename T> void part2(T a) { part3(a); }
^
main.cpp:7:16: note: in instantiation of function template specialization 'part2<int>' requested here
void part1() { part2(123); }
^
Clang is only able to see that the bug has to be in part1
, part2
or part3
. Let us compare this to the best programming language, Rust:
trait MyPoly {
fn member(&self);
}
fn part3<T: MyPoly>(b: T) {
b.member();
}
fn part2<T:MyPoly>(a: T) {
part3(a);
}
fn part1() {
part2(123);
}
fn part0() {
part1();
}
fn main() {
part0();
}
error[E0277]: the trait bound `{integer}: MyPoly` is not satisfied
--> src/main.rs:11:11
|
11 | part2(123);
| ----- ^^^ the trait `MyPoly` is not implemented for `{integer}`
| |
| required by a bound introduced by this call
|
note: required by a bound in `part2`
--> src/main.rs:7:12
|
7 | fn part2<T:MyPoly>(a: T) {
| ^^^^^^ required by this bound in `part2`
The Rust compiler sees that the bug has to be in the body of part1
or the signature of part2
.
Rust does not even attempt to use a type that does not fulfill the trait bound, i.e. it fails early, while C++ attempts to instantiate the template with a bad type. This is why Rust generics are nicer to debug than C++ templates.
Other Opinions
Postel’s law states the opposite of this post. I think Postel’s law is usually wrong.