Null, as implemented in Java, JavaScript, and other similar languages, is… troublesome. But why? It doesn’t seem so bad to hear it described. If a variable is a container, a null value just means that the container is empty. This is a perfectly natural state of affairs, so what’s the big deal?
There are a couple ways to look at this issue. The one that speaks to me most clearly is to think about the impact of null values on the types used by our programs and what they mean.
Let’s say we have the following class and function:
class Person {
String firstName;
String lastName;
}
void sayHello(Person p) {
print("hello there, " + p.firstName + "!");
}
The types here represent a contract between the programmer who wrote the function and the programmers who will call the function. The contract says that the caller can pass an instance of Person
into the sayHello
function and the computer will print a friendly, personalized greeting. If the caller passes anything other than a Person
, the function isn’t guaranteed to work right. In most statically typed languages it won’t even compile / run.
Since the function will only ever see instances of Person
there’s nothing wrong with immediately referencing the firstName
field. After all, a Person
has a firstName
, that’s part of what it means to be a Person
!
But if our language allows null values, the caller could also pass null
into this function. This is where the trouble begins: null
doesn’t have a firstName
field (even though it is a perfectly legal value in any place where a Person
is accepted) and the function above will crash if it receives a value with no firstName
field!
Effectively, the function above doesn’t accept a Person
, it accepts a Person | null
(which can be read “person or null”). This is called a sum type or a union type. So every type in this hypothetical language is actually itself or null
.
But we’re not even finished, it gets worse. In languages that support actual union types, the compiler / interpreter forces the programmer to handle each of the possible types that compose the union. The author of the function above was allowed to totally ignore the possibility that the function might receive a null value. A similar function that handles null
appropriately is shown below:
void sayHello(Person p) {
if (p == null) {
print('hello there!');
} else {
print("hello there, " + p.firstName + "!");
}
}
The problem with this solution is twofold. First, languages like Java don’t require this sort of defensive programming, and this has a tendency to cause bugs. Second, it’s impossible to tell if the null check is actually necessary. What if the caller has already checked for a null value?
// ...
Person person = getPerson();
if (person == null) {
// ...
} else {
sayHello(person);
}
// ...
In this case, the value passed to sayHello
will never be null, so one of the checks is unnecessary. But which one should be removed? That’s unclear. To get around this problem, we can define a new type called Person?
. We will give the new type the exact same behavior as the current version of Person
, that is, we will allow variables of this type to contain null
. Then, we will change the meaning of Person
to exclude null
as a possible value. I’ve summarized below.
Person p0 = null; // ERROR
Person p1 = new Person(); // allowed
Person? p2 = null; // allowed
Person? p3 = new Person(); // allowed
Now, when someone calls the sayHello
function, they must provide an instance of the Person
class, and they may not provide a null value. This means that we can safely revert to the original version of the function without the null check. It also means that any null check, if one is necessary, will have to be performed by the caller.
If we would prefer to allow null values then we can change the type to Person?
. However, since we have now declared that we accept null values, the compiler or interpreter can force us to check for null
, just like it would if we declared the type to be Person | null
in a language with union types.
void sayHello(Person? p) {
print(p.firstName); // ERROR - forgot to check for null
}
So why does this matter? Programming is mostly a human problem. We are almost always the weakest link. This means that anything we can do to make our code easier to write, easier to read, and easier to use, is likely to prevent bugs today and help us add new features tomorrow. By specifying clearly the contracts implied by our code we can produce better software, more easily.