Why Typing Erlang is Hard: Standard Erlang

Table of Contents


I heard before that Erlang and the BEAM did not pack a static type system because their creators “couldn’t build it”.

That is nonsense.

While building a sophisticated, reliable and scalable type checker is absolutely not a walk in the park, building one that works is not that hard.

If it could have been done, it would’ve been done, and it would’ve been hardened by the same process OTP was hardened and turned into what it is today: a reliability gold standard.

So I decided to write an Erlang type-checker to really understand what is hard about typing Erlang. I know Dialyzer exists, and its great, I just wanted to get a good understanding from scratch.

It hit me that there is a split in here, that is sometimes hard to notice and it makes a world of a difference. There is an Erlang language, called Standard Erlang, and there is an Erlang Runtime, the BEAM. Along the same lines as there is a Java language, and then there’s the JVM.

We tend to speak of these 2 things as if they were one, but if we isolate the language from the runtime, we find that they exhibit very different behaviors.

In particular the runtime brings to the table distribution and hot code reloading, which makes typing things a distributed-systems problem. Fun, yes, but very impractical.

On following essays I’ll do a more thorough and formal treatise of the operational semantics of the language and how to go about implementing your own type-checker for it, the distributed and dynamic nature of the runtime, and how they interact with each other.

Here I’ll go over what I’ve learned takes to type the Standard Erlang in a more superficial way, and what are some thorny parts that we will encounter.

First, lets scope what actually is Standard Erlang.

Standard Erlang

Erlang is a dynamically, strongly typed language. It includes just a little extra over your typical functional programming language. Overall, Erlang isn’t that big a language.

My eye-ball estimation is that it is about a 3rd of the size of OCaml, and about a 5th of the size of Java.

Here’s some Erlang:

% an erlang module

-export([hi/1]).  % <-- this a compiler annotation

hi(noone) -> sad;
hi(Name) -> <<"Hello, ", (Name)/binary, "!">>.

An Erlang program consists of a number of modules, each module is just a file with a bunch of functions in it and a few annotations used to tell the compiler what’s exported, who the module author is, what a function type is supposed to be, etc.

There’s quite a few things you can write in Erlang that are relatively common:

  • literals like booleans, numbers or strings
  • lists, maps, tuples, lambdas
  • ifs, try/catch, case
  • function calls and references

And then there’s a few things that aren’t common at all:

  • atoms, which are just constants
  • receive expressions
  • all sorts of pattern matching
  • everything is done through recursion

But in general, the language expressivity comes from it being rather simple. Most problems are solved by making functions that call each other, most functions pattern match on exactly the data they need, loops are achieved through recursion, and we’re done.

Exceptions are very common, since failing to pattern match will result in an exception. For this the language supports the catch, and try..catch..after expressions, but it is highly unidiomatic to use them.

The superb uptime through supervision that Erlang is known for is actually baked into the runtime, not the language. The language itself has no notion of processes, links, or sending messages. The only thing it does have is receive expressions for consuming messages.

Armed with this understanding, lets dig into how we could type Erlang.

Typing the Erlang Language

In writing the type-checker, I’ve found a few things that are worth bringing up:

  1. Functions on Untagged Unions
  2. Functions on Records
  3. Functions on Tagged Unions (Variants)
  4. Receive Expressions
  5. Dynamic Dispatch and M:F(A) syntax

There may be other language constructs that are not friendly to static analysis. I’ll try to write and cover more in the future.

1. Functions on Untagged Unions

An Erlang value can have a single type but we expect that type to be known only at runtime. This means that both the input and returning types of a function are actually unknown until we run the function.

So whenever you see a function like hi/1 above, calling it could actually return a value of different types, depending on how the function was written.

For example, calling hi(noone) returns the atom sad. Instead, calling hi(<<"Joe">>) returns the string <<"Hello, Joe!">>. The atom sad and the string have different types. Calling hi(0) will result in an exception.

If we were to write down the type of this function we’d have to say something like:

hi(atom("noone") | string) -> atom("sad") | string

This is present in every other dynamic language: Lisp, Lua, Javascript, Python, Ruby, Smalltalk, you name it.

Writing a type-checker that deals with this is certainly possible, as it has been done before in tools like Flow and TypeScript, but it certainly isn’t obvious how to.

To better understand how this can be dealt with, we need to talk about Union Types.

Union types allow us to express that a value may belong to one of many different types at the same time. E.g, 1 is a number, but it is also a (number | bool). In the same way, true is a bool, but it is also a (number | bool).

We read (number | bool) as “number or bool”.

The key here is that the value itself doesn’t have explicit information about which type from the union it actually belongs to, and we’re bound to find this out through refinements.

We can do these refinements in 2 ways through the use of guards and pattern matching.

Through the use of guards we can refine the actual type of a value into a single type. For example:

% the infamous left_pad function
left_pad(A, B) when is_binary(A) -> <<(A)/binary, (B)/binary>>;
left_pad(A, B) when is_number(A) -> left_pad(binary:copy(<<" ">>, A), B).

This function is has the following signature:

left_pad( string | number, string ) -> string

But it is thanks to the guards is_binary/1 and is_number/1 that we refine the type of the inputs to be just string or just number.

This makes it possible to say that binary:copy(<<" ">>, A) is safe, because at that point we know that is_number(A) was true, and so A’s type has been refined from string | number to just number. Otherwise the guard would have been false.

However, we can’t use any function as a guard. We must use only built-in functions that already come with the Erlang runtime.

This makes it impossible to write arbitrary refinement logic, which makes the approach rather limited.

Through pattern matching we can do something very similar, and establish our expectations of the values. For example:

% a good old list length counter
count(A=[_|_]) -> count(A, 0).

count([], C) -> C;
count([_|T], C) -> count(T, C+1).

In this case, we are guaranteed that if our function pattern matches, then A will be a list.

We don’t really know what the list elements are, but of course we can try and continue pattern matching and using guards until we refine it “enough”.

These two approaches, while composable, have the unfortunate property that they don’t generalize to all kinds of data. They should work well tho for: atoms, literals, and anything that there is a guard available for.

Anything where our data is one of type or another, would fall into this category.

Maps, tuples, and records can work with this, but it doesn’t scale very well. You essentially have to add the new map/tuple/record name to the union every time you want it supported. Even if you need a single field out of them.

In the next section we’ll see a different approach for dealing with this data that scales much better.

2. Functions on Record Types

When working with functions that take as inputs many values packed into one, be it with records, maps, or tuples, we’re faced with a different issue: we don’t care what the input type is as long as it has what we need in it.

For example, if we have two possible maps that share some things, both of them could be thought of as their own separate type, and these 2 types would have something in common.

% this has type 'character'
Character = #{ name => "Alexander Hamilton", play => "Hamilton" }.

% this has type 'performer'
Performer = #{ name => "Lin-Manuel Miranda" }.

Now if we make a function to get the name of a Performer:

name(#{ name := Name }) -> Name.

and we give it a type:

name(performer) -> string.

Would we expect this function to work on Characters?

My answer is yes, since Characters also have a name fields.

In fact, I’d say that it will work just fine for any map as long as it has a name key that is a string.

So the type of our function isn’t quite (character | performer). It actually is “Any map as long as it has a name key that is a string”. I’ll use this notation for that: #{ name }

name(#{ name : A }) -> A

Some type-systems support this with Structural Subtyping, some others use Row Polymorphism. The main difference isn’t obvious in this getter function, but becomes obvious in the next one, a setter:

with_name(Name, M=#{ name := _ }) -> M#{ name => Name }.

The structural subtyping approach would say that whatever you put in, it is now known to only have a name key. So all your other keys have disappeared (to the eyes of the type-system) and if you want to recover them you have to do some sort of casting. This is due to a subtyping behavior called subsumption and its the way that TypeScript handles this.

The row polymorphic approach has no information loss, and instead carries around the rest of the fields that M had. This is how OCaml and PureScript handles this. In PureScript this works for record types and in OCaml this is implemented for the object-oriented system.

Either one of this approaches extends to the case where we use all of the fields of the input tuple, record, or map.

3. Functions on Tagged Unions (Variants)

Whenever a function takes in a parameter, or returns a result that looks like this:

{tag, Value}

We are in the presence of a tagged union, or variant.

This is an easier thing to deal with than #1, because now we have more information to know what is the actual type of the value: the tag.

For example, take this function that may or may not succeed:

% our fake work function
work(ok) -> {ok, success};
work(not_ok) -> {error, <<"something went wrong">>}.

We can tell that the signature of this function is going to be:

% the type of our work function
work( atom("ok") | atom("not_ok") ) -> { atom("ok"), atom("success") }
                                     | { atom("error"), string } 

Which now has these constant values (ok, error) on the left side of the tuple, which we can use to further narrow down what the expected types should be.

This is a well established pattern in the Erlang ecosystem, where most functions that can fail return this result tuples with either an ok or an error tag.

To be fair, this still suffers from the problems of #1. We can make a function that returns an ok-tagged value that is an untagged union of types as well.

% a more complex version of our work function
work(ok) -> {ok, go_on};
work(done) -> {ok, <<"great job!">>};
work(not_ok) -> {error, <<"something went wrong">>}.

The signature of this function would be:

% we're omitting the In type because we're focusing on the output types
work(In) -> { atom("ok"), atom("go_on") | string }
          | { atom("error"), string } 

If you get an {ok, Value}, typing Value has the exact same issues we discussed above.

However, this pattern helps us in typing things because we can always tag our data all the way down:

% same as last work function but with tags!
work(ok) -> {ok, go_on};
work(done) -> {ok, {message, <<"great job!">>}};
work(not_ok) -> {error, <<"something went wrong">>}.

Which is a little more verbose, but a lot more amenable to type-checking too, since the type-checker can associate a specific tag with a specific type.

4. Receive Expressions

There is an asymmetry in the language: at the language level we can only receive messages. There is no special constructs for sending messages. To send a message we just use the erlang:send/2 function, or its operator version !.

So in any function, when we encounter a receive expression we are really saying 2 things:

  • Here be side-inputs — this function will receive a value that is not a direct argument, and it can use it as if it was one.

  • If there is a timeout (the after part), then there is also a deadline. Literally, since if this was a process, it’d be killed.

We won’t bother too much with the deadline right now, but we’ll look into these side-inputs we receive as messages. Here’s an example of this expressions:

adder() -> receive {X, Y} -> X + Y end.

We are very interested in the pattern matching that just happened up there, since that pattern matching essentially tells us what messages we can receive.

So much so, that we could think of this receive expression as a function built into the language that we can call with a function of our own.

% transforming a receive, into a function call
adder() -> receive      {X, Y}  -> X + Y end .
adder() -> receive(fun ({X, Y}) -> X + Y end).

And now we can analyze this internal function in the same ways we analyzed points 1, 2, and 3 above.

This reframing of receive expressions was actually key for the type-checker I’m writing to be able to understand what messages a process could receive, if we used the adder/0 to start it.

So that’s how we can work with these receive expressions.

5. Dynamic Dispatch and M:F(A) syntax

Another beast entirely is the dynamic parts of the language that allow us to make decisions at runtime about what code to run, through special syntax.

Erlang supports specifying what module, and what function you want to call in that module, completely dynamically.

% M is our module name
M = hello.
% F is our function name
F = world.
M:F() =:= hello:world().

But how can we tell what type will M:F() have here if we don’t keep track of exactly what the value of M and F are?

We’d have to evaluate M to pick up the actual module name, and evaluate F to know what function to look up, and then look up the actual type of the function.

If M is anything slightly more complex than just a variable binding like in the example above, we may just not be able to know what it is. Imagine the following scenario:

% We will receive the module name as a message
M = receive Name when is_atom(Name) -> Name end.

We’d need to wait on receiving a message to know what module to look up. This just isn’t plausible because we can’t even tell who is sending the message, or whether a message will be sent at all.

Of course Real World Erlang tends to rely on behaviors to ensure that certain types are there, so perhaps annotating what the expected module behavior is would help:

% Same as above, but with a type-annotation
M : runnable behavior = receive Name when is_atom(Name) -> Name end.

This way we’d be able to analyze this and know what type run should have, or complain immediately that run isn’t actually implemented in that behavior and thus can’t be guaranteed to exist.

Nonetheless this isn’t a trivial thing to type-check.

What’s next?

I hope these few things I managed to cover today give you an idea of what are the hard parts of typing the Erlang language. It is not impossible, in fact some parts are rather straightforward to type, but its definitely got some thorny bits that we should address with care.

In the future I’ll write more about typing the language, in particular parts like binary string pattern matching, and the runtime and its properties, such as hot-code reloading, figuring out process types, and how to check for type-safe message passing.

Until then, Let it crash!

References & More Resources

Thanks to Pontus Nagy, Manuel Rubio, Malcolm Matalka, and Calin Capitanu for taking the time to review earlier drafts of this essay.

What's next?

Make sure to follow me on twitter to get them as soon as they're out.