The secrets of Type Inference in Clojurescript

After watching this excellent talk by Mike Fikes about Type Inference in Clojurescript, I couldn’t resist the urge to share with the community what I have learned about this fascinating topic.

The Clojurescript compiler has been granted the capability to infer types since version 1.10.516. The benefits for us, the cljs devs, are both at compile time and at run time:

At compile time, our Clojurescript code is type checked automatically
At run time, the generated Javascript code runs faster

In a nutshell, type inference makes our code run faster with less bugs.

Type inference saves time both for us the developers (we catch our bugs earlier) and our users (our code runs faster).

Time

Automatic type checking

Currently, most of the automatic type checking occurs when we call a function that expects a number with a value that is for sure not a number. This type checking occurs at compile time, which means that we can catch some bugs without the need to even run our code.

For instance, when we add two strings, we get a warning at compile time, saying that all the arguments to + must be numbers:

(+ "hello " "my dear")

Or if we try to compare “apples” and “oranges”:

(<= "apples" "oranges")

Code optimization around str

In general, when a compiler knows that a value is of a certain type, it can generate optimized code (that runs faster), usually machine code.

The same optimization occurs in the Clojurescript compiler regarding the Javascript code it generates. The cool thing about the Clojurescript compiler is that we can easily read the code that it generates and we can even compile Clojurescript code right in the browser.

For instance, let’s take a look at the code generated by the str macro in two different situations:

when the compiler knows that the arguments are all strings
when the compiler doesn’t know that the arguments are strings

When the compiler knows that the arguments are all strings, it uses this information to generate Javascript code that is as straightforward (and fast) as you can imagine. The arguments are joined with the empty string:

(let [first-name "Kelly"
      last-name "Kapowski"]
  (str first-name last-name))

However, when the compiler doesn’t know that the arguments are all strings, the str macro is forced to generate code that calls the str function from cljs.core namespace:

(defn my-name [first-name last-name]
  (str first-name last-name))

This additional function call has a performance cost. We can save it by letting the compiler know that the arguments to our my-name function are strings. This is called type hinting and this is done via a ^string metadata information before each argument. Now, the compiler generates optimized code without the additional function call:

(defn my-name [^string first-name ^string last-name]
  (str first-name last-name))

This type hinting is valuable but it comes at a price: it requires an extra effort from the developer and it makes the code more verbose.

In some situations, the compiler is able to infer the type of a value without any type hinting. For example, the value returned by the str macro is for sure a string. As a consequence, if for some reason we call str on the value returned by my-name this superfluous call will be saved by the compiler:

(defn my-name [^string first-name ^string last-name]
  (str first-name last-name))

(str (my-name "Kelly" "Kapowski"))

The type inference mechanism is smart enough to handle less trivial situations. For instance, when we call str on an if expression where both branches are strings, this superfluous str call is also removed from the generated Javascript code:

(defn kelly-or-jessie [x]
  (str (if x 
         "Kelly"
         "Jessie")))

Code optimization around truth

Another area where code optimization occurs due to type inference is when we check whether a value is truthy or not. In order to fix the weird conception of truth in Javascript, Clojurescript wraps the predicate inside an if expression in a truth_ function call.

This additional truth_ function call is what makes it possible to consider 0 as truthy.

(if 0 "ok" "bad")

While in Javascript, 0 is falsy:

var a = 0? "ok" : "bad";
a

(This truth lesson is explained in greater detail here.)

On one hand the truth_ function call makes our code more reliable, on the other hand it induces an extra cost in terms of performance. When the compiler knows for sure that a the if predicate is a boolean, we can save this extra cost.

Take a look at this function, where we type hint the argument as a boolean:

(defn to-be-or-not-to-be [^boolean x]
  (if x 
    "To be"
    "Not to be"))

Sometimes, the compiler can infer on its own that a value is a boolean. For instance, the value returned by not is known to be a boolean. As a consequence when the if predicate is a not expression, the truth_ function call is saved:

(defn not-to-be-or-to-be [x]
  (if (not x) 
    "Not To be"
    "to be"))

Code optimization around boolean algebra

For similar reasons to what we saw in the previous section about truth in Javascript, boolean algebra in Clojurescript induces fairly complicated Javascript code. Take a look for instance at the Javascript code generated for this boolean stuff:

(defn boolean-stuff [x y z]
  (or (and x y) (and y z)))

However, when the compiler knows for sure that all the arguments are booleans, it can rely on the native Javascript boolean operators. For example, if we type hint the arguments as booleans, the Javascript generated code is much more compact and runs faster:

(defn boolean-stuff [^boolean x ^boolean y ^boolean z]
  (or (and x y) (and y z)))

Limitations

At the time this article is written (May 2019), type inference is a pretty new feature in Clojurescript. There is still a long way to go and many cool ideas are experimented by the Clojurescript core team. Be sure to watch this excellent talk by Mike Fikes if you want to know more about the history and the future of type inference in Clojurescript.

There are two main limitations related to type inference:

It makes the compilation a bit slower
There are some edge cases where it causes a misbehaviour

At the end of his talk, Mike shares more details about the impact of type inference on the compilation time.

Let’s see here an example of an edge case where type inference causes the generated code to be buggy.

Imagine you have a function foo that returns for sure a boolean:

(defn foo [x]
  (not (not x)))

And a function bar that uses foo as a predicate inside an if expression:

(defn bar [x]
  (if (foo x) 1 2))

Now, when we call (bar 0) we get 1 because in Clojurescript 0 is truthy:

(bar 0)

The problem occurs if we rewrite foo by getting rid of the double not, just like this:

(defn foo [x] x)

This redefinition of foo makes the code generated for bar buggy because at the time the ode was generated, foo was returning a boolean for sure. The compiler relied on this fact. But now, foo returns its argument as is - which may not be a boolean. If we pass to bar an argument that behaves badly as a boolean, our code will be broken (as we wrote earlier, in Javascript 0 is falsy):

(bar 0)

Conclusion

Historically, the main focus of the development of Clojurescript as a language was to make it reliable even though it runs on top of an unreliable language like Javascript. Now that Clojurescript is mature enough, there is room for new initiatives like type inference that are beneficial both in terms of performance and in terms of type checking.

Happy Clojurescript!

Yehonathan Sharvit

The secrets of Type Inference in Clojurescript

Automatic type checking

Code optimization around str

Code optimization around truth

Code optimization around boolean algebra

Limitations

Conclusion

Read Next

Java is confusing, Clojure is simple.

What is Category Theory?

Automatic type checking

Code optimization around str

Code optimization around truth

Code optimization around boolean algebra

Limitations

Conclusion

Subscribe to Yehonathan Sharvit newsletter

Read Next

Tags