After watching this excellent talk by Mike Fikes about Type Inference in Clojurescript, I couldn’t resist the urge to share with the community what I have learned about this fascinating topic.
The Clojurescript compiler has been granted the capability to infer types since version 1.10.516. The benefits for us, the cljs devs, are both at compile time and at run time:
- At compile time, our Clojurescript code is type checked automatically
- At run time, the generated Javascript code runs faster
In a nutshell, type inference makes our code run faster with less bugs.
Type inference saves time both for us the developers (we catch our bugs earlier) and our users (our code runs faster).
Automatic type checking
Currently, most of the automatic type checking occurs when we call a function that expects a number with a value that is for sure not a number. This type checking occurs at compile time, which means that we can catch some bugs without the need to even run our code.
For instance, when we add two strings, we get a warning at compile time, saying that all the arguments to +
must be numbers:
(+ "hello " "my dear")
Or if we try to compare “apples” and “oranges”:
(<= "apples" "oranges")
Code optimization around str
In general, when a compiler knows that a value is of a certain type, it can generate optimized code (that runs faster), usually machine code.
The same optimization occurs in the Clojurescript compiler regarding the Javascript code it generates. The cool thing about the Clojurescript compiler is that we can easily read the code that it generates and we can even compile Clojurescript code right in the browser.
For instance, let’s take a look at the code generated by the str
macro in two different situations:
- when the compiler knows that the arguments are all strings
- when the compiler doesn’t know that the arguments are strings
When the compiler knows that the arguments are all strings, it uses this information to generate Javascript code that is as straightforward (and fast) as you can imagine. The arguments are joined with the empty string:
(let [first-name "Kelly"
last-name "Kapowski"]
(str first-name last-name))
However, when the compiler doesn’t know that the arguments are all strings, the str
macro is forced to generate code that calls the str
function from cljs.core
namespace:
(defn my-name [first-name last-name]
(str first-name last-name))
This additional function call has a performance cost. We can save it by letting the compiler know that the arguments to our my-name
function are strings. This is called type hinting and this is done via a ^string
metadata information before each argument. Now, the compiler generates optimized code without the additional function call:
(defn my-name [^string first-name ^string last-name]
(str first-name last-name))
This type hinting is valuable but it comes at a price: it requires an extra effort from the developer and it makes the code more verbose.
In some situations, the compiler is able to infer the type of a value without any type hinting. For example, the value returned by the str
macro is for sure a string. As a consequence, if for some reason we call str
on the value returned by my-name
this superfluous call will be saved by the compiler:
(defn my-name [^string first-name ^string last-name]
(str first-name last-name))
(str (my-name "Kelly" "Kapowski"))
The type inference mechanism is smart enough to handle less trivial situations. For instance, when we call str
on an if
expression where both branches are strings, this superfluous str
call is also removed from the generated Javascript code:
(defn kelly-or-jessie [x]
(str (if x
"Kelly"
"Jessie")))
Code optimization around truth
Another area where code optimization occurs due to type inference is when we check whether a value is truthy or not. In order to fix the weird conception of truth in Javascript, Clojurescript wraps the predicate inside an if
expression in a truth_
function call.
This additional truth_
function call is what makes it possible to consider 0 as truthy.
(if 0 "ok" "bad")
While in Javascript, 0
is falsy:
var a = 0? "ok" : "bad";
a
(This truth lesson is explained in greater detail here.)
On one hand the truth_
function call makes our code more reliable, on the other hand it induces an extra cost in terms of performance. When the compiler knows for sure that a the if
predicate is a boolean, we can save this extra cost.
Take a look at this function, where we type hint the argument as a boolean:
(defn to-be-or-not-to-be [^boolean x]
(if x
"To be"
"Not to be"))
Sometimes, the compiler can infer on its own that a value is a boolean. For instance, the value returned by not
is known to be a boolean. As a consequence when the if
predicate is a not
expression, the truth_
function call is saved:
(defn not-to-be-or-to-be [x]
(if (not x)
"Not To be"
"to be"))
Code optimization around boolean algebra
For similar reasons to what we saw in the previous section about truth in Javascript, boolean algebra in Clojurescript induces fairly complicated Javascript code. Take a look for instance at the Javascript code generated for this boolean stuff:
(defn boolean-stuff [x y z]
(or (and x y) (and y z)))
However, when the compiler knows for sure that all the arguments are booleans, it can rely on the native Javascript boolean operators. For example, if we type hint the arguments as booleans, the Javascript generated code is much more compact and runs faster:
(defn boolean-stuff [^boolean x ^boolean y ^boolean z]
(or (and x y) (and y z)))
Limitations
At the time this article is written (May 2019), type inference is a pretty new feature in Clojurescript. There is still a long way to go and many cool ideas are experimented by the Clojurescript core team. Be sure to watch this excellent talk by Mike Fikes if you want to know more about the history and the future of type inference in Clojurescript.
There are two main limitations related to type inference:
- It makes the compilation a bit slower
- There are some edge cases where it causes a misbehaviour
At the end of his talk, Mike shares more details about the impact of type inference on the compilation time.
Let’s see here an example of an edge case where type inference causes the generated code to be buggy.
Imagine you have a function foo
that returns for sure a boolean:
(defn foo [x]
(not (not x)))
And a function bar
that uses foo
as a predicate inside an if
expression:
(defn bar [x]
(if (foo x) 1 2))
Now, when we call (bar 0)
we get 1
because in Clojurescript 0
is truthy:
(bar 0)
The problem occurs if we rewrite foo
by getting rid of the double not
, just like this:
(defn foo [x] x)
This redefinition of foo
makes the code generated for bar
buggy because at the time the ode was generated, foo
was returning a boolean for sure. The compiler relied on this fact. But now, foo
returns its argument as is - which may not be a boolean. If we pass to bar
an argument that behaves badly as a boolean, our code will be broken (as we wrote earlier, in Javascript 0
is falsy):
(bar 0)
Conclusion
Historically, the main focus of the development of Clojurescript as a language was to make it reliable even though it runs on top of an unreliable language like Javascript. Now that Clojurescript is mature enough, there is room for new initiatives like type inference that are beneficial both in terms of performance and in terms of type checking.
Happy Clojurescript!