After watching this excellent talk by Mike Fikes about Type Inference in Clojurescript, I couldn’t resist the urge to share with the community what I have learned about this fascinating topic.
The Clojurescript compiler has been granted the capability to infer types since version 1.10.516. The benefits for us, the cljs devs, are both at compile time and at run time:
- At compile time, our Clojurescript code is type checked automatically
In a nutshell, type inference makes our code run faster with less bugs.
Type inference saves time both for us the developers (we catch our bugs earlier) and our users (our code runs faster).
Automatic type checking
Currently, most of the automatic type checking occurs when we call a function that expects a number with a value that is for sure not a number. This type checking occurs at compile time, which means that we can catch some bugs without the need to even run our code.
For instance, when we add two strings, we get a warning at compile time, saying that all the arguments to
+ must be numbers:
(+ "hello " "my dear")
Or if we try to compare “apples” and “oranges”:
(<= "apples" "oranges")
Code optimization around str
In general, when a compiler knows that a value is of a certain type, it can generate optimized code (that runs faster), usually machine code.
For instance, let’s take a look at the code generated by the
str macro in two different situations:
- when the compiler knows that the arguments are all strings
- when the compiler doesn’t know that the arguments are strings
(let [first-name "Kelly" last-name "Kapowski"] (str first-name last-name))
However, when the compiler doesn’t know that the arguments are all strings, the
str macro is forced to generate code that calls the
str function from
(defn my-name [first-name last-name] (str first-name last-name))
This additional function call has a performance cost. We can save it by letting the compiler know that the arguments to our
my-name function are strings. This is called type hinting and this is done via a
^string metadata information before each argument. Now, the compiler generates optimized code without the additional function call:
(defn my-name [^string first-name ^string last-name] (str first-name last-name))
This type hinting is valuable but it comes at a price: it requires an extra effort from the developer and it makes the code more verbose.
In some situations, the compiler is able to infer the type of a value without any type hinting. For example, the value returned by the
str macro is for sure a string. As a consequence, if for some reason we call
str on the value returned by
my-name this superfluous call will be saved by the compiler:
(defn my-name [^string first-name ^string last-name] (str first-name last-name)) (str (my-name "Kelly" "Kapowski"))
The type inference mechanism is smart enough to handle less trivial situations. For instance, when we call
str on an
if expression where both branches are strings, this superfluous
(defn kelly-or-jessie [x] (str (if x "Kelly" "Jessie")))
Code optimization around truth
if expression in a
truth_ function call.
truth_ function call is what makes it possible to consider 0 as truthy.
(if 0 "ok" "bad")
0 is falsy:
var a = 0? "ok" : "bad"; a
(This truth lesson is explained in greater detail here.)
On one hand the
truth_ function call makes our code more reliable, on the other hand it induces an extra cost in terms of performance. When the compiler knows for sure that a the
if predicate is a boolean, we can save this extra cost.
Take a look at this function, where we type hint the argument as a boolean:
(defn to-be-or-not-to-be [^boolean x] (if x "To be" "Not to be"))
Sometimes, the compiler can infer on its own that a value is a boolean. For instance, the value returned by
not is known to be a boolean. As a consequence when the
if predicate is a
not expression, the
truth_ function call is saved:
(defn not-to-be-or-to-be [x] (if (not x) "Not To be" "to be"))
Code optimization around boolean algebra
(defn boolean-stuff [x y z] (or (and x y) (and y z)))
(defn boolean-stuff [^boolean x ^boolean y ^boolean z] (or (and x y) (and y z)))
At the time this article is written (May 2019), type inference is a pretty new feature in Clojurescript. There is still a long way to go and many cool ideas are experimented by the Clojurescript core team. Be sure to watch this excellent talk by Mike Fikes if you want to know more about the history and the future of type inference in Clojurescript.
There are two main limitations related to type inference:
- It makes the compilation a bit slower
- There are some edge cases where it causes a misbehaviour
At the end of his talk, Mike shares more details about the impact of type inference on the compilation time.
Let’s see here an example of an edge case where type inference causes the generated code to be buggy.
Imagine you have a function
foo that returns for sure a boolean:
(defn foo [x] (not (not x)))
And a function
bar that uses
foo as a predicate inside an
(defn bar [x] (if (foo x) 1 2))
Now, when we call
(bar 0) we get
1 because in Clojurescript
0 is truthy:
The problem occurs if we rewrite
foo by getting rid of the double
not, just like this:
(defn foo [x] x)
This redefinition of
foo makes the code generated for
bar buggy because at the time the ode was generated,
foo was returning a boolean for sure. The compiler relied on this fact. But now,
foo returns its argument as is - which may not be a boolean. If we pass to
0 is falsy):