Language choices, do they matter? Also, whats this ocaml?
Nothing is a more significant distraction than choosing a programming language. With that in mind, I was recently made aware of the following exciting fact about OCaml boxed types.
So let's discuss how OCaml represents integers.
The example that triggered me
Using Godbolt, Let's start by looking at how OCamls represents the following:
let square (x : int) : int = x*x
The square function compiles into the following assembly on x86 ( with hints on what the instructions are doing.
.
camlExample__square_284:
movq %rax, %rbx // move/copy rax to rbi
sarq $1, %rbx // right shift rbx
decq %rax. // rax <- rax - 1
imulq %rbx, %rax // rax <- rbx * rax
incq %rax. // rax <- rax + 1
ret
.
This is interesting for many reasons, mainly the decq
and incq
instructions. These are increment and decrement instructions. That is, in plain English" to say, that the square function is performed by first:
- Moving
%rax
to%rbx
- Shifting
%rbx
down one bit - Decrementing
%rax
- Multiplying
%rbx
and%rax
and storing this is%rax
- And then finally incrementing
%rax
by one and returning it.
(note/exersize: The mutiplication is not x*x
, but rather x*(x<<1)
why is that?)
Why is this? If I do the same in something like C, I get this:
int square(long num) {
return num * num;
}
Which compiles to :
.
square: # @square
mov rax, rdi // move rax into place
imul eax, eax // multiply rax and eax and store in eax
ret
.
This is as expected. The compiled code move operant into place and does the multiplication.
So, what's up, OCaml?
Why is Ocaml doing strange shifting, increments, and decrements when trying to multiply an integer with itself?
The answer is in the representation of types. OCaml uses the least significant bit as a flag to indicate if a type is an integer. It's clear from the OCaml source code and especially when it has to communicate with the outside world
If the bit is set, OCaml considers the value an integer. If not, it thinks the value is a pointer to a larger object. This is a neat trick used in lots of places. The V8 engine in Chrome famously did this for numbers as well.
However, it does seem silly to do when the type system knows that the type is a simple integer.
Why does this matter?
Well, it probably doesn't, but it's interesting. Statically typed languages like OCaml will use 5 instructions rather than 2 when doing simple integer arithmetic.