Ajour Software

Language choices, do they matter? Also, whats this ocaml?

Nothing is a more significant distraction than choosing a programming language. With that in mind, I was recently made aware of the following exciting fact about OCaml boxed types.

So let's discuss how OCaml represents integers.

The example that triggered me

Using Godbolt, Let's start by looking at how OCamls represents the following:

let square (x : int) : int =  x*x

in Godbolt

The square function compiles into the following assembly on x86 ( with hints on what the instructions are doing.

.
camlExample__square_284:
        movq    %rax, %rbx // move/copy rax to rbi
        sarq    $1, %rbx        // right shift rbx 
        decq    %rax.            // rax <- rax - 1
        imulq   %rbx, %rax  // rax <- rbx * rax 
        incq    %rax.             //  rax <- rax + 1 
        ret
.

This is interesting for many reasons, mainly the decq and incq instructions. These are increment and decrement instructions. That is, in plain English" to say, that the square function is performed by first:

(note/exersize: The mutiplication is not x*x, but rather x*(x<<1) why is that?)

Why is this? If I do the same in something like C, I get this:

int square(long  num) {
    return num * num;
}

in Godbolt

Which compiles to :

.
square:                                 # @square
        mov     rax, rdi   // move rax into place 
        imul    eax, eax  // multiply rax and eax and store in eax 
        ret
.

This is as expected. The compiled code move operant into place and does the multiplication.

So, what's up, OCaml?

Why is Ocaml doing strange shifting, increments, and decrements when trying to multiply an integer with itself?

The answer is in the representation of types. OCaml uses the least significant bit as a flag to indicate if a type is an integer. It's clear from the OCaml source code and especially when it has to communicate with the outside world

If the bit is set, OCaml considers the value an integer. If not, it thinks the value is a pointer to a larger object. This is a neat trick used in lots of places. The V8 engine in Chrome famously did this for numbers as well.

However, it does seem silly to do when the type system knows that the type is a simple integer.

Why does this matter?

Well, it probably doesn't, but it's interesting. Statically typed languages like OCaml will use 5 instructions rather than 2 when doing simple integer arithmetic.

#ocaml #optimization #programming #types