Meta Assembly Language»Blog
Dmitriy Kubyshkin
As I'm closing on 1 year of working on Mass I was quite curious to see if my approach to compile-time execution via JIT makes sense and how it compares with other languages that are capable of compile time execution, namely C++ and Zig. I do not have access to Jai, so can not measure that.

For now I just did two test program. The first one runs a counter 1 000 000 times at compile time and stores a result into a compile-time constant. The counter is then printed at runtime to verify the output value.

The goal of the second test program is to constant fold 1 000 definitions computing the sum of integer `1`. We then sum all the definitions at compile time as well to make sure that the compilers do not skip the computation for unreferenced constants. Because of the large amount of source code (2mb), this test not only measures the speed of constant folding itself but also parsing.

To measure just the compiled time execution and not the rest of the compiler, the code is first compiled with the constant hard-coded, then with the compile time execution and I get the difference.

All code is compiled without any optimization to minimize the non-relevant time spent in the compiler.

I have also inspected the generated assembly (x86_64) to verify that the value is indeed computed at compile time. All test are performed on Windows 10 (WSL2 for Clang). All times are provided in milliseconds.

Here are the results:


Language     | Hardcoded | Compile Eval | Delta (ms) | X Times Slower
------------ | ----------|--------------|------------|----------------
Mass         | 12        | 16           | 4          | baseline
C++ (MSVC)   | 330       | 2270         | 1940       | 485x
C++ (CLang)  | 1065      | 1874         | 809        | 202x
Zig          | 1220      | 11714        | 10494      | 2623x

The results are pretty much what you would expect considering that both C++ and Zig do interpretation while Mass does a single-pass JIT. Clang seems to do reasonably well for an interpreter although doing anything computationally expensive would still slow down your compilation time dramatically.

Doing complex things with Zig comptime does not seem feasible at this time.

Constant Folding

Language     | Hardcoded | Constant Folding | Delta (ms) | X Times Slower
------------ | ----------|------------------|------------|----------------
Mass         | 12        | 4362             | 4350       | 11.54x
C++ (MSVC)   | 330       | 1190             | 860        | 2.28x
C++ (CLang)  | 1065      | 1442             | 377        | baseline
Zig          | 1220      | 3818             | 2598       | 6.89x

Clang unsurprisingly is the fastest here as constant folding is its bread and butter. MSVC is slightly behind and Zig is almost 7x slower.

Mass is the more than an order of magnitude slower than Clang. After poking a bit under the hood I can see that the majority of time is actually spent in parsing as it is currently O(n^2) in complexity. The actual JIT part takes around 500ms. There is definitely lots of improvement to be done.


You can see the full source code and more detailed description in the repository
Dmitriy Kubyshkin
I have started developing the mass compiler in April 2020 with the majority of early work captured in a series of YouTube videos. Due to the limitation of doing the work on video the progress in the first half-year or so was quite slow. Still, by September 2020 the language got sufficiently powerful to run FizzBuzz both in JIT mode and by compiling it to a Windows executable.

Four months later FizzBuzz remains the most complex program in the test suit, however, that does not mean there was no progress. My main focus throughout this time has been mainly on two things: robustness and meta-programming capabilities. There is not much to be said about robustness work - it is very important but rather uninteresting. Meta-programming on the other hand is very important and core to this language.

There are two main parts for the meta-programming: macros and compile-time execution. It may seem that one does not need macros when an ability to run arbitrary code at compile is there. This does not seem to be the case in practice for a couple of reasons. Firstly, macros very often provide a succinct way to express complex transformations that would be awkward to represent in straight code. Secondly, all compiled languages suffer tremendous penalties in project compilation times when doing compile-time execution. Solving this is no easy feat.

The core issue with making compile-time execution fast is something that interpreted JIT languages, such as LuaJIT or JavaScript suffer from. If a certain piece of code is only ever executed once, it is the fastest to do a straight-up interpretation. On the second call, you probably would have been better off with a crappy JIT version. By the 10th time you have spent a couple of orders of magnitude more time than you should have due to the inherent overhead of the interpretation.

Out of mainstream compiled languages, only C++ and Rust provide facilities for compile-time evaluation. From my understanding C++ constexpr stuff is interpreted, not compiled. Rust has compile-time procedural macros, but because Rust compilation is dead slow in general using them has a significant penalty to a degree where some projects stop using the feature in certain cases in favor of offline code generation.

Out of newly developed languages, Zig and Jai both use a form of byte-code interpretation. Although the evaluation approach differs a bit, if used extensively, compile-time execution will destroy your compile times on either of them. Mass language aims to use compile execution to implement most of its features. This leaves no choice but to do JIT right away. As usual, this is way trickier than it sounds. Here are some of the things that need to be solved:

  1. Incremental JIT. Adding new code or data should not require any copying, recompilation, or interaction with the OS. I have a good plan of action here and already making some progress.
  2. JIT Compilation Speed. Mass has no real AST or IR, so the speed isn't too bad, but there is currently an assembly step. It should be removed at some point in the future.
  3. Compile-time / Runtime Boundary. Nailing the semantics for which values and how can be shared is tough. I expect this to be in flux to the very late stages of the project.
  4. Cross compilation. A different processor architecture or even a calling convention requires a separate version of the compiled code. Big-endian vs little-endian is also something to think about.

The topics above are what I plan on working on in the coming months. Of course, there are smaller tasks that need to be done.

You can follow the project progress and support my work by subscribing to my YouTube channel, starring the project on GitHub or catching an occasional live stream on Twitch.