Robustness Month

Dmitriy Kubyshkin
The majority of the development up until about a month ago was about figuring out the basic features of the compiler. As you start to combine them together new and unexpected cases need to be solved. Besides the common language issues such as signed / unsigned integer handling Mass has a lot of its own problems to solve. The most tricky one is handling the boundary between compile-time execution and runtime code. So far there is no production language with the same power as what I aim for so there is no real way to know what is the correct way to do it and this is what I spend a lot of time on.

Besides the robustness are the things that have been added to the compiler in the last month:

  • basic embedded debugger REPL
  • uniform and typed errors in the compiler
  • conditional constant definition via "using" and "if" expression
  • compile-time function definitions
  • static_assert() implemented using meta-programming
  • function default argument type inference


There are some exciting things I plan for the coming months and already looking forward to the next update.
If you define a subset of functions that can only have fully defined operations without global variables nor undefined behaviour, it shouldn't matter if it executes in compile time or which machine it runs on. One can still write bit-wise operations on unsigned integers and just avoid reinterpreting pointer casts and such.

The thing to be careful about when adding purity types is that it's an all or nothing scenario like using or not using const in C++. Making purity types implicit (used automatically when possible) allow calling pure functions even if they aren't explicitly marked as pure. The drawback of implicit purity is that it will break backward compatibility for the interface based on the hidden implementation if a new version of a library is no longer functionally pure.
@Dawoodoz I'm not exactly sure I'm getting what are you trying to say so please let me know if I am off the mark. The boundary issue with compile-time/runtime I am a bit struggling is exactly with global variables / memory. I like to think I understand pretty well how to make a system that would have pure compile-time part and in a sense it has been studied pretty well.

One of the questions I'm trying to answer with this project is what if you do allow mutable global state and side effects at compile time and not limit the interaction between compile-time and runtime. It is pretty obvious that it would not be 100% robust especially with 3rd party dependencies. The question is whether it can be useful and reliable enough for practical usage.
Global side-effects cannot be used for constant evaluation during compile time because they do not have a fixed execution order nor a known point in time. Reading from global constants such as tables however, can be used because reading write-protected data is not a side-effect. You can also allocate memory on the heap and stack in a virtual machine, which I did for Steamroller.

The "purity" feature exists in the D language and I use something similar in my language.
https://dlang.org/spec/function.html#pure-functions

Purity analysis/labeling allows knowing in compile time that the function will return the same output for the same input, which allow executing in compile time as well. Explicit purity labels can isolate modules so that you don't need full program analysis every time. A random generator would then have to be passed as a reference and have its state as the input when claiming functional determinism. Purity allow evaluating if and loop conditions without having quirky rules about which side-effect is evaluated first and therefore eliminating undefined or hard to understand behavior in placed where side-effects are confusing and dangerous.

A deterministic and well defined language should not allow things like:
1
2
// Anyone unfamiliar with our time's languages would struggle to port this
if (x++==methodWithSideEffect(moreSideEffects()) && areTheseSideEffectsEvaluatedLater()) { ... }


When one can simply place side-effects before conditions;
1
2
3
4
x++;
p = methodWithSideEffect(moreSideEffects()); // Inner goes first of course
q = areTheseSideEffectsEvaluatedLater(); // On a separate line without dangerous lazy evaluation
if ((x == p) && q) { ... }


The purity is useful for static analysis by knowing that only the supplied parameters (including pointer targets as children) can be affected, so that other variables passed by reference from outside will have the same conditions met before and after the call. The most common use case is to find out if run-time bound checks are needed for an array element access by propagating which range of values each variable may hold after each assignment based on all possible input paths combined.
1
2
3
4
5
6
7
8
9
uint8_t myArray[512];
void foo(uint8_t x) {
	// Know that 0 <= x <= 255 from the type
	int y = x * 2 + 4; // For linear formulas, we only have to evaluate for the extreme ends, x==0 and x==255 in this case
	// Know that 4 <= y <= 514 (no overflow for signed 32-bit integers as we expect as a minimum size)
	z = pureFunction(y);
	// Only z is affected because we passed y by value to a pure function
	myArray[y] = x; // x is within the range, but not y! 514 might be above 511. Adding runtime checks and warning the developer.
}
If I understand correctly, your argument is that a "good" compile time system can only be pure, potentially using range arithmetic for indexing and conditions. As I mentioned above I do agree that such system would work and it would be nice to have a language that does this. But again, this is not my goal.

You are right that there is a problem with side effects and order of execution if the compiler can do things lazily or out of order. The thing to consider here is that these restrictions are not different from any other concurrent code and even the same synchronization primitives like mutexes could likely be used to manage reads and writes to the global mutable data. The compiler then takes the role of the OS that schedules "threads" to be executed and manages critical sections.

Finally, even for a pure system, read-only mode is too restrictive. If the global data is only written to and never read until the end of compilation it is safe for the writes to come in any order as long as the data type is a monoid. This can be very useful for things like pre-warmed caches or more broadly any kind of maps, sets or sorted lists.