background book propbackground computer mouse prop

The RiceLang Language Definition

Contents

Introduction

RiceLang is a simple C/Java like programming language created by yours truly: RiceL123. From source code to an AST, RiceLang programs can compile to Java byte code or transpile to JavaScript.

An example of a simple RiceLang program is shown below.

int main() {
	putStringLn("T-T");
	byebye 0;
}

Grammar

The following conventions are adopted for defining grammar rules for syntax.

Program Structure

A RiceLang program is a collection of function and variable declarations in a single file.

The entry point of the program is the main function which must have a return type of int. Due to scope rules, it will usually be the last function in a program. main cannot call itself recursively.

Comments

RiceLang supports single line comments and multi-line comments. It does not support nesting of multi line comments.

// this is a single line comment
/*
this is a multi line comment
*/

All comments are ignored by the compiler.

Separators

White space (like new lines, tabs or spaces) as well as the following can be used as separators

Identifiers

Identifiers are used to define both variables and function names and must be 1 or more characters long. They start with a letter or underscore and end with a letter, number or underscore.

Operators

There are 14 operators. Ordered from highest to lowest precedence with their associativity they are:

  1. +, -, ! (right-associative) // + and - as unary operators
  2. *, / (left-associative)
  3. +, - (left-associative) // as binary operators
  4. <, <=, >, >= (left-associative)
  5. ==, != (left-associative)
  6. && (left-associative)
  7. || (left-associative)
  8. = (right-associative)

Basic Types

RiceLang programs operate on 3 primitive data types with operators to form expressions.

int

An is a decimal number of at least 1 digit. They are of type int.

The value of an int type is a 32-bit signed integer. They can be operated on by

int i = 3;
int j = -2;
int k = i / j; // integer division: -2
boolean b = i > j; // true

float

A is made up of a whole-number, decimal point, a fractional part and an exponent. It is of type float.

The value of a float type is a single-precision 32-bit IEEE 754 floating point. They can be operated on by

Coercion on an int to a float will automatically occur for in expressions including for byebye statements, declarations and functions arguments.

float i = 3.;
int j = -2;
float k = i / j; // -1.5 (j converted to -2.0)
boolean b = i > j; // true (j is converted to -2.0)

boolean

A is either true or false and is of type boolean.

Although technically boolean only needs 1 bit, they will typically use a whole byte. They can be operated on by

When && or || are used, they are evaluated left to right and will try to short-circuit.

boolean i = true;
boolean j = !true;
boolean k = i && j; // false
boolean l = false && boolFunc(); // shortcircuit: boolFunc wont be called

string literals

A is zero or more characters surrounded by quotation marks.

refers to ASCII characters. If non-ASCII / UTF-8 characters are used, they may be read as single bytes. Escape sequences like \n and \" are also supported. RiceLang has no String type to use; string literals can only be used in the built-in functions putString and putStringLn. Strings cannot span more than 1 line.

putString("Hewwo world\n");
putStringLn("Byebye world");

Arrays

Ricelang only supports 1-dimensional arrays of type int, float and boolean. Arrays have a fixed size determined by an in the subscript or by the length of an array initialiser (). Arrays are filled with default values meaning int and float arrays will be filled with zeros and boolean arrays filled with false.

int a[2];                   // default array: [ 0, 0 ]
float a[2];                 // default array: [ 0.0, 0.0 ]
boolean a[2];               // default array: [ false, false ]
int b[5] = { 1, 2 };        // zeroed array: [ 1, 2, 0, 0, 0 ]
int c[] = { 1, 2 };         // empty subscript: [ 1, 2 ]
int d[];                    // Error: requires either size or initialiser
int d[1] = { 1, 2 };        // Error: initialiser > size
float e[] = { 1, 2, 3.14 }; // coercion to declared type: [ 1.0, 2.0, 3.14 ]

Arrays themselves can be passed as an argument to a function call (one will typically also pass in the array size). Arrays are passed to functions as pointers so modifications on them by the callee can be observed by the caller. Only element access with a subscript allows for valid manipulation of the array.

int increment_all(int x[], int size) {
	int i; // defaults to zero
	for(; i < size; i = i + 1) {
		x[i] = x[i] + 1;
	}
}

int main() {
	int x[] = { 1, 2 };
	increment_all(x, 2); // x is now [ 2, 3 ]
	putIntLn(x[1] + 1);  // 4
	x + 1;               // Error
}

Variables

There are 3 types of variables being, global variables, local variables and function parameters. Global and local variables are similar in that they are both declared by a , and optionally a list of declarations.

Local variable declarations in compound statements must come before any statements.

Function parameters act like local variable declarations at the beginning of a function's compound statement.

int a = 1;       // global variable
int fun(int b) { // function parameter - same scope as c
	int c;       // local variable
	int d, e = 2, f[] = { 1, 2 }; // int d; int e = 2; int f[] = { 1, 2 };
}

Similar to arrays, variables are also initialised to default values if unspecified; int and float default to 0 and booleans default to false.

Statements

Statements can either be just a single statement or a compound statement that contains zero or more variable declarations followed by zero or more statements.

Because there is no hoisting one may be inclined to add compound statements to have increased locality of variable declaration and use. (Or you could just make a deal with it)

int main() {
	int a = 1;
	// int b = 2; // decl is further away
	for (;a < 10; a = a + 1)
		putIntLn(a);

	{ // introduce a new compound statement
		// b decl is closer
		int b = 2;
		for (; b < 10; b = b + 1)
			putIntLn(b);
	}
}

If

If statements control the flow of a program based on the evaluation of its expression.

When multiple if statements have a single else statement, the else is attached to the innermost if.

// the following nested if statements with a single else are equivalent
if (1 < 2) if (3 == 5) putString("nani"); else putString("hello");
if (1 < 2) {
	if (3 == 5) putString("nani");
	else putString("hello");
}

While

If a while statement's is , it will continuously execute its and re-evaluate its until it is .

while (i < 5) {
	putIntLn(i);
	i = i + 1;
}

For

For statements are equivalent to while statements with executing once before entering and executing every loop after the . There is an exception for the behaviour of ; control passes to instead of straight to the conditional.

If is omitted, it is decorated with resulting in an infinite loop.

for (i = 0; i < 5; i = i + 1) {
	putIntLn(i);
	if (i == 4) continue; // will not loop infinitely as expr3 is executed
}
for (;;) {} // infinite loop

Break

statements exit the control of the current loop.

while (true) {
	break;
}
// control is now here

Continue

statements pass the control back to the start of the loop or to in the case of for loops.

while (true) {
	continue; // pass control back to evaluate condition
	putStringLn("hello"); // will not execute
}

Byebye

acts as a return statement which transfers control back to the caller of the function that contains it.

without an must be in a void function. with an must have the assignable to the function type.

RiceLang does not do data-flow analysis and as such, puts the burden of ensuring all possible branches have a byebye onto the user. A simple solution would be to include a byebye at the end of the function.

int fun(boolean b) {
    if (b) {
        byebye 1; // no run time error
    } else {
        putStringLn("no byebye"); // run time error due to no byebye int;
    }
    // could just have a byebye here
}

Expression Statements

An expression statement is just an expression followed by a semicolon. This will most typically be used for expressions that are assignments or function calls.

myfunc();
i = 0;

Scope rules

Scope rules govern declarations and their uses.

int main() {
	int main = 1;
	{
		int main = 2;
		putIntLn(main); // prints 2
	}
	putIntLn(main); // prints 1
}

Functions

Functions in RiceLang require that they be declared before they are called. This means that one will typically find main at the bottom of a program. Formally, a declaration is as follows.

A function call is an identifier followed by some brackets. Formally as follows.

While functions support recursion, they do not support overloading.

Functions must return one of int, float, boolean, void and have a corresponding byebye statement. Functions cannot return arrays.

Built-in

RiceLang consists of 11 built-in functions for I/O.

Input functions will block and parse a line from stdin and its value if valid. In the case of the RiceLang playground, any program that calls the built-in input functions that

int i = getInt(); // read and a parse a line of stdin to an int
float f = getFloat(); // as above but for float

Output functions will print a particular data type to stdout. These functions all return void.

putInt(int x);         // prints value of x to stdout
putIntLn(int x);       // prints value of x to stdout + "\n"
putFloat(float x);     // prints value of x to stdout
putFloatLn(float x);   // prints value of x to stdout + "\n"
putBool(boolean x);    // prints value of x to stdout
putBoolLn(boolean x);  // prints value of x to stdout + "\n"
putString("string literal");   // prints the string to stdout
putStringLn("string literal"); // prints the string + "\n" to stdout

All arguments are passed by value. This means they are copied into temporary variables and thus cannot modify the caller's arguments from within the function. As for arrays as arguments, an array pointer is passed and copied with a function call which allows for modification of the same array however doesn't allow changing of the caller's array pointer which is consistent with the pass by value behaviour.

by Eric L May 2025