The RiceLang Language Definition

Introduction
Grammar
Program Structure
Operators
Basic Types
- int
- float
- boolean
- string literals
Arrays
Variables
Statements
- If
- While
- For
- Break
- Continue
- Byebye
- Expression Statements
Scope rules
Functions
- Built-in

Introduction

RiceLang is a simple C/Java like programming language created by yours truly: RiceL123. From source code to an AST, RiceLang programs can compile to Java byte code or transpile to JavaScript.

An example of a simple RiceLang program is shown below.

int main() {
	putStringLn("T-T");
	byebye 0;
}

Grammar

The following conventions are adopted for defining grammar rules for syntax.

Terminal symbols: $bold$
Nonterminal symbols: $italics$
symbols can be grouped with brackets ( ) (e.g. $(A B)$ )
$A *$ is a sequence of 0 or more iterations of $A$
$A ?$ is an optional occurrence of $A$
$A ∣ B$ represents two possible productions being $A$ or $B$
Productions are written $A \to B_{1} ∣ ... ∣ B_{n}$

program \to func-decl \to para-list \to proper-para-list \to para-decl \to var-decl \to init-declarator-list \to init-declarator \to declarator \to ∣ initialiser \to ∣ type \to identifier \to compound-stmt \to stmt \to ∣ ∣ ∣ ∣ ∣ ∣ if-stmt \to for-stmt \to while-stmt \to break-stmt \to continue-stmt \to return-stmt \to expr-stmt \to expr \to assignment-expr \to cond-or-expr \to ∣ cond-and-expr \to ∣ equality-expr \to ∣ ∣ rel-expr \to ∣ ∣ ∣ ∣ additive-expr \to ∣ ∣ multiplicative-expr \to ∣ ∣ unary-expr \to ∣ ∣ ∣ primary-expr \to ∣ ∣ ∣ ∣ ∣ ∣ arg-list \to proper-arg-list \to arg \to (func-decl ∣ var-decl) * type identifier para-list compound-stmt (proper-para-list ?) para-decl (, para-decl) * type declarator type init-declarator-list; init-declarator (, init-declarator) * declarator (= initialiser)? identifier identifier [INTLITERAL ?] expr {expr (, expr) *} void ∣ boolean ∣ int ∣ float ID {var-decl * stmt *} compound-stmt if-stmt for-stmt while-stmt break-stmt continue-stmt return-stmt if (expr) stmt (else stmt)? for (expr ?; expr ?; expr ?) stmt while (expr) stmt break; continue; byebye expr ?; expr ?; assignment-expr (cond-or-expr =) * cond-or-expr cond-or-expr cond-or-expr || cond-and-expr equality-expr cond-and-expr && equality-expr rel-expr equality-expr == rel-expr equality-expr!= rel-expr additive-expr rel-expr < additive-expr rel-expr <= additive-expr rel-expr > additive-expr rel-expr >= additive-expr mutliplicative-expr additive-expr + mutliplicative-expr additive-expr - mutliplicative-expr unary-expr multiplicative-expr * unary-expr multiplicative-expr / unary-expr primary-expr + unary-expr - unary-expr! unary-expr identifier arg-list ? identifier [expr] (expr) INTLITERAL FLOATLITERAL BOOLLITERAL STINGLITERAL (proper-arg-list ?) arg (, arg) * expr

Program Structure

A RiceLang program is a collection of function and variable declarations in a single file.

The entry point of the program is the main function which must have a return type of int. Due to scope rules, it will usually be the last function in a program. main cannot call itself recursively.

Comments

RiceLang supports single line comments and multi-line comments. It does not support nesting of multi line comments.

// this is a single line comment
/*
this is a multi line comment
*/

All comments are ignored by the compiler.

Separators

White space (like new lines, tabs or spaces) as well as the following can be used as separators

{, }, (, ), [, ], ;, , When the AST is generated, all separator tokens and white space is omitted.

Identifiers

Identifiers are used to define both variables and function names and must be 1 or more characters long. They start with a letter or underscore and end with a letter, number or underscore.

ID \to start-char \to end-char \to start-char end-char * A ∣ ... ∣ Z ∣ a ∣ ... ∣ z ∣_start-char ∣ 0 ∣ 1 ∣ ... ∣ 9

Operators

There are 14 operators. Ordered from highest to lowest precedence with their associativity they are:

+, -, ! (right-associative) // + and - as unary operators
*, / (left-associative)
+, - (left-associative) // as binary operators
<, <=, >, >= (left-associative)
==, != (left-associative)
&& (left-associative)
|| (left-associative)
= (right-associative)

Basic Types

RiceLang programs operate on 3 primitive data types with operators to form expressions.

int

An $INTLITERAL$ is a decimal number of at least 1 digit. They are of type int.

INTLITERAL \to (0 ∣ 1 ∣ ... ∣ 9) *

The value of an int type is a 32-bit signed integer. They can be operated on by

+, -, *, / to produce int values
<, >, <=, >=, ==, != to produce boolean values

int i = 3;
int j = -2;
int k = i / j; // integer division: -2
boolean b = i > j; // true

float

A $FLOATLITERAL$ is made up of a whole-number, decimal point, a fractional part and an exponent. It is of type float.

FLOATLITERAL \to ∣ ∣ digit \to fraction \to exponent \to digit * fraction exponent ? digit* . digit* . ? exponent 0 ∣ 1 ∣ ... ∣ 9 . digit + (E ∣ e) (+ ∣ -)? digit +

The value of a float type is a single-precision 32-bit IEEE 754 floating point. They can be operated on by

+, -, *, / to produce float values
<, >, <=, >=, ==, != to produce boolean values

Coercion on an int to a float will automatically occur for in expressions including for byebye statements, declarations and functions arguments.

float i = 3.;
int j = -2;
float k = i / j; // -1.5 (j converted to -2.0)
boolean b = i > j; // true (j is converted to -2.0)

boolean

A $BOOLEANLITERAL$ is either true or false and is of type boolean.

BOOLEANLITERAL \to true ∣ false

Although technically boolean only needs 1 bit, they will typically use a whole byte. They can be operated on by

!, !=, ==, &&, || to produce boolean values

When && or || are used, they are evaluated left to right and will try to short-circuit.

boolean i = true;
boolean j = !true;
boolean k = i && j; // false
boolean l = false && boolFunc(); // shortcircuit: boolFunc wont be called

string literals

A $STRINGLITERAL$ is zero or more characters surrounded by quotation marks.

STRINGLITERAL \to “ character * ”

$character$ refers to ASCII characters. If non-ASCII / UTF-8 characters are used, they may be read as single bytes. Escape sequences like \n and \" are also supported. RiceLang has no String type to use; string literals can only be used in the built-in functions putString and putStringLn. Strings cannot span more than 1 line.

putString("Hewwo world\n");
putStringLn("Byebye world");

Arrays

Ricelang only supports 1-dimensional arrays of type int, float and boolean. Arrays have a fixed size determined by an $INTLITERAL$ in the subscript or by the length of an array initialiser ( ${expr (, expr) *}$ ). Arrays are filled with default values meaning int and float arrays will be filled with zeros and boolean arrays filled with false.

int a[2];                   // default array: [ 0, 0 ]
float a[2];                 // default array: [ 0.0, 0.0 ]
boolean a[2];               // default array: [ false, false ]
int b[5] = { 1, 2 };        // zeroed array: [ 1, 2, 0, 0, 0 ]
int c[] = { 1, 2 };         // empty subscript: [ 1, 2 ]
int d[];                    // Error: requires either size or initialiser
int d[1] = { 1, 2 };        // Error: initialiser > size
float e[] = { 1, 2, 3.14 }; // coercion to declared type: [ 1.0, 2.0, 3.14 ]

Arrays themselves can be passed as an argument to a function call (one will typically also pass in the array size). Arrays are passed to functions as pointers so modifications on them by the callee can be observed by the caller. Only element access with a subscript allows for valid manipulation of the array.

int increment_all(int x[], int size) {
	int i; // defaults to zero
	for(; i < size; i = i + 1) {
		x[i] = x[i] + 1;
	}
}

int main() {
	int x[] = { 1, 2 };
	increment_all(x, 2); // x is now [ 2, 3 ]
	putIntLn(x[1] + 1);  // 4
	x + 1;               // Error
}

Variables

There are 3 types of variables being, global variables, local variables and function parameters. Global and local variables are similar in that they are both declared by a $type$ , $identifier$ and optionally a list of declarations.

var-decl \to init-declarator-list \to init-declarator \to declarator \to type init-declarator-list; init-declarator (, init-declarator) * declarator (= initialiser)? identifier

Local variable declarations in compound statements must come before any statements.

Function parameters act like local variable declarations at the beginning of a function's compound statement.

int a = 1;       // global variable
int fun(int b) { // function parameter - same scope as c
	int c;       // local variable
	int d, e = 2, f[] = { 1, 2 }; // int d; int e = 2; int f[] = { 1, 2 };
}

Similar to arrays, variables are also initialised to default values if unspecified; int and float default to 0 and booleans default to false.

Statements

Statements can either be just a single statement or a compound statement that contains zero or more variable declarations followed by zero or more statements.

stmt \to compound-stmt \to compound-stmt {var-decl * stmt *}

Because there is no hoisting one may be inclined to add compound statements to have increased locality of variable declaration and use. (Or you could just make a deal with it)

int main() {
	int a = 1;
	// int b = 2; // decl is further away
	for (;a < 10; a = a + 1)
		putIntLn(a);

	{ // introduce a new compound statement
		// b decl is closer
		int b = 2;
		for (; b < 10; b = b + 1)
			putIntLn(b);
	}
}

If

If statements control the flow of a program based on the evaluation of its expression.

if-stmt \to if (expr) stmt (else stmt)?

When multiple if statements have a single else statement, the else is attached to the innermost if.

// the following nested if statements with a single else are equivalent
if (1 < 2) if (3 == 5) putString("nani"); else putString("hello");
if (1 < 2) {
	if (3 == 5) putString("nani");
	else putString("hello");
}

While

If a while statement's $expr$ is $true$ , it will continuously execute its $stmt$ and re-evaluate its $expr$ until it is $false$ .

while-stmt \to while (expr) stmt

while (i < 5) {
	putIntLn(i);
	i = i + 1;
}

For

For statements are equivalent to while statements with $expr1$ executing once before entering and $expr3$ executing every loop after the $stmt$ . There is an exception for the behaviour of $continue$ ; control passes to $expr3$ instead of straight to the conditional.

for-stmt \to for (expr1 ?; expr2 ?; expr3 ?) stmt

If $expr2$ is omitted, it is decorated with $true$ resulting in an infinite loop.

for (i = 0; i < 5; i = i + 1) {
	putIntLn(i);
	if (i == 4) continue; // will not loop infinitely as expr3 is executed
}
for (;;) {} // infinite loop

Break

$break$ statements exit the control of the current loop.

break-stmt \to break;

while (true) {
	break;
}
// control is now here

Continue

$continue$ statements pass the control back to the start of the loop or to $expr3$ in the case of for loops.

continue-stmt \to continue;

while (true) {
	continue; // pass control back to evaluate condition
	putStringLn("hello"); // will not execute
}

Byebye

$byebye$ acts as a return statement which transfers control back to the caller of the function that contains it.

return-stmt \to byebye expr ?;

$byebye$ without an $expr$ must be in a void function. $byebye$ with an $expr$ must have the $expr$ assignable to the function type.

RiceLang does not do data-flow analysis and as such, puts the burden of ensuring all possible branches have a byebye onto the user. A simple solution would be to include a byebye at the end of the function.

int fun(boolean b) {
    if (b) {
        byebye 1; // no run time error
    } else {
        putStringLn("no byebye"); // run time error due to no byebye int;
    }
    // could just have a byebye here
}

Expression Statements

An expression statement is just an expression followed by a semicolon. This will most typically be used for expressions that are assignments or function calls.

expr-stmt \to expr ?;

myfunc();
i = 0;

Scope rules

Scope rules govern declarations and their uses.

No identifier can defined more than once in the same block (this means function parameters must not collide with the local variable declarations in a function's body)
For every occurrence of an identifier, there must be some declaration in the same or an outer scope
An occurrence of an identifier will use the declaration that is the inner most scope that is equal to greater than its own scope (this produces the possibility of scope holes)
Every compound statement (and thus every function) forms a nested scope
Functions (including the built-ins) and global variables are all defined in the outermost scope

int main() {
	int main = 1;
	{
		int main = 2;
		putIntLn(main); // prints 2
	}
	putIntLn(main); // prints 1
}

Functions

Functions in RiceLang require that they be declared before they are called. This means that one will typically find main at the bottom of a program. Formally, a declaration is as follows.

func-decl \to para-list \to proper-para-list \to para-decl \to type identifier para-list compound-stmt (proper-para-list ?) para-decl (, para-decl) * type declarator

A function call is an identifier followed by some brackets. Formally as follows.

function-call \to arg-list \to proper-arg-list \to arg \to identifier arg-list (proper-arg-list ?) arg (, arg) * expr

While functions support recursion, they do not support overloading.

Functions must return one of int, float, boolean, void and have a corresponding byebye statement. Functions cannot return arrays.

Built-in

RiceLang consists of 11 built-in functions for I/O.

Input functions will block and parse a line from stdin and its value if valid. In the case of the RiceLang playground, any program that calls the built-in input functions that

uses the legacy run command will timeout as stdin isn't available
uses vanilla JavaScript transpilation will use the browser's prompt() function

int i = getInt(); // read and a parse a line of stdin to an int
float f = getFloat(); // as above but for float

Output functions will print a particular data type to stdout. These functions all return void.

putInt(int x);         // prints value of x to stdout
putIntLn(int x);       // prints value of x to stdout + "\n"
putFloat(float x);     // prints value of x to stdout
putFloatLn(float x);   // prints value of x to stdout + "\n"
putBool(boolean x);    // prints value of x to stdout
putBoolLn(boolean x);  // prints value of x to stdout + "\n"
putString("string literal");   // prints the string to stdout
putStringLn("string literal"); // prints the string + "\n" to stdout

All arguments are passed by value. This means they are copied into temporary variables and thus cannot modify the caller's arguments from within the function. As for arrays as arguments, an array pointer is passed and copied with a function call which allows for modification of the same array however doesn't allow changing of the caller's array pointer which is consistent with the pass by value behaviour.