As I thought about making a homepage, I
read a little
about PHP and decided that I do not want to learn that language,
but instead want to invent my own language. Inventing languages is fun anyways.
Now, in the following, some features of BHP
.
BHP supports a mix of functional and imperative programming. There's also a bit of object orientation in it. BHP is syntactically similar to C++/Java. Further concepts have been borrowed from Python, Scala, HTML and Haskell.
BHP has a built-in multipurpose data type Dom
.
A Dom
-object consists essentially of a tag (a string), a collection of attributes
(A map from strings to objects) and a list of elements (any objects).
There are two syntactic variants of generating a Dom
-object.
The first variant looks like this, for example:
def domObject = [:div, .class="someClass", .attr=3+2, "Hello", [:br], "world"]
Here, a Dom
-object with the tag div
is generated that has two attributes (namely
with the value class
"someClass"
and
with the value attr
5
) and contains three elements with the values "Hello"
,
[:br]
(a Dom
-object without any attributes or elements) and "world"
.
Using the second variant, which is strongly based on XML/HTML, one would define the same object like this:
def domObject = <div class="someClass" attr=(2+3)>Hello<br/>world</div>
The attribute values are given by BHP expressions. Everything that is between the tags is treated like the contents of a string literal.
If a Dom
-object which has a tag different form null
is converted to a string, HTML/XML code is generated.
The object defined above thus becomes the string
"<div class="someClass" attr="5">Hello<br/>world</div>"
If you want to include calculated data or the contents of variables in the element list, you have several possibilities
(which work in a very similar way also inside string literals):
With $variable
you can insert the value of a variable that has been defined before.
For more complex expressions one may write \(expression)
.
If several elements are to be inserted, you can write \{statements}
. The statements
are program code that calculates
values and inserts them into the current context, that is here into the element list (Or, in the case of string literals, into the generated string).
The preferred way to insert data into the current context is the proxy object named context
, for which the operator <<
has been overloaded (borrowed from C++).
With this, one may write, for example:
def domObject2 = <div>
\{
for i in 0..3:
context << '\n' <<
<p>
2*$i = \(2*i)
</p>
}
</div>
That results in the Dom
data structure
[:div, "\n ",
'\n', [:p, "\n 2*", 0, " = ", 0, "\n "]
'\n', [:p, "\n 2*", 1, " = ", 2, "\n "]
'\n', [:p, "\n 2*", 2, " = ", 4, "\n "]
"\n"
]
If this data structure is converted to a string and printed, one gets the output:
<div>
<p>
2*0 = 0
</p>
<p>
2*1 = 2
</p>
<p>
2*2 = 4
</p>
</div>
The correct indentation of the <p>
tags is achieved by an additional trick which I won't describe here.
Every nonempty combination of a certain set of characters can by used as an operator.
Additionally,
and in
are operators. Some operators, for example ..
,
in
or =
, are also used as keywords in some
syntactical situations.
..
Binary infix operators are defined by specifying the operator name, the precedence, the associativity and a function.
For the associativity, besides left-, right- and non-associative operators, also comparison operators and multi-operators are available.
Applications of comparison operators can be combined, so that, for example, the expression 1 < a + b <= c == 10
is equivalent (except for the fact that a + b
is evaluated only once) to the expression
1 < a + b && a + b <=c && c == 10
.
Multi-operators allow several applications of the same operator without parentheses to be evaluated by a single call to a variadic function.
For example, the join of three values can be calculated using a | b | c
.
Partial application of unparenthesized operators is also possible (and similarly partial function application, by the way).
This works by replacing one or more operands by a .
character.
The result of such an expression is a function with as many parameters as operands have been replaced with periods.
Thus, for example, the expression .+1
is equivalent to the successor function fn(x){return x+1}
(except for parameter names) and
a*.+1/.
corresponds to the function fn(x, y){return a*x+1/y}
.
Most unary operators can be applied both in prefix and in postfix form.
The postfix form is more compatible with function calls and attribute accesses, whereas the prefix form is tradition especially with
minus, but also with pre-increment, pre-decrement and address operators.
In postfix form, all operators except post-increment (++
) and post-decrement (--
) are preceded by a period, which
further intensifies the similarity to method calls.
Except for <
, every postfix operator written with a period is also available as a prefix operator (without the period).
Prefix <
does not exist because that would conflict with the notation for XML expressions.
Also: Who needs a unary <
operator.
Whenever a programmer-defined unary operator is to be applied, the runtime first checks if the operand has a method of that name, which in that case is called. Otherwise the current scope is searched for a correspondingly named function definition. This way, it is possible to define unary operators that work with all data types, but can be overridden with specialized implementations for some objects.
Once a local variable has been defined, its value cannot be changed anymore due to safety concerns. If you want to do that anyway, you must define the variable as a mutable memory cell:
def a = 0
def b = var 0
The name b
is bound to a mutable memory cell that has been initialized to 0
but can be changed later,
whereas the name a
is directly bound to the value 0
.
If a read-access to b
occurs anywhere in an expression, the memory cell associated with b
is dereferenced.
When accessing a
it is not dereferenced, as its value is not a memory cell.
So you can write def c = a + b
without thinking about the fact that b
is not actually a number, but only a pointer to a number.
The difference becomes apparent when the variables are used on the left side of an assignment:
b <- 3 + b
a <- 3 + a
The first line works like this: The value of the name b
is looked up, but not dereferenced even though it is a memory cell/reference.
So the result of the left side is the memory cell itself. Then the right side is evaluated as usual.
Then the method set
of the memory cell is called in order to write the new value.
The second line generates an error message because a
does not have a set
method.
The non-user-definable unary operator &
evaluates its operand as if it was located on the left side of an assignment,
that is it suppresses the automatic dereferencing. With this one can define aliases:
def d = &b
The variables b
and d
are now bound to the same reference.
If there is a tuple-, list-, or Dom-expression on the left side of an assignment, all elements (and attributes) are evaluated without
automatic dereferencing. Also, tuples, lists, and Dom
-objects have a set
method, which takes apart its argument
and forwards the parts to the set
methods of the corresponding elements and attributes.
With this, for example a function for swapping the values of two references may be defined and used as follows:
def swap(a, b){
(a, b) <- (b, a)
}
def x = var 1
def y = var 2
swap(&x, &y)
The attributes of Dom
-objects and the elements of lists and Dom
-objects are also wrapped in mutable memory cells.
Hence you can give alias names to elements or attributes, which you may use to change them:
def list=[0,1,2,3]
for n in varIterator(list):
++n
This results in the list [1,2,3,4]
.
The standard library function varIterator
is necessary here because the normal iteration over a list would bind the variable
n
only to the values contained in the memory cells, not to the memory cells themselves.
varIterator
is defined by
def varIterator(l)[yield &l[i] : i in 0 .. l.size()]
and makes from a list an iterable object that, when iterated over, yields with constant memory overhead the references to the list elements.
def a = 3
a
is a pattern that matches everything and binds the matched object to the name a
.
In the statement
def (b, c) = f(a)
(b, c)
is a pattern that matches only pairs of values
and that binds the elements of the pair to the names b
and c
.
If the function f
here returns something other than a pair, an exception PatternMatchFailure
is thrown.
def f = fn(a@ !null, list@[head@in Int && positive(), .. rest@in Int]){
return (a.toString()*head, rest)
}
This function expects as its first argument any Object, as long as it is not null
.
The second argument has to be a list containing at least one element, where the first element is called head
and has to be a
positive integer, and the remaining elements are aggregated in a list rest
and have to be integers.
Should a parameter pattern not match the corresponding argument, an exception
ArgMismatchException
is thrown.
case
expressions in conjunction with the built-in function match
:
If you want to analyze an object and perform different actions or evaluate different expressions, depending on its structure,
you can use the match
function.
match(x,
case null:
context << "x is null",
case in Int && modulo(y: 3):
context << "x and $y are congruent modulo 3",
case [y]:
context << "x is a list containing one $y",
case _:
context << "x is not an integer or a one element list",
)
def collatz=fn(n@in Int&&positive()) match(n,
case even(): n//2,
case odd(): 3*n+1,
)
As you can see, case
expressions consist of a pattern and an expression. The function match
tries the patterns
of the passed case
objects in order and evaluates the expression belonging to the first matching pattern.
If none of the patterns matched, a PatternMatchFailure
exception is thrown.
A specialty of BHP is that case
expressions are not, as with other languages, syntactically
restricted to occur in the context of a match
-like construct.
For example, one may store case
objects in dynamically modifiable lists
which are the passed to match
.
Using this mechanism, extensible operator overloading is realized in BHP: So that an operator becomes able to process new operand types,
a case
object is added to the corresponding list.
for
loops:
Like with definitions of variables and parameters, for
loops bind local variable names to values, only that the values
are taken from an iterable object.
for (index, element) in withIndex(list):
context << "list[$index] = $element\n"
Here the library function withIndex
generates from a list an iterable object that, when iterated, not only
gives the list element but also the index.
The index and the element are then bound by the pattern (index, element)
to local variables, which are valid for the duration
of one iteration of the loop.
In case that an object provided by the iterator does not match the pattern, the iteration for this object is skipped.
Thus you can for example iterate over all non-null elements of a list:
for element@ !null in list:
element.doSth()
for
loop.
The second possibility can be used for binding intermediate results to variables and for pattern-based filtering of the generated list.
The following list comprehension turns a list list1
into a list of string representations of the elements of
list1
, where only those elements are considered which are not null
and whose string representations are not empty.
def list2=[elemStr:
element@ !null in list1, #1. possibility
elemStr@ !"" = element.toString(), #2. possibility
]
let
expressions. let
expressions are just another way to define local variables.
catch
clauses: Whether the current exception matches the pattern of a catch
-clause decides if the clause
is used for handling the exception. Parts of the exception can be extracted and bound to local variables:
try:
readSettings()
catch IOException(msg):
printErr("I/O error: $msg")
modulo(y: 3)
, even()
and IOException(msg)
.
These are applications of so-called pattern abstractions.
Similar to Scala's extractor objects, pattern abstractions serve the purpose of defining your own pattern building blocks.
One may for example define the pattern abstraction twice
like this:
def case twice():(n//2) <- n@even()
If this abstraction is applied (for example in the pattern twice(odd())
),
the pattern to the right of <-
is matched against the matched object.
If it matches (Which here is the case if the object is an even number; As a side effect the object is called n
),
the expression n//2
(integral division by 2) is evaluated and matched against the pattern in the argument listof the abstraction applications (that is
odd()
in the example here). So the pattern twice(odd())
matches all numbers that are twice an odd number.
In the pattern modulo(y: 3)
after the colon, an ordinary argument list of expressions is passed to the
pattern abstraction. These arguments are bound to the parameters that are declared before the colon of the pattern abstraction definition:
def case modulo(m):(o%m) <- o
def case even():() <- modulo(0:2)
def case odd():() <- modulo(1:2)
_; then nothing happens). If the variable was already bound, then depending on the syntactic context of the pattern either this is a syntax error or the matched object is compared to the bound value; the pattern matches only if the values are equal.
null
, true
, false
, a number or a string constant.
It matches only those objects that are equal to the constant.
$
followed by a variable name or an parenthesized expression.
The expression or variable name is evaluated and, like with the ConstPattern, compared to the matched object.
&&
or @
.
It matches only if all operands match. All variable bindings made by the operands are kept.
||
.
It matches only if one of the operand patterns matches.
Only those variable bindings made by the first matching operand are kept which would have been bound by all other operands, too.
!
character followed by a pattern. It matches only if the other pattern does not match.
No variable bindings are kept.
in
followed by an expression which must be inside parentheses if it contains infix operators.
It matches if the matched object is contained in the value of the expression (as determined by its contains
method).
In particular, this can be used to check whether the matched value belongs to a certain type.
var
followed by a pattern. It matches mutable memory cells whose current content matches the other pattern.
if
followed by an expression.
It matches if the value of the expression, converted to a boolean, is true
.
let
followed by a pattern, an =
character and an expression.
It matches if the value of the expression matches the pattern; the resulting variable bindings are kept.
In the following example it is used to define a pattern abstraction that matches strings which represent integers and that allows to match the parsed
number against a further pattern:
def case intString():(value) <- o @ let (value @ !null) = tryParseInt(o)
def tryParseInt(n){
try:
return parseInt(n)
catch _:
return null
}
context << match(str,
case intString(n@even()): "String representing an even number $n",
case intString(n@odd()): "String representing an odd number $n",
case _: "Not a string representation of an integer",
)
argument list, as described above, or it is the application of a user defined pattern operator to one or two patterns. Instead of names of pattern abstractions, names of extractor objects may be used, but I don't describe those here.
%pattern
.
This extracts the length of the tuple and matches it against the given pattern.
As its last element, a TuplePattern may contain the keyword ..
, optionally followed by a pattern.
This indicates that the tuple may have any number of additional elements. If a pattern was specified after the ..
, the remaining elements must
all individually match this pattern.
The variables bound while matching the rest elements are combined into lists and bound to the corresponding names by the TuplePattern.
Example: The pattern
((fst0, snd0), %odd(), .. (fsts, snds))
matches all tuples consisting of pairs and having odd length.
The elements of the first pair are bound to fst0
and snd0
.
The first elements of the remaining pairs are combined into a list named fsts
and the second elements into a list named snds
.
*
character in order to match it against the underlying mutable memory cell of the
list element and not against its value.
Dom
-objects.
Such a specifier may be:
Dom
-objects has to agree with one of them.
=
character and a pattern. The tag of the Dom
-object is matched against the pattern.
=
or ->
and a pattern.
This matches the specified attribute of the Dom
-object against the pattern.
In front of the pattern there may again be an *
in order to match against the mutable memory cell belonging to the attribute.
. * =
or . * ->
followed by a pattern. This extracts the list of elements and matches it
against the specified pattern.
fn
followed by a parameter list in parentheses.
Thus it looks exactly like a lambda function head.
It matches a Dom
-object that, if treated as an argument list, can be bound to the parameters
in the parameter list without throwing an ArgMismatchException from this.
The child elements of the Dom
-object provide the positional arguments and the attributes provide the named arguments.
ParamPatterns are primarily meant to be matched with argument lists that arise from variadic functions heads, so as to implement polymorphic functions:
def f(args*.){
# Positional and named arguments are
# bundled in the Dom-object 'args'
match(args,
case fn(c, d)&&let []=d: printOut("c=$c, d=$d (special case)"),
case fn(h, i): printOut("h=$h, i=$i"),
case fn(c, d): printOut("c=$c, d=$d"),
case fn(a =100, b=200): printOut("a=$a, b=$b"),
case fn(e, f, g): printOut("e=$e, f=$f, g=$g"),
case _: printOut("Nothing matches $args")
)
}
f() # Writes "a=100, b=200"
f(1, 2) # Writes "h=1, i=2"
f(1, []) # Writes "c=1, d=[] (special case)"
f(1, .d=2) # Writes "c=1, d=2"
f(1,2,3) # Writes "e=1, f=2, g=3"
f(1,2,3,.x=4) # Writes "Nothing matches [:(null), .x = 4, 1, 2, 3]"
In its basic form, a QuantifiedPattern is a Pattern followed by a quantifier, which is a specification in curly braces.
The matched object should represent a finite sequence and for this purpose provide methods size
, prefix
and
suffix
that can be used to take it apart. Thus it may for example be a String, a List or a Tuple.
The QuantifiedPattern matches if the matched Object is the concatenation of several subsequences, each of which matches the
pattern in front of the quantifier.
The variables that a bound during matching of the subsequence pattern are combined into lists.
The quantifier specifies how many subsequences it may be. It can take carious forms:
{expression}
: The expression must evaluate to a nonnegative integer. For the QuantifiedPattern to match, there must be exactly as many
subsequences as this number.{minExpression,}
: The expression must evaluate to a nonnegative integer. For the QuantifiedPattern to match, there must be at least as many
subsequences as this number.{minExpression,maxExpression}
: The expressions must evaluate to nonnegative integers (if maxExpression
yields null
,
it is as if there was nothing after the comma). For the QuantifiedPattern to match, there must be at least as many
subsequences as minExpression
says and at most as many as maxExpression
says.case
followed by a pattern that is matched against the number of subsequences
in order to decide if the QuantifiedPattern matches. Variable bindings from this matching are not kept.
A QuantifiedPattern with the quantifier {0,}
, {1,}
or {0,1}
can also be written using the unary
pattern operator *
, +
or ?
, respectively.
The matching algorithm for QuantifiedPattern works by splitting off prefixes of various sizes from the sequence
in order. If one of them matches the subsequence pattern, it continues recursively on the corresponding suffix. The algorithm terminates successfully if the suffix
is empty and the number of found subsequences matches the quantifier.
Lest the algorithm run into an endless loop by repeatedly splitting off empty prefixes, under appropriate circumstances two empty prefixes are never split off in direct
succession. But except for this restriction, the algorithm tries all decompositions of the sequence until a matching one is found.
Together with the OrPattern and the pattern operators +
and <++
for splitting of sequences (from the standard library),
BHP patterns thus reach the expressiveness of regular expressions. This implementation, however, is not very efficient.
If one also uses AbstractionApplicationPatterns, it should be possible to recognize context free languages.
But without caching of matching results (which I did not implement), this gets really inefficient, and it also is not useful for serious parsing
because in case of syntax errors in the input the pattern simply does not match instead of producing an informative error message.
Here is an example for the usage of QuantifiedPatterns:
match("----|Hello|world|",
case "-"{case odd(),4} <+ (b + "|").* :
printOut("b=$b"), # Writes "b=[-, Hello, world]"
)
The pattern operator <+
matches a sequence if its operands match the sequence in order, while
trying to match as long as possible a part of the sequence against the first operand.
Here is an extensive example from the constructor of my TreeMap implementation:
def dispatch = fn(args*)match(args,
# Make empty TreeMap
case []: ctor(var null),
# Make empty TreeMap with custom Comparator
case [comp@in Comparator]: ctor(var null, comp),
# Copy a TreeMap
case [map@in TreeMap]: ctor(var map.getTree(), map.comparator()),
# Initialize TreeMap with key-value-pairs from an iterable
case [it@iterable()]: dispatch(it, stdComparator),
# Make TreeMap with custom Comparator, initialized from an iterable
case [it@iterable(), comp@in Comparator]: {
def r = ctor(var null, comp);
r.putAll(it);
r;
},
case [comp@in Comparator, it@iterable()]: dispatch(it, comp),
# Make TreeMap with custom Comparator, initialized with
# key-value-pairs from the remaining arguments
case [comp@in Comparator, .. kv@(_,_)]: dispatch(kv, comp),
# Initialize TreeMap with key-value-pairs from the argument list
case [.. kv@(_,_)]: dispatch(kv, stdComparator),
# Make TreeMap with custom Comparator, initialized with
# keys and values from the remaining arguments, which are not
# yet combined into pairs
case [comp@in Comparator,%odd() , .. kv]:
dispatch([yield (key, value): key, value in kv], comp),
# Initialize TreeMap with
# keys and values from the argument list, which are not
# yet combined into pairs
case [%even(), .. kv]:
dispatch([yield (key, value): key, value in kv], stdComparator),
#Invalid argument list
case _: error(
"TreeMap() does not understand these argument types",
ArgMismatchException()
)
)
The variadic function dispatch
understands diverse argument list formats and is called by the outwardly visible constructor.
I wrote it before ParamPatterns existed, so it uses positional arguments only.
The function head fn(args*)
binds the entire argument list to the name args
,
which is then matched against various ListPatterns. Depending on which pattern matches, the arguments are transformed
and used for building a TreeMap by means of the actual (non-variadic) constructor function ctor
and the method putAll
.
The penultimate case
expression for example allows us to write:
TreeMap(key1, value1, key2, value2, key3, value3)
In order for every key to have a value, the argument count must be even.
The pattern of the penultimate case
expression matches (because of %even()
) only if
there actually is an even number of arguments.
These are then combined into pairs using the iterable comprehension [yield (key, value): key, value in kv]
.