Inhalt

As I thought about making a homepage, I read a little about PHP and decided that I do not want to learn that language, but instead want to invent my own language. Inventing languages is fun anyways. Now, in the following, some features of BHP.

BHP supports a mix of functional and imperative programming. There's also a bit of object orientation in it. BHP is syntactically similar to C++/Java. Further concepts have been borrowed from Python, Scala, HTML and Haskell.

XML expressions

BHP has a built-in multipurpose data type Dom. A Dom-object consists essentially of a tag (a string), a collection of attributes (A map from strings to objects) and a list of elements (any objects). There are two syntactic variants of generating a Dom-object. The first variant looks like this, for example:


def domObject = [:div, .class="someClass", .attr=3+2, "Hello", [:br], "world"]

Here, a Dom-object with the tag div is generated that has two attributes (namely class with the value "someClass" and attr with the value 5) and contains three elements with the values "Hello", [:br] (a Dom-object without any attributes or elements) and "world".

Using the second variant, which is strongly based on XML/HTML, one would define the same object like this:


def domObject = <div class="someClass" attr=(2+3)>Hello<br/>world</div>

The attribute values are given by BHP expressions. Everything that is between the tags is treated like the contents of a string literal.

If a Dom-object which has a tag different form null is converted to a string, HTML/XML code is generated. The object defined above thus becomes the string


"<div class="someClass" attr="5">Hello<br/>world</div>"

If you want to include calculated data or the contents of variables in the element list, you have several possibilities (which work in a very similar way also inside string literals): With $variable you can insert the value of a variable that has been defined before. For more complex expressions one may write \(expression). If several elements are to be inserted, you can write \{statements}. The statements are program code that calculates values and inserts them into the current context, that is here into the element list (Or, in the case of string literals, into the generated string). The preferred way to insert data into the current context is the proxy object named context, for which the operator << has been overloaded (borrowed from C++). With this, one may write, for example:


def domObject2 = <div>
  \{
    for i in 0..3:
      context << '\n' <<
        <p>
          2*$i = \(2*i)
        </p>
  }
</div>

That results in the Dom data structure


[:div, "\n  ", 
  '\n', [:p, "\n    2*", 0, " = ", 0, "\n  "]
  '\n', [:p, "\n    2*", 1, " = ", 2, "\n  "]
  '\n', [:p, "\n    2*", 2, " = ", 4, "\n  "]
  "\n"
]

If this data structure is converted to a string and printed, one gets the output:


<div>
  
  <p>
    2*0 = 0
  </p>
  <p>
    2*1 = 2
  </p>
  <p>
    2*2 = 4
  </p>
</div>

The correct indentation of the <p> tags is achieved by an additional trick which I won't describe here.

Operators

Every nonempty combination of a certain set of characters can by used as an operator. Additionally, in and .. are operators. Some operators, for example in, = or .., are also used as keywords in some syntactical situations.

Binary infix operators are defined by specifying the operator name, the precedence, the associativity and a function.

For the associativity, besides left-, right- and non-associative operators, also comparison operators and multi-operators are available. Applications of comparison operators can be combined, so that, for example, the expression 1 < a + b <= c == 10 is equivalent (except for the fact that a + b is evaluated only once) to the expression 1 < a + b && a + b <=c && c == 10 . Multi-operators allow several applications of the same operator without parentheses to be evaluated by a single call to a variadic function. For example, the join of three values can be calculated using a | b | c.

Partial application of unparenthesized operators is also possible (and similarly partial function application, by the way). This works by replacing one or more operands by a . character. The result of such an expression is a function with as many parameters as operands have been replaced with periods. Thus, for example, the expression .+1 is equivalent to the successor function fn(x){return x+1} (except for parameter names) and a*.+1/. corresponds to the function fn(x, y){return a*x+1/y}.

Most unary operators can be applied both in prefix and in postfix form. The postfix form is more compatible with function calls and attribute accesses, whereas the prefix form is tradition especially with minus, but also with pre-increment, pre-decrement and address operators. In postfix form, all operators except post-increment (++) and post-decrement (--) are preceded by a period, which further intensifies the similarity to method calls. Except for <, every postfix operator written with a period is also available as a prefix operator (without the period). Prefix < does not exist because that would conflict with the notation for XML expressions. Also: Who needs a unary < operator.

Whenever a programmer-defined unary operator is to be applied, the runtime first checks if the operand has a method of that name, which in that case is called. Otherwise the current scope is searched for a correspondingly named function definition. This way, it is possible to define unary operators that work with all data types, but can be overridden with specialized implementations for some objects.

References

Once a local variable has been defined, its value cannot be changed anymore due to safety concerns. If you want to do that anyway, you must define the variable as a mutable memory cell:


def a = 0
def b = var 0

The name b is bound to a mutable memory cell that has been initialized to 0 but can be changed later, whereas the name a is directly bound to the value 0. If a read-access to b occurs anywhere in an expression, the memory cell associated with b is dereferenced. When accessing a it is not dereferenced, as its value is not a memory cell. So you can write def c = a + b without thinking about the fact that b is not actually a number, but only a pointer to a number. The difference becomes apparent when the variables are used on the left side of an assignment:


b <- 3 + b
a <- 3 + a

The first line works like this: The value of the name b is looked up, but not dereferenced even though it is a memory cell/reference. So the result of the left side is the memory cell itself. Then the right side is evaluated as usual. Then the method set of the memory cell is called in order to write the new value. The second line generates an error message because a does not have a set method.

The non-user-definable unary operator & evaluates its operand as if it was located on the left side of an assignment, that is it suppresses the automatic dereferencing. With this one can define aliases:


def d = &b

The variables b and d are now bound to the same reference.

If there is a tuple-, list-, or Dom-expression on the left side of an assignment, all elements (and attributes) are evaluated without automatic dereferencing. Also, tuples, lists, and Dom-objects have a set method, which takes apart its argument and forwards the parts to the set methods of the corresponding elements and attributes. With this, for example a function for swapping the values of two references may be defined and used as follows:


def swap(a, b){
  (a, b) <- (b, a)
}
def x = var 1
def y = var 2
swap(&x, &y)

The attributes of Dom-objects and the elements of lists and Dom-objects are also wrapped in mutable memory cells. Hence you can give alias names to elements or attributes, which you may use to change them:


def list=[0,1,2,3]
for n in varIterator(list):
  ++n

This results in the list [1,2,3,4]. The standard library function varIterator is necessary here because the normal iteration over a list would bind the variable n only to the values contained in the memory cells, not to the memory cells themselves. varIterator is defined by


def varIterator(l)[yield &l[i] : i in 0 .. l.size()]

and makes from a list an iterable object that, when iterated over, yields with constant memory overhead the references to the list elements.

Pattern matching

With pattern matching one can test whether an object conforms to a given structure, and as a side effect the constituents of the object are bound to names.

Usage

BHP employs pattern matching in the following situations:

Available types of patterns

Patterns can be composed from elementary patterns using various (possibly user defined) operators. In detail, the operators and elementary patterns are:
BindPattern
A BindPattern is simply a variable name. If the variable has not yet been defined in the current scope, the pattern matches everything and binds the matched object to the variable (Except if the variable name is _; then nothing happens). If the variable was already bound, then depending on the syntactic context of the pattern either this is a syntax error or the matched object is compared to the bound value; the pattern matches only if the values are equal.
ConstPattern
A ConstPattern is a constant like null, true, false, a number or a string constant. It matches only those objects that are equal to the constant.
EqualsPattern
An EqualsPattern consists of a $ followed by a variable name or an parenthesized expression. The expression or variable name is evaluated and, like with the ConstPattern, compared to the matched object.
AndPattern
An AndPattern consists of several patterns that are connected using the operators && or @. It matches only if all operands match. All variable bindings made by the operands are kept.
OrPattern
An OrPattern consists of several patterns that are connected using the operator ||. It matches only if one of the operand patterns matches. Only those variable bindings made by the first matching operand are kept which would have been bound by all other operands, too.
NotPattern
A NotPattern consists of a ! character followed by a pattern. It matches only if the other pattern does not match. No variable bindings are kept.
InPattern
An InPattern consists of the keyword in followed by an expression which must be inside parentheses if it contains infix operators. It matches if the matched object is contained in the value of the expression (as determined by its contains method). In particular, this can be used to check whether the matched value belongs to a certain type.
VarPattern
A VarPattern consists of the keyword var followed by a pattern. It matches mutable memory cells whose current content matches the other pattern.
ConditionalPattern
A ConditionalPattern consists of the keyword if followed by an expression. It matches if the value of the expression, converted to a boolean, is true.
LetPattern
A LetPattern consists of the keyword let followed by a pattern, an = character and an expression. It matches if the value of the expression matches the pattern; the resulting variable bindings are kept. In the following example it is used to define a pattern abstraction that matches strings which represent integers and that allows to match the parsed number against a further pattern:

def case intString():(value) <- o @ let (value @ !null) = tryParseInt(o)
def tryParseInt(n){
  try:
    return parseInt(n)
  catch _: 
    return null
}
context << match(str,
  case intString(n@even()): "String representing an even number $n",
  case intString(n@odd()): "String representing an odd number $n",
  case _: "Not a string representation of an integer",
)
AbstractionApplicationPattern
An AbstractionApplicationPattern either is the name of a pattern abstraction followed by an argument list, as described above, or it is the application of a user defined pattern operator to one or two patterns. Instead of names of pattern abstractions, names of extractor objects may be used, but I don't describe those here.
TuplePattern
A TuplePattern essentially consists of 0 or more comma separated patterns enclosed in parentheses. If it is just a single pattern in parentheses, a comma should follow it so it is parsed as a 1-tuple and not as a simple parenthesized pattern. Optionally, a tuple pattern may contain a specification of the form %pattern. This extracts the length of the tuple and matches it against the given pattern. As its last element, a TuplePattern may contain the keyword .., optionally followed by a pattern. This indicates that the tuple may have any number of additional elements. If a pattern was specified after the .., the remaining elements must all individually match this pattern. The variables bound while matching the rest elements are combined into lists and bound to the corresponding names by the TuplePattern. Example: The pattern

((fst0, snd0), %odd(), .. (fsts, snds))
matches all tuples consisting of pairs and having odd length. The elements of the first pair are bound to fst0 and snd0. The first elements of the remaining pairs are combined into a list named fsts and the second elements into a list named snds.
ListPattern
A ListPattern works similar to a TuplePattern with brackets instead of parentheses. Additionally an element pattern may be preceded by a * character in order to match it against the underlying mutable memory cell of the list element and not against its value.
DomPattern
A DomPattern consists of several comma separated specifiers in brackets and matches only Dom-objects. Such a specifier may be:
  • A colon followed by a tag name. If one or more of these specifiers are present, the tag of the Dom-objects has to agree with one of them.
  • A colon followed by an = character and a pattern. The tag of the Dom-object is matched against the pattern.
  • A period, followed by an attribute name, an = or -> and a pattern. This matches the specified attribute of the Dom-object against the pattern. In front of the pattern there may again be an * in order to match against the mutable memory cell belonging to the attribute.
  • The sequence . * = or . * -> followed by a pattern. This extracts the list of elements and matches it against the specified pattern.
ParamPattern
A ParamPattern consists of the Keyword fn followed by a parameter list in parentheses. Thus it looks exactly like a lambda function head. It matches a Dom-object that, if treated as an argument list, can be bound to the parameters in the parameter list without throwing an ArgMismatchException from this. The child elements of the Dom-object provide the positional arguments and the attributes provide the named arguments. ParamPatterns are primarily meant to be matched with argument lists that arise from variadic functions heads, so as to implement polymorphic functions:

def f(args*.){ 
  # Positional and named arguments are
  # bundled in the Dom-object 'args'
  match(args,
    case fn(c, d)&&let []=d: printOut("c=$c, d=$d (special case)"),
    case fn(h, i): printOut("h=$h, i=$i"),
    case fn(c, d): printOut("c=$c, d=$d"),
    case fn(a =100, b=200): printOut("a=$a, b=$b"),
    case fn(e, f, g): printOut("e=$e, f=$f, g=$g"),
    case _: printOut("Nothing matches $args")
  )
}
f()           # Writes "a=100, b=200"
f(1, 2)       # Writes "h=1, i=2"
f(1, [])      # Writes "c=1, d=[] (special case)"
f(1, .d=2)    # Writes "c=1, d=2" 
f(1,2,3)      # Writes "e=1, f=2, g=3"
f(1,2,3,.x=4) # Writes "Nothing matches [:(null), .x = 4, 1, 2, 3]"
QuantifiedPattern

In its basic form, a QuantifiedPattern is a Pattern followed by a quantifier, which is a specification in curly braces. The matched object should represent a finite sequence and for this purpose provide methods size, prefix and suffix that can be used to take it apart. Thus it may for example be a String, a List or a Tuple. The QuantifiedPattern matches if the matched Object is the concatenation of several subsequences, each of which matches the pattern in front of the quantifier. The variables that a bound during matching of the subsequence pattern are combined into lists. The quantifier specifies how many subsequences it may be. It can take carious forms:

  • {expression}: The expression must evaluate to a nonnegative integer. For the QuantifiedPattern to match, there must be exactly as many subsequences as this number.
  • {minExpression,}: The expression must evaluate to a nonnegative integer. For the QuantifiedPattern to match, there must be at least as many subsequences as this number.
  • {minExpression,maxExpression}: The expressions must evaluate to nonnegative integers (if maxExpression yields null, it is as if there was nothing after the comma). For the QuantifiedPattern to match, there must be at least as many subsequences as minExpression says and at most as many as maxExpression says.
  • Instead of the first expression, there can also be the keyword case followed by a pattern that is matched against the number of subsequences in order to decide if the QuantifiedPattern matches. Variable bindings from this matching are not kept.
  • If there is a colon after the opening brace, the matching algorithm tries to split off as long as possible suffixes first instead of as short as possible suffixes, which is its standard behavior.

A QuantifiedPattern with the quantifier {0,}, {1,} or {0,1} can also be written using the unary pattern operator *, + or ?, respectively.

The matching algorithm for QuantifiedPattern works by splitting off prefixes of various sizes from the sequence in order. If one of them matches the subsequence pattern, it continues recursively on the corresponding suffix. The algorithm terminates successfully if the suffix is empty and the number of found subsequences matches the quantifier. Lest the algorithm run into an endless loop by repeatedly splitting off empty prefixes, under appropriate circumstances two empty prefixes are never split off in direct succession. But except for this restriction, the algorithm tries all decompositions of the sequence until a matching one is found. Together with the OrPattern and the pattern operators + and <++ for splitting of sequences (from the standard library), BHP patterns thus reach the expressiveness of regular expressions. This implementation, however, is not very efficient. If one also uses AbstractionApplicationPatterns, it should be possible to recognize context free languages. But without caching of matching results (which I did not implement), this gets really inefficient, and it also is not useful for serious parsing because in case of syntax errors in the input the pattern simply does not match instead of producing an informative error message.

Here is an example for the usage of QuantifiedPatterns:


match("----|Hello|world|", 
  case "-"{case odd(),4} <+ (b + "|").* :   
    printOut("b=$b"),   # Writes "b=[-, Hello, world]"
)

The pattern operator <+ matches a sequence if its operands match the sequence in order, while trying to match as long as possible a part of the sequence against the first operand.

Usage example

Here is an extensive example from the constructor of my TreeMap implementation:


  def dispatch = fn(args*)match(args, 
    
    # Make empty TreeMap
    case []: ctor(var null),
    
    # Make empty TreeMap with custom Comparator
    case [comp@in Comparator]: ctor(var null, comp),
   
    # Copy a TreeMap
    case [map@in TreeMap]: ctor(var map.getTree(), map.comparator()),

    # Initialize TreeMap with key-value-pairs from an iterable
    case [it@iterable()]: dispatch(it, stdComparator),
    
    # Make TreeMap with custom Comparator, initialized from an iterable
    case [it@iterable(), comp@in Comparator]: {
      def r = ctor(var null, comp);
      r.putAll(it);
      r;
    },
    case [comp@in Comparator, it@iterable()]: dispatch(it, comp),
    
    # Make TreeMap with custom Comparator, initialized with 
    #  key-value-pairs from the remaining arguments
    case [comp@in Comparator, .. kv@(_,_)]: dispatch(kv, comp),
    
    # Initialize TreeMap with key-value-pairs from the argument list
    case [.. kv@(_,_)]: dispatch(kv, stdComparator),
    
    # Make TreeMap with custom Comparator, initialized with 
    #  keys and values from the remaining arguments, which are not
    #  yet combined into pairs
    case [comp@in Comparator,%odd() , .. kv]: 
      dispatch([yield (key, value): key, value in kv], comp),

    # Initialize TreeMap with
    #  keys and values from the argument list, which are not
    #  yet combined into pairs
    case [%even(), .. kv]: 
      dispatch([yield (key, value): key, value in kv], stdComparator),    
      
    #Invalid argument list
    case _: error(
      "TreeMap() does not understand these argument types", 
      ArgMismatchException()
    )
  )

The variadic function dispatch understands diverse argument list formats and is called by the outwardly visible constructor. I wrote it before ParamPatterns existed, so it uses positional arguments only. The function head fn(args*) binds the entire argument list to the name args, which is then matched against various ListPatterns. Depending on which pattern matches, the arguments are transformed and used for building a TreeMap by means of the actual (non-variadic) constructor function ctor and the method putAll.

The penultimate case expression for example allows us to write:


TreeMap(key1, value1, key2, value2, key3, value3)

In order for every key to have a value, the argument count must be even. The pattern of the penultimate case expression matches (because of %even()) only if there actually is an even number of arguments. These are then combined into pairs using the iterable comprehension [yield (key, value): key, value in kv].

Share:  E-Mail Twitter Facebook LinkedIn Reddit
Contact and legal info | © 2016-08-07 siquod.org

Discussion

No entries yet

Add your comment


:
-- Enter the preceding text: