Chapter 4
The Standard Library
4.1 General Purpose Exceptions and Functions
The following exceptions are defined:
- exception Bind is raised when a value binding (val or val rec) fails to match. The Definition of
standard ML stipulates that Bind may be raised only in situations where a match is not exhaustive, and that the
compiler must warn the user of this. Map pattern matching complicates a lot static analysis of pattern matching,
so the HimML compiler does not warn the user for now. HimML does not give any warning about redundancy
of patterns either.
A typical example of a non-exhaustive pattern is:
let val a::l = r in a end
which raises Bind if r is the empty list.
- exception Match is raised when a function cannot be applied to its argument, though they agree on types. A typical
example is:
(fn (a::l) => a) r
or the equivalent:
case r of a::l => a
which raises Match if r is the empty list.
- exception ParSweep is raised when a comprehension (or quantification, or iteration) using the || separator is
meant to sweep through domains of different cardinalities (the ni in the explanation of comprehensions in section
2.1.1).
However, the moment where this exception is raised is voluntarily left unspecified. For example, the current interpreter
checks the cardinalities before sweeping, but it would be easier to check them after sweeping in compiled code, for
example.
- exception NoMem is raised when the garbage collector fails to get enough space to complete a computation. This
exception cannot be handled in production code, because trying to correct the problem usually implies that we allocate
some memory, which only makes the situation worse.
- exception Stack was intended to be raised whenever the stack overflowed. However, in HimML, it never overflows,
because when the stack becomes full, the system reifies the current continuation, empties the stack and proceeds with the
computation.
- exception NonSense is raised when evaluating a non-sensical expression. In theory, whether it is in Standard ML
or in HimML, this can never happen, because all expressions that type-check are well-formed. However, when some
expression fails to type-check, it is usually bad practice to stop compiling immediately. The usual solution is to allow for
several errors to occur before stopping compilation. In HimML, we have chosen not to stop compilation at all,
and instead to compile a special expression raise NonSense in place of all non-sensical (ill-typed)
expressions.
This exception can also be raised when reloading a module from a file, and then executing some code that contains
continuations coming from that file. Continuations cannot be saved to disk, so NonSense is then raised. If this happens,
the system should have given a warning when loading the module, saying that it was attempted to load a continuation
object. If this happens, it is probably a bug in HimML.
- exception Catch is raised when a continuation captured by catch (not by callcc) is thrown outside of its
defining dynamic scope (i.e, thrown twice, including the implicit throw that happens at the end of the evaluation of the
body of the function given to catch or callcc). A continuation reified by callcc will never give rise to such a
problem, however they are slower to handle. The exception Catch is not systematically raised each time a continuation
captured by catch is thrown twice. Sometimes, such continuations may assume the behavior of callcc
continuations.
- exception ReturnToTheFuture is an exception that should very rarely be raised. Its purpose is to correct a
subtle problem with continuations. At each instant, there is a notion of age in HimML, which counts the number of
toplevel declarations that have been evaluated. Continuations record the age where they were created. Then, throw is
illegal on future continuations (with an age higher than the current one). If throw is applied on such continuations, the
exception ReturnToTheFuture is raised. This can happen because we can create a toplevel reference r pointing
to a continuation, capture the current continuation (at age n), then evaluate some declarations, capture
the current continuation (at age n′ > n), store it in r, then relaunch the old age n continuation. If now,
we evaluate some other declarations and then throw the age n′ continuation stored in r, we would get
into an inconsistent state, because different time lines cannot cohabit in the current implementation (the
only way we know to do it would be to slow down the whole implementation drastically). So we forbid
this.
General functions are:
4.2 Lists
The exception:
exception Nth
is defined. It is raised whenever it is attempted to access an element outside of a list. Recall that lists are defined by the
datatype declaration:
datatype ’a list = nil — :: of ’a * ’a list
:: : 'a ⋆ 'a list -> 'a list is the list constructor, and is declared infix, right associative of precedence
5.
- null : 'a list -> bool returns true if the list in argument is empty. This is equivalent to
fn nil => true | _ => false
- hd : 'a list -> 'a returns the first element of the list in argument, or raises Match if the list is empty. This is
equivalent to:
fun hd (x :: _) = x
- tl : 'a list -> 'a list returns the list composed of all but the first element of the list in argument, or raises
Match if the list is empty. This is equivalent to:
fun tl (_ :: x) = x
- len : 'a list -> int computes the length of a list.
- nth : 'a list ⋆ int -> 'a gets the n + 1th element of the list l when applied to the arguments l and n. If n is
out of range (n < 0 or n >=len l), Nth is raised. Note that elements are indexed starting at 0. nth is declared infix,
left associative of precedence 9.
- nthtail : 'a list ⋆ int -> 'a list gets the nth tail of the list l when applied to the arguments l and n.
If n out of range (n < 0 or n >len l), Nth is raised. Note that tails are indexed starting at 0. When
n <len l, op nth (l,n) is the first element of nthtail l,n); if n =len l, nthtail l,n) is
nil.
- @ : 'a list ⋆ 'a list -> 'a list concatenates two lists. It could have been defined as:
fun op @ (nil,l) = l | op @ (a::l1,l2) = a:: op @(l1,l2)
@ is declared infix, right associative (as in Standard ML of New Jersey; the Definition of Standard ML defines it to be left
associative) of precedence 5.
- append : 'a list list -> 'a list is the distributed concatenation of lists. It could be defined
as:
fun append nil = nil | append (l::r) = l @ append r
The application of append to a list comprehension is compiled in an optimized form, where the concatenations are done
on the fly, without building the list comprehension first.
- revappend : 'a list ⋆ 'a list -> 'a list appends the reversion of the first list to the second list. We
could define:
fun revappend (nil,l) = l | revappend (a::l1,l2) = revappend (l1,a::l2)
- rev : 'a list -> 'a list reverses a list. It could be defined as:
fun rev l = revappend (l,nil)
- map : ('a -> 'b) -> 'a list -> 'b list applies its first argument to each element in turn of the list in
second argument, and return the list of all results. This is equivalent to:
fun map f l = [f x | x in list l]
- app : ('a -> 'b) -> 'a list -> unit applies its first argument to each element in turn of the list in second
argument. This is used purely for side-effects.
fun app f l = iterate f x | x in list l end
- fold : ('a ⋆ 'b -> 'b) -> 'b -> 'a list -> 'b combines the elements of the list in third argument
by applying the binary operation in first argument on all elements of the list. The second argument is assumed to be the
neutral element of the binary operation. For example, fold (op +) 0 l computes the sum of the
elements of the list l. fold computes from the right end of the list towards the left end. It can be defined
by:
fun fold f b =
let fun f2 nil = b
| f2 (e :: r) = f(e,f2 r)
in
f2
end
- revfold : ('a ⋆ 'b -> 'b) -> 'b -> 'a list -> 'b combines the elements of the list in third
argument by applying the binary operation in first argument on all elements of the list. The second argument is assumed
to be the neutral element of the binary operation. For example, revfold (op +) 0 l computes the sum of the
elements of the list l. revfold computes from the left end of the list towards the right end. It can be defined
by:
fun revfold f b =
let fun f2 b nil = b
| f2 b (e::r) = f2 (f (e,b)) r
in
f2
end
4.3 Sets and Maps
Exceptions related to sets and maps are:
- exception MapGet, raised when a map is applied to an element outside of its domain (with the ? function).
- exception Empty, raised when taking the distributed intersection of the empty set, or choosing an element
in the empty map.
- exception Range, raised when trying to build a range {m,…,n} with non-integer bounds (with the to
operator)
Functions on sets and maps are:
- empty : (''a -m> 'b) -> bool tests whether a map is empty. It is equivalent to fn m => m={}.
- ? : (''a -m> 'b) -> ''a -> 'b applies a map to an element. It is currified, so that the expression
?m(a) retrieves the element associated with a in m. If there is no element associated with a in m, the exception
MapGet is raised.
- inset : ''a ⋆ (''a -m> 'b) -> bool returns true if its first argument belongs to the domain of
the map in second argument. It can be defined as:
fun op inset (a,m) = (?m(a); true) handle MapGet => false
It is declared infix, of precedence 4.
? and inset share a one-entry cache, where the last maplet is stored, so that testing inset then using ? incurs almost
no speed penalty.
- inmap : (''a ⋆ ''b) ⋆ (''a -m> 'b) -> bool returns true if its first argument is a pair (x,y) such that
the maplet x => y is in the map in second argument. It can be defined as:
fun op inset (a,m) = (?m(a)=b) handle MapGet => false
It is declared infix, of precedence 4.
? and inset share a one-entry cache, where the last maplet is stored, so that testing inset then using ? incurs almost
no speed penalty.
- subset : (''a -m> 'b) ⋆ (''a -m> 'b) -> bool tests the inclusion of the domains of maps in
argument. It may be defined as:
fun op subset (m1,m2) = all x inset m2 | x in set m1 end
It is declared infix, of precedence 4.
- submap : (''a -m> ''b) ⋆ (''a -m> ''b) -> bool tests the inclusion of maps: it returns true if the first
map is a submap of the second. It may be defined as:
fun op submap (m1,m2) =
all x inset m2 andalso ?m1(x) = ?m2(x) | x in set m1 end
It is declared infix, of precedence 4.
- dom : (''a -m> 'b) -> ''a set computes the domain of a map; it is the identity on sets. It is syntactic sugar
for:
fun dom m = {x | x in set m}
- rng : (''a -m> ''b) -> ''b set computes the range of a map. The range of a non-empty set is always
{()}. rng is syntactic sugar for:
fun rng m = {y | _ => y in map m}
- card : (''a -m> 'b) -> int computes the cardinal of a map. It may be defined as:
fun card {} = 0 | card {_ => _} U m' = 1+card m'
- <| : (''a -m> 'c) ⋆ (''a -m> 'b) -> (''a -m> 'b) is the “domain restrict to” function. If s is a map
and m is another map, then s <| m is the map m, whose domain is restricted to the elements in the domain of s. This
may be defined by:
fun op <| (s,m) = {x => y | x => y in map m such that x inset s}
It is declared infix, right associative of precedence 7.
- <-| : (''a -m> 'c) ⋆ (''a -m> 'b) -> (''a -m> 'b) is the “domain restrict by” function. If s is a
map and m is another map, then s <-| m is the map m, whose domain is restricted to the elements outside the domain
of s. This may be defined by:
fun op <-| (s,m) = {x => y | x => y in map m such that not (x inset s)}
It is declared infix, right associative of precedence 7.
- |> : (''a -m> ''b) ⋆ (''b -m> 'c) -> (''a -m> ''b) if the “range restrict to” function. If s is a
map and m is another map, the m |> s is the map m, where only the maplets x => y such that y is in the domain of s
are considered. This may be defined by:
fun op |> (m,s) = {x => y | x => y in map m such that y inset s}
It is declared infix, left associative of precedence 7.
- |-> : (''a -m> ''b) ⋆ (''b -m> 'c) -> (''a -m> ''b) if the “range restrict by” function. If s is a
map and m is another map, the m |-> s is the map m, where only the maplets x => y such that y is outside the
domain of s are considered. This may be defined by:
fun op |-> (m,s) = {x => y | x => y in map m such that not (y inset s)}
It is declared infix, left associative of precedence 7.
- ++ : (''a -m> 'b) ⋆ (''a -m> 'b) -> (''a -m> b) is the map overwriting operator. It takes two maps
m and m′, and returns a map m′′, whose domain is the union of the domains of m and m′, and which maps every
element x of the domain of m′ to ?m′(x), and every other element x to ?m(x). Thus m′′ is m, over which the
associations of m′ have been written.
To underwrite instead of overwriting, write m′ ++ m instead of m ++ m′. The only difference is that m′ will be
evaluated before m.
++ is declared infix, left associative of precedence 6.
- overwrite : (''a -m> 'b) list -> ''a -m> 'b is the distributed overwriting function. It returns the first
map in the list, overwritten by the second, the third, etc. It is equivalent to:
fun overwrite nil = {}
| overwrite (m1::rest) = m1 ++ overwrite rest
The application of overwrite to a list comprehension is compiled in an optimized form, where the overwritings are
done on the fly, without building the list comprehension first.
- underwrite : (''a -m> 'b) list -> ''a -m> 'b is the distributed underwriting function. It returns the
last map in the list, overwritten by the next to last, the penultimate, etc. It is equivalent to:
fun underwrite nil = {}
| underwrite (m1::rest) = underwrite rest ++ m1
The application of underwrite to a list comprehension is compiled in an optimized form, where the underwritings are
done on the fly, without building the list comprehension first.
Note that overwrite o rev=underwrite, and underwrite o rev=overwrite.
- delta : (''a -m> 'b) ⋆ (''a -m> 'b) -> (''a -m> 'b) computes the symmetric difference of two
maps. This is the overwriting of one map by another, restricted by the intersection of the domains. It may be defined
by:
fun op delta (m,m') = (m' <-| m) ++ (m <-| m')
(or equivalently, (m <-| m') ++ (m' <-| m)). It generalizes the classical notion of symmetric difference of sets.
It is declared infix, left associative of precedence 7.
- choose : (''a -m> 'b) -> ''a chooses an element in the domain of the map in argument. It raises Empty if
the map is empty. It is syntactic sugar for:
fun choose {} => raise Empty
| choose {x => _,...} = x
or for:
fun choose m = case (some x | x in set m end) of
SOME x => x
| NONE => raise Empty
- choose_rng : (''a -m> 'b) -> 'b chooses an element in the range, and raises Empty if the map is empty.
The element chosen in the range is precisely the image by the map of the one chosen by choose, as shows the
equivalent form:
fun choose_rng m = ?m(choose m)
- split : (''a -m> 'b) -> (''a -m> 'b) ⋆ (''a -m> 'b) splits a map in two disjoint maps, whose
union is the original one. In general, the splitting won’t yield maps of equal (or nearly equal) cardinalities. However, the
splitting has the following properties:
- Splitting a map of cardinality greater than or equal to 2 yields two non empty maps. Hence, recursive
procedures that decompose their map arguments with split until its cardinality goes down to 1 or 0 will
always terminate.
- If (m1,m2)=split m, then all elements in the domain of m1 are less than elements in the domain of
m2 in the system order.
- Splitting depends only on the domain, not on the range. That is, assume if dom m=dom m′,
and both (m1,m2)=split m and (m′1,m′2)=split m′, then dom m1=dom m′1 and
dom m2=dom m′2.
- Splitting has the statistical property that if two maps have similar domains (that is, their domains differ
for a small number of elements), recursively splitting both domains will on average unearth an optimum
number of common subdomains. For a more detailed description of this property, refer to [11].
As a consequence, recursive memo functions with map arguments that recurse through splittings of their
arguments should be good incremental functions, recomputing quickly their results on data similar to
previous similar data.
- U : ''a set ⋆ ''a set -> ''a set is the union of two sets. It may be defined as a special case of map
overwriting, because sets are special cases of maps:
fun op U (s : ''a set,s' : ''a set) = s ++ s'
(or equivalently, s' ++ s). It is declared infix, left associative, of precedence 6.
- & : ''a set ⋆ ''a set -> ''a set is the intersection of two sets. It may be defined as a special case of map
domain restriction, because sets are special cases of maps:
fun op & (s : ''a set,s' : ''a set) = s <| s'
It is declared infix, left associative, of precedence 7.
- intersects : (''a -m> 'b) ⋆ (''a -m> 'c) -> bool returns true if its two arguments have domains
with a non-empty intersection. It can be defined as:
fun op intersects (m,m') = not (empty (m <| m'))
It is declared infix, of precedence 4.
- \ : ''a set ⋆ ''a set -> ''a set is the difference of two sets. It may be defined as a special case of the
domain restrict by operator, because sets are special cases of maps:
fun op & (s : ''a set,s' : ''a set) = s' <-| s
It is declared infix, left associative, of precedence 7.
- union : (''a set -m> 'b) -> ''a set is the distributed union of the sets in the domain
of the argument, it is quite similar to the overwrite and underwrite functions. It can be defined
as:
fun union {} = {}
| union {s => _} = s
| union ss = let val (s1,s2) = split ss
in
union s1 U union s2
end
The application of union to a set or map comprehension is compiled in an optimized form, where the unions are done
on the fly, without building the set comprehension first.
- inter : ((''a -m> ''b) -m> 'c) -> (''a -m> ''b) is the distributed intersection of maps (and hence
of sets, too) in the domain of its arguments. It is mostly used when ''b and 'c are both unit, in which case it
computes the distributed intersection of a set of sets. It raises Empty on the empty set. It can be defined
as:
fun inter {} = raise Empty
| inter {s => _} = s
| inter ss = let val (s1,s2) = split ss
in
inter s1 & inter s2
end
The application of inter to a set or map comprehension is compiled in an optimized form, where the intersections are
done on the fly, without building the set comprehension first.
- mapadd : (''a ⋆ 'b) ⋆ (''a -m> 'b) -> (''a -m> 'b) adds one maplet to a map, overwriting the
map. It may be defined as:
fun mapadd ((x,y),m) = m ++ {x => y}
It is only a bit faster than writing m ++ {x => y}.
- mapaddunder : (''a ⋆ 'b) ⋆ (''a -m> 'b) -> (''a -m> 'b) adds one maplet to a map,
underwriting the map. It may be defined as:
fun mapaddunder ((x,y),m) = {x => y} ++ m
It is only a bit faster than writing {x => y} ++ m (and the order of evaluation is different).
- mapremove : ''a ⋆ (''a -m> 'b) -> (''a -m> 'b) removes a maplet from a map. It may be defined
as:
fun mapremove (x,m) = {x} <-| m
It is only a bit faster than writing {x} <-| m.
- inv : (''a -m> ''b) -> (''b -m> ''a) inverses a map. Its definition is the same as:
fun inv m = {y => x | x => y in map m}
so in case m is not invertible, inv returns a map that maps y to the largest x in the system order such that m maps x to
y.
- O : (''b -m> 'c) ⋆ (''a -m> ''b) -> (''a -m> 'c) is the composition of maps. It is precisely
defined by:
fun op O (m,m') = {x => ?m(y) | x => y in map m' such that y inset m}
It is declared infix, left associative of precedence 3. It is the map version of the o function composition
operator.
- to : int ⋆ int -> int set is the range function. If a and b are the first and second argument respectively, to
returns the set of all integers x such that a ≤ x ≤ b. We could have defined to by:
fun op to (a,b) =
let fun f x = if x>b then {} else {x} U f(x+1)
in
f a
end
to is declared infix, left associative of precedence 9, so that we may write a to b.
- mapoflist : 'a list -> (int -m> 'a) converts a list to a map from its indices to its elements. It may be
defined by:
fun mapoflist l =
let fun f(_,nil) = {}
| f(n,a::l) = {n => a} ++ f(n+1,l)
in
f(0,l)
end
- inds : 'a list -> int set computes the set of indices of a list l, i.e, {0,…,len l-1}. It may be defined
by:
fun inds l = 0 to (len l-1)
or by val inds = dom o mapoflist.
- elems : ''a list -> ''a set computes the set of elements of the list in argument. It may be defined
by:
fun elems l = {x | x in list l}
or by val elems = rng o mapoflist.
There is now an alternative data type for maps in HimML, the table type. This implements imperative maps: unlike
the map type, objects of type table are updated in a destructive way, just like hash-tables. Objects of type
table, which we shall just call tables, are not really faster than using applicative references, but they put less
stress on the sharing mechanism and the garbage collector of HimML, which may result in space and even time
savings.
The type of tables mapping objects of type ''a to 'b is
type (''a, 'b) table
You can think as a kind of equivalent of the type (''a -m> 'b) ref. Associated functions are:
- table : unit -> (''_a, '_b) table creates a fresh, empty table. If tables were implemented as objects of
type (''a -m> 'b) ref, this would be equivalent to:
fun table () = ref {};
- t_get : (''a, 'b) table -> ''b -> 'a option reads an element off a table. Precisely, t_get t x
returns SOME y if x is mapped to y by the table t, and NONE if x has no entry in t. If tables were implemented as
objects of type (''a -m> 'b) ref, this would be equivalent to:
fun t_get t x =
SOME (?(!t) x) handle MapGet => NONE;
- t_put : (''a, 'b) table -> ''a ⋆ 'b -> unit, called as t_put t (x, y) adds a new binding from
x to y to the table t, erasing any previously existing binding for x. If tables were implemented as objects of type
(''a -m> 'b) ref, this would be equivalent to:
fun t_put t (x, y) =
t := !t ++ {x => y};
- t_put_behind : (''a, 'b) table -> ''a ⋆ 'b -> unit, called as t_put_behind t (x, y)
adds a new binding from x to y to the table t, except if any binding for x already existed. If tables were implemented as
objects of type (''a -m> 'b) ref, this would be equivalent to:
fun t_put_behind t (x, y) =
t := {x => y} ++ !t;
- t_remove : (''a, 'b) table -> ''a -> unit, removes any entry associated with its second argument
from the table given in first argument. If tables were implemented as objects of type (''a -m> 'b) ref, this would
be equivalent to:
fun t_remove t x =
t := {x} <-| !t;
- t_iter : (''a, 'b) table -> (''a ⋆ 'b -> bool) -> boolis the standard iterator over tables. This
is meant to implement a form of existential quantification, but can be used to loop over the table, and doing some
computation on each entry, just like the iterate quantifier. Calling t_iter t f iterates over all elements of the
table t, in some indefinite order, calls f (x,y) for each entry x => y in t. This stops when the table has been fully
traversed, and then returns false, or after the first call to f returns true, in which case iter t f returns
true.
If tables were implemented as objects of type (''a -m> 'b) ref, this would be equivalent to:
fun t_iter t f =
exists
f (x, y)
| x => y in map !t
end;
This can be used to implement iterate-style loops, by writing
t_iter t (fn (x, y) => (f(x,y); false)); ()
Warning: it is definitely a bad idea to modify the table t while you iterate on it.
- t_collect : (''a, 'b) table -> (''a ⋆ 'b -> (''c -m> 'd)) -> (''c -m> 'd) is an
iterator over tables, just like t_iter. This one is rather meant to implement set and map comprehensions:
t_collect t f iterates over all elements of the table t, in some indefinite order (although it is guaranteed to be the
same as the one used by t_iter), calls f (x,y) for each entry x => y in t, and computes the overwrite of all
maps returned by each call to f.
If tables were implemented as objects of type (''a -m> 'b) ref, this would be equivalent to:
fun t_collect t f =
overwrite [f (x, y)
| x => y in map !t];
For example, getting the contents of the table t as a map can be effected as:
t_collect t (fn (x, y) => {x => y})
Building the map of all x => y in t such that the predicate P (x,y) holds can be done by:
t_collect t (fn (x, y) => if P (x,y) then {x => y} else {})
Warning: it is definitely a bad idea to modify the table t while you iterate on it.
- t_reset : (''a, 'b) table -> unit resets the table in argument. If tables were implemented as objects of
type (''a -m> 'b) ref, this would be equivalent to:
fun t_reset t = (t := {});
4.4 Refs and Arrays
Refs and arrays are the fundamental mutable data structures of HimML, as in Standard ML of New Jersey (only refs exist in the
definition of Standard ML). Ref and array types always admit equality, even if their argument types do not. Equality is decided
according to the following rules:
- Every ref or array is equal to itself.
- Any newly created ref (by ref), or newly created array (by array or arrayoflist or iarray or
iarrayoflist), is different from any other ref or array.
In short, refs and arrays are not shared, and are compared by their
addresses.
Some syntax extensions are defined to allow you to write a more readable code to handle arrays. In particular:
There is however no similar syntactic sugar for non-polymorphic integer arrays (of type intarray), only for polymorphic
arrays (of type 'a array, for any 'a).
The following exception is defined:
exception Subscript
It is raised when accessing an element outside of an array, by sub or update, or by isub or iupdate.
The following functions are provided:
- ref : '1a -> '1a ref creates a new mutable reference.
- := : 'a ref ⋆ 'a -> unit modifies the value of a reference, that is, assigns the value of the second
argument to the reference in first argument. It is declared infix, of precedence 3.
- ! : 'a ref -> 'a is the dereferencing function. It gets the value stored inside a reference. It could have
been defined as:
fun !(ref v) = v
because ref is understood in ML as a (fake) datatype constructor.
- array : int ⋆ '1a -> '1a array creates an array with n elements equal to e, where n and e
are the arguments. If n is negative, the exception Subscript is raised. n is called the length of the
array.
- sub : 'a array ⋆ int -> 'a dereferences the nth element of the array a (indices start at 0), where a and n are
the arguments. If n is negative or greater than or equal to the length of the array, the exception Subscript is
raised.
A more readable notation for sub (a,n) is a.(n).
- update : 'a array ⋆ int ⋆ 'a -> unit modifies the nth element of the array a, replacing it by the value e,
where a, n and e are the arguments. If n is negative or greater than or equal to the length of the array, the exception
Subscript is raised.
A more readable notation for update (a, n, e) is a.(b) .:= e.
- length : 'a array -> int gets the length of the array in argument.
- arrayoflist : '1a list -> '1a array converts a list into an array with the same elements in the same
order. The length of the resulting array is exactly the length of the argument list.
- iarray : int ⋆ int -> intarray creates an array with n integer elements equal to e, where n and e
are the arguments. If n is negative, the exception Subscript is raised. n is called the length of the
array.
- isub : intarray ⋆ int -> int dereferences the nth element of the integer array a (indices start at 0), where a
and n are the arguments. If n is negative or greater than or equal to the length of the array, the exception Subscript is
raised.
- iupdate : intarray ⋆ int ⋆ int -> unit modifies the nth element of the array a, replacing it by the
integer value e, where a, n and e are the arguments. If n is negative or greater than or equal to the length of the array, the
exception Subscript is raised.
- ilength : intarray -> int gets the length of the array of integers in argument.
- iarrayoflist : int list -> intarray converts a list into an array with the same integer elements in the
same order. The length of the resulting array is exactly the length of the argument list.
4.5 Strings
Strings are ordered sequences of characters (we call size of the string the length of the sequence). ML does not provide a type
of characters, though one-character strings may serve this purpose. Strings are assumed coded according to the 7-bit ASCII
standard. However, characters are usually 8 bits wide: the characters of code greater than 127 are interpreted in a
system-dependent fashion. The string length is encoded separately from the sequence of characters that composes it: no special
character is reserved to represent the end of a string, in particular, any string may contain the NUL character
(\000).
The following exceptions are defined:
- exception Ascii, raised when using a number that is not the code of an ASCII code (with 8-bit characters,
a number that is not an integer between 0 and 255).
- exception StringNth, raised when accessing an element outside a string.
- exception RE of int, raised with an integer error code when the regexp function encounters an error.
The remsg function can be used to get a plain text description of the error.
The functions are:
- explode : string -> string list converts a string into the list of its characters, where characters
are represented as one element strings. For instance explode "abc" is ["a","b","c"].
- implode : string list -> string is the inverse of explode. If given a list of characters
(one-character strings), it produces the corresponding string. Actually, it accepts any list of strings, whatever
their size, and concatenates them, as a generalization of the intended semantics.
- ^ : string ⋆ string -> string concatenates two strings. It could have been defined as:
fun op ^ (s,s') = implode (explode s @ explode s')
(or implode [s,s'], noticing the generalization of implode though ^ is more efficient. ^ is declared infix, right
associative of precedence 6.
- concat : string list -> string is the distributed concatenation of strings. Because of the general character
of implode, implode and concat are synonymous.
- size : string -> int computes the size of a string. This may be defined by:
fun size s = len (explode s)
although it does not build the exploded list.
- chr : int -> string builds the string having as only character the character with code n given as argument. If n
is not the code of a character, the exception Ascii is raised.
- ord : string -> int gets the code of the first character of the string. The exception StringNth is raised if the
string is empty ("").
- ordof : string ⋆ int -> int gets the code of the n + 1th character in the string s, where s and n are the
arguments. The index n starts from 0. If n < 0 or n is greater than or equal to the size of s, the exception StringNth is
raised. This is equivalent to:
fun ordof (s,n) = ord (explode s nth n) handle Nth => raise StringNth
- substr : string ⋆ int ⋆ int -> string extracts a substring from the string s, given as first argument.
Call i the second argument, and j the third. substr(s,i,j) returns the string of all characters from indices i included
to j excluded. substr raises StringNth if i < 0 or j >len s, or i or j is not integer. It may be defined
as:
fun substr (s,i,j) =
let fun f k =
if k>=j
then ""
else chr (ordof (s,k)) ^ f(k+1)
in
f i
end
- strless : string ⋆ string -> bool compares strings in lexicographical order, the order on characters
being defined by the order on their codes. Equivalently:
fun op strless (s,s') =
let fun less (_,nil) = false
| less (nil,_) = true
| less (c::l,c'::l') =
ord c<ord c' orelse
ord c=ord c' andalso less(l,l')
in
less (explode s,explode s')
end
strless is declared infix, of precedence 4.
-
regexp : string ->
|[match : string -> (int ⋆ int) option,
matches : string -> bool,
matchsub : string ⋆ int ⋆ int -> (int ⋆ int) option,
subst : string -> string]|
builds a regular expression matcher and substitution engine, on the model of the Unix grep utility. This is based on Harry
Spencer’s regexp(3) package.
regexp re builds a non-deterministic finite automaton with some optimizations for recognizing the
language defined by the regular expression re. It returns a record with four methods, match, matches and
matchsub for matching, and subst for substituting substrings matched previously by match in a given
string.
The match function, applied on a string s to be searched, looks for the first (leftmost) occurrence of a substring of s that
is matched by the regular expression re. It returns SOME (i,j) if it has found one, and then substring (s,i,j) is the
matched substring. If it cannot find one, it returns NONE.
For example:
val |[match,...]| = regexp "yp";
builds a match function for finding the first occurrence of "yp" in a string. Then, to look for an occurrence of the latter
in various strings:
returns SOME (1,3).
The syntax of regular expressions is mostly as in any other regexp package. The special characters are:
- ^ matches the empty string, but only at the beginning of the string s to be matched.
- $, similarly, matches the empty string, but at the end of the string s to be matched. Therefore, to test whether a
string s is in the language defined by re, it is enough to test:
case #match (regexp "^re$") (s,0,size s) of SOME _ => true | _ => false
- \\< matches the empty string at the beginning of a word (any sequence of letters, digits and underscores).
- \\> matches the empty string at the end of a word.
- \\b matches the empty string at the beginning or end of a word.
- \\B matches the empty string everywhere except at the beginning or end of a word.
- \\w matches any word-constituent character (a letter, a digit, or an underscore).
- \\W matches any non word-constituent character (anything but a letter, a digit, or an underscore).
- \\ (one backslash character) can be used to quote the next character. This is useful when you wish to include a
special character, such as ., as an ordinary character.
- . matches any single character.
- […] defines a character group, so that [a-z] matches any letter in small caps, [^A-Z] matches any character that
is not a capital letter, and so on.
- (…) defines a group, on which operators like ⋆ or + for example can be applied as a whole. Groups can also be used
to mark matched substrings for future reference by the subst function. For example:
val |[match,subst,...]| = regexp "^(foo)⋆(a⋆)(bar)⋆$";
match "foofooaaaaaabar";
subst "\\0;\\1;\\2;\\3";
returns "foofooaaaaaabar;foo;aaaaaa;bar".
- \\1, …, \\9 can also be used to match whatever was last recognized by the corresponding parenthesized group. For
example:
val |[subst,match,...]| = regexp "^(a⋆)b\\1$";
match "aaabaaa";
returns SOME (0,7), and indeed "aaabaaa" is a group of n as (here, n = 3), followed by a b, and then the
same group of n as. This can be used to recognize sets of words that are non regular. (This feature does not work in
Harry Spencer’s version of the regexp package on which the HimML version is based, but works in
HimML.)
- ⋆ is a suffix. If g is a letter, a character group or a parenthesized group, g⋆ matches any sequence of zero or more
words, each one matched by g.
- + is as ⋆, except that it matches a sequence of at least one word.
- ? is as ⋆, except that it matches a sequence of at most one word.
The matches function is a trimmed down version of match, which only returns whether there is a match or none. So
#matches (regexp re) s is equivalent to:
(case #match (regexp re) s of SOME _ => true | NONE => false)
The matchsub function is, on the contrary, a puffed up version of match, which does not examine the whole string to
be matched, but only a substring: #matchsub (regexp re) (s,i,j) looks for a substring of s that would be
matched by re, but only between positions i and j. It then returns SOME (i′,j′) if it has found a match: i′ and j′ are the
bounds for the leftmost match between positions i and j. Otherwise, it returns NONE. Notice that i′ and j′ are positions in
s, not in the substring substr (s,i,j). In fact, #matchsub (regexp re) (s,i,j) is equivalent
to:
| (case | #match (regexp re) (substr (s,i,j)) of
|
| | SOME (i',j') => SOME (i+i',i+j') |
As substr, matchsub raise the exception StringNth if the positions i, j are out of bounds.
Any of these functions may raise the exception RE n, where n is an error code. regexp itself may raise it, when the
regular expression re contains a syntax error, or contains too deeply nested parentheses (the current limit is 10 levels),
etc. The remsg function provides a hopefully legible error message.
A possible bug is that NUL characters (of code 0) may be recognized erratically. It is better not to include any NUL
characters in either the regular expression or the string to be matched.
- remsg : int -> string translates the regular expression error code in argument (as returned as argument of a RE
exception) into a string in human-readable form.
- intofstring : string -> int converts the input string into an integer. This assumes the integer is written in
decimal; minus signs may be written in standard notation (-) or as in ML (~). No error is returned, even if the input
string is not the decimal representation of an integer. To check this, call regexp "[~-]?[0-9]+"
first.
Note: to convert back an integer i to a string, use
let val f as |[convert, ...]| = outstring ""
in
print f (pack (i:int));
convert ()
end
- numofstring : string -> num converts the input string into a real number. This assumes the real is written in
decimal; minus signs may be written in standard notation (-) or as in ML (~). No error is returned, even if the input
string is not the decimal representation of a real. To check this, call
regexp "[~-]?[0-9]+(\\.[0-9]⋆)?([eEfFgG][~-]?[0-9]+)?"
first.
Note: to convert back a number x to a string, use
let val f as |[convert, ...]| = outstring ""
in
print f (pack (x:num));
convert ()
end
4.6 Integer Arithmetic
HimML integers are machine integers, that is, integers in the range [min_int,max_int], and are represented with nbits
bits.
Exceptions on integer operations can be:
- exception Arith, which is raised whenever an arithmetic error occurs, for example division by 0.
(Overflow in addition or multiplication is normally not dealt with, and just produces wrong results—more
precisely, results modulo the word size—on most machines. This is implementation-dependent.)
Non-negative integers are 0, 1, 2, …, 42, …Negative integers are written with the ~ negation operator in
front.
The following operations are defined. Basic operations like +, -, etc. are not overloaded as in Standard ML. For example, +
is always an integer function. For analogous numerical (floating-point) functions, see Section 4.7. For example, floating-point
addition is #+, not +.
- + : int ⋆ int -> int is integer addition. It is declared infix, left associative of precedence 6.
- - : int ⋆ int -> int is integer subtraction. It is declared infix, left associative of precedence 6.
- ⋆ : int ⋆ int -> int is integer multiplication. It is declared infix, left associative of precedence 7.
- ~ : int -> int is integer negation.
- sqr : int -> int squares its argument.
- abs : int -> int computes the absolute value of its argument.
- < : int ⋆ int -> bool returns true if its first argument is less than its second argument.
- > : int ⋆ int -> bool returns true if its first argument is greater than its second argument.
- <= : int ⋆ int -> bool returns true if its first argument is less than or equal to its second argument.
- >= : int ⋆ int -> bool returns true if its first argument is greater than or equal to its second argument.
- min : '#a ⋆ '#a -> '#a returns the least of its two arguments.
- max : int ⋆ int -> int returns the greatest of its two arguments.
- div : int ⋆ int -> int is quotient extraction. It is declared infix, left associative of precedence 7. The
definition conforms to the definition of Standard ML, where the quotient has the smallest possible absolute value
(the remainder has the same sign as x).
- mod : int ⋆ int -> int is the remainder operation.
It is syntactic sugar for fn (x,y) => x - y⋆(x div y). It is declared infix, left associative of
precedence 7.
- divmod : int ⋆ int -> int ⋆ int returns both the quotient and remainder of the integers in
argument. This could be defined as fn (x,y) => (x div y,x mod y), but is normally faster.
- bor : int ⋆ int -> int computes the binary inclusive or (the disjunction) of the two integers in
argument. There is a one bit at some position in the result if there was a one at the same position in at least one
of the argument integers.
- bxor : int ⋆ int -> int computes the binary exclusive or of the two integers in argument. There is
a one bit at some position in the result if there was a one at the same position in exactly one of the argument
integers.
- band : int ⋆ int -> int computes the binary and (the conjunction) of the two integers in argument.
There is a one bit at some position in the result if there was a one at the same position in both of the argument
integers.
- bnot : int -> int computes the binary negation of the integer in argument. There is a one but at some
position in the result if there was a zero at the same position in the argument.
- lsl : int ⋆ int -> int computes the logical shift left of the first integer by the number of bits specified
in the second argument. Its behavior is unspecified if the number of bits is negative.
- lsr : int ⋆ int -> int computes the logical shift right of the first integer by the number of bits
specified in the second argument. Its behavior is unspecified if the number of bits is negative.
- asr : int ⋆ int -> int computes the arithmetic shift right of the first integer by the number of bits
specified in the second argument (arithmetic means that the sign bit is kept as is, and propagates through the
shifting). Its behavior is unspecified if the number of bits is negative.
- inc : int ref -> unit increments the integer pointed to by the reference in argument.
- dec : int ref -> unit decrements the integer pointed to by the reference in argument.
- rand : unit -> int draws a pseudo-random equidistributed number in the integer interval [min_int,
max_int]. The method used is based on a generalized feedback shift register [8], with generating polynomial
x55 + x24 + 1. The period is 255 - 1, which means that, practically, the generator never loops back to any
previous state.
- irand : int -> int draws a pseudo-random equidistributed number in the integer interval [0,x[,
where x is the given argument. If x ≤ 0, it instead draws an equidistributed pseudo-random integer in
[0,max_int] ∪ [min_int,x[ (think of integers as two’s complement integers). So if x = 0, this is equivalent
to rand. This calls zrand anyway.
This can be used to draw equidistributed integers in [a,b[ (with a < b) by calling a + irand(b - a).
4.7 Numerical Operations
The real and imaginary parts of numbers are floating-point numbers, which are classified according to the IEEE754 standard. A
floating-point number may be:
The fact that an infinity or a Nan is produced (in IEEE754 systems), or an Arith exception is raised (in non-IEEE754
systems) by a computation means usually that the computation is wrong. In this case, the exception system is
better suited to find numerical bugs, provided we have a means of locating the place where the exception is
raised.
However, in production code, these error conditions may still crop up for limit situations. It is then unacceptable for code to
stop functioning because of this: so infinities and NaNs should be used. In general, users had better use a system
obeying the IEEE754 standard (or its successors). The IEEE754 standard defines trapping mechanisms to be
used for debugging numerical codes; they are not used in HimML now, but may be in a future version of the
debugger.
Classification of numbers is accomplished with the auxiliary types:
datatype numClass = MNaN of int | MInf | MNormal | MDenormal
| Zero | PDenormal | PNormal | PInf | PNaN of int
numClass encodes both the class and the sign of a real number. In the case of a NaN, the NaN code is also provided as an
integer. NaN codes are system-dependent; only NaN codes from 1 to 7 are recognized in HimML; unrecognized codes are
represented by 0. NaN codes and the constructors MNaN, MInf, MDenormal, PDenormal, PInf, PNan are never used in a
non-IEE754 system. The exception:
exception Arith
is raised for all arithmetical errors in non-IEEE754 systems, and is never used in IEEE754 systems.
On a different subject, remember that testing complex numbers for equality is hazardous. Therefore, = is not to be used
lightly on numbers (for example, 0.2 ⋆ 5 may print as 1 but differ from 1 internally). Normally, though,
computations on sufficiently small integers should be exact (at least on real IEEE754 systems). This warning applies
not only to the explicit = equality function, but also to all operations that use internally the equality test, in
particular the test for being inside a set inset or the application of a map to an object ?, when this object contains
numbers.
Moreover, following the principle, valid in HimML, that an object is always equal to itself, +inf = +inf,
~inf = ~inf, and all NaNs are equal to themselves, though they are incomparable with any number (for example,
+NAN(0) <= +NAN(0) is false, but +NAN(0) = +NAN(0) is true).
Numerical constants are entered and printed in HimML as x or x:y, where x and y represent real numbers, possibly
followed by a scale. The notation x:y denotes the complex number x + i.y. There can be no confusion with the : character
used in type constraints, because no type may begin with a character lying at the start of a number. A real number is defined by
the regular expression (in lex syntax):
where DIG = [0-9] and INT = (\~?{DIG}+), with the obvious interpretations. Infinities and NaNs cannot be entered
in a non-IEEE754 system.
The exception:
is raised whenever an operation defined only on real values is applied on a complex with a non-zero imaginary part (even if
it is denormalized).
The numerical functions are:
- classify : '#a -> numClass ⋆ numClass classifies the real and imaginary parts of the numerical
value in argument.
- denormal : '#a -> bool returns true if and only if its argument is zero or a denormalized number. That
is, it returns true if and only if classify returns a pair of classifications, which are both MDenormal, Zero
or PDenormal. denormal is useful because dividing by a number is in fact legal only when this number
is not denormal, in the IEEE754 representation system. In short, being denormalized is the right criterion for
testing whether a number if invertible (and not comparison with 0).
- overflown : '#a -> bool returns true if and only if its argument is infinite or a NaN. That is, it returns
true if and only if its real part or its imaginary part is infinite or a NaN. This function is useful to detect when
a computation has produced an infinite number; usually, when doing some operations on infinite numbers, like
subtraction, the results are NaNs, hence the test for being either infinite or a NaN.
- #+ : '#a ⋆ '#a -> '#a is addition. It is declared infix, left associative of precedence 6.
- #- : '#a ⋆ '#a -> '#a is subtraction. It is declared infix, left associative of precedence 6.
- #⋆ : '#a ⋆ '#b -> '#a ‘ '#b is multiplication. It is declared infix, left associative of precedence 7.
- #/ : '#a ⋆ '#b -> '#a ‘ '#b^ ~1 is division; it does not raise an exception when dividing by 0 or
a denormalized, except on non-IEEE754 systems (the Arith exception). It is declared infix, left associative of
precedence 7.
- #^ : num ⋆ num -> num is exponentiation (as in FORTRAN); the typing engine knows a special case
about this one: when n is a constant number and e is an expression, then #^ in the expression e #^ n is given
type '#a ⋆ num -> '#a^n. It is declared infix, right associative of precedence 8.
We could almost define #^ as fn (a,b) => exp (b #⋆ log a), except for rounding mechanisms and
type inference.
- #~ : '#a -> '#a is negation.
- pi : num is the famous number π.
- fsqr : #a -> #a^2 squares its argument.
- fsqrt : '#a -> '#a^0.5 takes the square root of its argument. It is the principal determination of the
square root, defined by
=
.ei.θ∕2
for r ≥ 0, -π < θ ≤ π.
- fabs : '#a -> '#a computes the norm of its argument. The norm of z is |z| =
, where z is the
complex conjugate of z. When z is real, the norm of z is simply its absolute value.
- conj : '#a -> '#a computes the conjugate of a complex numerical value. The complex conjugate of
x + i.y is x + i.y = x - i.y. When z is real, the conjugate of z is z itself.
- re : '#a -> '#a computes the real part of its argument. It is always a real.
- im : '#a -> '#a computes the imaginary part of its argument. It is always a real.
- #< : '#a ⋆ '#a -> bool returns true if its first argument is less than its second argument, considered as
real quantities. The exception Complex is raised if one of the numbers was complex. Note that comparison of
NaNs may yield surprising results. For quantities with dimensions, the arguments are scaled to the default scale
of the dimension '#a, and then compared. Confusing results can ensue if you are using negative scaling factors.
- #> : '#a ⋆ '#a -> bool returns true if its first argument is greater than its second argument, considered
as real quantities. The exception Complex is raised if one of the numbers was complex. Note that comparison
of NaNs may yield surprising results. For quantities with dimensions, the arguments are scaled to the default
scale of the dimension '#a, and then compared. Confusing results can ensue if you are using negative scaling
factors.
- #<= : '#a ⋆ '#a -> bool returns true if its first argument is less than or equal to its second argument,
considered as real quantities. The exception Complex is raised if one of the numbers was complex. Note that
comparison of NaNs may yield surprising results. For quantities with dimensions, the arguments are scaled
to the default scale of the dimension '#a, and then compared. Confusing results can ensue if you are using
negative scaling factors.
- #>= : '#a ⋆ '#a -> bool returns true if its first argument is greater than or equal to its second
argument, considered as real quantities. The exception Complex is raised if one of the numbers was complex.
Note that comparison of NaNs may yield surprising results. For quantities with dimensions, the arguments are
scaled to the default scale of the dimension '#a, and then compared. Confusing results can ensue if you are
using negative scaling factors.
- fmin : '#a ⋆ '#a -> '#a returns the least of its two arguments, considered as real quantities. The
exception Complex is raised if one of the numbers was complex. Comparison of NaNs may give some surprises,
as well as the use of negative scaling factors. (See <.)
- fmax : '#a ⋆ '#a -> '#a returns the greatest of its two arguments, considered as real quantities. The
exception Complex is raised if one of the numbers was complex. Comparison of NaNs may give some surprises,
as well as the use of negative scaling factors. (See >.)
- exp : num -> num is the exponential, or natural anti-logarithm.
- log : num -> num is the principal determination of the natural logarithm (base e). In the complex plane,
log(r.ei.θ) = log r + i.θ for r ≥ 0 and -π < θ ≤ π.
- exp1 : num -> num is almost syntactic sugar for fn x => exp x #- 1.0, but gives precise results
even when the argument tends to zero.
- log1 : num -> num is almost syntactic sugar for fn x => log (1.0 #+ x), but gives precise
results even when the argument tends to zero.
- sin : num -> num is the (trigonometric) sine.
The default angle scale is the radian (defined as scale 1‘r = 1), but of course the scale system may be used
to input angles in any other unit.
- cos : num -> num is the (trigonometric) cosine. The default angle scale is the radian.
- tan : num -> num is the (trigonometric) tangent, quotient of sine by cosine.
- asin : num -> num is the principal determination of the arc sine. In the complex plane, asinx is defined
as -i.log(i.z +
).
- acos : num -> num is the principal determination of the arc cosine. In the complex plane, acosx is defined
as -i.log(z +
).
- atan : num -> num is the principal determination of the arc tangent. In the complex plane, atanx is
defined as i∕2.log i+z
i-z .
- sh : num -> num is the hyperbolic sine.
Equivalently, it might be defined as fn z => 0.5 #⋆ (exp z #- exp (#~ z)), except that this
would not be as precise near 0.
- ch : num -> num is the hyperbolic cosine function.
It could have been defined as fn z => 0.5 #⋆ (exp z #+ exp (#~ z)).
- th : num -> num is the hyperbolic tangent, quotient of the hyperbolic sine by the hyperbolic cosine.
- ash : num -> num is the principal determination of the argument hyperbolic sine. We have ashz =
log(z +
).
- ach : num -> num is the principal determination of the argument hyperbolic cosine. We have achz =
log(z +
).
- ath : num -> num is the principal determination of the argument hyperbolic tangent. We have athz =
1∕2.log 1+z
1-z .
- arg : num -> num is the principal determination of the argument. This is always a real in the semi-closed
interval ] - π,π]. It could be defined as fn x => im(log x).
- floor : num -> num computes the lower integer part, or floor, of a real: this is the greatest integer less
than or equal to the argument. It raises Complex if the argument is not real.
- ceil : num -> num computes the upper integer part, or ceiling, of a real: this is the smallest integer
greater than or equal to the argument. It raises Complex if the argument is not real. It is syntactic sugar for
fn x => #~ (floor (#~ x)).
- fdiv : '#a ⋆ '#a -> num is quotient extraction, it always returns an integer-valued real. This is
syntactic sugar for fn (x,y) => floor(re(x/y)). It is declared infix, left associative of precedence 7.
- fmod : '#a ⋆ '#a -> '#a is the remainder operation.
It is syntactic sugar for fn (x,y) => x #- y #⋆ (x fdiv y). It is declared infix, left associative of
precedence 7.
- fdivmod : '#a ⋆ '#a -> num ⋆ '#a returns both the quotient and remainder of the numerical
quantities in argument. This could be defined as fn (x,y) => (x fdiv y,x fmod y). fdivmod is
not especially quicker than computing fdiv and fmod separately, because the three functions maintain a
common one-entry cache for quotients and remainders.
- ldexp : '#a ⋆ int -> '#a applied to x and n computes x.2n.
- frexp : num -> num ⋆ int applied to a real number x≠0 returns a real number y of the same sign as x
and an integer n such that 0.5 ≤|y| < 1 and x = y.2n. If x = 0.0, returns (0.0,0). In general, if x is complex
and ℜx≠0, then it returns (y,n) such that 0.5 ≤|ℜy| < 1 and x = y.2n; if x is complex and ℜx = 0, then it
returns (x,0).
- modf : num -> num ⋆ num splits its real argument into integer and fractional part. If its argument is not
real, then it raises Complex. This function might be defined as fn x => (floor x, x #- floor x)
on non-negative numbers, but it differs on negative numbers; e.g., modf ~3.15 is (~3, ~0.15), not
(~4, 0.85).
- real : '#a -> bool tests whether the argument is real, that is, if it has a zero imaginary part. This is the
same as fn z => im z=0.0.
- integer : num -> bool tests whether the argument is an integer (in particular, real). This is the same as
fn z => floor(re z)=z.
- natural : num -> bool tests whether the argument is a natural number.
This is the same as fn z => integer z andalso z>=0.0.
- int : num -> int returns the integer value of the number in argument, as an integer. It raises Complex
if the argument is a complex number. If the argument is real, but not in the range [min_int,max_int], the
result is unspecified. If the argument is not an integer, rounding direction is unspecified.
- num : int -> num converts an integer to a real number having the same value.
- random : unit -> num draws a pseudo-random equidistributed number in the real interval [0,1[. The
method used is based on a generalized feedback shift register [8], with generating polynomial x55 + x24 + 1.
The period is 255 - 1, which means that, practically, the generator never loops back to any previous state.
- zrandom : unit -> num draws a pseudo-random equidistributed number in the complex square [0,1[×
[0,1[. The method used is the same as for random.
- maybe : unit -> bool draws a pseudo-random equidistributed boolean. The method used is the same as
for random and zrandom, and is quicker than the semantically equivalent
fn () => random() >= 0.5.
4.8 Large Integer Arithmetic
HimML includes a port of Arjen Lenstra’s large integer package, which provides arbitrary precision arithmetic, i.e., arithmetic
over a type Int that represents actual integers, without any size limitation as with the int type.
The exception:
exception Lip of int
is raised whenever an error occurs in any of the functions of this section. The numbers as arguments to the Lip exception are
as follows:
- 1: division by zero;
- 2: modulus is zero in modular arithmetic functions;
- 3: bad modulus in Montgomery arithmetic functions—this means the modulus is 0, negative or positive but
even;
- 4: Montgomery arithmetic modulus is undefined—use zmontstart to define it first;
- 5: moduli in the chinese remaindering function zchirem are not coprime;
- 6: a function was called that expected a positive argument, and got a zero or negative argument; this can be
raised with zln, zsqrt, and a few other functions like zchirem;
- 7: zbezout was called with both arguments zero, or zroot was called to compute an nth root with n = 0;
- 8: a bug occurred in zbezout, please report it to goubault@lsv.ens-cachan.fr, see MAINTENANCE
at the end of the OPTIONS file;
- 9: zchirem was called with identical moduli but different remainders;
- 10: an exponentiation function was called with a negative exponent;
- 11: the square root or some nth root with n even of a negative number was attempted;
- 12: a bug occurred in zpollardrho, please report it to goubault@lsv.ens-cachan.fr, see
MAINTENANCE at the end of the OPTIONS file;
- 13: the second argument to zjacobi or zsjacobi is even;
- 14: zrandompprime was given a non-positive value for q;
- 15: a bug occurred in zecm, please report it to goubault@lsv.ens-cachan.fr, see MAINTENANCE at
the end of the OPTIONS file.
- 16: a wrong base was supplied to one of the functions converting integers to a list of digits (e.g., zstobas); a
wrong base is one that is < 2.
- 17: too many small primes were requested.
- 18: one of the random prime generator functions failed.
- 19: zecm failed to factor the input argument, but did not estimate it was prime with any sufficiently high
probability.
The functions are classified into several subgroups.
Conversions
- Int : int -> Int converts a machine integer to an actual integer.
- zint : Int -> int converts an actual integer to a machine integer. In case of overflow, no exception is
raised: on machines with integers in two’s complement arithmetic on k bits (which is most common), zint n
the unique m, -2k-1 ≤ m < 2k-1, such that n = m mod 2k.
- znum : Int -> num converts an integer to an approximate floating-point value. Overflows are not checked,
and may yield unspecified results.
- zsbastoz : int ⋆ int list -> Int converts a list of digits in a given base to an integer. More
precisely, zsbastoz (b,[an,…,a1,a0]) = anbn + … + a1b + a0.
- zbastoz : Int ⋆ Int list -> Int converts a list of digits in a given base to an integer. More
precisely, zbastoz (b,[an,…,a1,a0]) = anbn + … + a1b + a0.
- zstobas : int ⋆ Int -> int list converts an integer to a list of digits in the given base. More
precisely, given a base b and an integer k, returns a list [an,…,a1,a0] such that 0 ≤ ai < b for every i, 0 ≤ i ≤ n,
an≠0 if k≠0, and anbn + … + a1b + a0 = |k|. Converts k = 0 to nil, and raises Lip 16 if b < 2.
- zstosymbas : int ⋆ Int -> int list converts an integer to a list of digits in the given base. More
precisely, given a base b and an integer k, returns a list [an,…,a1,a0] such that |ai|≤ b∕2 for every i, 0 ≤ i ≤ n,
an≠0 if n≠0, and anbn + … + a1b + a0 = |k|. Converts k = 0 to nil, and raises Lip 16 if b < 2.
- ztobas : Int ⋆ Int -> Int list converts an integer to a list of digits in the given base. More
precisely, given a base b and an integer k, returns a list [an,…,a1,a0] such that 0 ≤ ai < n for every i, 0 ≤ i ≤ n,
an≠0 if n≠0, and anbn + … + a1b + a0 = |k|. Converts k = 0 to nil, and raises Lip 16 if b < 2.
- ztosymbas : Int ⋆ Int -> Int list converts an integer to a list of digits in the given base. More
precisely, given a base b and an integer k, returns a list [an,…,a1,a0] such that |ai|≤ b∕2 for every i, 0 ≤ i ≤ n,
an≠0 if n≠0, and anbn + … + a1b + a0 = |k|. Converts k = 0 to nil, and raises Lip 16 if b < 2.
Large Integer Arithmetic
- zcompare : Int ⋆ Int -> int compares two integers, returns -1 if the first is less than the second, 0
if they are equal, and 1 if the first is greater than the second.
- zneg : Int -> Int returns the opposite of the integer in argument.
- zabs : Int -> Int returns the absolute value of the argument.
- zsign : Int -> int returns the sign of the argument, -1 if negative, 0 if zero, 1 if positive.
- zsadd : Int ⋆ int -> Int adds two integers. The first is a large integer, the second a machine integer.
- zadd : Int ⋆ Int -> Int adds two large integers.
- zsub : Int ⋆ Int -> Int takes two large integers m and n, and returns m - n.
- zsmul : Int ⋆ int -> Int multiplies two integers. The first is a large integer, the second a machine
integer.
- zmul : Int ⋆ Int -> Int multiplies two large integers. This uses Karatsuba’s algorithm, which
computes the product of two n-digit numbers in O(n1.58) steps. This is faster than naive multiplication (O(n2)
steps), and in theory is slower than Toom-Cook multiplication (O(n23.5
) steps) or Fast Fourier Transform
algorithms (O(nlog n) steps). But the latter are more complex, and start to be faster in practice for rather large
integers only.
- zsqr : Int -> Int takes the square of the large integer given as argument. This is faster than multiplying
the argument by itself, and uses a variant of Karatsuba’s algorithm.
- zsdivmod : Int ⋆ int -> Int ⋆ int computes the quotient and remainder of two integers given as
argument. The dividend m is a large integer, while the divisor n is a machine integer. This returns (q,r), where
m = nq+r, |r| < |n|, and the sign of r is the same as n. Note that this is not the same condition as for divmod.
Raises Lip 1 if n = 0.
- zsmod : Int ⋆ int -> int computes the remainder of two integers given as argument. The dividend
m is a large integer, while the divisor n is a machine integer. This returns r, where m = nq + r, |r| < |n|, and
the sign of r is the same as n. Note that this is not the same condition as for mod. Raises Lip 1 if n = 0.
- zdivmod : Int ⋆ Int -> Int ⋆ Int computes the quotient and remainder of two integers given as
argument. The dividend m and the divisor n are large integers. This returns (q,r), where m = nq +r, |r| < |n|,
and the sign of r is the same as n. Note that this is not the same condition as for divmod. Raises Lip 1 if
n = 0.
- zmod : Int ⋆ Int -> Int computes the remainder of two integers given as argument. The dividend m
and the divisor n are large integers. This returns r, where m = nq + r, |r| < |n|, and the sign of r is the same
as n. Note that this is not the same condition as for mod. Raises Lip 1 if n = 0.
- zsexp : Int ⋆ int -> Int, applied to (a,e), computes ae; note that, although a is a large integer, e is
a machine integer. Raises Lip 10 if e < 0 and |a|≠1. Note that this function may take some time and build a
huge result: time and space is O(elog a), i.e., proportional to the size of a times e itself, that is, an exponential
of its size. By convention, 00 = 1.
- zexp : Int ⋆ Int -> Int, applied to (a,e), computes ae; both a and e are large integers. Raises
Lip 10 if e < 0 and |a|≠1. Note that this function may take some time and build a huge result: time and space
is O(elog a), i.e., proportional to the size of a times e itself, that is, an exponential of its size. In short, although
e is a large integer, it should not be too large for zexp to return at all. By convention, 00 = 1.
- zlsl1 : Int -> Int multiplies the argument by 2, that is, shifts it left 1 bit.
- zasl : Int ⋆ int -> Int, called on (a,n), shifts a left n bits (if n ≥ 0), or right -n bits (if n < 0).
That is, it returns the greatest integer in absolute value that is less in absolute value than a.2n (this is exactly
a.2n unless n < 0).
- zasr1 : Int -> Int divides the argument by 2, that is, shifts it right 1 bit. For negative arguments, this
does not work as zdivmod, rather as divmod; this does not work as asr either. That is, while asr (~3,1)
returns ~2, zasr1 (Int ~3) returns ~1.
- zasr : Int ⋆ int -> Int, called on (a,n), shifts a right n bits (if n ≥ 0), or left -n bits (if n < 0).
That is, it returns the greatest integer in absolute value that is less in absolute value than a.2-n.
- zodd : Int -> bool returns true if and only if the argument is odd. Testing zodd (n) is equivalent to
zbit (n,0).
- zmakeodd : Int -> int ⋆ Int returns (k,a) with k highest such that the argument equals 2k.a. Note
that k is a machine integer, while a is a large integer. Raises Lip 7 if argument is 0.
- sqrt : int -> int returns the floor of the square root of the argument, which is a machine integer. Raises
Lip 11 if argument is negative.
- zsqrt : Int -> Int ⋆ Int returns (r,d), where r is the floor of the square root of the argument a, and
d is the remainder a - r2. Raises Lip 11 if argument is negative.
- zroot : Int ⋆ int -> Int returns the floor of the nth root of a, where (a,n) is provided as argument.
If a is negative and n is odd, returns the opposite of the nth root of -a. Raises Lip 7 if n = 0, Lip 11 if a
is negative but n is even, Lip 1 if a = 0 and n < 0.
- zlog : Int -> num returns the natural logarithm of the argument, or at least a good approximation (on
32-bit architectures, if computes it from the upper 60 bits, for a 56-bit result). If the argument x is negative, the
principal branch of the logarithm is chosen, and log(-x) + iπ is returned. If x = 0, then Lip 7 is raised.
- zjacobi : Int ⋆ Int -> intapplied to (m,n), returns the Jacobi symbol
. This is an extension
of the Legendre symbol (defined when n is prime), defined whenever n > 0 is odd, and which can also be
computed in polynomial time.
If n = ∏
i=1npi, where pi is prime, the Jacobi symbol
equals ∏
i=1n
, a product of Legendre
symbols. If p is prime and odd, the Legendre symbol
is 0 if m and p are not coprime, otherwise it is 1 if
m is a quadratic residue modulo p (i.e., m = x2 mod p for some x), otherwise it is -1.
This implies that in general, zjacobi (m,n) returns 0 is m and n are not coprime, and if it returns -1 then m
is not a quadratic residue modulo n.
Computation is based on the identities (where n > 0):
= 0;
= 1;
= (-1)(n-1)∕2
when
n is odd;
= (-1)(n2-1)∕8
when n is odd;
= (-1)(m-1)(n-1)∕4
when both m and n
are odd.
Raises Lip 6 if n ≤ 0, Lip 13 if n is not odd.
- zsjacobi : int ⋆ int -> int computes the Jacobi symbol of the integers in argument. See
zjacobi for details.
Bit Manipulation
Large integers also serve as bit vectors, of varying size. Every non-negative integer represents a finite set of bits, those that are
set in the binary representation of the integer; while every negative integer represents a cofinite set of bits (i.e., a set of bits
whose complementary is finite), those that are set in the two’s complement binary representation of the integer. For example, 23
is …00010111 in binary, and represents the set {0,1,2,4}, since 23 = 24 + 22 + 21 + 20. And -23 is …11101001, and
represents the set {0,3,5,6,7,…}.
- zsnbits : int -> int returns the number of bits needed to represent the argument x, which is a machine
integer. This is ⌈log 2(|x| + 1)⌉, where log 2 is base 2 logarithm.
- znbits : Int -> int returns the number of bits needed to represent the argument x, which is a large
integer. This is ⌈log 2(|x| + 1)⌉, where log 2 is base 2 logarithm.
- zsweight : int -> int returns the number of bits set to one in the binary representation of the machine
integer in argument. When the argument is negative, this is also the number of 0 bits in the binary negation of
the argument (see bnot), on two’s complement machines.
- zweight : Int -> int returns the number of bits set to one in the binary representation of |x|, where x
is the large integer in argument. In particular, for negative n, zweight (Int n) does not return the same
number as zsweight n.
- zlowbits : Int ⋆ int -> Int, applied to the large integer m and the machine integer n, returns the n
lowest bits of m as a large integer. Works as though m is in two’s complement, and always returns a non-negative
number.
- zhighbits : Int ⋆ int -> Int, applied to the large integer m and the machine integer n, returns the
n highest bits of |m| as a large integer. zhighbits (m,n) is equivalent to zasr (|m|,zsnbits (m) - n)
if zsnbits (m) > n, |m| otherwise.
- zcat : Int ⋆ Int -> Int concatenates the bit vectors in argument. Calling zcat (m,n) is equivalent
to zor (zasl (m,zsnbits (n)),n). Concatenating two bit vectors m and n, where n is instead known to
contain k significant bits, should instead be done by calling zor (zasl (m,k),n).
- zsreverse : int ⋆ int -> int returns the bit reversal of the machine integer m in first argument.
Second argument k is the number of bits in m that should be considered, and is truncated to 0 if negative, and
to the number of bits in a machine integer if greater.
- zreverse : Int ⋆ int -> Int returns the bit reversal of the large integer m in first argument. The
second argument k is the number of bits in m that should be considered, and is truncated to 0 if negative. The
integer m is viewed in two’s complement, and even if m < 0, zreverse (m,k) is ≥ 0. Just reversing m is
done by calling zreverse (m, znbits (m)), provided m ≥ 0.
- setofInt : Int -> int set returns the set of bits set to 1 in the integer argument. Raises Lip 10 if
the argument is negative, in which case the set would be infinite and hence not representable of type int set.
- Intofset : (int -m> 'a) -> Int returns the integer whose 1 bits are at positions specified by the
set in argument. To be precise, it returns ∑
i2i, where i ranges over all non-negative machine integers in the
domain of the map in argument. This function is not intended for sets of machine integers that are too large,
otherwise there is a risk of memory overflow.
- znot : Int -> Int returns the binary negation of the argument n. This is exactly the same as computing
-n - 1. Seeing integers as sets of machine integers, this computes the complement of the given set.
- zand : Int ⋆ Int -> Int return the bitwise conjunction of the integers in argument. As sets of machine
integers, zand computes the intersection of the sets described by the integers in argument.
- zor : Int ⋆ Int -> Int return the bitwise disjunction of the integers in argument. As sets of machine
integers, zor computes the union of the sets described by the integers in argument.
- zxor : Int ⋆ Int -> Int return the bitwise exclusive or of the integers in argument. As sets of machine
integers, zxor computes the symmetric difference of the sets described by the integers in argument.
- zbit : Int ⋆ int -> bool, applied to (m,n), returns bit number n of the large integer m. One bits are
returned as true, zero bits as false. Returns false if n < 0. The large integer m is taken to be in two’s
complement. This is equivalent to zand (m,Intofset{n})≠Int 0.
- zaddbit : Int ⋆ int -> Int, applied to (m,n), returns the large integer m with bit n set. Does
nothing if n < 0. The large integer m is taken to be in two’s complement. This is equivalent to zor
(m,Intofset{n}).
- zrembit : Int ⋆ int -> Int, applied to (m,n), returns the large integer m with bit n cleared. Does
nothing if n < 0. The large integer m is taken to be in two’s complement. This is equivalent to zand (m,znot
(Intofset{n})).
Euclidean Algorithms
- zgcd : Int ⋆ Int -> Int computes the greatest common divisor of the arguments by the binary
method. Raises Lip 7 if both arguments are 0.
- zgcde : Int ⋆ Int -> Int computes the greatest common divisor of the arguments, by the classical
Euclidean algorithm. This might be faster than zgcd in special cases. Raises Lip 7 if both arguments are 0.
- zbezout : Int ⋆ Int -> (Int ⋆ Int) ⋆ Int : zbezout (a,b) computes the gcd d of a and b,
and two coefficients xa, xb as in Bezout’s Theorem, i.e., such that a.xa + b.xb = d. This is also sometimes
called the extended Euclidean algorithm. The gcd is always ≥ 1.
This raises Lip 7 if both a and b are zero, in which case the gcd is undefined.
This might raise Lip 8, but should not. If this happens, this is a bug, and this should be reported to the author
goubault@lsv.ens-cachan.fr, see MAINTENANCE at the end of the OPTIONS file.
- zchirem : Int ⋆ Int ⋆ Int ⋆ Int -> Int applied to (ma,a,mb,b), returns d such that d = a
modulo ma, and d = b modulo mb, and 0 ≤ d < ma.mb—unless ma = mb, in which case a and b should be
equal, and d = a = b.
Raises Lip 6 if ma ≤ 0 or mb ≤ 0. Raises Lip 9 if ma = mb but a≠b, and Lip 5 if ma≠mb but ma and
mb are not coprime.
Standard Modular Arithmetic
- zmadd : Int ⋆ Int ⋆ Int -> Int adds modulo m: zmadd (a,b,m) computes (a + b) mod m. It is
assumed that 0 ≤ a,b < m; result is also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2 if
m = 0.
- zmsub : Int ⋆ Int ⋆ Int -> Int subtracts modulo m: zmsub (a,b,m) computes (a-b) mod m.
It is assumed that 0 ≤ a,b < m; result is also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2
if m = 0.
- zmmul : Int ⋆ Int ⋆ Int -> Int multiplies modulo m: zmmul (a,b,m) computes ab mod m. All
arguments are large integers. It is assumed that 0 ≤ a,b < m; result is also ≥ 0 and < m. Produces erratic
results otherwise. Raises Lip 2 if m = 0.
- zsmmul : Int ⋆ int ⋆ Int -> Int multiplies modulo m: zsmmul (a,b,m) computes ab mod m.
The first argument a is a large integer, while b is a machine integer. It is assumed that 0 ≤ a,b < m; result is
also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2 if m = 0.
- zmsmul : int ⋆ int ⋆ int -> int, called on (a,b,n), returns ab mod n; raises Lip 2 if n = 0.
All arguments are machine integers.
- zmsqr : Int ⋆ Int -> Int squares modulo m: zmsqr (a,m) computes a2 modm. It is assumed that
0 ≤ a < m; result is also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2 if m = 0.
- zmdiv : Int ⋆ Int ⋆ Int -> Int divides modulo m: zmdiv (a,b,m) computes a∕b mod m. It is
assumed that 0 ≤ a,b < m; result is also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2 if
m = 0, Lip 1 if b = 0. If m is not coprime with b, it might also be the base that the quotient is undefined, in
which case no exception is raised, rather some factor of m is returned.
- zminv : Int ⋆ Int -> Int computes inverses modulo m: zminv (a,m) computes a-1 mod m. It is
assumed that 0 ≤ a < m; result is also ≥ 0 and < m. Produces erratic results otherwise. Raises Lip 2 if
m = 0, Lip 1 if a = 0. If m is not coprime with a, it might also be the base that the inverse is undefined, in
which case no exception is raised, rather some factor of m is returned—namely gcd(a,m).
- zmsexp : int ⋆ int ⋆ int -> int, applied to (a,e,m), computes a|e| mod m. All arguments are
machine integers. Raises Lip 2 if m = 0. By convention, 00 = 1.
- zsmexp : Int ⋆ int ⋆ Int -> Int, applied to (a,e,m), computes ae mod m. The exponent e is a
machine integer; however, this is not faster than calling zmexp, since zsmexp calls zmexp. Raises Lip 2 if
m = 0, Lip 10 if e is negative and a and m are not coprime. By convention, 00 = 1.
- zmexp : Int ⋆ Int ⋆ Int -> Int, applied to (a,e,m), computes ae mod m. The exponent e is a
large integer. Raises Lip 2 if m = 0, Lip 10 if e is negative and a and m are not coprime. By convention,
00 = 1.
- zmexpm : Int ⋆ Int ⋆ Int -> Int, applied to (a,e,m), computes ae mod m, just like zmexp.
However, it uses the m-ary method, which is faster than zmexp if a is not too small. Raises Lip 2 if m = 0,
Lip 10 if e is negative and a and m are not coprime. By convention, 00 = 1.
- zm2exp : Int ⋆ Int -> Int, applied to (e,m), computes 2e mod m. Raises Lip 2 if m = 0,
Lip 10 if e is negative and m is even.
- zmexp2 : Int ⋆ Int ⋆ Int ⋆ Int ⋆ Int -> Int, called on (a1,e1,a2,e2,m), computes
a1e1.a2e2 mod m. Raises Lip 2 if m = 0, Lip 10 if e1 or e2 is negative. This uses Shamir’s method with
sliding window of size 1, 2 or 3, depending on the maximal size of e1 or e2. This should be used if e1 and e2 are
approximately of the same size.
Montgomery Modular Arithmetic
Modular multiplications can be done division free and therefore somewhat faster (about 20%), if the Montgomery
representation is used [10]. Converting to and from Montgomery representation takes one Montgomery multiplication each, so
it only pays to use Montgomery representation if many multiplications have to be carried out modulo a fixed odd
modulus.
To use Montgomery arithmetic, first initialize the modulus N by using zmontstart, and convert all operands to their
Montgomery representation by ztom, but do not convert exponents. Use the addition, subtraction, multiplication, squaring,
division, inversion, and exponentiation functions, which all start with zmont, below on the converted operands, just as you
would use the ordinary modular functions (starting with zm). The results can be converted back from Montgomery
representation to ordinary numbers modulo N using zmonttoz.
- zmontstart : Int -> unit initializes Montgomery arithmetic mod N, the argument. Raises Lip 3 if
N is not both positive and odd.
If no exception is raised, all subsequent computations using Montgomery arithmetic will use modulus N. Mixing
ordinary large integers and Montgomery numbers, or Montgomery numbers based on different moduli yields
surprising results; this should be left to experimented users only.
- ztomont : Int -> Int converts an ordinary large integer to the corresponding Montgomery number mod
N. Raises Lip 3 if N is undefined.
- zmonttoz : Int -> Int converts a Montgomery number mod N to the corresponding ordinary large
integer. Raises Lip 3 if N is undefined.
- zmontadd : Int ⋆ Int -> Int adds two Montgomery numbers mod N. This actually just calls zmadd
with N as modulus. Raises Lip 3 if N is undefined.
- zmontsub : Int ⋆ Int -> Int subtracts two Montgomery numbers mod N. This actually just calls
zmsub with N as modulus. Raises Lip 3 if N is undefined.
- zsmontmul : Int ⋆ int -> Int takes one Montgomery number mod N and an ordinary machine
integer, multiplies them and returns the result as a Montgomery number mod N. This actually just calls zsmmul
with N as modulus. Raises Lip 3 if N is undefined.
- zmontmul : Int ⋆ Int -> Int multiplies two Montgomery numbers mod N. This is essentially the
only function, apart from zmontsqr and zmontinv, that justifies using Montgomery arithmetic. Raises
Lip 3 if N is undefined.
- zmontsqr : Int -> Int squares the Montgomery number mod N in argument. This is essentially the
only function, apart from zmontmul and zmontinv, that justifies using Montgomery arithmetic. Raises
Lip 3 if N is undefined.
- zmontdiv : Int ⋆ Int -> Int divides two Montgomery numbers mod N. Raises Lip 3 if N is
undefined, Lip 1 if second argument is 0 mod N. If N is not coprime with the second argument, it might also
be the case that the quotient is undefined. In this case, no exception is raised, rather some factor of N is returned
(as a regular integer, not as a Montgomery number).
- zmontinv : Int -> Int computes the inverse of a Montgomery number mod N. Raises Lip 3 if N is
undefined, Lip 1 if argument is 0 mod N. If N is not coprime with the argument, it might also be the case that
the inverse is undefined. In this case, no exception is raised, rather some factor of N is returned (as a regular
integer, not as a Montgomery number).
This is one of the rare functions, apart from zmontmul and zmontsqr, that justifies using Montgomery
arithmetic. Indeed, computing the inverse mod N in Montgomery representation means doing just one
Montgomery multiplication by a constant.
- zmontexp : Int ⋆ Int -> Int applied to (a,e), computes ae mod N. Although a is a Montgomery
number, e is not, and is a regular large integer. Raises Lip 3 if N is undefined, Lip 10 if e is negative and a
and N are not coprime. By convention, 00 = 1.
- zmontexpm : Int ⋆ Int -> Int, called on (a,e), computes ae mod N, just like zmontexp.
However, it uses the m-ary method, which is faster than zmontexp if a is not too small. Raises Lip 3 if N is
undefined, Lip 10 if e is negative and a and N are not coprime. By convention, 00 = 1.
- zmontexp2 : Int ⋆ Int ⋆ Int ⋆ Int -> Int, called on (a1,e1,a2,e2), computes a1e1.a2e2 mod
N. Raises Lip 2 if m = 0, Lip 10 if e1 or e2 is negative. This uses Shamir’s method with sliding window
of size 1, 2 or 3, depending on the maximal size of e1 or e2. This should be used if e1 and e2 are approximately
of the same size.
Primes, Factoring
- primes : int -> unit -> int is a small prime enumerator generator. That is, first call primes n for
some integer n; this returns a function, call it nextprime. Then calling nextprime () repeatedly returns
2, then 3, 5, 7, 11, …, i.e., all primes that hold in a machine integer (small primes). After nextprime has
exhausted all small primes, raises Lip 17.
Note that each new call to primes generates a new prime enumerator. That is, just calling primes n ()
repeatedly will just return 2 each time, by computing a new enumerator each time, and calling it only once.
The small prime number enumerator primes is basically an implementation of Eratosthenes’ sieve. The
number n is an estimation of the largest prime you will need. It serves to allocate a table of m elements, where m
is the largest integer such that 2m(2m+1) ≤n. On 32 machines, calling primes max_int is recommended.
On 64 machines, doing so will generate a huge table, which will take a few seconds just to initialize, so a smaller
value of n, say, 5 000 000 000, is recommended.
- zpollardrho : Int ⋆ int -> (Int ⋆ Int) option is an implementation of Pollard’s ρ
algorithm. zpollardrho (n,k) tries to factor n, in k iterations or less. The value k = 0 is special and is
meant to denote no bound on the number of iterations.
If the algorithm succeeds, it returns SOME (r,c), where r is a non-trivial factor of n, c = n∕r and r ≤ c. If
n < 0, returns SOME (-1,n). If the algorithm fails, returns NONE.
Raises Lip 12 if a bug occurred in zpollardrho. If this happens, this should be reported to the author
goubault@lsv.ens-cachan.fr, see MAINTENANCE at the end of the OPTIONS file.
- ztrydiv : Int ⋆ int ⋆ int -> (int ⋆ Int) option applies to a large integer n, and two
machine integers a and b, and computes the smallest (positive) prime divisor p of n that is ≥ a and ≤ b. If p
exists, it returns SOME (p,n∕p), otherwise NONE.
This works by calling primes b to get a function that enumerates all small primes. All primes < a are first
discarded, then all primes between a and b are tested. This is only intended for large n, as it does not stop as
soon as it goes past
, for small a, as enumerating all primes < a takes some time for large a, and for b not
too large (see remark on the efficiency of primes on 64 bit machines).
- zprime : Int ⋆ int -> bool takes a large integer m and a machine integer k, and tests whether m is
prime. This runs a probabilistic tests for at most k + 1 runs. It first uses ztrydiv to find small factors first. If
none is found, the Miller-Rabin test is run k + 1 times: when zprime returns false, then m is definitely not
prime, otherwise it is prime with probability at least 1 - (1∕4)k+1.
- zrandl : int ⋆ (int -> int) -> Int draws an equidistributed large integer of |k| bits, where k
is the first argument. If k < 0, returns opposites of such numbers. The second argument is a pseudo-random
generator function, like irand, which takes a bound x and returns an equidistributed random integer in
[0,x[. Using irand is not recommended for cryptographic applications, instead cryptographically secure
pseudo-random number generators should be used.
The second argument should preferrably be a function that does not raise any exception, otherwise memory
leaks may occur.
- zrandl1 : int -> Int is equivalent to fn nbits => zrandl (nbits, irand), but faster.
- zrand : Int ⋆ (int -> int) -> Int draws an equidistributed large integer of absolute value < m,
where m is the first argument (or returns 0 if m = 0), and of the same sign as m. The second argument is a
pseudo-random generator function, like irand, which takes a bound x and returns an equidistributed random
integer in [0,x[. Using irand is not recommended for cryptographic applications, instead cryptographically
secure pseudo-random number generators should be used.
The second argument should preferrably be a function that does not raise any exception, otherwise memory
leaks may occur.
- zrand1 : Int -> Int is equivalent to fn nbits => zrand (nbits, irand), but faster.
- zrandprime : int ⋆ int ⋆ (int -> int) -> Int applied to k, n and f, draws random primes
of n bits. This calls f to draw random numbers, which are then tested for primality.
More precisely, if |n|≥ 2, then this returns a random probable prime of |n| bits, where prime testing is done by
calling zprime with argument k; if |n| < 2, then it raises Lip 18. If n < 0, in addition, the returned prime
number is congruent to 3 modulo 4.
This works by picking odd numbers of the right size, keeping adding two until it is probably prime; or it
is too large, in which case we pick again, and start adding again, in accordance with NIST DSS Appendix.
Using f =irand is not recommended for cryptographic applications, instead cryptographically secure
pseudo-random number generators should be used.
The third argument should preferrably be a function that does not raise any exception, otherwise memory leaks
may occur.
- zrandprime1 : int ⋆ int -> Int is equivalent to
fn (nbits, ntries) => zrandprime (nbits, ntries, irand)
- zrandpprime : int ⋆ int ⋆ Int ⋆ (int -> int) -> Int ⋆ Int is used to generate a probable
prime number p of exactly |k| bits such that q divides p- 1, where q is given. The arguments are k, a machine integer n
(used in calling zprime, so that p is prime with probability ≥ 1 - (1∕4)n+1), the large integer q, and a
pseudo-random generator function f taking a bound x and returning an equidistributed random integer in
[0,x[, e.g., irand. Raises Lip 14 if q ≤ 0, Lip 18 if failed to generate p, typically because q already
uses ≥|k| bits. Otherwise, returns (p,⌊p∕q⌋). If k < 0, in addition, p will be congruent to 3 modulo
4.
This works by generating p at random amongst |k|-bit numbers such that q divides p - 1, until p is probably prime, in
accordance with NIST DSS Appendix. This only works if n is substantially larger than znbits (q). Using f =irand
is not recommended for cryptographic applications, instead cryptographically secure pseudo-random number generators
should be used.
zrandpprime is used to generate pairs (p,q) of prime numbers such that q divides p- 1, where p and q have specified
numbers of bits. To do this, first generate q using zrandprime, then call zrandpprime to generate
p.
The fourth argument should preferrably be a function that does not raise any exception, otherwise memory leaks may
occur.
- zrandpprime1 : int ⋆ int ⋆ Int -> Int ⋆ Int is equivalent to
fn (k, n, q) => zrandpprime (k, n, q, irand)
- zrandfprime : int ⋆ int ⋆ Int ⋆ (int -> int) -> Int ⋆ Int generates a pair of probable prime
numbers p and q such that q has |k| bits and p = qr + 1, where k and r are given. The arguments are (k,n,r,f), where n
is given to zprime to test whether both p and q are probably prime, r is the machine integer above, and f is a
pseudo-random generator taking a bound x and returning an equidistributed random integer in [0,x[, e.g., irand. If
k < 0, in addition q will be congruent to 3 modulo 4. Raises Lip 18 if fails, typically if r < 2 or r is
odd.
Using f =irand is not recommended for cryptographic applications, instead cryptographically secure pseudo-random
number generators should be used.
The fourth argument should preferrably be a function that does not raise any exception, otherwise memory leaks may
occur.
- zrandfprime1 : int ⋆ int ⋆ Int -> Int ⋆ Int is equivalent to
fn (k, n, r) => zrandfprime (k, n, r, irand)
- zrandgprime : int ⋆ int ⋆ bool ⋆ (int -> int) -> Int ⋆ Int ⋆ Int takes four
arguments k, n, first, and f, and returns p,q,g where p and q random probable primes (tested with zprime
with argument n; if k < 0, in addition, q is congruent to 3 modulo 4) such that p = 2q + 1 (this uses
zrandfprime), and a generator g of the multiplicative group of all numbers prime to p less than p. If first is
true, then g will be the smallest generator, otherwise g will be selected at random. Raises Lip 18 if it
fails.
Using f =irand is not recommended for cryptographic applications, instead cryptographically secure pseudo-random
number generators should be used.
The fourth argument should preferrably be a function that does not raise any exception, otherwise memory leaks may
occur.
- zrandgprime1 : int ⋆ int ⋆ bool -> Int ⋆ Int ⋆ Int is equivalent to
fn (k, n, first) => zrandfprime (k, n, first, irand)
- zecm : Int ⋆ int ⋆ int ⋆ int -> (Int ⋆ bool) option, called on (n,ncurves,ϕ1,ntries), tries to
factor the large integer n. It returns NONE if n was found to be probably prime after ntries primality tests. Otherwise, it
returns SOME (f,b): if b =true, then f is a non-trivial factor of n; if b =false (which should only happen in very
exceptional circumstances), then a non-trivial factor of n can be obtained by looking at the factorization of
fn - f.
This runs by using Arjen Lenstra’s elliptic curve algorithm. Up to ncurves random elliptic curves are drawn with first
phase bound ϕ1 (this increases by 1.02 at each new elliptic curve).
This may raise Lip 19 if zecm fails to reach any conclusion, or if n is too small (on 32-bit machines,
n ≤ 215 = 32768; in general, n >
is never too small: in these cases, use ztrydiv first).
Raises Lip 15 if a bug occurred in zecm. In case this happens, this should be reported to the author
goubault@lsv.ens-cachan.fr, see MAINTENANCE at the end of the OPTIONS file.
4.9 Input/Output
Two types are defined, first that of output streams outstream:
type 'rest outstream = |[put:string -> unit,... : 'rest]|
means that output streams are extensible records (classes) with at least a put method, to output a string to the stream. For
example, to make a stream that prints to a string stored inside a ref sr:string ref:
|[put = fn text => (sr := !sr ^ text)]|
The type of input streams instream is:
type 'rest instream : |[get:int -> string,... : 'rest]|
providing at least a get method, that reads up to n characters from the stream, n being the number in argument. If less than
n characters have been read, it usually means that an end of stream has been reached. (After the end of stream is reached,
repeated calls to get will return the empty string.) For example, an input stream reading from a string s:string, with
current position stored in a ref posr:num ref is:
|[get = fn n => let val goal = !posr+n
val newpos = (if goal<size s
then goal
else size s)
in
substr(s,!posr,newpos)
before (posr := newpos)
end]|
Note that if you want to read from a string, the instring primitive does this faster.
The following exception is defined:
exception IO of int
is raised when the system was unable to open a file or any other I/O-related operation; it returns as argument the error code
returned by the operating system. (Note that this is OS-dependent.)
The other methods that may be present in predefined streams are:
- flush : unit -> unit flushes the stream, when it is buffered, emptying the buffer.
- seek : int -> unit sets the current position in the file to the integer in argument; if this argument is
negative, it does as if it were 0, that is it goes to the beginning of the file; if it is larger than the file size, it sets
the position to the end of the file.
- advance : int -> unit does a seek from the current position (given by the tell method).
- seekend : int -> unit does a seek from the end of file. Notice that interesting argument values in this
case are negative.
- tell : unit -> int returns the current position in the file, starting at the beginning, for which it is 0.
- getline : unit -> string reads a whole line, until and including the newline character. If instead the
end of file is reached, there won’t be any newline character in the resulting string. After the end of file is reached,
repeated calls to getline will return the empty string.
- truncate : unit -> unit truncates a file open for writing (or for appending) at the current position.
This is useful notably when seeking back into the file, and writing there, to discard any spurious data that might
remain following the current position in the file.
- close : unit -> unit closes the current file. However, although the file is physically flushed and closed
(so that, on systems with exclusive access to files, others can afterwards access the file), it will only be
definitively closed when the garbage collector has determined that the associated stream was not used any more.
If any method is invoked on a closed file, the file is first automatically reopened in the state it was in when it was
last closed (except if it was not needed, like for the tell method). This way, even with strange control flows,
coming for example from uses of callcc and throw, everything should go smoothly (no need to forecast
when to reopen the file).
The input/output values and functions are:
- print : 'a outstream -> dynamic -> unit prints the value stored in the dynamic on the
specified output stream, in the same format that is used by the toplevel loop. The type is not
printed, though. Strings are printed with quotes, and with control characters escaped. If it is desired
to print strings merely as the sequence of their characters, apply the put method of the stream. Try:
#put stdout "Hello, world\n". (And don’t forget to #flush stdout () afterwards to see your
result printed.)
- pretty : 'a outstream -> dynamic -> unit pretty-prints the value stored in the dynamic on the
specified output stream, as done by the toplevel loop. The type is not printed, and the right margin is the system
default (usually, 80).
- stdout : |[put:string -> unit,flush:unit -> unit]| is the standard output stream, that
prints on the console if not redirected. This stream is buffered, the flush method flushes the buffer.
- stderr : |[put:string -> unit,flush:unit -> unit]| is the standard error stream; it prints
on the console if not redirected. This stream is buffered, the flush method flushes the buffer (on most operating
systems, an end-of-line also flushes the buffer).
-
outfile : string -> |[put:string -> unit,
flush:unit -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
close:unit -> unit]|
opens the file whose name is given, creating it if it does not exist, reinitializing it to zero length, and returns an output
stream. If the file could not be opened, the IO exception is raised, applied to the error code.
The stream is buffered, the flush method flushes the buffer.
-
appendfile : string -> |[put:string -> unit,
flush:unit -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
close:unit -> unit]|
opens the file whose name is given, creating it if it does not exist, setting the current position to the end of file, and
returns an output stream. If the file could not be opened, the IO exception is raised, applied to the error
code.
The stream is buffered, the flush method flushes the buffer.
-
outprocess : string ⋆ string list -> |[put:string -> unit,
flush:unit -> unit,
kill:unit -> unit]|
creates a process which will execute the shell command given in argument in parallel with the current HimML process.
This shell command is given in the form (command name, arguments). The command name is searched for, using the
PATH shell variable, with arguments as given.
Strings can be sent to the standard input of this process by using the put method (followed by flush to really send the
message), and the process can be terminated by calling the kill method.
Contrarily to files, which can be closed and then revived when necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to write to a dead process.
If the process could not be created, then an IO exception is raised (normally, IO 2, “no such file or directory”).
The kill method raises an IO exception when the process exited with a non-zero exit code (as returned by the C
function exit). Then, if n is this code, IO n is raised. (To be fair, n is first taken modulo 256, and only if the result is
non-zero is the exception raised, with n mod 256 as argument.)
The child process can exit by itself, and this can be detected by the fact that putting a string to the child (then
flushing, to be sure that the message has really been sent) will raise an IO error (normally, IO 32, “broken pipe”). It
is then good policy to kill the process, as it allows the operating system to reclaim process structures allocated for the
child (at least on Unix, where this is necessary).
- stdin : |[get:int -> string,getline:unit -> string]| is the standard input stream, that reads
from the console if not redirected. This stream is usually buffered, so that characters cannot be read until a newline
character \n is typed as input.
-
infile : string -> |[get:int -> string,
getline:unit -> string,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
close:unit -> unit]|
opens the file whose name is given, and returns an input stream. If the file could not be opened, the IO exception
is raised, applied to the error code. The stream is buffered for speed, but no flushing method should be
necessary.
-
inprocess : string ⋆ string list -> |[get:int -> string,
getline:unit -> string,
kill:unit -> unit]|
creates a process which will execute the shell command given in argument in parallel with the current HimML process.
This shell command is given in the form (command name, arguments). The command name is searched for, using the
PATH shell variable, with arguments as given.
All text that is printed on the parallel process’s standard output can be read by the HimML process by using the get and
getline methods, and the process can be terminated by calling the kill method.
Contrarily to files, which can be closed and then revived when necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to read from a dead process.
If the process could not be created, then an IO exception is raised (normally, IO 2, “no such file or directory”).
The kill method raises an IO exception when the process exited with a non-zero exit code (as returned by the C
function exit). Then, if n is this code, IO n is raised. (To be fair, n is first taken modulo 256, and only if the result is
non-zero is the exception raised, with n mod 256 as argument.)
The child process can exit by itself, and this can be detected by the fact that the get and getline methods will all
return empty strings (end of file, at least after the internal buffer is emptied). It is then good policy to kill the process,
as it allows the operating system to reclaim process structures allocated for the child (at least on Unix, where this is
necessary).
-
instring : string -> |[get:int -> string,
getline:unit -> string,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int]|
opens a stream for reading on the string given as argument . This is a souped up version of the input stream example at
the beginning of the section.
-
outstring : string -> |[put:string -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
convert:unit -> string]|
opens a stream to write on, as if it were a file. The convert method is used to get the current contents of the stream in
the form of a string. This is useful notably to print data to a string. The stream is initialized with the string given as
argument to outstring.
This is a souped up version of the output stream example at the beginning of the section.
-
inoutprocess : string ⋆ string list -> |[get:int -> string,
getline:unit -> string,
put:string -> int,
flush:unit -> unit,
kill:unit -> unit]|
creates a process which will execute the shell command given in argument in parallel with the current HimML process.
This shell command is given in the form (command name, arguments). The command name is searched for, using the
PATH shell variable, with arguments as given.
All text that is printed on the parallel process’s standard output can be read by the HimML process by using the get and
getline methods; moreover, HimML can sent data, as text, by writing to its standard input with the put method,
while flush empties the output buffers to really send the text to the process; and the latter can be terminated by calling
the kill method.
Contrarily to files, which can be closed and then revived when necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to read from a dead process. For other remarks, see items inprocess and
outprocess.
- delete : string -> unit deletes the file whose name is given in argument. If any I/O error occurs, then the
exception IO is raised, applied to the corresponding error code. In particular, if the argument is the name of a directory,
rmdir should be used instead.
- rename : string ⋆ string -> unit renames the file whose name is given as first argument to the name given
as second argument. If any I/O error occurs, then the exception IO is raised, applied to the corresponding error code.
Note that the capabilities of rename vary greatly from system to system. For example, rename can move files from any
place to any other place on BSD Unix; this is restricted on Unix System V, Amiga and Mac systems to move files only
inside the same volume (file system).
- filetype : string -> string set takes the name of a file and returns a set of properties of this file as
character strings. If the file does not exist or any other I/O error occurs, then the exception IO is raised, applied to the
corresponding error code. Otherwise, the following strings are used for properties:
- "a" means the file is an archive (Amiga only);
- "b" means this is a block special file (Unix only);
- "c" means this is a character special file (Unix only);
- "d" means the file is a directory;
- "e" means the file is erasable, either by delete or by rmdir (by the owner, on Unix systems; by all,
on other systems); on Unix, "eg" means the file is writable by users in the same group, "eo" means the
file is writable by all other users.
- "f" means this is a fifo, a.k.a. a pipe (Unix only);
- "g" means the file has the setgid bit set (Unix only);
- "l" means the file is a symbolic link (Unix only);
- "n" means this is a regular (normal) file;
- "p" means the file is pure (Amiga only);
- "r" means the file is readable (by the owner, on Unix systems; by all, on other systems); on Unix, "rg"
means the file is readable by users in the same group, "ro" means the file is readable by all other users.
- "s" means the file is a script file (Amiga only);
- "S" means the file is a socket (Unix only);
- "u" means the file has the setuid bit set (Unix only);
- "v" means the file has the “save swapped text after use” option (Unix only, should be obsolete);
- "w" means the file is writable (by the owner, on Unix systems; by all, on other systems); on Unix systems,
if a file is writable, it is also erasable; on Unix again, "wg" means the file is writable by users in the same
group, "wo" means the file is writable by all other users.
- "x" means the file is executable (by the owner, on Unix systems; by all, on other systems); on Unix again,
"xg" means the file is writable by users in the same group, "xo" means the file is writable by all other
users.
- dir : string -> string set takes a path as input, which should be a correct path name for a directory, and
returns the set of all file names (except . and .. on Unix systems) inside this directory. If any I/O error occurs, then the
exception IO is raised, applied to the corresponding error code.
- cd : string -> unit changed the current directory to the one specified by the path given as input. If any I/O error
occurs, then the exception IO is raised, applied to the corresponding error code.
- pwd : unit -> string returns a name for the current directory, as changed by cd. If any I/O error occurs, then the
exception IO is raised, applied to the corresponding error code. This may notably be the case if the path name is too
long.
- mkdir : string -> unit creates a new directory by the name given in argument. If any I/O error occurs, then the
exception IO is raised, applied to the corresponding error code.
- rmdir : string -> unit creates a new directory by the name given in argument. If any I/O error occurs, then the
exception IO is raised, applied to the corresponding error code. In particular, on most operating systems, rmdir can be
used on empty directories only. To delete files, use delete.
- system : string -> int issues the shell command given as argument. On Unix systems, this calls the
system(3S) function, which calls a sh-compatible shell to execute the command. system launches the command,
waits for it to return, and returns the exit code of the command (0 if all went well). If an error occurs (not enough
memory, not enough processes available, command interrupted by a signal, typically), then an IO exception is
raised.
- getenv : string -> string option reads the value of the environment variable whose name is given, and
returns NONE if it has not been defined, and SOME of its value, otherwise.
- args : unit -> string list returns the list of command-line options given after the -- switch on HimML’s
command-line.
- iomsg : int -> string translates the I/O error code in argument (as returned as argument of a IO exception) into
a string in human-readable form. This is the same message as the one printed by the C function perror() on Unix
systems.
- leftmargin : int ref defines the left margin for printing, as a number of spaces to print at the beginning of a
new line when pretty-printing. It is ref 0 by default.
- rightmargin : int ref defines the right margin for printing, as a number of columns (counted as characters)
from the beginning of the line where a new line has to be forced by the pretty-printing functions. It is ref 80 by
default.
- maxprintlines : int ref defines the limit on the number of lines printed by pretty-printing functions, mainly to
avoid looping while printing infinite structures, or to avoid printing structures of humongous sizes fully. This is also valid
for printing values defined on the toplevel, since pretty is used for this purpose. The default value is ref 100. To
suppress the limit in practice, write maxprintlines := max_int.
- numformat : string ref is a reference to the string that is used to format floating-point values for printing. Any
C-style format for printing doubles may be used, i.e. it is %[-∣+∣ ∣#] * [0 - 9] * (\.[0 - 9]+)?[feEgG], where - forces
left adjustment, + forces a sign in front of the value, a blank puts a space instead of a plus sign in front of the value, a
hash sign (#) forces a radix character (i.e, a dot in general; besides, trailing zeroes are not removed in the case of g and G
conversions); the following optional decimal number specifies the minimum field width (with the left adjustment flag -,
the number is padded on the right if it is not large enough); if this is followed by a dot, and a decimal number, this
specifies the number of digits to appear after the radix character for e and f conversions, the maximum number of
significant digits for the g conversion. The possible conversion characters are: f prints in the format [ ] ddd.ddd
with 6 digits by default (this can be modified by precision specifications; 0 says not to output the radix
character); e or E print in the format [ ] d.dde[+ ]dd (or with E instead of e, respectively), with one digit
before the radix character, and 6 digits by default; g or G choose between f and e (resp. E) conversions:
the latter is chosen if the exponent is less than -4 or greater than or equal to the precision (by default,
6).
The default is "%G".
4.10 Miscellaneous
Interesting values are:
- it : unit is a special variable containing the last expression evaluated at toplevel. It is implicitly redefined
every time an expression (not a declaration) is input at toplevel. Indeed, an expression e at toplevel is
conceptually equivalent to writing the declaration:
-
features : |[OS : string,
reftyping : string,
continuations : string set,
numbers : string set,
structures : string set,
profiling : string set,
debugging : string set,
floatformat : string,
version : int ⋆ int ⋆ string ⋆ int,
recompiled : string,
started : string,
maintenance : string,
creator : string]|
(may be extended in future versions) is a description of the features in the implementation and the session. Its use is
informative, but it may also serve for conditional compilation. The fields currently defined are:
- OS defines the operating system for which this HimML compiler was compiled; it may be "Mac",
"Amiga", "Unix System V" or "Unix BSD" for now. Currently, The Mac port is lagging behind,
due to the demise of my old Mac II. A port for PCs should be possible, though there seems to be some
major hurdles to overcome (mainly because of segmenting schemes).
- reftyping defines the scheme used to handle types of imperative constructs in HimML. Its value
may be "imperative tyvars", meaning the Standard ML typing scheme [6], "weak tyvars",
meaning the Standard ML of New Jersey typing scheme with weak type variables (assumed throughout
this document), or "effect tyvars", meaning typing with more precise effect descriptions[14], or
yet another, not yet implemented.
Frankly, the latter was never fully implemented, and while the weak tyvars discipline remained the default
for years, and was buggy and not really required. Today, HimML uses the imperative tyvars discipline.
- continuations defines the style of continuation reifying mechanism. It is a set of strings, which may
contain "callcc" (callcc reifies the whole functional state of evaluation, i.e. the whole stack) and
"catch" (catch does the same as callcc, but the continuation it reifies is valid only while we are
still evaluating the catch call).
- numbers is a set of strings defining the styles of numerical objects we may handle: it may
contain "complex floating point", which defines complex floating point values with attached
dimensions and scales, and also "big rationals" which defines infinite precision rational numbers.
Only the first of these options has been implemented.
- structures defines the structuring mechanism used in HimML (modules). Its value is a set of strings
that may contain "Standard ML structures", indicating the structure system described in [6]
and [9], and "separate compilation a la CaML", indicating we have a separate compilation
module system a la CaML. The first (not yet implemented) provides a sound way of structuting name
spaces for types and values. The second only provides separate compilation facilities. It is described in
Section 5.5.
- profiling defines the profiling possibilities. It is a set that may contain:
- "call counts", meaning that the number of times each function is called is tallied,
- "proper times", meaning that a statistical estimation of the time spent in each function
(excluding any other profiled function it has called) is computed,
- "total times" (same, but including time spent in every called function),
- "proper allocations" and "total allocations". The last two options mean that
memory consumption is recorded, as well, but have not been implemented yet. See Section 5.4 for a
detailed description of the profiler.
- debugging defines the debugging support. It may contain:
- "standard", which means we can set breakpoints and read values at breakpoints, and we can
consult the call stack;
- and "replaying debugger", if the debugger may reverse execution up to specified control
points to show the origins of a bug. Only the last option has not been implemented in HimML. (See
Section 5.3 for a description of the debugger.)
- floatformat defines the format of floating point numbers. This is important since certain numerical features are
not provided in certain modes, and since the management of rounding may differ with a format or another. The
floatformat may be "IEEE 754" or "unknown".
- version defines the current version of HimML. It is composed of:
- a major version (currently 1), incremented each time a fundamental decision is made by the creator of
the language which affects deeply the way we program in it (deletions or modifications of values or
constructions in the language may qualify if they are important enough)
- a minor version (currently 0), incremented for any minor revision (addition of functionalities qualify;
deletion or modification of minor functions qualify too)
- the code status, a string which may be:
-
"alpha"
- , if only the creator (or close friends) have access to the implementation;
-
"beta"
- , if released for beta-testing to a group of people;
-
"gamma"
- , if deemed enough stable and bug-free to be used by final users;
-
"release"
- , if distributable.
- the revision (currently 11), incremented each time a bug is corrected or something is made better inside the
language, without changing its overall behavior.
- recompiled is the last date of recompilation of this revision of HimML.
- started is the date the current session was started. This may be used inside a HimML program to see if it has
been restarted from disk or if it has just been transferred to another session.
- maintenance is a description of the person or service the user should contact if there is any problem with
HimML.
- creator is the name of the creator of the language, that is, "Jean Goubault".
- times : unit -> time ⋆ time computes the user time (first component of the returned pair) and the system
time (second component). On Unix systems, this time takes its origin at the moment the current HimML process was
launched. On other systems, the origin may be the time at which the machine was last booted. These times are in
seconds, the default scale of dimension time, defined as:
dimension time(s)
- force : 'a promise -> 'a forces evaluation of a promise (created with the delay construct), and returns the
result. A delayed expression is evaluated at most once: the result of forcing is memoized.
- callcc : ('1a cont -> '1a) -> '1a reifies the current continuation, and passes it to the function in
argument. This continuation includes a copy of the current stack of exception handlers, so that triggering a continuation
reinstalls the exception handlers in use when reifying the continuation: this is the reason why imperative types are
used.
The extent of the continuation is indefinite, i.e. it can be thrown at any time, even after the call to callcc is
completed.
- catch : ('1a cont -> '1a) -> '1a reifies the current continuation, and passes it to the function in argument.
This continuation includes a copy of the current stack of exception handlers, so that triggering a continuations reinstalls
the exception handlers in use when reifying the continuation: this is the reason why imperative types are
used.
Contrarily to callcc, the extent of the continuation is not indefinite, that is, throwing the reified continuation is valid
only until the call to catch is completed. Afterwards, any attempt to throw the continuation may raise the exception
Catch. (This is not systematic, however.)
catch is the basic mechanism for implementing threads, a.k.a. coroutines, i.e. separate computations sharing the
same process space, and which can be suspended or resumed at will. The current implementation actually
builds catch as basically a thread mechanism, where the process stack is split up in several different
stack zones, called thread caches, and thread switching is implemented as changing caches. callcc is
implemented in the same way, except that thread caches are copied back to the heap for possible future
reuse. Thread caches may also be copied to the heap to make room for a new thread when there is no
remaining available cache in catch or callcc, and they are copied back from the heap when executing
throw
- throw : 'a cont -> 'a -> 'b reactivates a continuation, by passing it the return value of the
continuation.
- gc : unit -> unit forces a garbage collection—not necessarily a major one—to take place. This is usually not
needed. It may be used in alpha status to scramble the heap and see if it incurs any nasty memory management bug
in the HimML run-time. It may also be used prior to doing some benchmarks, but as of yet, there is no
guarantee that the garbage collection will indeed collect every free objects (this does not force a major
collection).
- major_gc : string -> unit forces a major garbage collection. This can be used in theory to get back some
memory that you know can be freed right away, but it takes time. Its main use is in conjunction with the
-gctrace <file-name> command-line option: then each garbage collection will write some statistics to <file-name>,
which can be used to detect space leaks. Typically, when you wish to see how much memory is allocated or freed in a call
to some function f, call major_gc "before f" before calling f, and major_gc "after f" afterwards. Then
consult <file-name>. There should be some information on the status of memory before f, beginning with a line of the
form:
==[before f]=========================
then sometime later another block of information on the status of memory after the call to f, beginning with a line of the
form:
==[after f]=========================
- abort : unit -> 'a aborts the current computation, and returns to toplevel without completing the current
toplevel declaration (actually, it invokes the toplevel continuation). This is the function that is internally
used when syntax errors, type errors or uncaught exceptions are triggered. Following the definition of
Standard ML, the current toplevel declaration is abandoned, so that its effect are nil, except possibly on the
store.
- quit : int -> 'a aborts all computations, and returns to the operating system with the return code in
argument.
- stamp_world : unit -> unit internally records a checksum and a copy of the current HimML environment. All
modules (see Section 5.5) are compiled in the latest stamped copy of the HimML environment, and are saved to disk with
the corresponding checksum. This way, when a module is reloaded, its checksum is compared to the checksums of all
stamped environments in memory, and an error is triggered if no environment exists with the same checksum (meaning
that we cannot safely use the module, either because it needed a richer or incompatible environment, or because the
version of HimML has changed too much). The environment is automatically stamped just before HimML starts
up.
- system_less : ''a ⋆ ''a -> bool is the system order. It returns true if the first argument is less than the
second in the system order. You may use this ordering as a total ordering on all data of equality admitting
types.
Another total ordering on equality admitting types is
fun system_less_2 (x,y) =
case {x,y} of {z,...} => z=x
but this is slower. It is also a different order.
- inline_limit : int ref is a reference to a limit on the size of functions that the compiler will accept to inline.
Its default value is 30. (Units are unspecified, but are roughly the numbers of basic operations and of variables in the
function.) In version 1.0 alpha 20, there is no compiler yet, but one that produces C source code to feed to your favorite C
compiler is in the works.