Chapter 5
Running the System

5.1 Starting up

HimML runs on Unix systems, and Amigas. Previous versions also worked on Apple Macintoshes, but this one lacks some functions. On a Mac, the only way to launch a HimML session is to double-click on the HimML icon; a text window opens, asking to enter Unix command-style arguments: enter the arguments to the HimML command, except the command’s name itself. From then on, all work happens in this console, at toplevel, as on Amiga and Unix systems.

On Amigas and Unix boxes, type himml followed by a list of arguments. The legal arguments are obtained by typing himml ?, to which HimML should answer:

Usage: himml [-replay replay-file] [-mem memory-size]  
  [-cmd ML-command-string] [-init ML-init-string] [-path path]  
  [-col number-of-columns]  
  [-grow memory-grow-factor] [-maxgrow max-memory-grow-factor]  
  [-nthreads max-cached-threads] [-threadsize thread-cache-size]  
  [-maxcells max-cells] [?] [-gctrace file-name]  
  [-pair-hash-size #entries] [-int-hash-size #entries]  
  [-real-hash-size #entries]  
  [-string-hash-size #entries] [-array-hash-size #entries]  
  [-pwd-prompt format-string] [-core-trace] [-data-hash-size #entries]  
  [-c source-file-name] [-inline-limit max-inlined-size] [-nodebug]  
  [-- arguments...]

and exit. Launching HimML without any arguments is fine. There are other HimML tools, used to compile, link and execute bytecode compiled files; they are listed at the end of this section.

To load a file, the use keyword may be used; it begins a declaration, just like val or type, that asks HimML to load a file and interpret it as if it were input at the keyboard (except it does not use stdin. The path that use uses can be extended on the command line by the -path switch, or inside HimML by changing the contents of usepath : string list ref, which is a reference to the list of volumes or directories in which to search for files, from left to right.

The explanation of the various options are:

5.2 Compiling, Linking, Finding Dependencies

As said earlier, the HimML distribution includes other tools to compile, link and run bytecode compiled files:

5.3 Debugger

HimML contains a debugger, as shown by consulting the set #debugging features, which should be non-empty. It can be called by the break function:

Another way of entering the debugger is when an exception is raised but not caught by any handler.

There are two ways of entering the debugger. These are shown on entry by a message, stop on break (we entered the debugger through break, or by typing control-C or DEL when evaluating an expression), or stop at…(we entered the debugger at a breakpoint located just before the execution of an expression).

In any case, the debugger enters a command loop, under which you can examine the values of expressions, see the call stack, step through code, set breakpoints, resume or abort execution. The debugger presents a prompt, normally (debug). It then waits for a line to be typed, followed by a carriage return, and executes the corresponding command. These commands are:

The way that the interpreter gives control to the debugger is by means of code points, which are points in the code where the compiler adds extra instructions. These instructions usually do nothing. When you set a breakpoint, they are patched to become the equivalent of break. Alternatively, these instructions also enter the debugger when we are single-stepping through some code.

These instructions are added by default by the compiler, but they tend to slow the interpreter. If you wish to dispense with debugging information, you may issue the directive:

(⋆$D-⋆)

which turns off generation of debugging information (of code points). If you wish to reinclude debugging information, type:

(⋆$D+⋆)

These directives are seen as declarations by the compiler, just like val or type declarations. As such, they obey the same scope rules. It is recommended to use them in a properly scoped fashion, either inside a let or local expression, or confined in a module.

5.4 Profiler

The way that the interpreter records profiling information is by means of special instructions that do the tallying.

These instructions are not added by default by the compiler, since they tend to slow the interpreter by roughly a factor of 2, and you may not wish to gather profiling information of every piece of code you write. To use the profiler, you first have to issue the directive:

(⋆$P+⋆)

which turns generation of profiling instructions on. The functions that will be profiled are exactly those that were declared with the fun or the memofun keyword.

If you wish to turn it off again, type:

(⋆$P-⋆)

These directives are seen as declarations by the compiler, just like val or type declarations. As such, they obey the same scope rules. It is recommended to use them in a properly scoped fashion, either inside a let or local expression, or confined in a module. Usually, you will want to profile a collection of modules. It is then advised to add (⋆P+⋆) at the beginning of each. Time spent in non-profiled functions will be taken into account as though it had been spent in their profiled callers.

Then, the HimML system provides the following functions to help manage profiling data:

What can you do with profile information? The main goal is to detect what takes up too much time in your code, so as to focus your efforts of optimization on what really needs it. A good strategy to do this is the following:

5.5 Separate Compilation and Modules

5.5.1 Overview

The main goal of the HimML module system is to implement separate compilation, where you can build your program as a collection of modules that you can compile independently from each other, and then link them together.

The HimML module system was designed so that it integrated well with the rest of the core language, while remaining simple and intuitive. At the time being, the HimML module system does not provide the other feature that modules are useful for, namely management of name spaces. The module system of Standard ML seems best for this purpose, although it is much more complex than the HimML module system.

Consider the following example. Assume that your program consists naturally of three files, a.ml, b.ml and c.ml. The most natural way of compiling it would be to type:

use "a.ml";  
use "b.ml";  
use "c.ml";

But, b.ml will probably use some types and values that were defined in a.ml, and similarly c.ml will probably use some types and values defined in a.ml or b.ml. In particular, if you want to modify a definition in a.ml, you will have to reload b.ml and c.ml to be sure that everything has been updated.

This is not dramatic when you have a few files, and provided they are not too long. But if they are long or many, this will take a lot of time. Separate compilation is the cure: with it, you can compile a.ml, b.ml, and c.ml separately, without having to reload other files first.

The paradigm that has been implemented in HimML is close to that used in CaML, and even closer in spirit to the C language. In particular, modules are just source files, as in C. Two new keywords are added to HimML: extern and open. Note that the Standard ML module system also has an open keyword, but there is no ambiguity as it is followed by a structure identifier like Foo in Standard ML, and by a module name like "foo" in HimML.

The extern keyword specifies some type or some value that we need to compile the current file, telling the type-checker and compiler that it is defined in some other file. Otherwise, if you say, for example, val y=x+1 in b.ml, but that x is defined in a.ml, the type-checker would complain that x is undefined when compiling b.ml. To alleviate this, just precede the declaration for y by:

extern val x : int

This tells the compiler that x has to be defined in some other file, and that it will know its values only when linking all files together. This is called importing the value of x from another module.

Not only values, but datatypes can be imported:

extern datatype foo

imports a datatype foo. The compiler will then know that some other module defines a datatype (or an abstype) of this name. However, it won’t know whether this datatype admits equality, i.e. whether you can compare objects of this datatype by =. If you wish to import foo as an equality-admitting datatype, then you should write:

extern eqtype foo

Of course, if foo is a parameterized datatype, you have to declare it with its arity, for example:

extern datatype 'a foo

for a unary (not necessarily equality-preserving) datatype, or

extern eqtype ('a, 'b) foo

for an equality-preserving datatype with two type parameters.

Finally, dimensions can be imported as well:

extern dimension foo

imports foo as a dimension (type of a physical quantity, typically).

Given this, what does the following mean? We write a file "foo.ml", containing:

extern val x:int;  
val y = x+1;

Then this defines a module that expects to import a value named x, of type int (alternatively, to take x as input), and will then define a new value y as x+1 and export it.

Try the following at the toplevel (be sure to place file "foo.ml" above somewhere on the load path, as referenced by the variable usepath):

val x = 4;  
open "foo";

You should then see something like:

x : int  
y : int  
x = 4  
y = 5

Opening "foo" by the open declaration above proceeded along the following steps:

A variant on open is open⋆, which does just the same, except it does not try to recompile the source file "foo.ml": it just assumes that "foo.mlx" is up to date, or fails. This is useful when shipping compiled bytecode modules, and is used internally in the himmlpack and himmllnk tools.

Assume now that we didn’t have any value x handy; then open would still have precompiled and opened the resulting object module "foo.mlx". Only, it would have failed to link it to the rest of the system. If you wish to just compile "foo.ml" without loading it and linking it, issue the directive:

#compile "foo.ml"

at the toplevel. (The # sign must be at the start of the line.) This compiles, or re-compiles, "foo.ml" and writes the result to "foo.mlx".

5.5.2 Header Files

Another problem pertaining to separate compilation is how to share information between separate modules. For example, you might want to define again three modules a.ml, b.ml and c.ml, where a.ml would define some value f (say, a function from string to int), and b.ml and c.ml would use it.

A first way to do this would be to write:

but this approach suffers from several defects. First, no check is done that the type of f is the same in all three files; in fact, the check will eventually be performed at link time, that is, when doing:

open "a";  
open "b";  
open "c";

but we had rather be warned when first precompiling the modules.

Then, whenever the type of f changes in a.ml, we would have to change the extern declarations in all other files, which can be tedious and error-prone.

The idea is then to do as in the C language, namely to use one header file common to all three modules. (This approach still has one defect, and we shall see later one how we should really do.) That is, we would define an auxiliary file "a_h.ml" (although the name is not meaningful, the convention in HimML is to add _h to a module name to get the name of a corresponding header file), which would contain only extern declarations. This file, which contains in our case:

extern val f : string -> int;

is then called a header file.

We then write the files above as:

This way, there is only one place where we have to change the type of f in case we wish to do it: the header file a_h.ml.

What is the meaning of using a_h.ml in a.ml, then? Well, this is the way that type checks are effected across modules. The meaning of extern then changes: in a.ml, f is defined after having been declared extern in a_h.ml, so that f is understood by HimML not as being imported, rather as being exported to other modules. This allows HimML to type-check the definition of f against its extern declaration, and at the same time to resolve the imported symbol f as the definition in a.ml. This is more or less the way it is done in C.

On thing that still does not work with this scheme, however, is how we can share datatypes. This is because datatype declarations are generative. Try the following. In a_h.ml, declare a new datatype:

datatype foo = FOO of int;  
extern val x:foo;

In a.ml, define the datatype and the value x:

use "a_h.ml";  
 
val x = FOO 3;

Now in b.ml, write:

use "a_h.ml";  
 
val y = x : foo;

Then, open "a", then "b". This does not work: why? The reason is that the definition of the datatype foo in a_h.ml is read twice, once when compiling a.ml, then when compiling b.ml, and that both definitions created fresh datatypes (which just happen to have the same name foo). These datatypes are distinct, hence in val y = x : foo, x has the old foo type, whereas the cast to foo is to the new foo type.

The remedy is to avoid useing header files, and to rather open them. So write the following in a.ml:

open "a_h";  
 
val x = FOO 3;

and in b.ml:

open "a_h";  
 
val y = x : foo;

Opening a_h produces a compiled module a_h.mlx, which holds the definition for foo and the declaration for x. In the compiled module, the datatype declaration for foo is precompiled, so that opening a_h does not re-generate a new datatype foo each time a_h is opened, rather it re-imports the same.

Technically, imagine that fresh datatypes are produced by pairing their name foo with a counter, so that each time we type datatype foo = FOO of int at the toplevel, we generate a type (foo,1), then (foo,2), and so on. This process is slightly changed when compiling modules, and the datatype name is paired with the name of the module instead, say, (foo,a_h). Opening a_h twice then reimports the same datatype.

The same works for exceptions, except there is no extern exception declaration. The reason is just that it would do exactly the same as what exception already does in a module. If you declare:

exception Bar of string;

in a_h.ml, and import a_h as above, by writing open "a_h" in a.ml and b.ml, then both a.ml and b.ml will be able to share the exception Bar. Typing the following in a_h.ml would not work satisfactorily, since Bar would not be recognized as a constructor in patterns:

extern val Bar : string -> exn;

That is, it would then become impossible to write expressions such as:

f(x) handle Bar message => #put stdout message

in a.ml. However, if you don’t plan to use pattern matching on Bar, then the latter declaration is perfectly all right.

5.5.3 Summary

The following commands are available in HimML:

It is easier to compile modules by typing the following under the shell:

himml -c foo.ml

which does exactly the same as launching HimML, and typing #compile "foo"; quit 0; under the HimML toplevel.

You can then use himml as a HimML standalone compiler, and compile each of your modules with himml -c. This is especially useful when using the make utility. A typical makefile would then look like:

.mlx : %.ml  
        himml -c $<  
 
a_h.mlx: a_h.ml  
a.mlx: a.ml a_h.mlx  
b.mlx: b.ml a_h.mlx  
pack.mlx: pack.ml a.mlx b.mlx

The first lines define a rule how to make compiled HimML modules from source files ending in .ml. It has a syntax specific to GNU make. If your make utility does not support it, replace it by:

.SUFFIXES: .mlx .ml  
.mlx.ml:  
        himml -c $<

The last lines of the above makefile represent dependencies: that a.mlx depends on a.ml and a_h.mlx means that make should rebuild a.mlx (from a.ml, then) whenever it is older than a.ml or a_h.mlx. Such dependencies can be found automatically by the himmldep utility. For example, the dependency line for a.mlx was obtained by typing:

himmldep a.ml

at the shell prompt.

There is no specific way to link compiled modules together, since open already does a link phase. To link a.mlx and b.mlx, write a new module, say pack.ml, containing:

open "a";  
open "b";

then compile pack.ml. The resulting pack.mlx file can also be executed, provided it has no pending imported identifiers, either by launching HimML, opening pack, and running main (); (provided pack.ml exports one such function), but it is even easier to type the following from the shell:

himmlrun pack

Under Unix, every module starts with the line:

#!/usr/local/bin/himmlrun

assuming that /usr/local/bin is the directory where himmlrun was installed, so that you can even make pack.mlx have an executable status:

chmod a+x pack.mlx

and then run it as though it were a proper executable file:

pack.mlx

This will launch himmlrun on module pack.mlx, find a function main and run it.

5.6 Editor Support

Any ASCII text editor can be used to write HimML sources. But an editor can also be used as an environment for HimML. In GNU Emacs, there is a special mode for Standard ML, called ‘sml-mode.el’ and that comes with the Standard ML of New Jersey distribution, that can be adapted to deal with HimML: this is the ‘ml-mode.el’ file. However, it was felt that it did not indent properly in all cases, because of the complicated nature of the ML syntax. A replacement version is in the works, called ‘himml-mode.el’; it is not yet operational.

5.7 Bugs

Remember: a feature is nothing but a documented bug! You may therefore consider the following as features :-).

5.8 Common Problems

5.8.1 Problems When Installing HimML

P:
When I type make, nothing happens except that I get a message telling me to type a sequence of commands.

This is normal. The installation procedure needs to make configuration files, for interpreting your favorite options (in file OPTIONS) or for determining system or compiler behaviours. So, just do as indicated.

P:
I don’t understand the meaning of an option in file OPTIONS.

Then leave it alone. Most options have reasonable default values.

P:
When I run HimML, it just stops on abort: attempt to longjmp() to lower stack or a similar message.

See next question.

P:
After typing make, I get messages such as:
mksyscc: 20847 Abort - core dumped  
longjmp() is brain-damaged (won't allow you to jump to a lower stack)  
trying to find a standard patch...

Some operating systems (mostly BSD systems, although the only example I know is AIX) implement a “smart” longjmp() routine that first checks whether the current stack pointer is lower than the one it is trying to restore, and aborts if this is not the case. HimML needs to be able to do just that, in order to implement continuations (and continuations are heavily used internally, even if you don’t plan to use them). The best solution I’ve come up with on AIX is to write a small patching utility (dpxljhak) that hunts for a specific piece of code in the prologue of the longjmp() function and puts no-ops instead. A better solution would be to rewrite the function in assembler, but I’ve been unable to do this.

If this happens to you, try to rewrite longjmp() so that it does not check for stack levels and link your new definition. Or write a patch, just like me; you’ll need to experiment a bit.

Please also contribute your modification so that I can include it in the next HimML release. (See MAINTENANCE at the end of the OPTIONS file to know whom to write to.)

P:
My machine is a Cray/VMS machine/PC-Dos machine, and I cannot manage to make the darn thing compile or execute.

Cray machines have a weird stack format, and my scheme for capturing continuations has no hope of working on these machines. If it’s absolutely necessary for you, I’ll see what I can do, provided you promise to tell me whether it works or not. (See MAINTENANCE at the end of the OPTIONS file to know my address.)

I don’t have any VMS machine handy, so I cannot test HimML on it. The HimML implementation is pretty much centered around Unix, so I would be surprised if it worked without changes. Please tell me what you have been forced to do to make it work.

PC-Dos machines won’t do. 640K is not enough for HimML, and HimML has no knowledge of extended or expanded memory. HimML must run in one segment only, lest its sharing mechanism be defeated by one physical address having two distinct representations (from two different segments). This may work on 486’s or higher, which can use large segments, but the operating system (Dos or Windows, any version until now) is the stumbling block. Your best bet is to change for Linux or any other Unix for PCs. Windows/NT or OS/2 is expected not to pose any problem.

P:
When I run HimML, it just core dumps.

Check the OPTIONS file: there is no safeguard against illegal values there (in particular stack values). Put back the default values; if this does not work, try to increase the stack parameters (notably SAFETY_SIZE and SECURITY). See also previous questions; it is quite likely that this is due to stack problems. If nothing works, mail me (goubault@lsv.ens-cachan.fr, see MAINTENANCE at the end of the OPTIONS file).

5.8.2 Problems When Running HimML

P:
I have typed a command line at the toplevel prompt, then typed return, but nothing happens.

Most probably, you have not terminated your command line with a semicolon (;). Although the syntax of Standard ML makes semicolons optional between declarations, the toplevel parser has no way of knowing that input is complete unless it finds a terminating semicolon (or an end of file). Consider also all the ways to complete input such as, say, 1: if you write a semicolon afterwards, then this is an abbreviation of val it=1;, but if you write +2;, even on the following line, then you really meant val it=1+2;, and if you type return just after 1, the parser has no way to know which possibility you intended.

It may happen that typing a semicolon does not cure the problem. This may happen is you have not closed all parentheses and brackets. Consider (frozzle (): if you type a semicolon afterwards, then your input is still incomplete, as you may want to write, say, (frozzle (); foo). The semicolon is not only a declaration separator, but also the sequence instruction.

P:
When opening modules that open header modules, I keep getting type errors, and the explanation is that some datatypes are not the same in each type?

First, check that you are not defining or declaring datatypes (or dimensions) in header files that you use instead of opening. Each time you use a given file, it creates new versions of the datatypes or dimensions inside it. To avoid it, open the file instead; this creates unique stamps for the datatype (or dimension), which it records in a file of the same name, with .mlx at the end. This will work only if your header file can be compiled separately, so be prepared to modularize your code.

If the above does not apply, it may happen that your .ml files have inconsistent modification dates. The module system always tries to recompile a .ml file when the .ml file appears to be newer than the corresponding .mlx file. Therefore, if the last modification date of the .ml file is some future date, it will always recompile it, as many times as it is opened; and this leads to the same problem as above. A quick fix is to set the modification date manually (with touch on Unix, or setdate on Amigas; there’s probably a public-domain utility to fix this on Macintoshes, but I don’t know). In any case, there’s probably something wrong with the way the date is set up on your system, and it’s worth having a look at it.

5.9 Reporting Bugs, Making Suggestions

This is an alpha revision of HimML. This means that I do not consider it as a distributable version. This means that I deem the product robust enough to be given only to my friends, counting on their comprehensive support, mostly as far as bugs are concerned. This also means that I want some feedback on the usability of the language, and on reasonable ways to improve the implementation.

To help me improve the implementation (and possibly the language, though I am not eager to), you can submit a note to the person in charge of maintaining the system (type #maintenance features at the toplevel to know who, where and when). The preferred communication means is electronic mail, but others (snail-mail notably) are welcome. If you think you have found a bug in HimML, or if you want something changed in HimML, you should send the person in charge a message that should contain: