The h1 Tool Suite

Jean Goubault-Larrecq
LSV/UMR 8643, CNRS, ENS Cachan & INRIA Futurs projet SECSI
61 avenue du président-Wilson, F-94235 Cachan Cedex
goubault@lsv.ens-cachan.fr
Phone: +33-1 47 40 75 68   Fax: +33-1 47 40 75 21

Abstract: This describes the theoretical basis and practical usage of the h1 tool suite. This is a set of tools that allow one to handle tree-regular languages in various formats, including deterministic, non-deterministic, and alternating finite tree automata, but also various fairly general clausal formats, the central one being the H1 class due to Nielson, Nielson, and Seidl. Alternatively, this can also be seen as a terminating automated theorem prover for the H1 class; or as an automated theorem prover for general clause sets, which however makes some controlled mistakes in the spirit of abstract interpretation: this is notably useful in proofs of security protocols. Other aspects of the h1 tool suite include producing certain forms of automated proofs by induction in the Coq proof assistant, deciding Presburger arithmetic, and displaying tree automata.

1  Introduction

The h1 tool suite is a toolchest for handling finite tree automata, in various forms. There are basically three forms, from most constrained to least constrained:
  1. as deterministic bottom-up tree automata;
  2. as alternating tree automata;
  3. as pure Prolog programs (sets of Horn clauses) in the H1 class [2002  (Nielson et al.)].
The global architecture is given in the following figure. Some things are still missing from it, and we will add them progressively later. The inner dashed box forms the core of the h1 tool suite. The outer dashed box is the h1 tool suite itself.

The main thing to understand is that the h1 tool suite includes tools to convert between all three formats of tree automata, forming the core of the h1 tool suite. The tools h1, pldet, auto2pl, pl2tptp are used to navigate between all three formats.

In the h1 tool suite, deterministic tree automata are represented in files in XML format.

Alternating tree automata are represented as Horn clauses, in Prolog notation; in particular, you can run them in any Prolog implementation (whatever the use of this may be).

Finally, H1 clause sets are represented in TPTP format. TPTP (a Thousand Problems for Theorem Provers) is a publicly available repository used to test automated theorem provers, due to [2002  (Suttner and Sutcliffe)]. The h1 tool suite handles TPTP input files that contain only clauses. This allows you to use any automated, clausal theorem prover in place of the h1 prover if you so wish. The h1 prover, one of the tools of the h1 tool suite, is an automated theorem prover. In addition to searching for proofs, it is also able to produce counter-models and describe them as alternating tree automata; h1 is described in Section 4.

Additional tools operate on such counter-models and proofs. They form the rest of the h1 tool suite. As far as proofs are concerned, the h1 tool keeps a trace of all proof steps it did in a trace log. This trace log is in a proprietary format that the h1trace and h1logstring tools can work on. (Don't assume anything on this format, it may change in the future.) This is pictured in the following figure.



The purpose of h1trace is to explain the proofs found by h1, both to humans, as a proof in natural deduction in text format, and to machines, as a formal proof, in Coq notation. Coq is a proof assistant developed in the LogiCal team at INRIA Futurs [1999–2003  (Barras et al.)].

The h1 tool is also able to work on general clause sets, not just in the H1 format. In this case, h1 will output a proof candidate, which however may fail to be a proof. In case this is a wrong proof, h1trace will do its best to explain the wrong proof to humans. This can be used to design a true proof.

In case a prover does not find a proof of some given query, you usually have no choice but to trust it that there is indeed no proof. In case h1 does not find a proof of some given query, it outputs a model. This model can be independently checked by the h1mc model-checking tool; h1mc takes as input both a description of the model M, as an alternating tree automaton, and a trace log obtained from some given clause set S, and checks whether the clauses in S all hold in the model M. This is a way of getting confident that there is indeed no proof of the initial query, i.e., that your query is wrong. More useful is the fact that h1mc can give you an explanation, based on M, why you query is wrong. As with h1trace, this explanation can be made for humans, in text format, or in Coq format, to be checked independently by the Coq proof assistant. This is important in security protocols for instance, where a proof of secrecy consists in showing that there is no proof of the fact that the intruder can get hold of a given secret. This was shown by [2001  (Selinger)], see [2004  (Goubault-Larrecq)]. Adding h1mc gives you the following picture of the h1 tool suite.



The h1mc tool is described in Section 6.

Additionally, various tools are provided. First, there are two conversion utilities, auto2pl, which converts deterministic tree automata in XML format into Prolog format (Section 8); and pl2tptp, which converts alternating tree automata in Prolog format into TPTP format (Section 10). Since every deterministic tree automaton is a particular case of an alternating tree automaton, and every alternating tree automaton is a particular case of an H1 clause set (see Section 2), these two utilities entail no loss of information. In particular, to convert deterministic tree automata into H1 clause sets, just run auto2pl on the former, getting an intermediate file which you then convert to H1 clause sets using pl2tptp.

Then, there are miscellaneous utilities. The first, plpurge, extracts the part of a tree automaton that is relevant to certain final states. In other words, it reads a tree automaton in Prolog format, and eliminates all states and transitions that do not reach any final state. This cleaning step is useful to help understand the structure of automata computed by h1, in particular as a preparation to calling pl2gastex. The plpurge tool is described in Section 9. The second, tptpmorph, applies certain kinds of morphisms to languages represented as H1 clause sets. This is described in Section 11. The need for it should become apparent in conjunction with linauto, which implements [1996  (Boudet and Comon)]'s algorithm for deciding quantifier-free Presburger formulae by converting them to automata. While linauto only deals with quantifier-free formulae, existential quantification can be dealt with by using tptpmorph, and universal quantification can be implemented using existential quantification and complementation. Complementation can be implemented by determinization (using pldet) and using auto2pl with the -negate option. The linauto tool is described in Section 12.

Finally, pl2gastex is a utility that converts alternating tree automata to a sequence of gasTEX macros, which can then be used in conjunction with LATEX to display them on screen or include them in documents. A former attempt of a tree automaton visualizer is provided with autodot. The latter converts deterministic tree automata to files in dot format; dot can then be applied to these files to display them graphically. This is only useful for relatively small automata, and is far from perfect for tree automata that are not just word automata. Don't expect too much from pl2gastex and autodot on anything else but small automata. The autodot utility is obsolescent: basically, it never gives satisfying output. The pl2gastex utility is more satisfactory, but not perfect. See Section 13.

At this point the picture of the h1 tool suite is complete—up to the fact that I did not talk about some minor extensions yet, and that more extensions may be added in the future.



2  Tree Automata, Clauses

Automata on finite words are a very classical data structure to represent sets of words—possibly infinite sets. Not all sets of words can be described this way; such special sets are called regular languages. Regular languages are reasonably expressive, and all the basic operations are computable on them: testing for membership, for vacuity, computing unions, intersections, complements notably.

An example of automaton is shown below. This is just a directed graph. Vertices are traditionally called states, and edges are called transitions. Two other components are needed. First, we need an initial state: in the example, we assume this is qinit. Second, we need final states. The convention is that final states are circled; here, the only final state is qinit.


Let us play a game. You start at the initial state qinit, then you must follow some transitions (whichever you wish, you may even repeat the same transition as long as you wish), until you reach a final state. While traveling around, collect the letters that label the transitions: this gives you a sequence of letters, i.e., a word. (One exception, though: the є symbols means “no letter”, and you should just go through the corresponding transition without collecting any letter at all.) The fact that you reached the final state is how you decide that the automaton recognized the word.

For example, starting from qinit, you may go up the a transition, then come back along the є transition. Since you've reached a final state, namely qinit (which is both initial and final), the word a is recognized by the automaton. Or you may have decided to start from qinit, go up the a transition, turn once inside the b loop, then come back through the є-transition: the word ab is recognized. In fact, you might have looped as many times as you wished, so any word abn, n in |N, is recognized, too.

There is no need to stop when you reach a final state. For example, we may travel along a, then b twice, then along є, then again along a, then b three times, then є, showing that ab2ab3 is recognized. While we are at it, since qinit is already final, the empty word (with no letter) is also recognized. We write є for the empty word, as is traditional.

To wrap up the example, the above automaton recognizes exactly the language (a b*)*, that is the set of all words on the alphabet {a,b} which are concatenations of words a bn, n in |N. It turns out that this is just the set of words that, if non-empty, start with a.

Alternatively, we can describe this same set of words as a set of Horn clauses, i.e., as a very simple Prolog program. To this end, create a fresh predicate symbol q for each state q. The meaning of q(t) is that the word t should be recognized at q, i.e., there is a trip along the transitions in the automaton, starting from the initial state, and ending exactly on the state q, along which the letters collected form the word t. We also encode words as terms: є will be a constant denoting the empty word, and we add each letter ℓ at the end of word t by writing the term ℓ (t); this means in particular that each letter is now viewed as a unary function symbol. The automaton above then gets described as a Prolog program with one clause for each transition, plus one to say that the empty word is recognized at the initial state:
qinit (є)   q1 (a (X)) :- qinit (X)
q1 (b (X)) :- q1 (X)   qinit (X) :- q1 (X)
You may ask Prolog whether the word ab2 is recognized at qinit by submitting the query
?qinit (b (b (a (є))))
and it will answer “yes”. Prolog will also answer the query
?qinit (b (b (a (b (є)))))
by “no”, meaning that bab2 is not in the language.

Prolog, as a notation, is fine, as we shall see. Prolog, as a tool to check properties, is ill-suited: on the more complex clauses we shall encounter below, Prolog would loop infinitely in general; h1 on the other hand is meant to return, always, on its input clauses.

The notion of automata, on words, can be generalized to automata on first-order terms, a.k.a., tree automata. These are very similar structures, except they recognize sets of ground terms. The automaton of Figure 1, to take an example, recognizes the set of all lists of even natural numbers at state qlisteven. To be precise, it recognizes the set of all terms cons (t1, cons (t2, …, cons (tn, nil) …)), where each ti is of the form Sni (O), ni even. Note that the transition O (up left) starts from no state, while the transition cons (_, _) (middle) starts from a pair of states, qeven and qlisteven.



Figure 1: Recognizing the lists of even natural numbers



To define the semantics of tree automata, the simplest is just to describe their translation to Horn clauses. Each transition again gives rise to exactly one clause:
qeven (O) qeven (S (X)) :- qodd (X) qodd (S (X)) :- qeven (X)
  qlisteven (cons (X, Y)) :- qeven (X), qlisteven (Y) qlisteven (nil)
There is no need to define initial states in tree automata; e.g., O is recognized at qeven, using the transition qeven (O), of arity 0.

In general, a tree automaton is any finite set S of clauses of the form
  P (f (X1, …, Xn)) :- P1 (X1), …, Pn (Xn)     (1)
where X1, ..., Xn are pairwise distinct variables. When n=0, we retrieve initial clauses such as qeven (O). When n is restricted to be at most 1, tree automata are just ordinary, word automata (without є-transitions).

A set of ground terms is called regular if and only if it is exactly the set of terms t such that t is recognized at P in some tree automaton S (i.e., such that P (t) follows logically from the clauses in S).

The format of clauses (1) is very particular. First, clauses (1) are definite clauses. Formally, definite clause are implications of the form
  P (t) :- P1 (t1), …, Pn (tn)     (2)
where P (t), P1 (t1), ..., Pn (tn) is an unordered set of atoms. (An atom is just a predicate P applied to some term t; read “P holds of t”, or “t is recognized at P”.) If n=0, this is called a fact, and is often written just P (t).

Any set S of definite clauses has a least Herbrand model. A Herbrand model is just a collection of ground atoms P (t). The least Herbrand model H (S) of S can be described as follows. First, it contains all ground instances of the facts in S. Then, while there is a ground instance P (t σ) :- P1 (t1 σ), …, Pn (tn σ) of a clause P (t) :- P1 (t1), …, Pn (tn) in S, and P1 (t1 σ), ..., Pn (tn σ) are in H (S), then add P (t σ) to H (S). This procedure does not terminate in general, but enumerates H (S).

In particular, any tree automaton has a least Herbrand model. It turns out that the ground atom P (t) is in the least Herbrand model H (S) if and only if t is recognized at P in the tree automaton S. Therefore, we may generalize the notion of a term t being recognized at some predicate P in any set of definite clauses, by requiring that P (t) is in H (S).

A goal clause is an implication of the form
  false :- P1 (t1), …, Pn (tn)     (3)
where false is meant to denote false. A Horn clause is a definite or a goal clause. We shall also write P (t) instead of P (t) :- when n=0, and false instead of false :- .

Not any set S of Horn clauses has a model. If it has one, that is, if S is satisfiable, then it again has a least Herbrand model H (S). Again, we say that t is recognized at P in S if and only if P (t) is in H (S).

Then it can be shown that the ground term t is recognized at P in the set S of definite clauses if and only if S plus the goal clause false :- P (t) is unsatisfiable (i.e., not satisfiable); that P is empty in S, i.e., that P recognizes no term in S, if and only if S plus the clause false :- P (X) is unsatisfiable, where X is a variable. And there are automated means, called automated theorem provers, to check the unsatisfiability of clause sets. Unfortunately, they do not always terminate. The h1 tool always terminates, but only deals with so-called H1 clauses [2002  (Nielson et al.)].

We have said above that tree automata clauses (1) were very particular, because they were definite clauses, and in particular they always have a least Herbrand model. They are also particular in that the head (the atom at the left of :- ) is restricted to be of the form P (t) with t itself of the form f (X1, …, Xn), where X1, ..., Xn must be distinct variables; and in that there is no function symbol at all in the body (the set of atoms at the right of :- ).

If you do not restrict the form of Horn clauses (2) and (3), then any prover that operates on them is forced to relinquish either termination, soundness, or completeness. This is because the satisfiability of Horn clauses is undecidable [1996  (Devienne et al.)]. Most automated theorem provers in existence are sound, i.e., if they deduce a contradiction from S then S is unsatisfiable, and complete, i.e., if S is unsatisfiable then they can derive a contradiction from S. Therefore they have to fail to terminate sometimes. On the other hand, h1 is complete and terminates, but is only sound on the subset of so-called H1-clauses.

The H1-clauses are exactly the Horn clauses, except that definite clauses (1) are restricted to have a head of the form P (X), where X is a variable, or P (f (X1, …, Xn)), where X1, ..., Xn are distinct variables. While this is not the definition [2002  (Nielson et al.)], this is equivalent to it, see [2005  (Goubault-Larrecq)].

On general Horn clauses, h1 applies an abstraction function which makes it still a terminating and complete prover, but one which is unsound in general. So you might want to see h1 as a counter-model finder rather than a prover. In some cases, though, h1 produces unsound “proofs” that may be indicative of actual proofs.

Remember that h1 is sound, complete, and terminating on H1 clauses. Note that automata clauses are a special case of H1 clauses. In fact, when h1 terminates, starting from a set S of H1 clauses, there are two possible outcomes: In the latter case, h1 also produces a model of S, in the form of an alternating tree automaton, i.e., a set of clauses of the form
  P (f (X1, …, Xn)) :- B1 (X1), …, Bn (Xn)     (4)
where Bi (Xi) is a block, i.e., a list of atoms Pi1 (Xi), …, Pini (Xi). Note that tree automata clauses are a special case of alternating tree automata clauses (take ni=1 for each i), while alternating tree automata clauses are special H1 clauses (with no function symbol in the body, the head is of the form P (f (X1, …, Xn)) and not P (X), and every variable free in the body is among X1, …, Xn).

From a theoretician's perspective, this means that H1 clauses are not more expressive than alternating tree automata clauses. And it is well-known that alternating tree automata are not more expressive than plain tree automata: the languages defined by satisfiable H1 clause sets are just, again, the regular tree languages. However, H1 offers considerably more freedom in describing such languages than just using tree automata, because of the general form of H1 clauses.

The paper that introduced H1 is [2002  (Nielson et al.)]. For some background theory on the way h1 works, deciding H1 and converting H1 clause sets to tree automata by resolution techniques, and abstracting general clause sets to H1 clause sets, see [2005  (Goubault-Larrecq)].

3  The h1 Tool Suite through Examples

The h1 prover is the core of the h1 tool suite, and we shall explain the tool suite by running h1 on several examples.

The h1 prover is invoked by calling h1 with a sequence of flags, ended by a file name. The file name should contain a set of clauses in TPTP clause format. Such files conventionally end with the .p extension—but there is no obligation. Also, giving a single dash - as file name forces h1 to read the input clause set from standard input.

3.1  A Toy, Introductory Example

Examples are given in the distribution package. Here is a very small one (file test1.p):



This contains three clauses. Each clause is introduced by the keyword input_clause. The first argument, clause1, clause2, or clause3 above, is the name of the clause. Names are used in sundry ways, mainly for explanation and documentation purposes. It is good practice to give each clause a different name, but the tools of the h1 tool suite should work even when several clauses have the same name.

The second argument can be the keyword conjecture or axiom; h1 just does not care: write what you prefer here.

Finally, the third argument, enclosed between square brackets, is the clause itself. It is a list of literals, separated by commas. Each literal starts with a sign, ++ for positive literals, -- for negative literals. The clauses above are, in a more traditional notation:
  p(a)  
  ¬ p (X) or p (f (X))  
  ¬ p (f (f (X)))
This example is in fact a set of Horn clauses, which would be written in Prolog notation:
p (a)    
p (f (X)) :- p (X)
false :- p (f (f (X)))


Launch h1 on this file, test1.p, by typing
h1 test1.p
You will get the answer
*** Derived: clause3 ***
which means that a contradiction was found, using the clause named clause3 in the last step. In other words, in the present example, it says that a fact of the form p (f (f (t))), matching the body of clause3, can be deduced from the definite clauses (here, clause1 and clause2)

So far, so good, this is typically the least you could expect from a theorem prover.

However, h1 also produced two more files, test1.log and test1.model.pl. If you're curious, look at test1.log, by running
less test1.log
Oh well, it is long, but it is not meant to be read by a human reader! If you're perspicuous enough, you'll find some meaning buried inside this. However, this is really meant as a log file, from which h1trace can extract a (mostly) readable proof (Section 5), and which h1mc can use to get some essential information it needs (Section 6).

3.2  The Dreadsbury Mansion Murder Mystery Example

Here is a more complicated example, due to Len Schubert. This is problem 55 of [1986  (Pelletier)].
Someone in Dreadsbury Mansion killed Aunt Agatha. Agatha, the butler, and Charles live in Dreadsbury Mansion, and are the only ones to live there. A killer always hates, and is no richer than his victim. Charles hates noone that Agatha hates. Agatha hates everybody except the butler. The butler hates everyone not richer than Aunt Agatha. The butler hates everyone whom Agatha hates. Noone hates everyone. Who killed Agatha?
The problem is formalized in file butler-puzzle.p. The clauses are as follows. First, “Agatha, the butler, and Charles live in Dreadsbury Mansion, and are the only ones to live there.”:
agatha_in_mansion in_mansion (agatha)    
charles_in_mansion in_mansion (charles)    
butler_in_mansion in_mansion (butler)  
Then, “A killer always hates, and is no richer than his victim.”, that is, if X killed Y, then it must be the case that X hates Y, and on the other hand that X is not richer than Y:
killer_hates_victim hates (X,Y) :- killed (X,Y)
killer_no_richer not_richer (X,Y) :- killed (X,Y)
Next, “Charles hates noone that Agatha hates.” In other words, it is impossible that Charles and Agatha hate the same person X:
charles_hates_noone_agatha_hates false :- hates (charles,X), hates (agatha,X)
To write “Agatha hates everybody except the butler.”, we just say that Agatha hates herself and Charles:
Agatha_hates_herself hates (agatha, agatha)    
Agatha_hates_charles hates (agatha, charles)  
Now, “The butler hates everyone not richer than Aunt Agatha.”:
butler_hates_everyone_not_richer_than_agatha hates (butler, X) :- not_richer (X, agatha)
Then, “The butler hates everyone whom Agatha hates.”:
butler_hates_everybody_agatha_hates hates (butler, X) :- hates (agatha, X)
Next, “Noone hates everyone.”, which we formulate as “noone hates Agatha, Charles, and the butler”:
noone_hates_everyone false :- hates (X,agatha), hates (X,butler), hates (X, charles)
Finally, we explore who may have killed Aunt Agatha. To do this, we shall enumerate the potential murderers, and use the C preprocessor cpp to replace the macro identifier WHO in the following clause by each resident of Dreadsbury Mansion:
WHO_killed_agatha killed (WHO, agatha)
Let us test whether the butler killed Agatha. Run butler-puzzle.p through cpp with WHO equal to butler, and feed the output to h1; h1 reads from standard input when given - as file name:
cpp -P -DWHO=butler butler-puzzle.p | h1 -
You get the output:
*** Derived: noone_hates_everyone ***
In other words, the butler cannot have killed Agatha (contrarily to proper conventions in popular whodunnit mysteries), because this would contradict the fact that noone hates everyone.

More information can be gotten from the trace file h1out.log. (Running h1 on file <file>.p produces a trace file <file>.log. If no input file is given, as here, the trace file is called h1out.log.) We shall explain how to use this trace file in Section 5. For now, just run
h1trace h1out.log >dummy
and open the file dummy. You'll see that the first lines say:
. #false(noone_hates_everyone) [noone_hates_everyone]. using assumption #false(noone_hates_everyone) :- hates(X1,charles), hates(X1,butler), hates(X1,agatha). {X1=butler}
In other words, assuming the butler killed Agatha would involve that the butler hates everyone, which is impossible. The rest of file dummy is a tree-like proof that indeed the butler hates everyone, i.e., hates Agatha, Charles, and himself in this case.

So did Charles kill Agatha instead?
cpp -P -DWHO=charles butler-puzzle.p | h1 -
No, this would contradict the fact that Charles hates noone that Agatha hates. In this case, Charles hates Agatha, and Agatha hates herself, whence the contradiction.
*** Derived: charles_hates_noone_agatha_hates ***


There is only one possibility remaining: that Agatha killed herself.
cpp -P -DWHO=agatha butler-puzzle.p | h1 -
and indeed, h1 does not complain: assuming Agatha killed herself leads to no contradiction. The file h1out.model.pl describes the least model of this clause set.

This could have been found automatically by a simple sh script, listing all possible murderers of Aunt Agatha.
for who in butler charles agatha
do
  if (cpp -P -DWHO=$who butler-puzzle.p | h1 - 2>&1\
                                        | grep -q Derived)
  then echo $who did not kill agatha.
  else echo $who may have killed agatha.
  fi
done
Since somebody killed Agatha, it must be herself.
butler did not kill agatha. charles did not kill agatha. agatha may have killed agatha.


The point of this example is that, first, the clauses are clearly not (alternating tree) automata clauses; and second, that they are H1 clauses. Check this by running h1 with the -check-h1 2 option:
cpp -P -DWHO=agatha butler-puzzle.p | h1 -check-h1 2 -
This makes h1 run as above, except it would have failed if any of the input clauses where not in H1. By default, h1 runs as h1 -check-h1 0, which does not check anything, but computes an approximation, see Section 3.4.

That the clauses of butler-puzzle.p are in the H1 class may seem surprising. After all, H1 clauses are required to use only unary predicates, i.e., all predicate letters P can only take one argument. This is certainly not the case of the hates, killed, and not_richer predicates above, which are all binary!

The trick here is that, when h1 sees an n-ary predicate P (t1, …, tn) in its input, it converts it first to P (fP (t1, …, tn)) for some fresh n-ary function symbol fP. (Different occurrences of P correspond to the same symbol fP.) This makes P unary, and does not change the semantics of clauses in any essential way. Under the hood, h1 typically builds fP by prepending a sharp sign # in front of the name of P, guaranteeing that no clash occurs with any function symbol you may have used. Don't count on it, though, as this may change in future releases. Also, the h1 tools try to hide this kludge as much as they can, and will happily parse and print n-ary predicate symbols.

We shall return to this example in Section 3.3.8.

3.3  Computing with Tree Automata

Assume that we wish to compute the intersection of the languages L1 of all lists of even natural numbers, and L2 of all trees with binary nodes labeled with cons, and whose leaves are either nil or natural numbers of the form 3n+2, n in |N.

We have already seen what an automaton recognizing L1 looked like, see Figure 1. In TPTP format, this is file listeven.p:



The language L2 is described by the predicate (final state) tree_3n_plus_2. Look at file tree3plus2.p:



By the way, we can convert any automaton in TPTP format into Prolog format by running h1 with the -no-trim option:
h1 -no-trim tree3plus2.p
Normally, h1's default is to use the -trim option, which trims away all clauses that are obviously not needed for deriving a contradiction. (See Section 4.1 for more information on -trim and -no-trim.) In this case, trimming would just eliminate all clauses! Since we are not looking for a contradiction, we just run h1 without trimming, and get a model in tree3plus2.model.pl:



In other words, h1 can be used to convert any set of H1 clauses into an equivalent alternating tree automaton by running it with the -no-trim option and looking into the generated model file, ending in .model.pl.

3.3.1  Visualizing Tree Automata

Before we compute the intersection of L1 and L2, let us visualize the automaton defining L2. This is accomplished using pl2gastex, see Figure 2. For more information on pl2gastex, and how to read such pictures precisely, see Section 13.



Figure 2: Trees with leaves equal to nil or to 3n+2, n in |N



3.3.2  Computing Intersections of Tree Automata

Now compute the intersection. Build a file, say list_even_inter_tree3plus2.p, by concatenating the clauses from listeven.p and from tree3plus2.p, and add the so-called intersection clause
q (X) :- list_even (X), tree_3n_plus_2 (X)
meaning that q holds of all terms that are both lists of even numbers, and trees as recognized at tree_3n_plus_2. Just run the following commands:
OUT=list_even_inter_tree3plus2
cat listeven.p tree3plus2.p >$OUT.p
echo "input_clause ($OUT, axiom, \
        [++q(X), --list_even(X),\
        --tree_3n_plus_2(X)])" >>$OUT.p
Because of the intersection clause above, the resulting clause set is not an alternating tree automaton as we have defined it. However, run h1 -no-trim on it:
h1 -no-trim list_even_inter_tree3plus2.p
and look at the generated alternating tree automaton. This is obtained, as usual, in a file named list_even_inter_tree3plus2.model.pl:



Graphically, this is the alternating tree automaton of Figure 3.



Figure 3: Trees with leaves equal to nil or to 3n+2, which are lists of even natural numbers at the same time



Note the presence of new nodes labeled ∧ in Figure 3. They represent intersections of languages. Indeed, there are two ways one can construct an element of q. First, there is nil. Second, there are terms of the form cons applied to two arguments X1 and X2, where X1 is recognized both at tree_3n_plus_2 and at even, and X2 is recognized both at tree_3n_plus_2 and at list_even.

The alternating tree automaton above recognizes (at q) the terms in the intersection of L1 and L2. One should observe that computing intersections of two languages by concatenating the clause sets defining each and adding an intersection clause is rather cavalier. It is only correct here because the two files listeven.p and tree3plus2.p share no predicate symbol. In general, one might allow shared predicate symbols, provided they have the same semantics in each file. For example, it is legal for the two files to both use the predicate even provided it denotes the set of even natural numbers in both. Otherwise, strange things may happen (an over-approximation will be computed).

3.3.3  Checking Tree Automata for Emptiness, Testing Membership

It may not be completely obvious whether such an alternating tree automaton is empty or not. (To say the least. The problem is DEXPTIME-complete.) Let us see whether the intersection state q is empty or not. In general, given a satisfiable set S of Horn clauses (e.g., definite clauses, in particular alternating tree automata clauses), the language of terms recognized at state P in S is empty if and only if S plus the clause false :- P (X) is satisfiable. Running:
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(q_is_not_empty, conjecture, [--q(X)]).")\
  | h1 -log-out - >list_even_inter_tree3plus2.log
yields
*** Derived: q_is_not_empty ***
meaning that there are indeed terms recognized at state q in the intersection.

We have kept a trace of the derivation in the log file list\_even\_inter\_tree3plus2.log. We can then use h1trace to get a mostly readable proof of the fact that q is non-empty; in particular, to have an example of a term recognized at q:
%-*-mode:outline;outline-regexp:"[0-9a-z.]+"-*-
. #false(q_is_not_empty).
  using assumption #false(q_is_not_empty) :- q(X1).
   {X1=nil}
1. q(nil).
    using assumption q(X1) :- tree_3n_plus_2(X1), list_even(X1).
     {X1=nil}
1.1. tree_3n_plus_2(nil) by assumption.
1.2. list_even(nil) by assumption.
In fact, the empty list nil is recognized at q ({X1=nil} at line 4 above). How to read such proofs will be explained in Section 5.

Let us test membership of some ground term. Is the list cons (s (s (o)), nil) consisting of just the natural number 2 in the intersection? In general, given S as above, a ground term t is recognized at state P if and only if S plus the clause false :- P (t) is satisfiable.
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(q_rec_cons_2_nil, conjecture,\
         [--q(cons(s(s(o)),nil))]).")\
  | h1 -
yields
*** Derived: q_rec_cons_2_nil ***
So cons (s (s (o)), nil) is in the intersection.

On the other hand, the list cons (s (s (s (s (o)))), nil) containing just the natural number 4 is not in the intersection. Run
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(q_rec_cons_4_nil, conjecture,\
         [--q(cons(s(s(s(s(o)))),nil))]).")\
  | h1 -
and you'll get (no message at all). Indeed, cons (s (s (s (s (o)))), nil) is a list of even natural numbers, but not a tree whose numeric leaves are of the form 3n+2, n in |N.

We can do more this way. Is there a term of the form cons (s (X), X), with the same X, in the intersection? Run
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(q_rec_cons_sX_X, conjecture,\
         [--q(cons(s(X),X))])") \
  | h1 -
and you'll get (no message at all). So there is none.

Is there a list whose first element is at least 3 in the intersection? Run
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(q_rec_cons_sssX_Y, conjecture,\
         [--q(cons(s(s(s(X))),Y))])") \
  | h1 -
and you'll get
*** Derived: q_rec_cons_sssX_Y ***
So there is one. We don't know which. However, we may use h1trace as above to have an idea (left as an exercise!).

Another possibility to have an idea of which lists whose first element is at least 3 in the intersection is to build the automaton recognizing all solutions, by running
(cat list_even_inter_tree3plus2.p;\
 echo "input_clause(what_q_rec_cons_sssX_Y, conjecture,\
         [++r(X,Y), --q(cons(s(s(s(X))),Y))])") \
  | h1 -no-trim -
mv h1out.model.pl rinter.model.pl

The resulting automaton, in file rinter.model.pl, is:



And graphically, this is the automaton of Figure 4.



Figure 4: Lists starting with a number at least 3 as recognized in state q of Figure 3



3.3.4  Converting Alternating to Non-Deterministic Tree Automata

All right, this starts being a tad intricate. In general, alternating tree automata are not that easy to read. We may eliminate intersection nodes ∧, and get a non-deterministic tree automaton instead by using the -no-alternation option to h1. Either use
h1 -no-trim -no-alternation
instead of h1 -no-trim (this will produce a non-deterministic, i.e., not an alternating tree automaton, in rinter.model.pl), or simply run
h1 -no-trim -model
on rinter.model.pl to eliminate intersection nodes. Just as
h1 -no-trim
can be used to conv ert an H1 clause set into an equivalent alternating tree automaton,
h1 -no-trim -no-alternation
converts an H1 clause set, or an alternating tree automaton, into an equivalent non-deterministic tree automaton. Run
pl2tptp rinter.model.pl >rinter_nd.p
h1 -no-trim -no-alternation rinter_nd.p
This yields the following automaton in file rinter_nd.model.pl.



Using pl2gastex on the output rinter_nd.model.pl, we arrive at the non-deterministic tree automaton of Figure 5. This should be more readable. The final state is q. Note that the result still contains two copies of the automata recognizing respectively all lists of even numbers, and all trees with leaves of the form nil or 3n+2, which are not needed any longer.



Figure 5: Eliminating intersection nodes from Figure 4



The automaton of Figure 5, i.e., in file rinter_nd.model.pl, uses new states such as __inter_even_and_one__mod__3 (which recognizes all terms which are both even numbers and numbers of the form 3n+1). In general, these new states are named
__inter_P1_P2__Pn
and are meant to recognize all terms that are recognized at P1 and at P2 and ... and at Pn at the same time. They appear as P1P2 ∩ … ∩ Pn under pl2gastex.

3.3.5  Purging Tree Automata

Well, Figure 5 should be more readable... but there is some junk here. First, there are two sub-automata, disconnected from the rest, defining the predicates zero_mod_3, one_mod_3, two_mod_3, tree_3n_plus_2, and odd, even, list_even. They do not contribute at all to the definition of the language of r. Second, there are also spurious states such as q, or __aux_1 (drawn as a small state or ). Use plpurge to purge the automaton of Figure 5 from all spurious states, by running
plpurge -final r rinter_nd.model.pl >rinter_nd.purged.pl



Figure 6: Purging the automaton of Figure 5



Hence we see that the relation r is simply the relation relating all numbers that are both odd and equal to 2 modulo 3 (state oddtwo_mod_3) to all objects that are both lists of even numbers and trees of with leaves equal to nil or 3n+2 (state list_eventree_3n_plus_2).

Looking a bit more in depth, the numbers that are both odd and equal to 2 modulo 3 are 5, 11, 17, ..., in other words the numbers that are equal to 5 modulo 6. And the objects that are both lists of even numbers and trees of with leaves equal to nil or 3n+2 are just lists of numbers equal to 2 modulo 6.

3.3.6  Determinizing Tree Automata

Looking at Figure 6, we realize that taking the successor, i.e., applying the s function to a term recognized at oddone_mod_3, yields a term that is recognized both at eventwo_mod_3 and at eventree_3n_plus_2. This is a form of non-determinism: we may want to travel to either state, not knowing which will eventually lead to acceptance.

To cater for this, we may determinize our tree automata. This produces an equivalent deterministic tree automaton, i.e., a set of Horn clauses of the form
  P (f (X1, …, Xn)) :- P1 (X1), …, Pn (Xn)     (5)
where X1, ..., Xn are pairwise distinct variables, and where there is at most one such clause for each (n+1)-tuple (f, P1, …, Pn). The automaton of Figure 6 is not deterministic because it contains the two clauses



By definition, a deterministic tree automaton can also be seen as a partial function If from tuples of predicates to predicates, one for each f. I.e., If (P1, …, Pn) = P if there is a, necessarily unique, clause of the form (5).

To determinize the automaton rinter_nd.purged.pl of Figure 6, run
pldet rinter_nd.purged.pl >rinter_d.xml
This produces a deterministic tree automaton in file rinter_d.xml:



The format will be explained in more detail in Section 7.

We can then convert this deterministic tree automaton into Prolog notation, since every deterministic tree automaton is a particular case of a non-deterministic tree automaton (itself a particular case of an alternating tree automaton). Use auto2pl this way:
auto2pl rinter_d.xml >rinter_d.pl
This produces a file rinter\_d.pl, which is probably slightly more readable than the XML file above:



Now draw the resulting automaton in Figure 7, using pl2gastex:
pl2gastex rinter_d.pl >rinterd.tex



Figure 7: Determinizing the automaton of Figure 6



The automaton of Figure 7 is not too big. But beware: determinizing tree automata may produce automata that are exponentially larger in the general case. In fact, pldet may just take forever on some alternating, or even non-deterministic tree automata.

3.3.7  Computing Unions, Transitive Closures

We have seen how to compute intersections of tree automata in Section 3.3.2. Computing unions is just as easy. Say that you want to compute the union of the sets of lists of even numbers (listeven.p) and of the trees whose leaves are nil or 3n+2, n in |N. That is, instead of computing the intersection of the languages L1 and L2 introduced at the beginning of Section 3.3, we compute their union. As before, build a file, say list_even_union_tree3plus2.p, by concatenating the clauses from listeven.p and from tree3plus2.p, but this time add the two clauses
q (X) :- list_even (X)
q (X) :- tree_3n_plus_2 (X)
so that the fresh state q recognizes the terms that are recognized by either the state list_even or by tree_3n_plus_2. Concretely, run the following commands:
OUT=list_even_union_tree3plus2
cat listeven.p tree3plus2.p >$OUT.p
echo "input_clause ("$OUT"_1, axiom, \
        [++q(X), --list_even(X)])." >>$OUT.p
echo "input_clause ("$OUT"_2, axiom, \
        [++q(X), --tree_3n_plus_2(X)])" >>$OUT.p
Now run h1 -no-trim as above:
h1 -no-trim list_even_union_tree3plus2.p
We get an equivalent alternating tree automaton in the file list_even_union_tree3plus2.model.pl:



Graphically, this is the alternating tree automaton of Figure 8, again obtained using pl2gastex.



Figure 8: Trees with leaves equal to nil or to 3n+2, or which are lists of even natural numbers



This can be determinized again. Run pldet and auto2pl :
pldet list_even_union_tree3plus2.model.pl \
  | auto2pl - >list_even_union_tree3plus2_d.pl
and you get



As you may see, the determinized automaton is much larger this time, and pl2gastex now has real trouble trying to draw it. (See Figure 9.) We use twopi as graph layout engine here instead of neato and run:
pl2gastex -v -layout twopi \
  -overlap false \
  list_even_union_tree3plus2_d.pl \
   >list_even_union_tree3plus2_model_d.tex



Figure 9: Determinizing the automaton of Figure 8



The trick we have used to compute intersections and unions, namely concatenating files and adding new clauses (provided the concatenated files agree on the semantics of predicates), can also be used to compute other combinations of tree languages. A particularly interesting one is the computation of transitive closures of relations defined by tree automata (or, more generally, by H1 clause sets).

For the purpose of illustration, imagine you want to compute the transitive closure of the binary relation r defined in rinter_d.pl (drawn in Figure 7, Section 3.3.6). Just create a fresh binary predicate symbol r+, and add the clauses
r+ (X, Y) :- r (X, Y)
r+ (X, Z) :- r (X, Y), r (Y, Z)
Concretely, run
OUT=rinter_d_plus
pl2tptp rinter_d.pl >$OUT.p
echo "input_clause (r_plus_r, axiom, \
        [++r_plus(X,Y), --r(X,Y)])." >>$OUT.p
echo "input_clause (r_plus_tc, axiom, \
        [++r_plus(X,Z), --r_plus(X,Y), --r_plus(Y,Z)])" >>$OUT.p
Compute an equivalent non-deterministic tree automata by
h1 -no-trim -no-alternation rinter_d_plus.p
You get



Since the clauses defining r+, i.e., r_plus, are the same as those defining r, the transitive closure of r is r+ itself here: r was already transitive. (Exercise: why?)

3.3.8  Complete Deterministic Tree Automata, Taking Complements

We observed in Section 3.3.6 that a deterministic tree automaton could be seen as a partial function If from tuples of predicates to predicates, one for each function symbol f.

A complete deterministic tree automaton is a set of Horn clauses of the form (5), i.e.,
P (f (X1, …, Xn)) :- P1 (X1), …, Pn (Xn)
where X1, ..., Xn are pairwise distinct variables, and where there is exactly one such clause for each (n+1)-tuple (f, P1, …, Pn) (instad of at most one such clause for incomplete automata).

Any deterministic tree automaton can be completed to a complete one, by adding a catch-all state __all (shown as ⊤ by pl2gastex), to which all missing transitions are directed. Precisely, if there is an (n+1)-tuple (f, P1, …, Pn) such that there is not clause as above, then add the clause
__all (f (X1, …, Xn)) :- P1 (X1), …, Pn (Xn)
This must also be done whenever any one of P1, ..., Pn is the catch-all state __all.

The function If is then total. The collection of all such If defines a finite model, whose set of values is that of all predicates. A value satisfies a predicate P if and only if it is P, seen as a value.

In fact, finite models and complete deterministic tree automata are exactly the same notion. You might want to ponder this.

As an example, let us return to the Dreadsbury mansion murder mystery (Section 3.2). As we have seen, the only to have possibly killed Aunt Agatha is Aunt Agatha herself. We have proved this by showing that the set of clauses in butler-puzzle.p (and explained in Section 3.2) with WHO defined as agatha was satisfiable.

Since this set is satisfiable, it has a model. Well, h1 computes such a model, in the guise of an alternating tree automaton. Run
cpp -P -DWHO=agatha butler-puzzle.p >agatha.p
h1 -no-trim agatha.p
and you'll get it in file agatha.model.pl (see Figure 10):




Figure 10: Why Agatha killed herself



All right, this does not look like a model at all (much less an explanation!), but remember there is a complete deterministic tree automaton that is equivalent to it. We know how to compute an equivalent deterministic tree automaton, using pldet and auto2pl, viz.
pldet agatha.model.pl | auto2pl - >agatha_d.pl
pl2gastex -v agatha_d.pl >agathad.tex
Using pl2gastex, as usual, produces a visual representation of it, see Figure 11;



Figure 11: Why Agatha killed herself, deterministically



So Agatha killed herself, Agatha is the only one not to be richer than Agatha, Agatha hates herself and Charles, the butler hates Agatha and Charles, and Charles hates noone.

Let us now produce the corresponding complete deterministic tree automaton. This can be done using auto2pl, which we have already seen, using the -complete 1 option. Be warned, though, that complete deterministic tree automata are large: with n states (included the catch-all states), any function taking k arguments will contribute nk clauses, never less—and do not forget that any k-ary predicate symbol P with k≥ 2 creates an invisible function symbol fP, which will contribute nk clauses as well. Anyway, run
pldet agatha.model.pl | auto2pl -complete 1 - | sort >agatha_c.pl
and you'll get the resulting complete deterministic tree automaton in file agatha_c.pl.

3.4  The Needham-Schroeder Symmetric Key Protocol Example

Try another example, nspriv.p, an encoding of the Needham-Schroeder symmetric key protocol, together with three queries. We do not mean that this is the only possible description of this protocol, and only use this example as a motivation for using h1 with more complex clause sets.

Here are the clauses of nspriv.p, in Prolog notation. We have shown the name of the clause on the left. First, we define a predicate knows that is meant to recognize all messages that a malevolent intruder is able to build. The first clauses say that attackers can build any list of messages provided it knows each message in the list, and conversely that it can build any message that appears at any position in a list it knows:
intruder_knows_nil knows (nil)    
intruder_can_take_first_components knows (M1) :- knows (cons (M1, M2))
intruder_can_take_second_components knows (M2) :- knows (cons (M1, M2))
intruder_can_build_pairs knows (cons (M1, M2)) :- knows (M1), knows (M2)
Lists such as [M1, M2, …, Mn] are encoded, very classically, as terms cons (M1, cons (M2, … cons (Mn, nil) …)), whence the clauses above. Using a binary symbol crypt to denote encryption, i.e., crypt (M, K) denotes the result of encrypting M using key K, we may also write the following two important rules, stating that the intruder can always encrypt any message it knows using any message it knows, used as a key; and conversely, that the intruder can always decrypt a message if he has the right key.
intruder_can_encrypt knows (crypt (M, K)) :- knows (M), knows (K)
intruder_can_decrypt_if_has_private_key    
  knows (M) :- knows (crypt (M, key (pub, K))),
      knows (key (prv, K))
intruder_can_decrypt_if_has_public_key    
  knows (M) :- knows (crypt (M, key (prv, K))),
      knows (key (pub, K))
intruder_can_decrypt_if_has_symmetric_key    
  knows (M) :- knows (crypt (M, key (sym, X))),
      knows (key (sym, X))
The last three clauses state how the intruder may decrypt a message of the form crypt (M, Z). We assume that keys come into three varieties, public keys of the form key (pub, K) where K is typically the name of the agent holding this public key; private keys of the form key (prv, K) where K is the name of the agent holding this private key; and symmetric keys of the form key (sym, X), where X is arbitrary. The last three clauses state that you may decrypt a message encrypted with a public key key (pub, K) provided you know the corresponding private key key (prv, X); that you may decrypt a message encrypted with a private key provided you know the corresponding public key; and that you may decrypt a message encrypted with a symmetric key provided you know the latter.

We also assume that there is an operation s that builds a new message s (M) from an old M, in bijection with M; while we can compute s (M) from M and recover M from s (M), the point is that M and s (M) always differ.
intruder_can_compute_successors knows (s (M)) :- knows (M)
intruder_can_compute_predecessors knows (M) :- knows (s (M))


In the Needham-Schroeder symmetric key protocol, Alice and Bob communicate with a trusted server to get a common private key that only they know, not the intruder. Alice can always start a session of the protocol and send the server a triple containing Alice's identity alice, Bob's identity bob, and a nonce, that is, a fresh message for this session. Following [2001  (Blanchet)], we encode this nonce as a function symbol applied to all parameters currently known, say noncea (alice, bob)—the function symbol noncea applied to Alice's identity alice and Bob's identity bob. Now we assume a worst-case intruder model, where any communication can be diverted by the intruder. The net effect is that, from a security viewpoint, what Alice does by sending a message consists exactly in making it known to the intruder:
alice_sends_message_1_to_server knows ( cons ( alice,
    cons ( bob,
    cons ( noncea (alice, bob),
      nil))))
If the server ever receives such a message, i.e., a message of the form cons (A, cons (B, cons (Na, nil))) for some arbitrary messages A, B, and Na (the server has no way of checking that Alice indeed sent the right message, and can only check the message it receives contains three fields), then it should send out (to Alice, but we have already seen this was irrelevant from a security point of view) the message crypt ([Na, B, Kab, crypt ([Kab, A], Kbs)], Kas), where Kab is some fresh key to be used by Alice and Bob, Kas is a long-term key shared between Alice and the server, and Kbs is a long-term key shared between Bob and the server.

Just like sending a message consists exactly in making it known to the intruder, receiving a message is modeled by stating that the intruder was able to build this message. We shall therefore write a clause saying that, if knows [A, B, Na] then knows  crypt ([Na, B, Kab, crypt ([Kab, A], Kbs)], Kas). Note that we are effectively saying that, from the angle of security, the actions of the server amount to adding new capabilities to the intruder: if the intruder knows a message matching what the server expects, it can build the message that the server will answer, even though it may not know the long-term keys Kas and Kbs.

Now we encode Kas as the term key (sym, cons (A, cons (server, nil))), and Kbs as the term key (sym, cons (B, cons (server, nil))); note that A and B are variables here, representing the fact that the server will find these keys by looking up tables by the identities of A, resp. B. In current sessions, we encode Kab by the term key (sym, current_session (A, B, Na)). The key Kab as generated during older sessions is encoded by the term key (sym, old_session (A, B, Na)). Separating current from old sessions means we have to write two clauses:
server_answers_A_with_encrypted_packet
  knows (crypt ( cons ( Na,
    cons ( B,
    cons ( key (sym, current_session (A, B, Na)),
    cons ( crypt (cons (key (sym, current_session (A, B, Na)),
                cons (A, nil)),
                key (sym, cons (B, cons (server, nil)))),
      nil)))),
    key ( sym, cons (A, cons (server, nil)))))
              :-    knows (cons (A, cons (B, cons (Na, nil))))
intruder_knows_previous_server_messages
  knows (crypt ( cons ( Na,
    cons ( B,
    cons ( key (sym, old_session (A, B, Na)),
    cons ( crypt (cons (key (sym, old_session (A, B, Na)),
                cons (A, nil)),
                key (sym, cons (B, cons (server, nil)))),
      nil)))),
    key ( sym, cons (A, cons (server, nil)))))
              :-    knows (cons (A, cons (B, cons (Na, nil))))


All right, now, when Alice receives the message from the server, she should send the part encrypted with Kbs to Bob. The idea is that while Alice can decrypt the whole message, which is encrypted with Kas, only Bob can decrypt the sub-message that is encrypted with Kbs. The message that Alice receives from the server should be crypt ([Na, B, Kab, crypt ([Kab, A], Kbs)], Kas), however she can only check that it is of the form crypt ([Na, B, Kab, Msg) for some sub-messages Kab and Msg (which may be totally unrelated to what the server actually sent). She can however check that Na is the nonce noncea (alice, bob) that she created earlier (see above, clause alice_sends_message_1_to_server), and that B really is Bob's identity bob. So Alice expects a message of the form crypt ([noncea (alice, bob), bob, Kab, Msg], Kas). As we have said before, for Alice to receive this message, the intruder must send it to Alice, so the intruder must be able to build it. Once Alice receives this, it extract the sub-message Msg and forwards it to Bob—in fact adding it to the set of messages known to the intruder:
alice_gets_server_message_and_forwards_submessage_to_bob
  knows (Msg) :-    knows (crypt ( cons (noncea (alice, bob),
      cons (bob,
      cons (Kab,
      cons (Msg, nil)))),
      key (sym, cons (alice, cons (server, nil)))))
We use an auxiliary predicate alice_key meant to recognize all possible values of Kab above. This is the key as seen by Alice.
alice_gets_server_message_and_stores_current_session_key
     alice_key (Kab)
    :-    knows (crypt ( cons (noncea (alice, bob),
      cons (bob,
      cons (Kab,
      cons (Msg, nil)))),
      key (sym, cons (alice, cons (server, nil)))))


Let's see what Bob does in this protocol. First, Bob expects to receive the sub-message Msg above. Cutting it short, Bob decrypts it, gets Kab, then sends a confirmation challenge crypt (Nb, Kab), where Nb is a fresh nonce. As before, Nb is modeled as a function symbol nonceb applied to all relevant variables.
agent_B_gets_forwarded_submessage_and_sends_confirmation_challenge
   knows (crypt (nonceb (Kab, A, B), Kab) :- knows (crypt (cons (Kab, cons (A, nil)),
                 key (sym, cons (B, server, nil))))))
On receiving this challenge, Alice tries to decrypt it with its own version of the key Kab, and sends back Nb+1:
alice_answers_confirmation_challenge
     knows (crypt (s (Nb), Kab)) :- alice_key (Kab), knows (crypt (Nb, Kab))
Bob then checks that it indeed gets the confirmation message above with the right value for Nb. If so, we store the key Kab in the predicate bob_key:
agent_B_checks_confirmation_from_A
     bob_key (Kab) :- knows (crypt (s (nonceb (Kab, A, B)), Kab))
This terminates the description of the protocol. Let us now describe additional things the intruder know. First, the intruder is assumed to know the identities of all agents. We also list who we think are agents. Note that the intruder itself is considered an agent, and has its own identity intruder.
alice_is_an_agent agent (alice)    
bob_is_an_agent agent (bob)    
server_is_an_agent agent (server)    
intruder_is_an_agent agent (intruder)    
intruder_knows_all_agents knows (X) :- agent (X)
We also posit that the intruder knows all public keys (... because this is what we mean for them to be public!), and its own private key. We also assume that older sessions are so old that the intruder eventually managed to crack all old sessions key. This is slightly pessimistic. But it is precisely the purpose of changing session keys to prevent intruders from gaining anything from cracking old session keys.
intruder_knows_every_public_key knows (key (pub, X))    
intruder_knows_own_private_key knows (key (prv, intruder))    
intruder_knows_all_previous_session_keys knows (key (sym, old_session (A, B, Na)))  
This is all, at last.

Let us now ask a few queries. The first asks whether the intruder may know the key Kab as it was generated by the server. The second asks whether the intruder may know any key that Alice accepted as being a key Kab at the end of the protocol. The third asks whether the intruder may know any key that Bob accepted as being Kab at the end of the protocol.
intruder_knows_session_key_generated_by_server
     false :- knows (key (sym, current_session (alice, bob, Na)))
intruder_knows_session_key_as_seen_by_alice
     false :- alice_key (key (Mode, current_session (X, Y, N))),
      knows (key (Mode, current_session (X, Y, N)))
intruder_knows_session_key_as_seen_by_B
     false :- knows (crypt (s (nonceb (Kab, A, B)), Kab)), knows (Kab)


Now launch h1 on nspriv.p:
h1 nspriv.p
You should get
*** Derived: intruder_knows_session_key_as_seen_by_B ***
This means in short that h1 believes that this clause set is unsatisfiable; in terms of protocols, that there is an attack. You cannot actually be sure that this is indeed an attack, i.e., that this clause set is indeed unsatisfiable, because this clause set is not in the H1 format. (You should not conclude from this that cryptographic protocols always produce clause sets outside H1, see [2002  (Nielson et al.)].)

In this case, and contrarily to the example of Section 3.2, h1 computes an approximation of the input clause set S as a clause set S1, and reasons on S1 instead. Check it using the -check-h1 2 option:
h1 -check-h1 2 nspriv.p
and you'll get the list of all clauses in nspriv.p that are not in H1:
Warning: clause server_answers_A_with_encrypted_packet has non-linear head, variable A occurs repeatedly. Warning: clause intruder_knows_previous_server_messages has non-linear head, variable A occurs repeatedly. Warning: clause agent_B_gets_forwarded_submessage_and_\ \sends_confirmation_challenge has non-linear head, variable Kab occurs repeatedly. Warning: clause alice_answers_confirmation_challenge has two non-sibling variables in the head that are connected in the body, Nb and Kab. Stop.
Note also that h1 gives you an explanation of what's going wrong. For further explanation of what linear terms, sibling variables and connected variables are, see [2002  (Nielson et al.)] or [2005  (Goubault-Larrecq)].

The -check-h1 1 option runs the same checks. However, it still proceeds with checking the (un)satisfiability of the approximated clause set S1, even when the approximation is not exact, i.e., when the original clause set S is not in H1.

The approximated clause set S1 is always in the class H1, and always implies S logically. In particular, if h1 does not find any contradiction, i.e., if S1 is satisfiable, then S is satisfiable, too. In terms of cryptographic protocols, if h1 tells you that your protocol has no attack, then you can be sure of it. (That is, up to the accuracy of your model, written as clauses.)

In our case, h1 found a purported attack, i.e., a purported contradiction. It turns out that nspriv.p is indeed contradictory. In fact, in most cases where h1 thinks a clause set is contradictory, it is indeed contradictory. (Although, as usual, your mileage may vary.)

We terminate this example by mentioning the -all option.

The answer
*** Derived: intruder_knows_session_key_as_seen_by_B ***
above means that h1 found a (purported) contradiction first, and second that, in order to derive the empty clause false, h1 required the clause intruder_knows_session_key_as_seen_by_B. Is this the only query that fails? Remember we have asked three queries. One way of checking this is to remove the intruder_knows_session_key_as_seen_by_B clause from nspriv.p and run h1 again. There is a simpler option: run h1 with the -all option. Once a contradiction has been found, instead of stopping just like any other prover, h1 -all will continue, trying to find contradictions using other queries.

On the example, running
h1 -all nspriv.p
you will get the same answer as above:
*** Derived: intruder_knows_session_key_as_seen_by_B ***
In other words, no other query fails. In terms of security, this means that while Bob's key Kab is not secure, those of Alice and the server definitely are. This may seem paradoxical, however the attack that h1 just found is one where the intruder deceives Bob into accepted an old, cracked key from an older session. This key is not the key that the server produced. It is also old, meaning that it will be detected as not being new by Alice's use of the nonce Na.

We have just proved that Alice and the server were in fact safe in this protocol, while Bob is not.

The -no-resolve option can be used to disable the reasoning facilities of h1 altogether. Then h1 -no-resolve will just compute an approximation of the input clause set, and store it into the log file. For example,
h1 -no-resolve nspriv.p
runs without any output, and creates the log file nspriv.log.gz. As we have already said, it is not really instructive to look at this file directly. However, you can extract from it the approximated clause set by running:
h1getlog processed nspriv.log | pl2tptp -
This will output the approximated clause set on stdout: h1getlog with the processed option will extract the approximate clauses and print them in Prolog format. Then pl2tptp with the - argument (meaning stdin) will convert these from Prolog format to TPTP format.

All this can be done in a simpler way, by using the -log-out option to h1. Then h1 will output its log file to stdout. You can then pipe it to h1getlog and pl2tptp, as follows.
h1 -log-out -no-resolve nspriv.p | h1getlog processed | pl2tptp -

4  The h1 Prover

Most of the features of h1 are explained in Section 3. We explain the structure of the h1 command line in Section 4.1, then turn to the theoretical basis of h1 in Section 4.2. We refine this description in Section 4.3, by stating the precise rules used by h1.

4.1  How to Use h1

The h1 tool recognizes the following options. As in all Unix tools, options start with a minus sign -. Some options are flags, which can be toggled between an active state and a deactivated state. In this case, the convention used in h1 and other tools of the h1 suite is to have -<option> activate the option, while -no-<option> deactivates it.

The options recognized by the h1 prover are:

4.2  Theoretical Background

The essentials of the algorithmic underpinnings of h1 are described, rather tersely, in [2005  (Goubault-Larrecq)]: h1 implements a form of ordered resolution with selection

4.3  Principle of Operation

5  Explaining and Checking Proofs: h1trace, h1logstrip

6  Model-Checking Clause Sets and Explaining the Absence of Proofs with h1mc

6.1  Theoretical Background

7  Determinizing Tree Automata with pldet

8  Converting XML Deterministic Tree Automata to Prolog Notation with auto2pl

9  Cleaning and Extracting Automata with plpurge

10  Converting Tree Automata and Prolog Programs to TPTP Files

11  Applying Morphisms to H1 Clause Sets with tptpmorph

12  Solving Presburger Arithmetic Formulas with linauto

13  Displaying Automata with pl2gastex

Sometimes, pl2gastex gives pretty good results.



On some more complex examples, pl2gastex gives pretty awful results.





14  Log Files and h1getlog

15  Bugs

There is no DTD for deterministic tree automata in XML format.

Yes, I know. Sorry. I don't know how to write DTDs. If anybody wants to volunteer, the format should be clear from examples produced by h1. Anyone?

h1 works for a long time, then gets “Killed”

A typical case is that you have exceeded some system quotas. One case I had was that h1 was producing a log file which turned out to exceed 2 Gb, and Linux could not handle this. (Or, it does, but using non-portable system calls, from what I understood.)

If you don't need the log file, use the -no-log option to h1. Otherwise, use the -log-out option to h1, and redirected the standard output. E.g., instead of h1 blah.p, use h1 -log-out blah.p > blah.log.

Clause names are lost in converting clauses to Prolog format.

Yes, this is unfortunate. A few cases have been corrected, by using special comments. But more has to be done.

References

1999–2003  (Barras et al.)
B. Barras, S. Boutin, C. Cornes, J. Courant, J.-C. Filliâtre, E. Giménez, H. Herbelin, G. Huet, C. Muñoz, C. Murthy, C. Parent, C. Paulin-Mohring, A. Saibi, and B. Werner. The Coq proof assistant reference manual: Version 7.4. Rapport technique, INRIA, France, 1999–2003. http://coq.inria.fr/doc/main.html.

2001  (Blanchet)
B. Blanchet. An efficient cryptographic protocol verifier based on Prolog rules. In 14th IEEE Computer Security Foundations Workshop (CSFW-14), pages 82–96. IEEE Computer Society Press, 2001.

1996  (Boudet and Comon)
A. Boudet and H. Comon. Diophantine equations, Presburger arithmetic and finite automata. In H. Kirchner, editor, Colloquium on Trees in Algebra and Programming (CAAP'96), pages 30–43. Springer Verlag LNCS 1059, 1996.

1996  (Devienne et al.)
P. Devienne, P. Lebègue, A. Parrain, J.-C. Routier, and J. Würtz. Smallest Horn clause programs. Journal of Logic Programming, 27 (3): 227–267, 1996. URL citeseer.nj.nec.com/devienne94smallest.html.

1991  (Frühwirth et al.)
T. Frühwirth, E. Shapiro, M. Y. Vardi, and E. Yardeni. Logic programs as types for logic programs. In Proc. 6th Symp Logic in Computer Science, pages 300–309, 1991.

2005  (Goubault-Larrecq)
J. Goubault-Larrecq. Deciding H1 by resolution. Information Processing Letters, 95 (3): 401–408, Aug. 2005. URL http://www.lsv.ens-cachan.fr/Publis/PAPERS/PDF/Goubault-h1.pdf.

2004  (Goubault-Larrecq)
J. Goubault-Larrecq. Une fois qu'on n'a pas trouvé de preuve, comment le faire comprendre à un assistant de preuve ? In Actes 15èmes journées francophones sur les langages applicatifs (JFLA'04), Sainte-Marie-de-Ré, France, Jan. 2004, pages 1–20. INRIA, collection didactique, 2004. URL http://www.lsv.ens-cachan.fr/Publis/PAPERS/JGL-JFLA2004.ps.

2005  (Goubault-Larrecq and Parrennes)
J. Goubault-Larrecq and F. Parrennes. Cryptographic protocol analysis on real C code. In R. Cousot, editor, Proceedings of the 6th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI'05), volume 3385 of Lecture Notes in Computer Science, Paris, France, Jan. 2005. Springer. URL http://www.lsv.ens-cachan.fr/Publis/PAPERS/PDF/GouPar-VMCAI2005.pdf. To appear.

2002  (Nielson et al.)
F. Nielson, H. R. Nielson, and H. Seidl. Normalizable Horn clauses, strongly recognizable relations and Spi. In 9th Static Analysis Symposium (SAS). Springer Verlag LNCS 2477, 2002.

1986  (Pelletier)
F. J. Pelletier. Seventy-five problems for testing automatic theorem provers. Journal of Automated Reasoning, 2: 191–216, 1986.

2001  (Selinger)
P. Selinger. Models for an adversary-centric protocol logic. Electronic Notes in Theoretical Computer Science, 55 (1): 73–87, 2001. Proceedings of the 1st Workshop on Logical Aspects of Cryptographic Protocol Verification (LACPV'01), J. Goubault-Larrecq, ed.

2002  (Suttner and Sutcliffe)
C. B. Suttner and G. Sutcliffe. The TPTP problem library v2.5.0, 2002. URL http://www.cs.miami.edu/~tptp/TPTP/TR/TPTPTR.shtml.

Concept Index

Command Index



This document was translated from LATEX by HEVEA.