Using index automaton

Major speedups in rewriting can be obtained by using a specialized data structure for rewriting system. The structure is known in the contex^[Sims1994] of Knuth-Bendix completion as an index automaton, but a more widely name for is the Aho-Corasik automaton^[Aho1975]. This automaton provides an optimal complexity for string searching in a corpus, namely, once the automaton is built, the searching for the left hand sides of the rules (and hence rewriting) takes Ω(length(w)) where w is the word being rewritten. Thus we can obtain rewrites in time independent of the size of the rewriting system.

To learn more about fast rewriting using the automaton see Index automaton rewriting.

KnuthBendix.KBIndex — Type

KBIndex <: KBS2Alg <: CompletionAlgorithm

The Knuth-Bendix completion algorithm that (if successful) yields the reduced, confluent rewriting system generated by rules of rws.

KBIndex uses index automaton to perform rewrites, and delays the incorporation of newly found rules by maintaining a stack of them of size controlled by reduce_delay of the Settings passed along to knuthbendix call.

Additionally confluence checks by backtrack search on the index automaton are performed after periods when no new rules were discovered.

source

You can run it on a rws by calling

knuthbendix(KnuthBendix.Settings(KnuthBendix.KBIndex()), rws)

By default the size of the stack is 100 rules and the algorithm terminates early after 10_000 of rewriting rules have been found. To control this behaviour Settings struct has many knobs and whistles.

Performance

The 8th quotient of (2,3,7) triangle group from Knuth-Bendix completion using a stack poses no challenges to KBIndex which solves it in a fraction of second. The performance bottleneck for larger examples is the incorporation of the stack of newly found rules into the rewriting system, while maintaining its reducedness (the construction of the index automaton, but more importantly, the confluence check require a reduced rewriting system).

Keeping the construction of the automaton as a black box, below is a julia-flavoured pseudocode describing the completion procedure.

function knuthbendix!(settings::Settings{KBIndex}, rws::RewritingSystem)
    stack = Vector{Tuple{Word,Word}}()
    rws = reduce!(rws) # this is assumed by IndexAutomaton!
    idxA = Automata.IndexAutomaton(rws)

    # ... initialize more temporary structures here
    for ri in rules(rws)
        for rj in rules(rws)
            find_critical_pairs!(stack, idxA, ri, rj, ...)
            ri === rj && break
            find_critical_pairs!(stack, idxA, rj, ri, ...)
        end
        if lenght(stack) > settings.reduce_delay || ri == last(rules(rws))
            # reduce the rws and then rebuild idxA
            rws, ... = reduce!(KBPrefix(), rws, stack)
            idxA, ... = Automata.rebuild!(idxA, rws)
        end

        if time_to_check_confluence(...)
            rws, ... = reduce!(KBPrefix(), rws, stack)
            idxA, ... = Automata.rebuild!(idxA, rws)
            stack = check_confluence!(stack, rws, idxA)
            if isempty(stack)
                return rws
            end
        end
    end
    return rws
end

The main difference in the procedure is that instead immediately forcing reducedness with every discovered rule we delay this check until stack is sufficiently large. Only then

we invoke reduce! which
- constructs PrefixAutomaton from rules in rws and the stack.
- reduces the automaton (with complexity Ω(k²), where k is the number of rules in rws)
- updates rules of rws
we re-sync the index automaton idxA with rws.

This allows to amortize the time needed for reduction of rws (dominating!) and the construction of idxA (rather cheap, in comparison) across many different pairs of rules (ri, rj).

Backtrack search and the test for (local) confluence

Another very important feature of IndexAutomaton is that it allows to test cheaply for confluence. Naively we need to check all pairs of rules for their suffix-prefix intersections and resolve the potentially critical pairs. However if we have an idxA::IndexAutomaton at our disposal checking for confluence becomes much easier: given a rule lhs → rhs from our rewriting system we need to see how can we extend lhs[2:end] to a word w which ends with the left-hand-side lhs' of some other rule.

Here we don't need to search for all possible completions, since if w can be written as w = lhs[2:end]*X*lhs', then it satisfies local confluence w.r.t. the rewrites with the given rules trivially. If you think about index automaton as a tree with some (a plenty of) additional "shortcut" edges, then the answer is right in front of you: for every lhs we need to perform a single backtrack search on idxA. The search starts at the last state of trace(idxA, lhs[2:end]). We then backtrack

when the depth of search becomes too large i.e. equal to lenght(lhs[2:end]), or
when the edge lead us to a vertex closer to the origin than we were (i.e. we took a shortcut).

This mechanism is implemented by Automata.ConfluenceOracle, see Backtrack searches.

With this, instead of examining k² pairs for all possible suffix-prefix intersections it is enouth to perform k backtrack searches (where k is the number of rules of our rws). In pseudocode

function check_confluence(rws::RewritingSystem, idxA::IndexAutomaton)
    stack = Vector{Tupe{Word,Word}}()
    backtrack = Automata.BacktrackSearch(idxA, Automata.ConfluenceOracle())

    for ri in rules(rws)
        stack = find_critical_pairs!(stack, backtrack, ri, ...)
        isempty(stack) || break
        # i.e. we found some honest failures to local confluence
    end
    return stack
end

If you inspect the actual code, you will be surprised how close it is to the listing above.

KnuthBendix.find_critical_pairs! — Method

find_critical_pairs!(stack, bts, rule, work)

Find critical pairs by completing lhs of rule by backtrack search on index automaton.

If rule can be written as P → Q this function performs a backtrack search on bts.automaton to find possible completions of P[2:end] to a word which ends with P' where P' → Q' is another rule. The search backtracks when its depth is greater than or equal to the length of P' to make sure that P and P' share an overlap (i.e. a nonempty suffix of P is a prefix of P')

source

Sims1994Charles C. Sims Computation with finitely presented groups, Cambridge University Press, 1994.
Aho1975Alfred V. Aho and Margaret J. Corasick Efficient string matching: an aid to bibliographic search Commun. ACM 18, 6 (June 1975), 333–340. doi:10.1145/360825.360855