Using index automaton
Major speedups in rewriting can be obtained by using a specialized data structure for rewriting system. The structure is known in the contex[Sims1994] of Knuth-Bendix completion as an index automaton, but a more widely name for is the Aho-Corasik automaton[Aho1975]. This automaton provides an optimal complexity for string searching in a corpus, namely, once the automaton is built, the searching for the left hand sides of the rules (and hence rewriting) takes Ω(length(w))
where w
is the word being rewritten. Thus we can obtain rewrites in time independent of the size of the rewriting system.
To learn more about fast rewriting using the automaton see Index automaton rewriting.
KnuthBendix.KBIndex
— TypeKBIndex <: KBS2Alg <: CompletionAlgorithm
The Knuth-Bendix completion algorithm that (if successful) yields the reduced, confluent rewriting system generated by rules of rws
.
KBIndex
uses index automaton to perform rewrites, and delays the incorporation of newly found rules by maintaining a stack of them of size controlled by reduce_delay
of the Settings
passed along to knuthbendix
call.
Additionally confluence checks by backtrack search on the index automaton are performed after periods when no new rules were discovered.
You can run it on a rws
by calling
knuthbendix(KnuthBendix.Settings(KnuthBendix.KBIndex()), rws)
By default the size of the stack is 100
rules and the algorithm terminates early after 10_000
of rewriting rules have been found. To control this behaviour Settings
struct has many knobs and whistles.
The 8th quotient of (2,3,7) triangle group from Knuth-Bendix completion using a stack poses no challenges to KBIndex
which solves it in a fraction of second. The performance bottleneck for larger examples is the incorporation of the stack of newly found rules into the rewriting system, while maintaining its reducedness (the construction of the index automaton, but more importantly, the confluence check require a reduced rewriting system).
Keeping the construction of the automaton as a black box, below is a julia-flavoured pseudocode describing the completion procedure.
function knuthbendix!(settings::Settings{KBIndex}, rws::RewritingSystem)
stack = Vector{Tuple{Word,Word}}()
rws = reduce!(rws) # this is assumed by IndexAutomaton!
idxA = Automata.IndexAutomaton(rws)
# ... initialize more temporary structures here
for ri in rules(rws)
for rj in rules(rws)
find_critical_pairs!(stack, idxA, ri, rj, ...)
ri === rj && break
find_critical_pairs!(stack, idxA, rj, ri, ...)
end
if lenght(stack) > settings.reduce_delay || ri == last(rules(rws))
# reduce the rws and then rebuild idxA
rws, ... = reduce!(KBPrefix(), rws, stack)
idxA, ... = Automata.rebuild!(idxA, rws)
end
if time_to_check_confluence(...)
rws, ... = reduce!(KBPrefix(), rws, stack)
idxA, ... = Automata.rebuild!(idxA, rws)
stack = check_confluence!(stack, rws, idxA)
if isempty(stack)
return rws
end
end
end
return rws
end
The main difference in the procedure is that instead immediately forcing reducedness with every discovered rule we delay this check until stack
is sufficiently large. Only then
- we invoke
reduce!
which- constructs
PrefixAutomaton
from rules inrws
and thestack
. - reduces the automaton (with complexity
Ω(k²)
, wherek
is the number of rules inrws
) - updates rules of
rws
- constructs
- we re-sync the index automaton
idxA
withrws
.
This allows to amortize the time needed for reduction of rws
(dominating!) and the construction of idxA
(rather cheap, in comparison) across many different pairs of rules (ri, rj)
.
Backtrack search and the test for (local) confluence
Another very important feature of IndexAutomaton
is that it allows to test cheaply for confluence. Naively we need to check all pairs of rules for their suffix-prefix intersections and resolve the potentially critical pairs. However if we have an idxA::IndexAutomaton
at our disposal checking for confluence becomes much easier: given a rule lhs → rhs
from our rewriting system we need to see how can we extend lhs[2:end]
to a word w
which ends with the left-hand-side lhs'
of some other rule.
Here we don't need to search for all possible completions, since if w
can be written as w = lhs[2:end]*X*lhs'
, then it satisfies local confluence w.r.t. the rewrites with the given rules trivially. If you think about index automaton as a tree with some (a plenty of) additional "shortcut" edges, then the answer is right in front of you: for every lhs
we need to perform a single backtrack search on idxA
. The search starts at the last state of trace(idxA, lhs[2:end])
. We then backtrack
- when the depth of search becomes too large i.e. equal to
lenght(lhs[2:end])
, or - when the edge lead us to a vertex closer to the origin than we were (i.e. we took a shortcut).
This mechanism is implemented by Automata.ConfluenceOracle
, see Backtrack searches.
With this, instead of examining k²
pairs for all possible suffix-prefix intersections it is enouth to perform k
backtrack searches (where k
is the number of rules of our rws
). In pseudocode
function check_confluence(rws::RewritingSystem, idxA::IndexAutomaton)
stack = Vector{Tupe{Word,Word}}()
backtrack = Automata.BacktrackSearch(idxA, Automata.ConfluenceOracle())
for ri in rules(rws)
stack = find_critical_pairs!(stack, backtrack, ri, ...)
isempty(stack) || break
# i.e. we found some honest failures to local confluence
end
return stack
end
If you inspect the actual code, you will be surprised how close it is to the listing above.
KnuthBendix.find_critical_pairs!
— Methodfind_critical_pairs!(stack, bts, rule, work)
Find critical pairs by completing lhs of rule
by backtrack search on index automaton.
If rule
can be written as P → Q
this function performs a backtrack search on bts.automaton
to find possible completions of P[2:end]
to a word which ends with P'
where P' → Q'
is another rule. The search backtracks when its depth is greater than or equal to the length of P'
to make sure that P
and P'
share an overlap (i.e. a nonempty suffix of P
is a prefix of P'
)
- Sims1994Charles C. Sims Computation with finitely presented groups, Cambridge University Press, 1994.
- Aho1975Alfred V. Aho and Margaret J. Corasick Efficient string matching: an aid to bibliographic search Commun. ACM 18, 6 (June 1975), 333–340. doi:10.1145/360825.360855