On rewriting
KnuthBendix.rewrite
— Functionrewrite(u::AbstractWord, rewriting)
Rewrites word u
using the rewriting
object. The object must implement rewrite!(v::AbstractWord, w::AbstractWord, rewriting)
.
Example
julia> alph = Alphabet([:a, :A, :b], [2,1,3])
Alphabet of Symbol
1. a (inverse of: A)
2. A (inverse of: a)
3. b (self-inverse)
julia> a = Word([1]); A = Word([2]); b = Word([3]);
julia> rule = KnuthBendix.Rule(a*b => a)
1·3 ⇒ 1
julia> KnuthBendix.rewrite(a*b^2*A*b^3, rule) == a*A*b^3
true
julia> KnuthBendix.rewrite(a*A*b^3, alph) == b
true
Internals
Implementing rewriting procedure naively could easily lead to quadratic complexity due to unnecessary moves of parts of the rewritten word. The linear-complexity algorithm uses two stacks v
and w
:
v
is initially empty (represents the trivial word), whilew
contains the content ofu
(the word to be rewritten with its first letter on its top).
- we pop the first letter
l = popfirst!(w)
fromw
, - we push
l
to the end ofv
, - we determine a rewritng rule (if it exists)
lhs → rhs
withv = v'·lhs
(i.e.lhs
is equal to a suffix ofv
), and we setv ← v'
i.e. we remove the suffix fomv
, andw ← rhs·w
i.e.rhs
is prepended tow
- if no such rule exists we go back to 1.
These four steps are repeated until w
becomes empty.
In julia flavoured pseudocode the rewrite procedure looks as follows:
function rewrite!(v::AbstractWord, w::AbstractWord, rewriting; kwargs...)
while !isone(w)
push!(v, popfirst!(w))
res = find_rule_with_suffix(rewriting, v)
isnothing(res) && continue # no rule matching was found
(lhs, rhs) = res
@assert v[end-length(lhs)+1:end] == lhs # i.e. v = v'·lhs
resize!(v, length(v)-length(lhs)) # v ← v'
prepend!(w, rhs) # w ← rhs·w
end
return v
end
In practice our implementations in place of stacks use RewritingBuffer
which consists of a pair of BufferWord
s (a special implementation of the AbstractWord
API).
Particular implementations of rewriting.
Here are some examples of the internal rewriting function already defined:
KnuthBendix.rewrite!
— Methodrewrite!(v::AbstractWord, w::AbstractWord, rule::Rule)
Rewrite word w
storing the result in v
by using a single rewriting rule
.
Example
julia> a = Word([1]); A = Word([2]); b = Word([3]);
julia> rule = KnuthBendix.Rule(a*b => a)
1·3 ⇒ 1
julia> v = one(a); KnuthBendix.rewrite!(v, a*b^2*A*b^3, rule);
julia> v == a*A*b^3
true
KnuthBendix.rewrite!
— Methodrewrite!(v::AbstractWord, w::AbstractWord, A::Alphabet)
Rewrite word w
storing the result in v
by applying free reductions as defined by the inverses present in alphabet A
.
Example
julia> alph = Alphabet([:a, :A, :b], [2,1,3])
Alphabet of Symbol
1. a (inverse of: A)
2. A (inverse of: a)
3. b (self-inverse)
julia> a = Word([1]); A = Word([2]); b = Word([3]);
julia> v = one(a); KnuthBendix.rewrite!(v, a*b^2*A*b^3, alph);
julia> v == b
true
Let a rewriting system rws
is given, and let lhsₖ → rhsₖ
denote its k
-th rule. Let ℒ = {lhsₖ}ₖ
denote the language of left-hand-sides of rws
and let N = Σₖ length(lhsₖ)
be the total length of ℒ
.
Naive rewriting
KnuthBendix.rewrite!
— Methodrewrite!(v::AbstractWord, w::AbstractWord, rws::RewritingSystem)
Rewrite word w
storing the result in v
using rewriting rules of rws
.
The naive rewriting with RewritingSystem
- moves one letter from the beginning of
w
to the end ofv
- checks every rule
lhs → rhs
inrws
untilv
containslhs
as a suffix, - if found, the suffix is removed from
v
andrhs
is prepended tow
.
Those steps repeat until w
is empty.
The complexity of this rewriting is Ω(length(w) · N)
, where N
is the total size of left hand sides of the rewriting rules of rws
.
See procedure REWRITE_FROM_LEFT
from Section 2.4[Sims1994], p. 66.
The naive rewriting with a rewriting system is therefore in the worst case proportional to the total size of the whole rws
which makes it a very inefficient rewriting strategy.
Index automaton rewriting
KnuthBendix.rewrite!
— Methodrewrite!(v::AbstractWord, w::AbstractWord, idxA::Automata.IndexAutomaton)
Rewrite word w
storing the result in v
using index automaton idxA
.
Rewriting with an IndexAutomaton
traces (i.e. follows) the path in the automaton determined by w
(since the automaton is deterministic there is only one such path). Whenever a terminal (i.e. accepting) state is encountered
- its corresponding rule
lhs → rhs
is retrived, - the appropriate suffix of
v
(equal tolhs
) is removed, and rhs
is prepended tow
.
Tracing continues from the first letter of the newly prepended word.
To continue tracing w
through the automaton we need to backtrack on our path in the automaton and for this rewrite
maintains a vector of visited states of idxA
(the history of visited states of idxA
). Whenever a suffix is removed from v
, the path is rewinded (i.e. shortened) to the appropriate length and the next letter of w
is traced from the last state on the path. This maintains the property that signature of the path is equal to v
at all times.
Once index automaton is build the complexity of this rewriting is Ω(length(w))
which is the optimal rewriting strategy.
See procedure INDEX_REWRITE
from Section 3.5[Sims1994], p. 113.
In practice the complexity of building and maintaining idxA
synchronized with ℒ
overwhelms gains made in rewriting (to construct idxA
one need to fully reduce rws
first which is a very costly operation).
Prefix automaton rewriting
Rewriting with a prefix automaton pfxA
traces w
through a non non-deterministic automaton which is pfxA
with a ε
-labeled loop added at its (unique) initial state. In tracing we follow all possible paths determined by w
, i.e. in this case each path begins at every letter of w
. Whenever a non-accepting state is encountered on any of the paths, the corresponding rule lhs → rhs
is used for rewriting.
This rewriting can be also understood differently: given the non-deterministic automaton one could determinize it through power-set construction and then trace deterministicaly in the automaton whose states are subsets of states of the initial automaton. Here we do it without realizing the power-set explicitly and we are tracing through a procedure described in Sims book as the accessible subset construction.
KnuthBendix.rewrite!
— Methodrewrite!(v::AbstractWord, w::AbstractWord, pfxA::PrefixAutomaton[; history, skipping])
Rewrite word w
storing the result in v
using prefix automaton idxA
. As rewriting rules are stored externally, they must be passed in the rules
keyword argument.
Rewriting with a PrefixAutomaton
traces (i.e. follows) simultanously all paths in the automaton determined by w
. To be more precise we trace a path in the power-set automaton (states are the subsets of states of the original automaton) via a prodedure described by Sims as the lazy accessible subset construction. Since the non-deterministic part of the automaton consists of ε
-loop at the initial state there are at most length(w)-1
such paths.
Whenever a non-accepting state is encountered on any of those paths
- its corresponding rule
lhs → rhs
is retrived, - the appropriate suffix of
v
(equal tolhs
) is removed, and rhs
is prepended tow
.
Tracing continues from the first letter of the newly prepended word.
To continue tracing w
through the automaton we need to backtrack on our path in the automaton and for this rewrite
maintains a history consisting of subsets of states of pfxA
. Whenever a suffix is removed from v
, the path is rewinded (i.e. shortened to the appropriate length) and the next letter of w
is traced from the last state on the path. This maintains the property that signature of the path is equal to v
at all times.
Once prefix automaton pfxA
is build the complexity of this rewriting is Ω(length(w) · max(length(w), m))
, where m
is the number of states of pfxA
.
In practice the history consists of the subsets of states (of pfxA
) which are stored in PackedVector
, a contiguous array of states with separators. A new path is started whenever a new letter is pushed onto v
by simply adding the the initial state of pfxA
to the current subset. The rewinding of the history then happens simultanuously for all paths without additional book-keeping.
Once prefix automaton is build the complexity of this rewriting is achievable in realistically quadratic time with respect to the length of the rewritten word. This rewriting strikes a balance between the complexity of rewriting and the synchronization of pfxA
and ℒ
, as the insertion of a rule into pfxA
has linear complexity and can be accomplished without reducing the automaton.
Even more internals of rewriting
On RewritingBuffer
s
Rewriting is a crucial part of the Knuth-Bendix completion. In particular we do care plenty not only about the theoretical complexity, but also the practical speed of rewriting. You may be surprised then that a simple rewrite
allocates 6
times:
julia> alph = Alphabet([:a, :A, :b], [2,1,3])
Alphabet of Symbol
1. a (inverse of: A)
2. A (inverse of: a)
3. b (self-inverse)
julia> a = Word([1]); A = Word([2]); b = Word([3]);
julia> w = a*A*b^3;
julia> @time KnuthBendix.rewrite(w, alph)
0.000015 seconds (7 allocations: 336 bytes)
Word{UInt16}: 3
This is because the system is tuned towards re-using the storage in the process of the completion. In particular a RewritingBuffer
consisting of two Words.BufferWord
and the final returned word (of the same type as w
) are allocated in the process:
function rewrite(
w::W,
rewriting,
rwbuffer::RewritingBuffer = RewritingBuffer{T}(rewriting),
kwargs...,
) where {T,W<:AbstractWord{T}}
# first copy the content of w into rwbuffer
Words.store!(rwbuffer, w)
# then rewrite reusing the internal BufferWords
rewrite!(rwbuffer, rewriting; kwargs...)
# finally take ownership of the result
return W(rwbuffer.output)
end
KnuthBendix.RewritingBuffer
— TypeRewritingBuffer{T}()
RewritingBuffer{T}(history::AbstractVector)
A pair of BufferWords
used for efficient rewriting of words with letters of type {T}.
Additional history
may be passed to construct a buffer for rewriting with an appropriate automaton. See methods of rewrite!
.
Words.BufferWord
is an implementation of AbstractWord
tuned for the purpose of this form of rewriting (with O(1)
complexity for popfirst!
, push!
and prepend!
). In the Knuth-Bendix completion these RewritingBuffer
s are allocated only once per run and re-used later, so that destructive rewriting is as free from allocations and memory copy as possible.