javascript - 前端多语言




Ukkonen的后缀树算法在普通英语中 (4)

这一点我感觉有点厚。 我已经花了好几天的时间来试图完全包裹后缀树的构造,但是因为我没有数学背景,所以很多解释都是因为我们开始过度使用数学符号系统。 最接近我找到的一个很好的解释是使用后缀树的快速字符串搜索 ,但是他掩盖了各个点,并且该算法的某些方面仍然不清楚。

我相信,Stack Overflow对这个算法的逐步解释对于除了我之外的许多其他人是无价的。

作为参考,这里是Ukkonen关于算法的论文: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf : http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

到目前为止,我的基本理解是:

  • 我需要迭代给定字符串T的每个前缀P
  • 我需要遍历前缀P中的每个后缀S并将其添加到树中
  • 为了给树添加后缀S,我需要遍历S中的每个字符,迭代包括沿着一个现有的分支走,该分支以S中的相同字符集C开始,并且可能在将I分割为后代节点时在后缀中达到不同的字符,或者如果没有匹配的边缘向下走。 当发现没有匹配的边沿向下走向C时,为C创建新的叶边。

正如我们在大多数解释中指出的那样,基本算法似乎是O(n 2 ),因为我们需要遍历所有的前缀,然后我们需要遍历每个前缀的每个后缀。 由于使用了后缀指针技术,Ukkonen的算法显然是独一无二的,但我认为是我无法理解的。

我也无法理解:

  • 确切地何时以及如何分配,使用和更改“活动点”
  • 算法的标准化方面正在发生什么
  • 为什么我看到的实现需要“修复”他们正在使用的变量的边界

这里是完整的C#源代码。 它不仅工作正常,而且支持自动封装,并呈现出更好看的输出文本图形。 源代码和示例输出位于:

https://gist.github.com/2373868

更新2017-11-04

多年后,我发现了后缀树的新用法,并在JavaScript中实现了该算法。 Gist在下面。 它应该没有缺陷。 将它转储到js文件中, npm install chalk从同一位置npm install chalk ,然后使用node.js运行以查看一些丰富多彩的输出。 在同一个Gist中有一个精简版,没有任何调试代码。

https://gist.github.com/axefrog/c347bf0f5e0723cbd09b1aaed6ec6fc6


@jogojapan you brought awesome explanation and visualisation. But as @makagonov mentioned it's missing some rules regarding setting suffix links. It's visible in nice way when going step by step on http://brenden.github.io/ukkonen-animation/ through word 'aabaaabb'. When you go from step 10 to step 11, there is no suffix link from node 5 to node 2 but active point suddenly moves there.

@makagonov since I live in Java world I also tried to follow your implementation to grasp ST building workflow but it was hard to me because of:

  • combining edges with nodes
  • using index pointers instead of references
  • breaks statements;
  • continue statements;

So I ended up with such implementation in Java which I hope reflects all steps in clearer way and will reduce learning time for other Java people:

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class ST {

  public class Node {
    private final int id;
    private final Map<Character, Edge> edges;
    private Node slink;

    public Node(final int id) {
        this.id = id;
        this.edges = new HashMap<>();
    }

    public void setSlink(final Node slink) {
        this.slink = slink;
    }

    public Map<Character, Edge> getEdges() {
        return this.edges;
    }

    public Node getSlink() {
        return this.slink;
    }

    public String toString(final String word) {
        return new StringBuilder()
                .append("{")
                .append("\"id\"")
                .append(":")
                .append(this.id)
                .append(",")
                .append("\"slink\"")
                .append(":")
                .append(this.slink != null ? this.slink.id : null)
                .append(",")
                .append("\"edges\"")
                .append(":")
                .append(edgesToString(word))
                .append("}")
                .toString();
    }

    private StringBuilder edgesToString(final String word) {
        final StringBuilder edgesStringBuilder = new StringBuilder();
        edgesStringBuilder.append("{");
        for(final Map.Entry<Character, Edge> entry : this.edges.entrySet()) {
            edgesStringBuilder.append("\"")
                    .append(entry.getKey())
                    .append("\"")
                    .append(":")
                    .append(entry.getValue().toString(word))
                    .append(",");
        }
        if(!this.edges.isEmpty()) {
            edgesStringBuilder.deleteCharAt(edgesStringBuilder.length() - 1);
        }
        edgesStringBuilder.append("}");
        return edgesStringBuilder;
    }

    public boolean contains(final String word, final String suffix) {
        return !suffix.isEmpty()
                && this.edges.containsKey(suffix.charAt(0))
                && this.edges.get(suffix.charAt(0)).contains(word, suffix);
    }
  }

  public class Edge {
    private final int from;
    private final int to;
    private final Node next;

    public Edge(final int from, final int to, final Node next) {
        this.from = from;
        this.to = to;
        this.next = next;
    }

    public int getFrom() {
        return this.from;
    }

    public int getTo() {
        return this.to;
    }

    public Node getNext() {
        return this.next;
    }

    public int getLength() {
        return this.to - this.from;
    }

    public String toString(final String word) {
        return new StringBuilder()
                .append("{")
                .append("\"content\"")
                .append(":")
                .append("\"")
                .append(word.substring(this.from, this.to))
                .append("\"")
                .append(",")
                .append("\"next\"")
                .append(":")
                .append(this.next != null ? this.next.toString(word) : null)
                .append("}")
                .toString();
    }

    public boolean contains(final String word, final String suffix) {
        if(this.next == null) {
            return word.substring(this.from, this.to).equals(suffix);
        }
        return suffix.startsWith(word.substring(this.from,
                this.to)) && this.next.contains(word, suffix.substring(this.to - this.from));
    }
  }

  public class ActivePoint {
    private final Node activeNode;
    private final Character activeEdgeFirstCharacter;
    private final int activeLength;

    public ActivePoint(final Node activeNode,
                       final Character activeEdgeFirstCharacter,
                       final int activeLength) {
        this.activeNode = activeNode;
        this.activeEdgeFirstCharacter = activeEdgeFirstCharacter;
        this.activeLength = activeLength;
    }

    private Edge getActiveEdge() {
        return this.activeNode.getEdges().get(this.activeEdgeFirstCharacter);
    }

    public boolean pointsToActiveNode() {
        return this.activeLength == 0;
    }

    public boolean activeNodeIs(final Node node) {
        return this.activeNode == node;
    }

    public boolean activeNodeHasEdgeStartingWith(final char character) {
        return this.activeNode.getEdges().containsKey(character);
    }

    public boolean activeNodeHasSlink() {
        return this.activeNode.getSlink() != null;
    }

    public boolean pointsToOnActiveEdge(final String word, final char character) {
        return word.charAt(this.getActiveEdge().getFrom() + this.activeLength) == character;
    }

    public boolean pointsToTheEndOfActiveEdge() {
        return this.getActiveEdge().getLength() == this.activeLength;
    }

    public boolean pointsAfterTheEndOfActiveEdge() {
        return this.getActiveEdge().getLength() < this.activeLength;
    }

    public ActivePoint moveToEdgeStartingWithAndByOne(final char character) {
        return new ActivePoint(this.activeNode, character, 1);
    }

    public ActivePoint moveToNextNodeOfActiveEdge() {
        return new ActivePoint(this.getActiveEdge().getNext(), null, 0);
    }

    public ActivePoint moveToSlink() {
        return new ActivePoint(this.activeNode.getSlink(),
                this.activeEdgeFirstCharacter,
                this.activeLength);
    }

    public ActivePoint moveTo(final Node node) {
        return new ActivePoint(node, this.activeEdgeFirstCharacter, this.activeLength);
    }

    public ActivePoint moveByOneCharacter() {
        return new ActivePoint(this.activeNode,
                this.activeEdgeFirstCharacter,
                this.activeLength + 1);
    }

    public ActivePoint moveToEdgeStartingWithAndByActiveLengthMinusOne(final Node node,
                                                                       final char character) {
        return new ActivePoint(node, character, this.activeLength - 1);
    }

    public ActivePoint moveToNextNodeOfActiveEdge(final String word, final int index) {
        return new ActivePoint(this.getActiveEdge().getNext(),
                word.charAt(index - this.activeLength + this.getActiveEdge().getLength()),
                this.activeLength - this.getActiveEdge().getLength());
    }

    public void addEdgeToActiveNode(final char character, final Edge edge) {
        this.activeNode.getEdges().put(character, edge);
    }

    public void splitActiveEdge(final String word,
                                final Node nodeToAdd,
                                final int index,
                                final char character) {
        final Edge activeEdgeToSplit = this.getActiveEdge();
        final Edge splittedEdge = new Edge(activeEdgeToSplit.getFrom(),
                activeEdgeToSplit.getFrom() + this.activeLength,
                nodeToAdd);
        nodeToAdd.getEdges().put(word.charAt(activeEdgeToSplit.getFrom() + this.activeLength),
                new Edge(activeEdgeToSplit.getFrom() + this.activeLength,
                        activeEdgeToSplit.getTo(),
                        activeEdgeToSplit.getNext()));
        nodeToAdd.getEdges().put(character, new Edge(index, word.length(), null));
        this.activeNode.getEdges().put(this.activeEdgeFirstCharacter, splittedEdge);
    }

    public Node setSlinkTo(final Node previouslyAddedNodeOrAddedEdgeNode,
                           final Node node) {
        if(previouslyAddedNodeOrAddedEdgeNode != null) {
            previouslyAddedNodeOrAddedEdgeNode.setSlink(node);
        }
        return node;
    }

    public Node setSlinkToActiveNode(final Node previouslyAddedNodeOrAddedEdgeNode) {
        return setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, this.activeNode);
    }
  }

  private static int idGenerator;

  private final String word;
  private final Node root;
  private ActivePoint activePoint;
  private int remainder;

  public ST(final String word) {
    this.word = word;
    this.root = new Node(idGenerator++);
    this.activePoint = new ActivePoint(this.root, null, 0);
    this.remainder = 0;
    build();
  }

  private void build() {
    for(int i = 0; i < this.word.length(); i++) {
        add(i, this.word.charAt(i));
    }
  }

  private void add(final int index, final char character) {
    this.remainder++;
    boolean characterFoundInTheTree = false;
    Node previouslyAddedNodeOrAddedEdgeNode = null;
    while(!characterFoundInTheTree && this.remainder > 0) {
        if(this.activePoint.pointsToActiveNode()) {
            if(this.activePoint.activeNodeHasEdgeStartingWith(character)) {
                activeNodeHasEdgeStartingWithCharacter(character, previouslyAddedNodeOrAddedEdgeNode);
                characterFoundInTheTree = true;
            }
            else {
                if(this.activePoint.activeNodeIs(this.root)) {
                    rootNodeHasNotEdgeStartingWithCharacter(index, character);
                }
                else {
                    previouslyAddedNodeOrAddedEdgeNode = internalNodeHasNotEdgeStartingWithCharacter(index,
                            character, previouslyAddedNodeOrAddedEdgeNode);
                }
            }
        }
        else {
            if(this.activePoint.pointsToOnActiveEdge(this.word, character)) {
                activeEdgeHasCharacter();
                characterFoundInTheTree = true;
            }
            else {
                if(this.activePoint.activeNodeIs(this.root)) {
                    previouslyAddedNodeOrAddedEdgeNode = edgeFromRootNodeHasNotCharacter(index,
                            character,
                            previouslyAddedNodeOrAddedEdgeNode);
                }
                else {
                    previouslyAddedNodeOrAddedEdgeNode = edgeFromInternalNodeHasNotCharacter(index,
                            character,
                            previouslyAddedNodeOrAddedEdgeNode);
                }
            }
        }
    }
  }

  private void activeNodeHasEdgeStartingWithCharacter(final char character,
                                                    final Node previouslyAddedNodeOrAddedEdgeNode) {
    this.activePoint.setSlinkToActiveNode(previouslyAddedNodeOrAddedEdgeNode);
    this.activePoint = this.activePoint.moveToEdgeStartingWithAndByOne(character);
    if(this.activePoint.pointsToTheEndOfActiveEdge()) {
        this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
    }
  }

  private void rootNodeHasNotEdgeStartingWithCharacter(final int index, final char character) {
    this.activePoint.addEdgeToActiveNode(character, new Edge(index, this.word.length(), null));
    this.activePoint = this.activePoint.moveTo(this.root);
    this.remainder--;
    assert this.remainder == 0;
  }

  private Node internalNodeHasNotEdgeStartingWithCharacter(final int index,
                                                         final char character,
                                                         Node previouslyAddedNodeOrAddedEdgeNode) {
    this.activePoint.addEdgeToActiveNode(character, new Edge(index, this.word.length(), null));
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkToActiveNode(previouslyAddedNodeOrAddedEdgeNode);
    if(this.activePoint.activeNodeHasSlink()) {
        this.activePoint = this.activePoint.moveToSlink();
    }
    else {
        this.activePoint = this.activePoint.moveTo(this.root);
    }
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private void activeEdgeHasCharacter() {
    this.activePoint = this.activePoint.moveByOneCharacter();
    if(this.activePoint.pointsToTheEndOfActiveEdge()) {
        this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
    }
  }

  private Node edgeFromRootNodeHasNotCharacter(final int index,
                                             final char character,
                                             Node previouslyAddedNodeOrAddedEdgeNode) {
    final Node newNode = new Node(idGenerator++);
    this.activePoint.splitActiveEdge(this.word, newNode, index, character);
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, newNode);
    this.activePoint = this.activePoint.moveToEdgeStartingWithAndByActiveLengthMinusOne(this.root,
            this.word.charAt(index - this.remainder + 2));
    this.activePoint = walkDown(index);
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private Node edgeFromInternalNodeHasNotCharacter(final int index,
                                                 final char character,
                                                 Node previouslyAddedNodeOrAddedEdgeNode) {
    final Node newNode = new Node(idGenerator++);
    this.activePoint.splitActiveEdge(this.word, newNode, index, character);
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, newNode);
    if(this.activePoint.activeNodeHasSlink()) {
        this.activePoint = this.activePoint.moveToSlink();
    }
    else {
        this.activePoint = this.activePoint.moveTo(this.root);
    }
    this.activePoint = walkDown(index);
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private ActivePoint walkDown(final int index) {
    while(!this.activePoint.pointsToActiveNode()
            && (this.activePoint.pointsToTheEndOfActiveEdge() || this.activePoint.pointsAfterTheEndOfActiveEdge())) {
        if(this.activePoint.pointsAfterTheEndOfActiveEdge()) {
            this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge(this.word, index);
        }
        else {
            this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
        }
    }
    return this.activePoint;
  }

  public String toString(final String word) {
    return this.root.toString(word);
  }

  public boolean contains(final String suffix) {
    return this.root.contains(this.word, suffix);
  }

  public static void main(final String[] args) {
    final String[] words = {
            "abcabcabc$",
            "abc$",
            "abcabxabcd$",
            "abcabxabda$",
            "abcabxad$",
            "aabaaabb$",
            "aababcabcd$",
            "ababcabcd$",
            "abccba$",
            "mississipi$",
            "abacabadabacabae$",
            "abcabcd$",
            "00132220$"
    };
    Arrays.stream(words).forEach(word -> {
        System.out.println("Building suffix tree for word: " + word);
        final ST suffixTree = new ST(word);
        System.out.println("Suffix tree: " + suffixTree.toString(word));
        for(int i = 0; i < word.length() - 1; i++) {
            assert suffixTree.contains(word.substring(i)) : word.substring(i);
        }
    });
  }
}

Hi i have tried to implement the above explained implementation in ruby , please check it out. it seems to work fine.

the only difference in the implementation is that , i have tried to use the edge object instead of just using symbols.

its also present at https://gist.github.com/suchitpuri/9304856

    require 'pry'


class Edge
    attr_accessor :data , :edges , :suffix_link
    def initialize data
        @data = data
        @edges = []
        @suffix_link = nil
    end

    def find_edge element
        self.edges.each do |edge|
            return edge if edge.data.start_with? element
        end
        return nil
    end
end

class SuffixTrees
    attr_accessor :root , :active_point , :remainder , :pending_prefixes , :last_split_edge , :remainder

    def initialize
        @root = Edge.new nil
        @active_point = { active_node: @root , active_edge: nil , active_length: 0}
        @remainder = 0
        @pending_prefixes = []
        @last_split_edge = nil
        @remainder = 1
    end

    def build string
        string.split("").each_with_index do |element , index|


            add_to_edges @root , element        

            update_pending_prefix element                           
            add_pending_elements_to_tree element
            active_length = @active_point[:active_length]

            # if(@active_point[:active_edge] && @active_point[:active_edge].data && @active_point[:active_edge].data[0..active_length-1] ==  @active_point[:active_edge].data[[email protected]_point[:active_edge].data.length-1])
            #   @active_point[:active_edge].data = @active_point[:active_edge].data[0..active_length-1]
            #   @active_point[:active_edge].edges << Edge.new(@active_point[:active_edge].data)
            # end

            if(@active_point[:active_edge] && @active_point[:active_edge].data && @active_point[:active_edge].data.length == @active_point[:active_length]  )
                @active_point[:active_node] =  @active_point[:active_edge]
                @active_point[:active_edge] = @active_point[:active_node].find_edge(element[0])
                @active_point[:active_length] = 0
            end
        end
    end

    def add_pending_elements_to_tree element

        to_be_deleted = []
        update_active_length = false
        # binding.pry
        if( @active_point[:active_node].find_edge(element[0]) != nil)
            @active_point[:active_length] = @active_point[:active_length] + 1               
            @active_point[:active_edge] = @active_point[:active_node].find_edge(element[0]) if @active_point[:active_edge] == nil
            @remainder = @remainder + 1
            return
        end



        @pending_prefixes.each_with_index do |pending_prefix , index|

            # binding.pry           

            if @active_point[:active_edge] == nil and @active_point[:active_node].find_edge(element[0]) == nil

                @active_point[:active_node].edges << Edge.new(element)

            else

                @active_point[:active_edge] = node.find_edge(element[0]) if @active_point[:active_edge]  == nil

                data = @active_point[:active_edge].data
                data = data.split("")               

                location = @active_point[:active_length]


                # binding.pry
                if(data[0..location].join == pending_prefix or @active_point[:active_node].find_edge(element) != nil )                  


                else #tree split    
                    split_edge data , index , element
                end

            end
        end 
    end



    def update_pending_prefix element
        if @active_point[:active_edge] == nil
            @pending_prefixes = [element]
            return

        end

        @pending_prefixes = []

        length = @active_point[:active_edge].data.length
        data = @active_point[:active_edge].data
        @remainder.times do |ctr|
                @pending_prefixes << data[-(ctr+1)..data.length-1]
        end

        @pending_prefixes.reverse!

    end

    def split_edge data , index , element
        location = @active_point[:active_length]
        old_edges = []
        internal_node = (@active_point[:active_edge].edges != nil)

        if (internal_node)
            old_edges = @active_point[:active_edge].edges 
            @active_point[:active_edge].edges = []
        end

        @active_point[:active_edge].data = data[0..location-1].join                 
        @active_point[:active_edge].edges << Edge.new(data[location..data.size].join)


        if internal_node
            @active_point[:active_edge].edges << Edge.new(element)
        else
            @active_point[:active_edge].edges << Edge.new(data.last)        
        end

        if internal_node
            @active_point[:active_edge].edges[0].edges = old_edges
        end


        #setup the suffix link
        if @last_split_edge != nil and @[email protected]_point[:active_edge].data 

            @last_split_edge.suffix_link = @active_point[:active_edge] 
        end

        @last_split_edge = @active_point[:active_edge]

        update_active_point index

    end


    def update_active_point index
        if(@active_point[:active_node] == @root)
            @active_point[:active_length] = @active_point[:active_length] - 1
            @remainder = @remainder - 1
            @active_point[:active_edge] = @active_point[:active_node].find_edge(@pending_prefixes.first[index+1])
        else
            if @active_point[:active_node].suffix_link != nil
                @active_point[:active_node] = @active_point[:active_node].suffix_link               
            else
                @active_point[:active_node] = @root
            end 
            @active_point[:active_edge] = @active_point[:active_node].find_edge(@active_point[:active_edge].data[0])
            @remainder = @remainder - 1     
        end
    end

    def add_to_edges root , element     
        return if root == nil
        root.data = root.data + element if(root.data and root.edges.size == 0)
        root.edges.each do |edge|
            add_to_edges edge , element
        end
    end
end

suffix_tree = SuffixTrees.new
suffix_tree.build("abcabxabcd")
binding.pry

感谢@jogojapan提供的解释良好的教程,我在Python中实现了该算法。

@jogojapan提到的一些小问题比我预期的要复杂得多,需要非常仔细地处理。 我花了几天的时间让我的实现足够强大 (我想)。 下面列出了问题和解决方案:

  1. Remainder > 0结束原来,这种情况也可能发生在展开步骤中 ,而不仅仅是整个算法的结束。 当发生这种情况时,我们可以保持余数,actnode,acttedge和actlength 不变 ,结束当前的展开步骤,并根据原始字符串中的下一个字符是否位于当前路径上,或者保持折叠或展开来开始另一个步骤,或者不。

  2. 跨越节点:当我们遵循后缀链接时,更新活动点,然后发现其active_length组件对新的活动节点不起作用。 我们必须前往正确的地方分割,或插入一片叶子​​。 This process might be not that straightforward because during the moving the actlength and actedge keep changing all the way, when you have to move back to the root node , the actedge and actlength could be wrong because of those moves. We need additional variable(s) to keep that information.

The other two problems have somehow been pointed out by @managonov

  1. Split Could Degenerate When trying to split an edge, sometime you'll find the split operation is right on a node. That case we only need add a new leaf to that node, take it as a standard edge split operation, which means the suffix links if there's any, should be maintained correspondingly.

  2. Hidden Suffix Links There is another special case which is incurred by problem 1 and problem 2 . Sometimes we need to hop over several nodes to the right point for split, we might surpass the right point if we move by comparing the remainder string and the path labels. That case the suffix link will be neglected unintentionally, if there should be any. This could be avoided by remembering the right point when moving forward. The suffix link should be maintained if the split node already exists, or even the problem 1 happens during a unfolding step.

Finally, my implementation in Python is as follows:

Tips: It includes a naive tree printing function in the code above, which is very important while debugging . It saved me a lot of time and is convenient for locating special cases.


我尝试使用jogojapan的答案中给出的方法来实现后缀树,但由于用于规则的措辞,在某些情况下它不起作用。 此外,我提到过,没有人设法使用这种方法来实现绝对正确的后缀树。 下面我将对jogojapan的答案进行“概述”,并对规则进行一些修改。 我还会介绍一些我们忘记创建重要后缀链接的情况。

使用其他变量

  1. 活动点 - 一个三元组(active_node; active_edge; active_length),显示我们必须从哪里开始插入新后缀。
  2. 余数 - 显示我们必须明确添加的后缀数量。 例如,如果我们的词是'abcaabca',剩余部分= 3,这意味着我们必须处理3个最后的后缀: bcacaa

让我们使用内部节点的概念 - 除根叶子之外的所有节点都是内部节点

观察1

当我们需要插入的最后一个后缀已经存在于树中时,树本身根本不会改变(我们只更新active pointremainder )。

观察2

如果在某点active_length大于或等于当前边的长度( edge_length ),我们将我们的active point向下移动,直到edge_length严格大于active_length

现在,让我们重新定义规则:

规则1

如果在从活动节点插入= root后活动长度大于0,则:

  1. 活动节点不会更改
  2. 活动长度递减
  3. 活动边向右移动(到我们必须插入的下一个后缀的第一个字符)

规则2

如果我们创建一个新的内部节点 或者内部节点创建一个插入器,并且这不是当前步骤中的第一个SUCH 内部节点 ,那么我们通过后缀链接将前一个SUCH节点与一个节点链接起来

Rule 2定义与jogojapan不同,因为这里我们不仅考虑新创建的内部节点,而且还考虑内部节点,我们从中进行插入。

规则3

在从不是根节点活动节点插入后,我们必须遵循后缀链接并将活动节点设置为它指向的节点。 如果没有后缀链接,请将活动节点设置为根节点。 无论哪种方式, 有效边缘有效长度保持不变。

Rule 3这个定义中,我们还考虑了叶节点的插入(不仅是分割节点)。

最后,观察3:

当我们想要添加到树中的符号已经在边上时,根据Observation 1 ,我们只更新active pointremainder ,使树保持不变。 但是如果有一个标记为需要后缀链接内部节点 ,我们必须通过后缀链接将该节点与我们当前的active node连接起来。

让我们看看cdddcdc后缀树的例子,如果我们在这种情况下添加后缀链接,并且我们不:

  1. 如果我们通过后缀链接连接节点:

    • 在添加最后一个字母c之前

    • 在添加最后一个字母c

  2. 如果我们通过后缀链接连接节点:

    • 在添加最后一个字母c之前

    • 在添加最后一个字母c

似乎没有显着差异:在第二种情况下,还有两个后缀链接。 但是,这些后缀链接是正确的 ,其中的一个 - 从蓝色节点到红色节点 - 对于我们采用积极点的方法非常重要 。 问题是,如果我们不在这里放置后缀链接,稍后当我们向树添加一些新的字母时,由于Rule 3 ,我们可能省略向树添加一些节点,因为根据它,如果存在没有后缀链接,那么我们必须把active_node放到根目录下。

当我们将最后一个字母添加到树中时,红色节点已经存在,然后我们从蓝色节点(边缘标记为'c' )插入。 由于有蓝色节点的插入,我们将其标记为需要后缀链接 。 然后,依靠主动点方法, active node被设置为红色节点。 但是我们不会从红色节点插入,因为字母'c'已经在边缘。 这是否意味着蓝色节点必须没有后缀链接? 不,我们必须通过后缀链接连接蓝色节点和红色节点。 为什么它是正确的? 因为积极点方法保证我们到达正确的地方,即到下一个我们必须处理插入一个较短后缀的地方。

最后,这里是我的后缀树的实现:

  1. Java
  2. C++

希望这个“概述”结合jogojapan的详细答案将有助于某人实施他自己的后缀树。







suffix-tree