javascript - tree教學 - ukkonen's algorithm




Ukkonen的後綴樹算法在普通英語中 (4)

@jogojapan you brought awesome explanation and visualisation. But as @makagonov mentioned it's missing some rules regarding setting suffix links. It's visible in nice way when going step by step on http://brenden.github.io/ukkonen-animation/ through word 'aabaaabb'. When you go from step 10 to step 11, there is no suffix link from node 5 to node 2 but active point suddenly moves there.

@makagonov since I live in Java world I also tried to follow your implementation to grasp ST building workflow but it was hard to me because of:

  • combining edges with nodes
  • using index pointers instead of references
  • breaks statements;
  • continue statements;

So I ended up with such implementation in Java which I hope reflects all steps in clearer way and will reduce learning time for other Java people:

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class ST {

  public class Node {
    private final int id;
    private final Map<Character, Edge> edges;
    private Node slink;

    public Node(final int id) {
        this.id = id;
        this.edges = new HashMap<>();
    }

    public void setSlink(final Node slink) {
        this.slink = slink;
    }

    public Map<Character, Edge> getEdges() {
        return this.edges;
    }

    public Node getSlink() {
        return this.slink;
    }

    public String toString(final String word) {
        return new StringBuilder()
                .append("{")
                .append("\"id\"")
                .append(":")
                .append(this.id)
                .append(",")
                .append("\"slink\"")
                .append(":")
                .append(this.slink != null ? this.slink.id : null)
                .append(",")
                .append("\"edges\"")
                .append(":")
                .append(edgesToString(word))
                .append("}")
                .toString();
    }

    private StringBuilder edgesToString(final String word) {
        final StringBuilder edgesStringBuilder = new StringBuilder();
        edgesStringBuilder.append("{");
        for(final Map.Entry<Character, Edge> entry : this.edges.entrySet()) {
            edgesStringBuilder.append("\"")
                    .append(entry.getKey())
                    .append("\"")
                    .append(":")
                    .append(entry.getValue().toString(word))
                    .append(",");
        }
        if(!this.edges.isEmpty()) {
            edgesStringBuilder.deleteCharAt(edgesStringBuilder.length() - 1);
        }
        edgesStringBuilder.append("}");
        return edgesStringBuilder;
    }

    public boolean contains(final String word, final String suffix) {
        return !suffix.isEmpty()
                && this.edges.containsKey(suffix.charAt(0))
                && this.edges.get(suffix.charAt(0)).contains(word, suffix);
    }
  }

  public class Edge {
    private final int from;
    private final int to;
    private final Node next;

    public Edge(final int from, final int to, final Node next) {
        this.from = from;
        this.to = to;
        this.next = next;
    }

    public int getFrom() {
        return this.from;
    }

    public int getTo() {
        return this.to;
    }

    public Node getNext() {
        return this.next;
    }

    public int getLength() {
        return this.to - this.from;
    }

    public String toString(final String word) {
        return new StringBuilder()
                .append("{")
                .append("\"content\"")
                .append(":")
                .append("\"")
                .append(word.substring(this.from, this.to))
                .append("\"")
                .append(",")
                .append("\"next\"")
                .append(":")
                .append(this.next != null ? this.next.toString(word) : null)
                .append("}")
                .toString();
    }

    public boolean contains(final String word, final String suffix) {
        if(this.next == null) {
            return word.substring(this.from, this.to).equals(suffix);
        }
        return suffix.startsWith(word.substring(this.from,
                this.to)) && this.next.contains(word, suffix.substring(this.to - this.from));
    }
  }

  public class ActivePoint {
    private final Node activeNode;
    private final Character activeEdgeFirstCharacter;
    private final int activeLength;

    public ActivePoint(final Node activeNode,
                       final Character activeEdgeFirstCharacter,
                       final int activeLength) {
        this.activeNode = activeNode;
        this.activeEdgeFirstCharacter = activeEdgeFirstCharacter;
        this.activeLength = activeLength;
    }

    private Edge getActiveEdge() {
        return this.activeNode.getEdges().get(this.activeEdgeFirstCharacter);
    }

    public boolean pointsToActiveNode() {
        return this.activeLength == 0;
    }

    public boolean activeNodeIs(final Node node) {
        return this.activeNode == node;
    }

    public boolean activeNodeHasEdgeStartingWith(final char character) {
        return this.activeNode.getEdges().containsKey(character);
    }

    public boolean activeNodeHasSlink() {
        return this.activeNode.getSlink() != null;
    }

    public boolean pointsToOnActiveEdge(final String word, final char character) {
        return word.charAt(this.getActiveEdge().getFrom() + this.activeLength) == character;
    }

    public boolean pointsToTheEndOfActiveEdge() {
        return this.getActiveEdge().getLength() == this.activeLength;
    }

    public boolean pointsAfterTheEndOfActiveEdge() {
        return this.getActiveEdge().getLength() < this.activeLength;
    }

    public ActivePoint moveToEdgeStartingWithAndByOne(final char character) {
        return new ActivePoint(this.activeNode, character, 1);
    }

    public ActivePoint moveToNextNodeOfActiveEdge() {
        return new ActivePoint(this.getActiveEdge().getNext(), null, 0);
    }

    public ActivePoint moveToSlink() {
        return new ActivePoint(this.activeNode.getSlink(),
                this.activeEdgeFirstCharacter,
                this.activeLength);
    }

    public ActivePoint moveTo(final Node node) {
        return new ActivePoint(node, this.activeEdgeFirstCharacter, this.activeLength);
    }

    public ActivePoint moveByOneCharacter() {
        return new ActivePoint(this.activeNode,
                this.activeEdgeFirstCharacter,
                this.activeLength + 1);
    }

    public ActivePoint moveToEdgeStartingWithAndByActiveLengthMinusOne(final Node node,
                                                                       final char character) {
        return new ActivePoint(node, character, this.activeLength - 1);
    }

    public ActivePoint moveToNextNodeOfActiveEdge(final String word, final int index) {
        return new ActivePoint(this.getActiveEdge().getNext(),
                word.charAt(index - this.activeLength + this.getActiveEdge().getLength()),
                this.activeLength - this.getActiveEdge().getLength());
    }

    public void addEdgeToActiveNode(final char character, final Edge edge) {
        this.activeNode.getEdges().put(character, edge);
    }

    public void splitActiveEdge(final String word,
                                final Node nodeToAdd,
                                final int index,
                                final char character) {
        final Edge activeEdgeToSplit = this.getActiveEdge();
        final Edge splittedEdge = new Edge(activeEdgeToSplit.getFrom(),
                activeEdgeToSplit.getFrom() + this.activeLength,
                nodeToAdd);
        nodeToAdd.getEdges().put(word.charAt(activeEdgeToSplit.getFrom() + this.activeLength),
                new Edge(activeEdgeToSplit.getFrom() + this.activeLength,
                        activeEdgeToSplit.getTo(),
                        activeEdgeToSplit.getNext()));
        nodeToAdd.getEdges().put(character, new Edge(index, word.length(), null));
        this.activeNode.getEdges().put(this.activeEdgeFirstCharacter, splittedEdge);
    }

    public Node setSlinkTo(final Node previouslyAddedNodeOrAddedEdgeNode,
                           final Node node) {
        if(previouslyAddedNodeOrAddedEdgeNode != null) {
            previouslyAddedNodeOrAddedEdgeNode.setSlink(node);
        }
        return node;
    }

    public Node setSlinkToActiveNode(final Node previouslyAddedNodeOrAddedEdgeNode) {
        return setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, this.activeNode);
    }
  }

  private static int idGenerator;

  private final String word;
  private final Node root;
  private ActivePoint activePoint;
  private int remainder;

  public ST(final String word) {
    this.word = word;
    this.root = new Node(idGenerator++);
    this.activePoint = new ActivePoint(this.root, null, 0);
    this.remainder = 0;
    build();
  }

  private void build() {
    for(int i = 0; i < this.word.length(); i++) {
        add(i, this.word.charAt(i));
    }
  }

  private void add(final int index, final char character) {
    this.remainder++;
    boolean characterFoundInTheTree = false;
    Node previouslyAddedNodeOrAddedEdgeNode = null;
    while(!characterFoundInTheTree && this.remainder > 0) {
        if(this.activePoint.pointsToActiveNode()) {
            if(this.activePoint.activeNodeHasEdgeStartingWith(character)) {
                activeNodeHasEdgeStartingWithCharacter(character, previouslyAddedNodeOrAddedEdgeNode);
                characterFoundInTheTree = true;
            }
            else {
                if(this.activePoint.activeNodeIs(this.root)) {
                    rootNodeHasNotEdgeStartingWithCharacter(index, character);
                }
                else {
                    previouslyAddedNodeOrAddedEdgeNode = internalNodeHasNotEdgeStartingWithCharacter(index,
                            character, previouslyAddedNodeOrAddedEdgeNode);
                }
            }
        }
        else {
            if(this.activePoint.pointsToOnActiveEdge(this.word, character)) {
                activeEdgeHasCharacter();
                characterFoundInTheTree = true;
            }
            else {
                if(this.activePoint.activeNodeIs(this.root)) {
                    previouslyAddedNodeOrAddedEdgeNode = edgeFromRootNodeHasNotCharacter(index,
                            character,
                            previouslyAddedNodeOrAddedEdgeNode);
                }
                else {
                    previouslyAddedNodeOrAddedEdgeNode = edgeFromInternalNodeHasNotCharacter(index,
                            character,
                            previouslyAddedNodeOrAddedEdgeNode);
                }
            }
        }
    }
  }

  private void activeNodeHasEdgeStartingWithCharacter(final char character,
                                                    final Node previouslyAddedNodeOrAddedEdgeNode) {
    this.activePoint.setSlinkToActiveNode(previouslyAddedNodeOrAddedEdgeNode);
    this.activePoint = this.activePoint.moveToEdgeStartingWithAndByOne(character);
    if(this.activePoint.pointsToTheEndOfActiveEdge()) {
        this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
    }
  }

  private void rootNodeHasNotEdgeStartingWithCharacter(final int index, final char character) {
    this.activePoint.addEdgeToActiveNode(character, new Edge(index, this.word.length(), null));
    this.activePoint = this.activePoint.moveTo(this.root);
    this.remainder--;
    assert this.remainder == 0;
  }

  private Node internalNodeHasNotEdgeStartingWithCharacter(final int index,
                                                         final char character,
                                                         Node previouslyAddedNodeOrAddedEdgeNode) {
    this.activePoint.addEdgeToActiveNode(character, new Edge(index, this.word.length(), null));
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkToActiveNode(previouslyAddedNodeOrAddedEdgeNode);
    if(this.activePoint.activeNodeHasSlink()) {
        this.activePoint = this.activePoint.moveToSlink();
    }
    else {
        this.activePoint = this.activePoint.moveTo(this.root);
    }
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private void activeEdgeHasCharacter() {
    this.activePoint = this.activePoint.moveByOneCharacter();
    if(this.activePoint.pointsToTheEndOfActiveEdge()) {
        this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
    }
  }

  private Node edgeFromRootNodeHasNotCharacter(final int index,
                                             final char character,
                                             Node previouslyAddedNodeOrAddedEdgeNode) {
    final Node newNode = new Node(idGenerator++);
    this.activePoint.splitActiveEdge(this.word, newNode, index, character);
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, newNode);
    this.activePoint = this.activePoint.moveToEdgeStartingWithAndByActiveLengthMinusOne(this.root,
            this.word.charAt(index - this.remainder + 2));
    this.activePoint = walkDown(index);
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private Node edgeFromInternalNodeHasNotCharacter(final int index,
                                                 final char character,
                                                 Node previouslyAddedNodeOrAddedEdgeNode) {
    final Node newNode = new Node(idGenerator++);
    this.activePoint.splitActiveEdge(this.word, newNode, index, character);
    previouslyAddedNodeOrAddedEdgeNode = this.activePoint.setSlinkTo(previouslyAddedNodeOrAddedEdgeNode, newNode);
    if(this.activePoint.activeNodeHasSlink()) {
        this.activePoint = this.activePoint.moveToSlink();
    }
    else {
        this.activePoint = this.activePoint.moveTo(this.root);
    }
    this.activePoint = walkDown(index);
    this.remainder--;
    return previouslyAddedNodeOrAddedEdgeNode;
  }

  private ActivePoint walkDown(final int index) {
    while(!this.activePoint.pointsToActiveNode()
            && (this.activePoint.pointsToTheEndOfActiveEdge() || this.activePoint.pointsAfterTheEndOfActiveEdge())) {
        if(this.activePoint.pointsAfterTheEndOfActiveEdge()) {
            this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge(this.word, index);
        }
        else {
            this.activePoint = this.activePoint.moveToNextNodeOfActiveEdge();
        }
    }
    return this.activePoint;
  }

  public String toString(final String word) {
    return this.root.toString(word);
  }

  public boolean contains(final String suffix) {
    return this.root.contains(this.word, suffix);
  }

  public static void main(final String[] args) {
    final String[] words = {
            "abcabcabc$",
            "abc$",
            "abcabxabcd$",
            "abcabxabda$",
            "abcabxad$",
            "aabaaabb$",
            "aababcabcd$",
            "ababcabcd$",
            "abccba$",
            "mississipi$",
            "abacabadabacabae$",
            "abcabcd$",
            "00132220$"
    };
    Arrays.stream(words).forEach(word -> {
        System.out.println("Building suffix tree for word: " + word);
        final ST suffixTree = new ST(word);
        System.out.println("Suffix tree: " + suffixTree.toString(word));
        for(int i = 0; i < word.length() - 1; i++) {
            assert suffixTree.contains(word.substring(i)) : word.substring(i);
        }
    });
  }
}

這一點我感覺有點厚。 我已經花了好幾天的時間來試圖完全包裹後綴樹的構造,但是因為我沒有數學背景,所以很多解釋都是因為我們開始過度使用數學符號系統。 最接近我找到的一個很好的解釋是使用後綴樹的快速字符串搜索 ,但是他掩蓋了各種不同的點,並且算法的某些方面仍然不清楚。

我相信,Stack Overflow對這個算法的逐步解釋對於除了我之外的許多其他人是無價的。

作為參考,這裡是Ukkonen關於算法的論文: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf : http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

到目前為止,我的基本理解是:

  • 我需要迭代給定字符串T的每個前綴P
  • 我需要遍歷前綴P中的每個後綴S並將其添加到樹中
  • 為了給樹添加後綴S,我需要遍歷S中的每個字符,迭代包括沿著一個現有的分支走,該分支以S中的相同字符集C開頭,並且可能將I分割為後代節點在後綴中達到不同的字符,或者如果沒有匹配的邊緣向下走。 當發現沒有匹配的邊沿向下走向C時,為C創建新的葉邊。

正如我們在大多數解釋中指出的那樣,基本算法似乎是O(n 2 ),因為我們需要遍歷所有的前綴,然後我們需要遍歷每個前綴的每個後綴。 由於使用了後綴指針技術,Ukkonen的算法顯然是獨一無二的,但我認為是我無法理解的。

我也無法理解:

  • 確切地何時以及如何分配,使用和更改“活動點”
  • 算法的標準化方面正在發生什麼
  • 為什麼我看到的實現需要“修復”他們正在使用的變量的邊界

這裡是完整的C#源代碼。 它不僅工作正常,而且支持自動封裝,並呈現更好看的輸出文本圖形。 源代碼和示例輸出位於:

https://gist.github.com/2373868

更新2017-11-04

多年後,我發現了後綴樹的新用法,並在JavaScript中實現了該算法。 Gist在下面。 它應該沒有缺陷。 將它轉儲到js文件中, npm install chalk從同一位置npm install chalk ,然後使用node.js運行以查看一些豐富多彩的輸出。 在同一個Gist中有一個精簡版,沒有任何調試代碼。

https://gist.github.com/axefrog/c347bf0f5e0723cbd09b1aaed6ec6fc6


Hi i have tried to implement the above explained implementation in ruby , please check it out. it seems to work fine.

the only difference in the implementation is that , i have tried to use the edge object instead of just using symbols.

its also present at https://gist.github.com/suchitpuri/9304856

    require 'pry'


class Edge
    attr_accessor :data , :edges , :suffix_link
    def initialize data
        @data = data
        @edges = []
        @suffix_link = nil
    end

    def find_edge element
        self.edges.each do |edge|
            return edge if edge.data.start_with? element
        end
        return nil
    end
end

class SuffixTrees
    attr_accessor :root , :active_point , :remainder , :pending_prefixes , :last_split_edge , :remainder

    def initialize
        @root = Edge.new nil
        @active_point = { active_node: @root , active_edge: nil , active_length: 0}
        @remainder = 0
        @pending_prefixes = []
        @last_split_edge = nil
        @remainder = 1
    end

    def build string
        string.split("").each_with_index do |element , index|


            add_to_edges @root , element        

            update_pending_prefix element                           
            add_pending_elements_to_tree element
            active_length = @active_point[:active_length]

            # if(@active_point[:active_edge] && @active_point[:active_edge].data && @active_point[:active_edge].data[0..active_length-1] ==  @active_point[:active_edge].data[[email protected]_point[:active_edge].data.length-1])
            #   @active_point[:active_edge].data = @active_point[:active_edge].data[0..active_length-1]
            #   @active_point[:active_edge].edges << Edge.new(@active_point[:active_edge].data)
            # end

            if(@active_point[:active_edge] && @active_point[:active_edge].data && @active_point[:active_edge].data.length == @active_point[:active_length]  )
                @active_point[:active_node] =  @active_point[:active_edge]
                @active_point[:active_edge] = @active_point[:active_node].find_edge(element[0])
                @active_point[:active_length] = 0
            end
        end
    end

    def add_pending_elements_to_tree element

        to_be_deleted = []
        update_active_length = false
        # binding.pry
        if( @active_point[:active_node].find_edge(element[0]) != nil)
            @active_point[:active_length] = @active_point[:active_length] + 1               
            @active_point[:active_edge] = @active_point[:active_node].find_edge(element[0]) if @active_point[:active_edge] == nil
            @remainder = @remainder + 1
            return
        end



        @pending_prefixes.each_with_index do |pending_prefix , index|

            # binding.pry           

            if @active_point[:active_edge] == nil and @active_point[:active_node].find_edge(element[0]) == nil

                @active_point[:active_node].edges << Edge.new(element)

            else

                @active_point[:active_edge] = node.find_edge(element[0]) if @active_point[:active_edge]  == nil

                data = @active_point[:active_edge].data
                data = data.split("")               

                location = @active_point[:active_length]


                # binding.pry
                if(data[0..location].join == pending_prefix or @active_point[:active_node].find_edge(element) != nil )                  


                else #tree split    
                    split_edge data , index , element
                end

            end
        end 
    end



    def update_pending_prefix element
        if @active_point[:active_edge] == nil
            @pending_prefixes = [element]
            return

        end

        @pending_prefixes = []

        length = @active_point[:active_edge].data.length
        data = @active_point[:active_edge].data
        @remainder.times do |ctr|
                @pending_prefixes << data[-(ctr+1)..data.length-1]
        end

        @pending_prefixes.reverse!

    end

    def split_edge data , index , element
        location = @active_point[:active_length]
        old_edges = []
        internal_node = (@active_point[:active_edge].edges != nil)

        if (internal_node)
            old_edges = @active_point[:active_edge].edges 
            @active_point[:active_edge].edges = []
        end

        @active_point[:active_edge].data = data[0..location-1].join                 
        @active_point[:active_edge].edges << Edge.new(data[location..data.size].join)


        if internal_node
            @active_point[:active_edge].edges << Edge.new(element)
        else
            @active_point[:active_edge].edges << Edge.new(data.last)        
        end

        if internal_node
            @active_point[:active_edge].edges[0].edges = old_edges
        end


        #setup the suffix link
        if @last_split_edge != nil and @[email protected]_point[:active_edge].data 

            @last_split_edge.suffix_link = @active_point[:active_edge] 
        end

        @last_split_edge = @active_point[:active_edge]

        update_active_point index

    end


    def update_active_point index
        if(@active_point[:active_node] == @root)
            @active_point[:active_length] = @active_point[:active_length] - 1
            @remainder = @remainder - 1
            @active_point[:active_edge] = @active_point[:active_node].find_edge(@pending_prefixes.first[index+1])
        else
            if @active_point[:active_node].suffix_link != nil
                @active_point[:active_node] = @active_point[:active_node].suffix_link               
            else
                @active_point[:active_node] = @root
            end 
            @active_point[:active_edge] = @active_point[:active_node].find_edge(@active_point[:active_edge].data[0])
            @remainder = @remainder - 1     
        end
    end

    def add_to_edges root , element     
        return if root == nil
        root.data = root.data + element if(root.data and root.edges.size == 0)
        root.edges.each do |edge|
            add_to_edges edge , element
        end
    end
end

suffix_tree = SuffixTrees.new
suffix_tree.build("abcabxabcd")
binding.pry

感謝@jogojapan提供的解釋良好的教程,我在Python中實現了該算法。

@jogojapan提到的一些小問題比我預期的要復雜得多,需要非常小心的處理。 我花了幾天的時間讓我的實現足夠強大 (我想)。 下面列出了問題和解決方案:

  1. Remainder > 0結束原來,這種情況也可能發生在展開步驟中 ,而不僅僅是整個算法的結束。 當發生這種情況時,我們可以保持餘數,actnode,acttedge和actlength 不變 ,結束當前的展開步驟,並根據原始字符串中的下一個字符是否位於當前路徑上,或者繼續折疊或展開來開始另一個步驟,或者不。

  2. 跨越節點:當我們遵循後綴鏈接時,更新活動點,然後發現其active_length組件對新的活動節點不起作用。 我們必須前往正確的地方分割,或插入一片葉子。 This process might be not that straightforward because during the moving the actlength and actedge keep changing all the way, when you have to move back to the root node , the actedge and actlength could be wrong because of those moves. We need additional variable(s) to keep that information.

The other two problems have somehow been pointed out by @managonov

  1. Split Could Degenerate When trying to split an edge, sometime you'll find the split operation is right on a node. That case we only need add a new leaf to that node, take it as a standard edge split operation, which means the suffix links if there's any, should be maintained correspondingly.

  2. Hidden Suffix Links There is another special case which is incurred by problem 1 and problem 2 . Sometimes we need to hop over several nodes to the right point for split, we might surpass the right point if we move by comparing the remainder string and the path labels. That case the suffix link will be neglected unintentionally, if there should be any. This could be avoided by remembering the right point when moving forward. The suffix link should be maintained if the split node already exists, or even the problem 1 happens during a unfolding step.

Finally, my implementation in Python is as follows:

Tips: It includes a naive tree printing function in the code above, which is very important while debugging . It saved me a lot of time and is convenient for locating special cases.


我嘗試使用jogojapan的答案中給出的方法來實現後綴樹,但由於用於規則的措辭,在某些情況下它不起作用。 此外,我提到過,沒有人設法使用這種方法來實現絕對正確的後綴樹。 下面我將對jogojapan的答案進行“概述”,並對規則進行一些修改。 我還會介紹一些我們忘記創建重要後綴鏈接的情況。

使用其他變量

  1. 活動點 - 一個三元組(active_node; active_edge; active_length),顯示我們必須從哪裡開始插入新後綴。
  2. 餘數 - 顯示我們必須明確添加的後綴數量。 例如,如果我們的單詞是'abcaabca',餘數= 3,這意味著我們必須處理3個最後的後綴: bcacaa

讓我們使用內部節點的概念 - 除根葉子之外的所有節點都是內部節點

觀察1

當我們需要插入的最後一個後綴已經存在於樹中時,樹本身根本不會改變(我們只更新active pointremainder )。

觀察2

如果在某點active_length大於或等於當前邊的長度( edge_length ),我們將我們的active point向下移動,直到edge_length嚴格大於active_length

現在,讓我們重新定義規則:

規則1

如果在從活動節點插入= root後活動長度大於0,則:

  1. 活動節點不會更改
  2. 活動長度遞減
  3. 活動邊向右移動(到我們必須插入的下一個後綴的第一個字符)

規則2

如果我們創建一個新的內部節點 或者內部節點創建一個插入器,並且這不是當前步驟中的第一個SUCH 內部節點 ,那麼我們通過後綴鏈接將前一個SUCH節點與這個節點鏈接

Rule 2定義與jogojapan不同,因為這裡我們不僅考慮新創建的內部節點,還考慮內部節點,我們從中進行插入。

規則3

在從不是根節點活動節點插入後,我們必須遵循後綴鏈接並將活動節點設置為它指向的節點。 如果沒有後綴鏈接,請將活動節點設置為根節點。 無論哪種方式, 有效邊緣有效長度保持不變。

Rule 3這個定義中,我們還考慮了葉節點的插入(不僅是分割節點)。

最後,觀察3:

當我們想要添加到樹中的符號已經在邊上時,根據Observation 1 ,我們只更新active pointremainder ,使樹保持不變。 但是如果有一個標記為需要後綴鏈接內部節點 ,我們必須通過後綴鏈接將該節點與我們當前的active node連接起來。

讓我們看看cdddcdc後綴樹的例子,如果我們在這種情況下添加後綴鏈接,並且我們不:

  1. 如果我們通過後綴鏈接連接節點:

    • 在添加最後一個字母c之前

    • 在添加最後一個字母c

  2. 如果我們通過後綴鏈接連接節點:

    • 在添加最後一個字母c之前

    • 在添加最後一個字母c

似乎沒有顯著差異:在第二種情況下,還有兩個後綴鏈接。 但是這些後綴鏈接是正確的 ,其中的一個 - 從藍色節點到紅色節點 - 對於我們的積極點的方法非常重要 。 問題是,如果我們在這裡沒有後綴鏈接,那麼當我們向樹中添加一些新的字母時,由於Rule 3 ,我們可能會省略添加一些節點到樹中,因為根據它,如果有沒有後綴鏈接,那麼我們必須把active_node放到根目錄下。

當我們將最後一個字母添加到樹中時,紅色節點已經存在,然後我們從藍色節點(邊標為'c' )進行插入。 由於有藍色節點的插入,我們將其標記為需要後綴鏈接 。 然後,依靠主動點方法, active node被設置為紅色節點。 但是我們不會從紅色節點插入,因為字母'c'已經在邊緣。 這是否意味著藍色節點必須沒有後綴鏈接? 不,我們必須通過後綴鏈接連接藍色節點和紅色節點。 為什麼它是正確的? 因為活動點方法可以保證我們到達正確的地方,即到下一個必須處理較短後綴插入的地方。

最後,這裡是我的後綴樹的實現:

  1. Java
  2. C++

希望這個“概述”結合jogojapan的詳細答案將有助於某人實施他自己的後綴樹。







suffix-tree