javascript html - Does nodejs have a working diff library or algorithm?




function angular (6)

There is an diff algorithm implementation that I wrote with javascript in the following page.

https://github.com/cubicdaiya/onp

This runs with node.js. Forthermore there is a C++ addon version for node.js in following page.

https://github.com/cubicdaiya/node-dtl

You can install this with npm.

$ npm install -g dtl

I'm looking for a javascript diff algorithm implementation or library, which has been tested on and works with arbitrary utf8 text files.

All of the ones I found so far (say for example, http://ejohn.org/projects/javascript-diff-algorithm/) fail on corner cases

(Try using a file which contains the string '__proto__' in my example library.)




Maybe this will help you — jsdiff


Have a look at the JavaScript library wikEd diff. It has Unicode and multilingual support. It also detects and highlights block moves and is word/character based. You can also use the online too/demo for testing different settings and to look at the internal data structures. The library's code is fully commented.


My intuition is as follows:

After k iterations of the main loop you have constructed a suffix tree which contains all suffixes of the complete string that start in the first k characters.

At the start, this means the suffix tree contains a single root node that represents the entire string (this is the only suffix that starts at 0).

After len(string) iterations you have a suffix tree that contains all suffixes.

During the loop the key is the active point. My guess is that this represents the deepest point in the suffix tree that corresponds to a proper suffix of the first k characters of the string. (I think proper means that the suffix cannot be the entire string.)

For example, suppose you have seen characters 'abcabc'. The active point would represent the point in the tree corresponding to the suffix 'abc'.

The active point is represented by (origin,first,last). This means that you are currently at the point in the tree that you get to by starting at node origin and then feeding in the characters in string[first:last]

When you add a new character you look to see whether the active point is still in the existing tree. If it is then you are done. Otherwise you need to add a new node to the suffix tree at the active point, fallback to the next shortest match, and check again.

Note 1: The suffix pointers give a link to the next shortest match for each node.

Note 2: When you add a new node and fallback you add a new suffix pointer for the new node. The destination for this suffix pointer will be the node at the shortened active point. This node will either already exist, or be created on the next iteration of this fallback loop.

Note 3: The canonization part simply saves time in checking the active point. For example, suppose you always used origin=0, and just changed first and last. To check the active point you would have to follow the suffix tree each time along all the intermediate nodes. It makes sense to cache the result of following this path by recording just the distance from the last node.

Can you give a code example of what you mean by "fix" bounding variables?

Health warning: I also found this algorithm particularly hard to understand so please realise that this intuition is likely to be incorrect in all important details...







javascript node.js diff