Computing the data diff between two hash maps

This article is an excerpt from my book about Data-Oriented Programming.

More excerpts are available on my blog.

Our implementation of the 3-way merge resolution algorithm, where the system state is represented as a nested hash map, relies on the ability two compare two versions of the system state via the DataDiff class that implements:

The computation of a semantic diff between two hash maps (a.k.a data diff)
The detection of empty intersection between two hash maps

The internals of the DataDiff class are now revealed.

The data diff algorithm

Computing a semantic diff between two hash maps is the most challenging part of the reconciliation algorithm.

But at the end of the day, it deals only with data manipulation.

Let’s define exactly what we mean by a semantic diff between two hash maps a and b:

The semantic diff between a and b is a hash map d that contains all the nested fields of b whose value differ from their value in a.

For instance,

DataDiff.diff(
    {
        x: {
            y: 2,
            z: 3
        }
    },
    {
        x: {
            y: 2,
            z: 4
        }
    }
)

should return


{
    x: {
        z: 4
    }
}

Our implementation uses immutable functions from Lodash. By default, Lodash functions are not immutable. In order to use a immutable version of the functions, we need to use Lodash FP module (Functional Programming), as it is explained in the Lodash FP guide.^[1]. With this piece of code the signature of the immutable functions is exactly the same as the mutable functions.

_ = fp.convert({
    "cap": false,
    "curry": false,
    "fixed": false,
    "immutable": true,
    "rearg": false
});

And now to the implementation! The core of the code is inside _.reduce() where we make the recursive call to DataDiff.diff().

class DataDiff {
    static diffObjects(data1, data2) {
        var emptyObject = _.isArray(data1) ? [] : {};
        if(data1 == data2) {
            return emptyObject;
        }
        var keys = _.union(_.keys(data1), _.keys(data2));
        return _.reduce(keys,
                        function (acc, k) {
                            var res = DataDiff.diff(_.get(data1, k),
                                                    _.get(data2, k));
                            if((_.isObject(res) && _.isEmpty(res)) ||
                               (res == "data-diff:no-diff")) {
                                return acc;
                            }
                            return _.set(acc, k, res);
                        },
                        emptyObject);
    }

    static diff(data1, data2) {
        if(_.isObject(data1) && _.isObject(data2)) {
            return DataDiff.diffObjects(data1, data2);
        }
        if(data1 !== data2) {
            return data2;
        }
        return "data-diff:no-diff";
    }
}

window.DataDiff = DataDiff

Performance of calculating semantic diff

In the general case, calculating the semantic diff of two hash maps is not efficient as we have to go over all the leaves of both maps.

But when the two maps are created via structural sharing from the same map, the implementation in efficient as most of the nodes are shared between the two hash maps.

In DO, the versions of the system state are always created via structural sharing: that’s why the code in the conflict resolution phase is efficient.

Playing with data diff

Even if you don’t grasp all the details of the implementation, feel free to play with DataDiff and see how it calculates the semantic diff between two hash maps.

var data1 = {
    g: {
        c: 3
    },
    x: 2,
    y: {
        z: 1
    },
    w: [5]
}

var data2 = {
    g: {
        c: 3
    },
    x: 2,
    y: {
        z: 2
    },
    w: [4]
}

DataDiff.diff(data1, data2);

Intersection of maps

DataDiff also provides a DataDiff.isEmptyIntersection() that we used in the reconciliation algorithm to detect a conflict.

DataDiff.leaves = function(obj, prefix = '') {
    return _.reduce(obj,
                    function(acc, v, k) {
                        if (_.isObject(v)) {
                            return _.concat(acc,
                                            DataDiff.leaves(v,
                                                            prefix + k + "."))
                        }
                        return _.concat(acc, [prefix + k]);
                    },
                    []);
}

DataDiff.isEmptyIntersection = function(delta1, delta2) {
    return _.isEmpty(_.intersection(DataDiff.leaves(delta1),
                                    DataDiff.leaves(delta2)));
}

Feel free to play with it also!

var diff1 = {
    g: {
        c: 3
    }
}

var diff2 = {
    g: {
        c: 4
    }
}

DataDiff.isEmptyIntersection(diff1, diff2);

var diff3 = {
    g: {
        c: 3
    }
}

var diff4 = {
    g: {
        d: 4
    }
}

DataDiff.isEmptyIntersection(diff3, diff4);

This article is an excerpt from my book about Data-Oriented Programming.

More excerpts are available on my blog.

1. https://github.com/lodash/lodash/wiki/FP-Guide

Yehonathan Sharvit

Computing the data diff between two hash maps

The data diff algorithm

Performance of calculating semantic diff

Playing with data diff

Intersection of maps

Read Next

Applying Git and Optimistic Concurrency Control principles to Data Oriented Programming

A new way of blogging about Python

The data diff algorithm

Performance of calculating semantic diff

Playing with data diff

Intersection of maps

Subscribe to Yehonathan Sharvit newsletter

Read Next

Tags