Computing the data diff between two hash maps
This article is an excerpt from my upcoming book about Data Oriented Programming. The book will be published by Manning, once it is completed (hopefully in 2021).
More excerpts are available on my blog.
Enter your email address below to get notified when the book is published.
Our implementation of the 3way merge resolution algorithm, where the system state is represented as a nested hash map, relies on the ability two compare two versions of the system state via the DataDiff
class that implements:

The computation of a semantic diff between two hash maps (a.k.a data diff)

The detection of empty intersection between two hash maps
The internals of the DataDiff
class are now revealed.
The data diff algorithm
Computing a semantic diff between two hash maps is the most challenging part of the reconciliation algorithm.
But at the end of the day, it deals only with data manipulation.
Let’s define exactly what we mean by a semantic diff between two hash maps a
and b
:
The semantic diff between
a
andb
is a hash mapd
that contains all the nested fields ofb
whose value differ from their value ina
.
For instance,
DataDiff.diff(
{
x: {
y: 2,
z: 3
}
},
{
x: {
y: 2,
z: 4
}
}
)
should return
{
x: {
z: 4
}
}
Our implementation uses immutable functions from Lodash. By default, Lodash functions are not immutable. In order to use a immutable version of the functions, we need to use Lodash FP module (Functional Programming), as it is explained in the Lodash FP guide.^{[1]}. With this piece of code the signature of the immutable functions is exactly the same as the mutable functions.
_ = fp.convert({
"cap": false,
"curry": false,
"fixed": false,
"immutable": true,
"rearg": false
});
And now to the implementation! The core of the code is inside _.reduce()
where we make the recursive call to DataDiff.diff()
.
class DataDiff {
static diffObjects(data1, data2) {
var emptyObject = _.isArray(data1) ? [] : {};
if(data1 == data2) {
return emptyObject;
}
var keys = _.union(_.keys(data1), _.keys(data2));
return _.reduce(keys,
function (acc, k) {
var res = DataDiff.diff(_.get(data1, k),
_.get(data2, k));
if((_.isObject(res) && _.isEmpty(res)) 
(res == "datadiff:nodiff")) {
return acc;
}
return _.set(acc, k, res);
},
emptyObject);
}
static diff(data1, data2) {
if(_.isObject(data1) && _.isObject(data2)) {
return DataDiff.diffObjects(data1, data2);
}
if(data1 !== data2) {
return data2;
}
return "datadiff:nodiff";
}
}
window.DataDiff = DataDiff
Performance of calculating semantic diff
In the general case, calculating the semantic diff of two hash maps is not efficient as we have to go over all the leaves of both maps.
But when the two maps are created via structural sharing from the same map, the implementation in efficient as most of the nodes are shared between the two hash maps.
In DO, the versions of the system state are always created via structural sharing: that’s why the code in the conflict resolution phase is efficient.
Playing with data diff
Even if you don’t grasp all the details of the implementation, feel free to play with DataDiff
and see how it calculates the semantic diff between two hash maps.
var data1 = {
g: {
c: 3
},
x: 2,
y: {
z: 1
},
w: [5]
}
var data2 = {
g: {
c: 3
},
x: 2,
y: {
z: 2
},
w: [4]
}
DataDiff.diff(data1, data2);
Intersection of maps
DataDiff
also provides a DataDiff.isEmptyIntersection()
that we used in the reconciliation algorithm to detect a conflict.
DataDiff.leaves = function(obj, prefix = '') {
return _.reduce(obj,
function(acc, v, k) {
if (_.isObject(v)) {
return _.concat(acc,
DataDiff.leaves(v,
prefix + k + "."))
}
return _.concat(acc, [prefix + k]);
},
[]);
}
DataDiff.isEmptyIntersection = function(delta1, delta2) {
return _.isEmpty(_.intersection(DataDiff.leaves(delta1),
DataDiff.leaves(delta2)));
}
Feel free to play with it also!
var diff1 = {
g: {
c: 3
}
}
var diff2 = {
g: {
c: 4
}
}
DataDiff.isEmptyIntersection(diff1, diff2);
var diff3 = {
g: {
c: 3
}
}
var diff4 = {
g: {
d: 4
}
}
DataDiff.isEmptyIntersection(diff3, diff4);
This article is an excerpt from my upcoming book about Data Oriented Programming. The book will be published by Manning, once it is completed (hopefully in 2021).
More excerpts are available on my blog.
Enter your email address below to get notified when the book is published.