How to compare JSON values, objects and arrays
Comparing two JSON files is fairly straightforward, though there are a few areas where it is not quite as simple as it seems. The three literal names, true, false and null are not a problem, though note they must be lower case. Before comparing two numbers, they should be normalised so that 1 and 1.0 would not show a change. Similarly 100 and 1e2 would also be deemed to be equal. For example:
JSON A
{
"a": 1,
"b": 100.0
}
=
JSON B
{
"a": 1.0,
"b": 1e2
}
Strings may also need some normalisation to handle special character encodings so that for example.
JSON A
{
"a": "Here are some apostrophes
( ' and ' and \u0027 )"
}
=
JSON B
{
"a": "Here are some apostrophes
( \u0027 and \u0027 and ')"
}
Objects also compare well in that each member property is identified by a string which should be unique within the object (it does not have to be unique but behaviour is unpredictable if they are not unique!). Therefore corresponding members can be identified without ambiguity even if the order of the members is different.
JSON A
{
"firstname": "Andrew",
"lastname" : "Other"
}
=
JSON B
{
"lastname" : "Other",
"firstname": "Andrew"
}
Any object that has a unique key member should ideally be represented as an object where the key is pulled out as the member string – this leads to unambiguous comparison. See the example below.
Arrays present more of a problem for comparison. This is because arrays are used for different purposes. For example, if an array is used to represent an x,y coordinate, then the expectation is that [ 34, 56 ] is not the same as [ 56, 34 ]. However, if the array is being used as an unordered set of numbers, then the arrays should be considered equal. So comparing by position or as unordered items are alternative approaches to be applied depending on the interpretation of the array data.
Furthermore, comparing by position is not always what is needed when we use an array as a list, where the item order is significant. In this case, comparing [1,3,2,4,5] with [1,3,4,5] by position would give three differences: 2 != 4, 4 != 5 and 5 is a deleted item.
[ 1, 3, 2, 4, 5 ] | | | | x [ 1, 3, 4, 5 ]
A more intelligent ordered comparison might just say that 2 has been inserted.
[ 1, 3, 2, 4, 5 ] | | + | | [ 1, 3, 4, 5 ]
So it is arrays that cause most problems in comparing JSON data.
When JSON is generated, arrays are often used where the data could be represented as objects. Converting such an array into an object may therefore be a sensible pre-comparison step in order to get only ‘real’ changes identified.
For example:
{"contacts": [ { "id": "324", "first_name": "AN", "last_name": "Other" }, { "id": "127", "first_name": "John", "last_name": "Doe" } ]}
would be much better represented for comparison purposes as:
{"contacts": { "324": { "first_name": "AN", "last_name": "Other" }, "127": { "first_name": "John", "last_name": "Doe" } }}
It may not look quite so natural, but the corresponding contacts will be aligned properly.