Posted15 March 2021
byRobin La Fontaine

XML and JSON: horses for courses or a one-horse race?

Posted15 March 2021
byRobin La Fontaine

Both XML and JSON are now well established, but there is inevitably still a decision to be made as to which is best – or can we manage with only one of them? To me, as an engineer with a heavy exposure to structured data particularly in data interchange, it has been interesting to see these two develop.

Why we have XML and JSON

It is worth remembering that XML, when it appeared in 1998, was a very significant simplification of its parent SGML. SGML was well-used in publishing but the entry cost was high due to its complexity. XML appeared as a 20 page specification which ruthlessly eliminated many of SGML complexities – but many thought it still too complex. So JSON was born to be simple and easy to use.

It is always attractive to make things simpler, and we all applaud that trend. The problem though with simplicity can be a lack of power and flexibility and sometimes the simple solution soon sprouts many add-ons to meet these needs. We see it in other areas, for example Scheme was a simple version of Lisp but to use it in earnest you needed to access additional libraries of functions. But, like JSON, it was an appropriate solution to many problems.

How do we choose?

So how do we approach any choice between XML and JSON? I view XML as a run of text with markup added to provide computer-sensible semantics. I view JSON as data structure with string and number values added at the appropriate places. Therefore as a document contains more data, e.g. a catalogue, then the best representation might move from XML to JSON. Similarly as a set of, for example, financial data gains more explanatory text to become an annual report, so the most appropriate structure might change from JSON to XML.

Our interest is in finding change to data or documents in both these formats and the challenges have similarities and differences. XML is inherently an ordered sequence of information, JSON objects all have named members where the order is not relevant. So we find that we need a different approach to get the best results from each. The good news is that discovering the best approach to one leads to improvements in the other. So certainly we need them both!