Posted17 September 2018
byRobin La Fontaine

Thoughts on XML Summer School 2018

Posted17 September 2018
byRobin La Fontaine

XML to JSON and JSON to XML – easy?

Automatic conversion of data between XML and JSON – how difficult can this be? In one sense it is quite easy, but the devil is in the detail when it comes to needing a proper round-trip capability. JSON does not have attributes, XML does not have arrays – these are just two areas where these two formats differ.

But there are some more interesting ones too, from the perspective of Jason Polis who works in financial services. Converting a text representation of a number representing a payment into a JSON number can result in loss of information… which of course could result in loss of money! This is just because a limited binary representation cannot represent all decimal numbers precisely.

To do a good conversion from XML to JSON needs knowledge of the schema because, for example, repeating elements (maxOccurs > 1) needs to go into an array in JSON. Round-tripping back from JSON has the fundamental problem that JSON data has no order, except within an array. So some convention needs to be adapted to handle this. Unfortunately, XML does not have a way to indicate whether or not the order is important – often it is not important for data but by definition, an order is important in XML.

So what looks like a fairly simply conversion becomes very difficult if you want to get everything back again into the original JSON or XML representation.

XForms – a well-kept secret

More and more new languages that started life with a restricted focus have been developed well beyond their original purpose and are ‘Turing complete’. That means you should be able to do anything – write code equivalent to code in any other language. Advocates of those languages delight in tackling difficult problems such as writing a compiler in XQuery or in XSLT – not the first language of choice for a compiler but it can be done.

XForms does not seem to be an intuitive name for a declarative programming language that is Turing complete. Stephen Pemberton sang its praises and demonstrated some impressive results for work that was done ‘during the lunch hour’. Two programmers were asked how long a project would take and given two days to estimate this. One came back and said he needed more time to estimate this, the other XForms programmer came back and said he had implemented it! Sounds impressive so surely this is worth looking at especially if you have knowledge of XML, XPath, REST and CSS.

A debate in the Oxford Union

“This house believes that open technology and standards have widened social injustice” was the motion and we were swayed first one way then the other by experts in various areas. We all retired

to the bar while the result was worked out and it is a credit to both sides that the ayes to the right were 38 and the noes to the left numbered 38 also!

Invisible XML

We all have strong preferences to see data or a programming language in a sytax that is familiar to us. The goal of Invisible XML is to take over the world – by converting anything that can be parsed into an XML representation. Why? So that the XML technology stack can be applied to it, and then the result can be converted back into the original syntax so no-one will ever know that XML was used – invisible XML.

It is a great idea and we use it ourselves to convert JSON into XML, compare it and merge it and then push it back out again as JSON. It would be good to be able to do that to other formats and languages also and the technology to do this is improving. It is as always the ability to round-trip without any loss of information that is the tricky part of this.

Schemas for XML content: More is good but less is better

In the days of SGML, roll-your-own schema was the wisdom of the day. More and more were developed but the recent trend in XML is to consolidate down to just a few – why re-invent the wheel? There are industrial application areas that have spawned schemas that have reached a high level of maturity and it makes much more sense to adopt one of these than to develop another one that will probably be very similar to one that already exists.

One of the key questions, Debbie Lapeyre said, was whether to take a subset or even go for a superset. A subset makes more sense because the main advantage of adopting an existing schema is to make use of all the tools, for example, the publishing pipeline that will turn the content into HTML or print. As some of these schemas become more complex there is always a group who will resist this and subset the language, which can make it much easier to understand and lower the cost of entry.