XML: The What, Why, Where and Hows Explained by a Newbie

If you’re looking for an in depth, super technical, ultra descriptive piece about what XML is and what it does then (if I were you) I’d probably keep searching. However, if (like me) you’ve just found out that XML is not just a representation of someone who doesn’t know their alphabet then let me enlighten you on the very basics.

Firstly, XML stands for EXtensible Markup Language. The key word there is ‘Extensible’. This means it is a markup language that can be expanded or added to by it’s users and still be usable by the application that is displaying it. Okay so I’m jumping ahead of myself. I should explain that XML doesn’t actually do anything. It is just information surrounded by tags. Software must be used if one would like to store, display, send and receive it.

So you’re probably asking “If it doesn’t do anything, then what’s the point of it?” Well let me inform. One of XML’s main features is it is self-descriptive, meaning it can be read by both humans and machines. With many systems today containing data in conflicting formats, large amounts of data needs converting, due to this it is time consuming and some data is often lost. However, because XML stores data in a plain text format it means many new, old and upgrading systems can read the same information with no data lost and it can be converted incredibly quickly.

The Components of XML

So what does this magical XML look like? Below is an example of some simple XML code:

<email>
   <date> 26/07/2012 </date>
   <time> 17:07 </time>
   <from> Janet </from>
   <to> James </to>
   <subject> Just saying hi </subject>
   <body> Hi James, had a lovely chat with you today. You must come over soon. </body>
</email>

From the example above, it is very clear to decipher that it is an email addressed to James from Janet on the date of the 26th July 2012 at 5:07pm. The subject and body of the email are also included. As you can see, the code is made up by different parts.

Arrows pointing to XML tags and the whole XML element

Like HTML attributes can also be used in XML (see below).

<person gender=”female”>
   <name> Janet Smith </name>
   <age> 43 </age>
</person>

Hopefully, you could decipher that the text above is describing Janet Smith giving information about her gender and age. However, the gender here, has been shown as an attribute. Attributes are designed to contain data related to a specific element. Attribute values must be quoted, by either double or single quotes.

<person gender=”female”>

Unlike HTML, which relies on predefined tags, XML’s tags are solely created by the user. However, these two markup languages are often in partnership. Put simply: XML stores and transports the data, while HTML formats and presents it.

The Rules of XML

Like all things there are rules to XML. These are known as the syntax rules. Let’s go through the common 3:

1. All XML documents must contain a root element which is the parent of all the other elements.

<root>
   <child>
      <subchild>...</subchild>
   </child>
</root>

2. All XML elements must have a closing tag.

<yes> Am I doing it right </yes>
<no> Am I doing it right

3. XML tags are case sensitive. So the tag < email> is different to < Email> and these are both different to < EMAIL>. Opening and closing tags must be the same.

<yes> Am I doing it right </yes>
<Yes> Am I doing it right </Yes>
<no> Am I doing it right </No>

Of course there are many more, but like the above they are all pretty simple.

XML Schemas

However, as mentioned XMLs tags and elements are made up by the user. Due to this it can lack structure and may be hard to find software that translates the code into the format that one wishes. Content models such as DocBook help with this. DocBook is a collection of standards and tools for technical publishing, originally created by software companies as a standard for computer documentation, it can now be used for other kinds of content and has been adapted for many purposes. DocBook provides a number of tags that allow the user to easily publish the documents in any other form of documentation such as PDF and HTML. Other content models include: DITA (Darwin Information Typing Architecture), S1000D, XBRL and more.

To Conclude

All in all, XML may seem a bit complex to understand, but it is being used in many different areas for its simplicity. It chooses brains over beauty, focusing more on the logical information, rather than how it is formatted. Therefore, it is used in various different professions and industries. Including, finance, publishing, medicine, science and many more.

Keep Reading

Managing Risk in Legal Documentation

/
Proactively addressing compliance, accuracy, and security risks in legal documentation is essential to protect from costly errors.

Ensuring Accuracy in Legal Documentation

/
Efficient document comparison and merging can drastically improve accuracy, collaboration, and compliance for legal teams.

Beyond Step-Through XSLT Debugging

Print-debugging in XSLT provides a broader view of code behaviour by capturing variable values at multiple points.

Solving Common Challenges with Inaccurate Document Management

Discover practical strategies to overcome common challenges in regulated industries.

How to avoid non-compliance when updating technical documents in regulated industries

Navigate the challenges of updating technical documents in regulated industries.

Built-in XML Comparison vs Document Management Systems (DMS)

Compare using specialised XML comparison software versus a DMS in regulated industries.

How Move Detection Improves Document Management

Learn how move detection technology improves document management by accurately tracking relocated content.

Streamlining Data Syndication in PIM Systems through JSON Comparison

Utilise JSON comparison to reduce errors, labour costs, and system downtime.

Move detection when comparing XML files

DeltaXML introduces an enhanced move detection feature that provides a clearer insight of how your content has changed.