XML: The What, Why, Where and Hows Explained by a Newbie
If you’re looking for an in depth, super technical, ultra descriptive piece about what XML is and what it does then (if I were you) I’d probably keep searching. However, if (like me) you’ve just found out that XML is not just a representation of someone who doesn’t know their alphabet then let me enlighten you on the very basics.
Firstly, XML stands for EXtensible Markup Language. The key word there is ‘Extensible’. This means it is a markup language that can be expanded or added to by it’s users and still be usable by the application that is displaying it. Okay so I’m jumping ahead of myself. I should explain that XML doesn’t actually do anything. It is just information surrounded by tags. Software must be used if one would like to store, display, send and receive it.
So you’re probably asking “If it doesn’t do anything, then what’s the point of it?” Well let me inform. One of XML’s main features is it is self-descriptive, meaning it can be read by both humans and machines. With many systems today containing data in conflicting formats, large amounts of data needs converting, due to this it is time consuming and some data is often lost. However, because XML stores data in a plain text format it means many new, old and upgrading systems can read the same information with no data lost and it can be converted incredibly quickly.
The Components of XML
So what does this magical XML look like? Below is an example of some simple XML code:
<email>
<date> 26/07/2012 </date>
<time> 17:07 </time>
<from> Janet </from>
<to> James </to>
<subject> Just saying hi </subject>
<body> Hi James, had a lovely chat with you today. You must come over soon. </body>
</email>
From the example above, it is very clear to decipher that it is an email addressed to James from Janet on the date of the 26th July 2012 at 5:07pm. The subject and body of the email are also included. As you can see, the code is made up by different parts.
Like HTML attributes can also be used in XML (see below).
<person gender=”female”>
<name> Janet Smith </name>
<age> 43 </age>
</person>
Hopefully, you could decipher that the text above is describing Janet Smith giving information about her gender and age. However, the gender here, has been shown as an attribute. Attributes are designed to contain data related to a specific element. Attribute values must be quoted, by either double or single quotes.
<person gender=”female”>
Unlike HTML, which relies on predefined tags, XML’s tags are solely created by the user. However, these two markup languages are often in partnership. Put simply: XML stores and transports the data, while HTML formats and presents it.
The Rules of XML
Like all things there are rules to XML. These are known as the syntax rules. Let’s go through the common 3:
1. All XML documents must contain a root element which is the parent of all the other elements.
<root>
<child>
<subchild>...</subchild>
</child>
</root>
2. All XML elements must have a closing tag.
<yes> Am I doing it right </yes>
<no> Am I doing it right
3. XML tags are case sensitive. So the tag < email> is different to < Email> and these are both different to < EMAIL>. Opening and closing tags must be the same.
<yes> Am I doing it right </yes>
<Yes> Am I doing it right </Yes>
<no> Am I doing it right </No>
Of course there are many more, but like the above they are all pretty simple.
XML Schemas
However, as mentioned XMLs tags and elements are made up by the user. Due to this it can lack structure and may be hard to find software that translates the code into the format that one wishes. Content models such as DocBook help with this. DocBook is a collection of standards and tools for technical publishing, originally created by software companies as a standard for computer documentation, it can now be used for other kinds of content and has been adapted for many purposes. DocBook provides a number of tags that allow the user to easily publish the documents in any other form of documentation such as PDF and HTML. Other content models include: DITA (Darwin Information Typing Architecture), S1000D, XBRL and more.
To Conclude
All in all, XML may seem a bit complex to understand, but it is being used in many different areas for its simplicity. It chooses brains over beauty, focusing more on the logical information, rather than how it is formatted. Therefore, it is used in various different professions and industries. Including, finance, publishing, medicine, science and many more.