File Formats and ConversionQA Functionality
What is ConversionQA?
ConversionQA is a new tool from DeltaXML that provides assurance that your content conversion project is successful. It enables you to compare inputs of any two XML formats, ignoring the XML structure, to determine any differences in the content that may have occurred during conversion. Differences are provided in a simple HTML report that helps you to identify where the transformation is going wrong.
File Types
ConversionQA handles any (with some caveats) XML inputs, allowing you to check conversions between any two formats. The base input type is ‘XML’, which is any single-file-based XML input, e.g., DocBook, XHTML, etc. The initial version offers special handling for two other XML input types: Microsoft Word .docx and DITA maps. If you have other requirements for inputs that need special handling, please let us know, and we’ll do our best to accommodate them.
Single file XML
These are treated as-is, with no special handling. Each input is parsed and then passed directly into ConversionQA, where content is extracted from the XML structure for comparison.
Microsoft Word .docx
XML-based Word documents are passed in their plain form (which is a zip file) to ConversionQA. The relevant parts of the zip are extracted, parsed and processed before passing into the content comparison engine.
DITA maps
The whole file structure of DITA maps must be packaged as a zip before passing to ConversionQA. As well as the zip file, you need to specify a path within the zip that points to the top-level map. Once passed to ConversionQA, the zip file is unpacked, and the map is published using the DITA-OT to a single-file DITA document. This single file is then treated in the same way as a normal single file XML input.
Functionality
The primary purpose of ConversionQA is to check that a document conversion has been successful, i.e., it hasn’t lost any content in the process. This means that changes that don’t affect the content itself are ignored. For example:
- Merging two paragraphs together
- Changing the text formatting of one or more words
- Converting a bulleted list into a numbered list
There are occasions where these changes might result in a content change as well, e.g., adding a space in between two words when merging paragraphs or adding a period at the end of each sentence when converting a bulleted list into a paragraph. These changes will be highlighted in the report.
Other significant changes, like added or deleted paragraphs, will also be included in the report. The changed content is included alongside an XPath showing where to find the change in the input, e.g.
Running Conversion QA
We’ve designed ConversionQA to fit into the larger content conversion workflow. This means that you can build it into your applications using either a Java API or by hosting it as a REST service, which you can make calls to from any coding language. Each call returns a Boolean representing whether the content conversion was successful or not, and, in the case, where there are differences, the HTML report is available.
If you need to check content after conversion, get started with a ConversionQA trial today.