Introducing Character By Character Comparison
April 2024 – We are excited to announce the latest feature to our document comparison solutions: character by character comparison.
When even the smallest change can have a ripple effect across documentation, systems and processes, it’s important that differences are found quickly and efficiently. Whether it’s a minor typo correction, a nuanced capitalisation adjustment, or a single-character update in critical data fields like product codes, these changes can hold immense significance.
We’re pleased to announce that character by character comparison is now available for our XML Compare, DITA Compare and DocBook Compare products providing granularity that enables users to detect even the smallest differences within their documents. This new feature is designed to pinpoint these subtle alterations, providing users with needed clarity and insight into their data and documents.
Why is character by character comparison so great?
Character by character analysis provides users with a more precise method for analysing text and data.
Enhanced precision and efficiency
Word by word comparison has long been a favourite feature of our XML Comparison solutions. However, it’s true that relying solely on word-level changes could sometimes miss subtle alterations within words. With character by character, users can now zoom in on individual characters, providing a much closer look at textual modifications. This not only saves time but also valuable resources, especially when working under tight deadlines. Take, for instance, legal contracts, where even the slightest amendments hold significant legal weight; character-level analysis ensures that no detail escapes scrutiny.
Likewise, in technical documents, consistency in the capitalisation of specific terms is often required under standardisation and clarity. By leveraging character-level analysis, users can swiftly identify and rectify any differences, thereby ensuring uniformity across the document. In industries like manufacturing, where product tracking is required, character-level analysis becomes a necessity. Consider changes to product codes – even minor alterations can disrupt inventory management and traceability. By integrating character-level analysis into the XML Comparison process, such changes are promptly detected, ensuring products can be tracked reliably across the entire supply chain.
Customisable analysis for varied document sections
One of the standout features of DeltaXML’s character by character comparison is its flexibility. Through the integration of a control attribute in XSLT, users have the ability to toggle this feature on or off for different parts of their documents.
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:character-by-character="false">some input</root>
This level of customisation empowers users to tailor their analysis according to their unique requirements. For example, document headings or product codes may require character-level scrutiny to ensure precision, whereas the body of the document or product descriptions may suffice with word-level comparison. In a software development setting, code documentation often necessitates character-level analysis to precisely track changes, whereas user manuals or instructional texts may prefer word-level comparison to preserve readability. Thankfully with this new update, users can adapt their comparison to meet the distinct demands of each document section.
Context-sensitive output
While character-level analysis offers great detail, it can sometimes result in cluttered outputs, especially with large text blocks. DeltaXML has addressed this challenge by implementing context-sensitive output. By employing thresholding techniques, the output intelligently mitigates fragmented displays caused by similar character matches, enhancing overall readability without compromising on detail.
For example, let’s imagine we are comparing two sentences:
- “The quick brown fox jumps over the lazy dogs.”
- “The dirty brown fox jumps over the lazy dogs.”
Without thresholding, the shared letter ‘i’ between the words “quick” and “dirty” could lead to a confusing output:
However, with the intelligent application of thresholding behaviour, DeltaXML separates out the differences for easy readability:
Try out character by character today
For customers of XML Compare, DITA Compare and DocBook Compare, these features will already be available to you. Update to the latest version and within the API configuration settings set the setCharacterByCharacterEnabled
parameter to true.
If you want to take advantage of using character by character comparison within certain sections but not the whole document, make sure the parameter is set to true in the API settings, add an attribute to the root node, setting characterByCharacter
to false and set an attribute of characterByCharacter
to true on the element where you would like the condition to apply. And as always, you can create your own rules on when characterByCharacter
should be automatically applied with the power of XSLT pipelines.
Our documentation provides you with everything you need to get up to date with this new feature.
For those new to DeltaXML, you can trial our products today, take advantage of our free samples and have a play with finding those small, but significant, differences in no time.
For any questions, or if you’d prefer a demo, don’t hesitate to get in touch.
We’d love to hear your feedback on this feature or any ideas you may have for future improvements, so please share your thoughts in the comments section below. Your input is super important in helping us make our solutions even better for you. Thank you for your continued support and collaboration, and to make sure you never miss a new feature sign up to our newsletter.