Introducing Character By Character Comparison

April 2024 – We are excited to announce the latest feature to our document comparison solutions: character by character comparison.

When even the smallest change can have a ripple effect across documentation, systems and processes, it’s important that differences are found quickly and efficiently. Whether it’s a minor typo correction, a nuanced capitalisation adjustment, or a single-character update in critical data fields like product codes, these changes can hold immense significance.

We’re pleased to announce that character by character comparison is now available for our XML Compare, DITA Compare and DocBook Compare products providing granularity that enables users to detect even the smallest differences within their documents. This new feature is designed to pinpoint these subtle alterations, providing users with needed clarity and insight into their data and documents.

Benefits of character by character comparison

  • Identifying and rectifying typos is simplified as even the smallest changes are highlighted.
  • Users can pinpoint changes in capitalisation, ensuring consistency and adherence to style guidelines.
  • Identifying single-character updates, such as changes in product codes or serial numbers, becomes more straightforward.
  • A deeper level of granularity is provided, facilitating more accurate comparisons and insights into textual changes.
  • The configurability of character by character comparison allows users to tailor the analysis to different parts of the document, optimising accuracy and efficiency.

Why is character by character comparison so great?

Character by character analysis provides users with a more precise method for analysing text and data.

Enhanced precision and efficiency

Word by word comparison has long been a favourite feature of our XML Comparison solutions. However, it’s true that relying solely on word-level changes could sometimes miss subtle alterations within words. With character by character, users can now zoom in on individual characters, providing a much closer look at textual modifications. This not only saves time but also valuable resources, especially when working under tight deadlines. Take, for instance, legal contracts, where even the slightest amendments hold significant legal weight; character-level analysis ensures that no detail escapes scrutiny.

Likewise, in technical documents, consistency in the capitalisation of specific terms is often required under standardisation and clarity. By leveraging character-level analysis, users can swiftly identify and rectify any differences, thereby ensuring uniformity across the document. In industries like manufacturing, where product tracking is required, character-level analysis becomes a necessity. Consider changes to product codes – even minor alterations can disrupt inventory management and traceability. By integrating character-level analysis into the XML Comparison process, such changes are promptly detected, ensuring products can be tracked reliably across the entire supply chain.

Customisable analysis for varied document sections

One of the standout features of DeltaXML’s character by character comparison is its flexibility. Through the integration of a control attribute in XSLT, users have the ability to toggle this feature on or off for different parts of their documents.

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:character-by-character="false">some input</root>

This level of customisation empowers users to tailor their analysis according to their unique requirements. For example, document headings or product codes may require character-level scrutiny to ensure precision, whereas the body of the document or product descriptions may suffice with word-level comparison. In a software development setting, code documentation often necessitates character-level analysis to precisely track changes, whereas user manuals or instructional texts may prefer word-level comparison to preserve readability. Thankfully with this new update, users can adapt their comparison to meet the distinct demands of each document section.

Context-sensitive output

While character-level analysis offers great detail, it can sometimes result in cluttered outputs, especially with large text blocks. DeltaXML has addressed this challenge by implementing context-sensitive output. By employing thresholding techniques, the output intelligently mitigates fragmented displays caused by similar character matches, enhancing overall readability without compromising on detail.

For example, let’s imagine we are comparing two sentences:

  1. “The quick brown fox jumps over the lazy dogs.”
  2. “The dirty brown fox jumps over the lazy dogs.”

Without thresholding, the shared letter ‘i’ between the words “quick” and “dirty” could lead to a confusing output:

The qudickrty brown fox jumps over the lazy dogs.

However, with the intelligent application of thresholding behaviour, DeltaXML separates out the differences for easy readability:

The quick dirty brown fox jumps over the lazy dogs.

Try out character by character today

For customers of XML Compare, DITA Compare and DocBook Compare, these features will already be available to you. Update to the latest version and within the API configuration settings set the setCharacterByCharacterEnabled parameter to true.

If you want to take advantage of using character by character comparison within certain sections but not the whole document, make sure the parameter is set to true in the API settings, add an attribute to the root node, setting characterByCharacter to false and set an attribute of characterByCharacter to true on the element where you would like the condition to apply. And as always, you can create your own rules on when characterByCharacter should be automatically applied with the power of XSLT pipelines.

Our documentation provides you with everything you need to get up to date with this new feature.

For those new to DeltaXML, you can trial our products today, take advantage of our free samples and have a play with finding those small, but significant, differences in no time.

For any questions, or if you’d prefer a demo, don’t hesitate to get in touch.

We’d love to hear your feedback on this feature or any ideas you may have for future improvements, so please share your thoughts in the comments section below. Your input is super important in helping us make our solutions even better for you. Thank you for your continued support and collaboration, and to make sure you never miss a new feature sign up to our newsletter.

Keep Reading

Managing Risk in Legal Documentation

/
Proactively addressing compliance, accuracy, and security risks in legal documentation is essential to protect from costly errors.

Ensuring Accuracy in Legal Documentation

/
Efficient document comparison and merging can drastically improve accuracy, collaboration, and compliance for legal teams.

Introducing HTML Compare

/
HTML Compare is your go-to for tracking, comparing, and managing HTML content changes with ease, offering clear visual highlights and customisable settings.

Introducing Subtree Processing Mode for Greater Flexibility

/
A new feature that lets you control how content is compared by processing sections as either text or data.

Beyond Step-Through XSLT Debugging

Print-debugging in XSLT provides a broader view of code behaviour by capturing variable values at multiple points.

DeltaXML’s Smart Comparison Report

With clear insights and detailed analysis, DeltaXML's new Comparison Report makes fine-tuning configuration easier than ever.

Solving Common Challenges with Inaccurate Document Management

Discover practical strategies to overcome common challenges in regulated industries.

How to avoid non-compliance when updating technical documents in regulated industries

Navigate the challenges of updating technical documents in regulated industries.

Built-in XML Comparison vs Document Management Systems (DMS)

Compare using specialised XML comparison software versus a DMS in regulated industries.