Posted26 September 2024
bySasha Hayden

Introducing Subtree Processing Mode for Greater Flexibility

Posted26 September 2024
bySasha Hayden

September 2024 – We’re excited to introduce the Subtree Processing Mode—a major new feature that enhances how you can compare and manage content.

At DeltaXML, we’re constantly looking for ways to improve how our customers handle their document and data comparisons. This powerful addition offers more precise control, boosting performance and helping you achieve even better comparison results.

Benefits of using Subtree Processing Mode:

Processing content as either text or data based on its type can significantly speed up comparison times, especially for large files.
By distinguishing between content types, Subtree Processing Mode ensures that text-heavy sections are compared with attention to structure and style, while data-centric sections are processed more efficiently.
The ability to use XPath-based rules means you can easily customise which sections should be processed as text or data, giving you complete control over how your content is handled.

What is Subtree Processing Mode?

Simply put, Subtree Processing Mode allows you to process specific sections, or subtrees, within your content as either text or data. This distinction is key, especially when dealing with mixed content files where both narrative (text) and structured data co-exist. For example, a technical document might include written instructions alongside tables of specifications—two very different types of content, but often both present in the same file.

With Subtree Processing Mode, you can tailor how each part of your file is processed, improving both speed and accuracy.

Why This Feature is so Great

Subtree Processing Mode is especially beneficial for our existing customers who regularly deal with complex, large-scale XML documents containing a mix of content types. Before, comparison tools might struggle to differentiate between data-centric and text-heavy sections, potentially slowing down performance or yielding unexpected results.

Now, you have the flexibility to process XML subtrees based on XPath configurations that define whether a section should be treated as text or data. The impact? Faster processing, more meaningful comparison results, and the ability to customise how you handle intricate files.

A Closer Look: Data vs Text Processing

A helpful way to think about this distinction is the purpose of the content:

Text: Information designed for human consumption, like content in a book, article, or a block of descriptive text. It’s free-flowing and often requires attention to detail.
Data: More structured and often repetitive, data is typically used to store information. Think contact lists, specifications, or reports. These sections don’t require the same human-readable nuance as text.

By identifying which content is better treated as text or data, Subtree Processing Mode improves the accuracy and efficiency of the comparison, especially in large, complex XML files.

A simple example to visualise the difference

Text Content Processing

Data Content Processing

Use Case: Hybrid Content Processing

One of the most exciting aspects of Subtree Processing Mode is its ability to handle hybrid content effectively. Customers sometimes faced obstacles in achieving optimal comparisons when working with documents that contained both narrative and structured data. For instance, a product manual that combines detailed descriptions with technical tables could produce confusing results when treated uniformly.

With this new mode, you can define sections like the tables as data, allowing XML Compare to treat them accordingly, while still processing the descriptive text with the detail it needs. This separation leads to more insightful, accurate comparisons and a more streamlined workflow.

Example

In the following example, both A and B contain the same items, but in B, the items for Daniel and Edith are swapped.

When using Subtree processing mode as Data, the item for Daniel is matched with the item for Edith because they share common PCData nodes in the same order:

Daniel, Edith, Fred, 1940, 1960, 1980
Edith, Fred, Adam, 1960, 1980, 1990

<?xml version="1.0" encoding="UTF-8"?>
<container>
    <item >
        <name>Adam</name>
        <spouse>Beth</spouse>
        <children>
            <child>Chris</child>
        </children>
        <date_of_birth>1950</date_of_birth>
        <date_of_marriage>1970</date_of_marriage>
        <member_since>1970</member_since>
    </item>
    <item >
        <name>Daniel</name>
        <spouse>Edith</spouse>
        <children>
            <child>Fred</child>
        </children>
        <date_of_birth>1940</date_of_birth>
        <date_of_marriage>1960</date_of_marriage>
        <member_since>1980</member_since>
    </item>
    <item >
        <name>Edith</name>
        <spouse>Fred</spouse>
        <children>
            <child>Adam</child>
        </children>
        <date_of_birth>1960</date_of_birth>
        <date_of_marriage>1980</date_of_marriage>
        <member_since>1990</member_since>
    </item>
    <item >
        <name>Ottie</name>
        <spouse>Phil</spouse>
        <children>
            <child>Rich</child>
        </children>
        <date_of_birth>1990</date_of_birth>
        <date_of_marriage>2010</date_of_marriage>
        <member_since>2011</member_since>
    </item>
</container>

<?xml version="1.0" encoding="UTF-8"?>
<container>
    <item >
        <name>Adam</name>
        <spouse>Beth</spouse>
        <children>
            <child>Chris</child>
        </children>
        <date_of_birth>1950</date_of_birth>
        <date_of_marriage>1970</date_of_marriage>
        <member_since>1970</member_since>
    </item>
    <item >
        <name>Edith</name>
        <spouse>Fred</spouse>
        <children>
            <child>Adam</child>
        </children>
        <date_of_birth>1960</date_of_birth>
        <date_of_marriage>1980</date_of_marriage>
        <member_since>1990</member_since>
    </item>
    <item >
        <name>Daniel</name>
        <spouse>Edith</spouse>
        <children>
            <child>Fred</child>
        </children>
        <date_of_birth>1940</date_of_birth>
        <date_of_marriage>1960</date_of_marriage>
        <member_since>1980</member_since>
    </item>
    <item >
        <name>Ottie</name>
        <spouse>Phil</spouse>
        <children>
            <child>Rich</child>
        </children>
        <date_of_birth>1990</date_of_birth>
        <date_of_marriage>2010</date_of_marriage>
        <member_since>2011</member_since>
    </item>
</container>

By using Subtree Processing Mode, you can specify that items like these should be treated as data, while narrative sections are processed as text. This ensures that XML Compare delivers the most relevant differences, saving time and producing more accurate results.

Using Text Content Processing

Using Data Content Processing

Try It Today!

Our Subtree Processing Mode is now available, and we encourage all our customers to explore its benefits. Whether you’re managing complex technical documents, legal contracts, or large datasets, this feature will help you fine-tune your comparisons, improve performance, and ultimately make your workflow more efficient.

To make the most of Subtree Processing Mode, we’ve provided sample files to help you see the feature in action. Don’t hesitate to reach out to our support team if you have any questions or need assistance in configuring your pipeline for this exciting new capability.

We’d love to hear your feedback on this feature or any ideas you may have for future improvements, so please share your thoughts in the comments section below. Your input is super important in helping us make our solutions even better for you. Thank you for your continued support and collaboration, and to make sure you never miss a new feature sign up to our newsletter.

Introducing Subtree Processing Mode for Greater Flexibility

Benefits of using Subtree Processing Mode: