Comparing XQuery with DeltaXML Core
Typically, the DeltaXML Core product is used to compare XML content, but have you ever considered using the Core product to compare non-XML documents? In this blog post, I experiment with using Core to compare (non-XML) XQuery code.
The objective here is to use a DXP Pipeline Configuration file to define a pipeline for the Core comparator. Two simple XSLT filters are to be used, one for input, the other for output. The input filter converts XQuery to XML with one element for each token, it exploits an imported tokenizer function for XQuery – from the open source XMLSpectrum project (for which I’m currently the sole contributor).
A high-level view of the XQuery pipeline I developed is shown below:
A view of the configured pipeline
Running the XQuery comparison
For this experiment I’m using the Java version of Core and invoking the comparison from an Ant build file. Within this, the run target invokes the DeltaXML command.jar with 5 command-line arguments:
compare
The Core method to invokexquery
The id attribute of the Pipeline Configuration fileinput-file1.xml
Input XML file 1 – holds the URI of the 1st XQuery file to compareinput-file2.xml
Input XML file 2 – holds the URI of the 2nd XQuery file to compareresult.html
The destination file
<project name="compare-xquery" default="run" basedir=".">
<target name="run">
<java jar="../../command.jar" fork="yes" failonerror="yes">
<arg value="compare"/>
<arg value="xquery"/>
<arg value="input-file1.xml"/>
<arg value="input-file2.xml"/>
<arg value="result.html"/>
</java>
</target>
<target name="clean">
<delete>
<fileset dir="." includes="result.html"/>
</delete>
</target>
</project>
The Ant build file: build.xml
The Comparison Result
Before looking at how the comparison pipeline is defined, let’s first have a look at the result of an XQuery comparison performed on 2 small test files. Each file defines the same XQuery function, but three minor changes were made to the second file. The HTML output from the comparison of these files is rendered below:
declare function display:print-modules($local as xs:boolean) as element()+ {
(
<div class="homehomeyyy">
{
if (fn:exists(fn:collection($display:XQDOC_COLLECTION)/xq:xqdocxq:xqdocxq:/modulexq:module[@type="library"])) then
(
<h4>Library ModulesLibrary Module</h4>,
<br/>,
<br/>,
for $x in fn:collection($display:XQDOC_COLLECTION)/xq:xqdoc[xq:module/@type="library"]
order by $x/xq:module/xq:uri
return
(
display:build-link("get-module",
$local,
(fn:string($x/xq:module/xq:uri)),
display:decode-uri(fn:string($x/xq:module/xq:uri))
),
<br/>
)
)
else
()
}
</div>
)
};
Syntax highlighted result with differences
The output, as shown above, is HTML that renders a syntax-highlighted version of the result of comparing the two XQuery files, with the background color indicating changes – deletions are in red and additions in green. A couple of things can be observed from this: 1) the granularity for the changes is at the ‘token’ level, and 2) the tokens are syntax-highlighted as they would have been in the two input XQuery files.
CSS used to style the HTML is also generated by the output filter. The HTML produced has class
attributes that allow the CSS to be used to render the background and foreground colors as required. To illustrate this, here’s a small part of the rendered HTML:
/xq:xqdocxq:xqdocxq:/modulexq:module
Extracted part of the HTML output
And here is the HTML code used to render the above:
<pre>
<span class="step">/</span>
<span class="partA qname">xq:xqdoc</span>
<span class="partB qname">xq:xqdocxq:</span>
<span class="partA step">/</span>
<span class="partB qname">module</span>
<span class="partA qname">xq:module</span>
</pre>
HTML code with class attributes used for CSS styling
Note: The change in the XQuery for this extract was just the deletion of the ‘step’ operator, this change rendered the XQuery invalid because we’re left with an invalid QName ‘xq:docxq:module’; unsurprisingly, XMLSpectrum doesn’t do too well tokenizing XQuery that won’t compile – hence the unexpected output where the invalid QName is split into two.
Every span
element in the HTML source represents an XQuery token, each span element has a class attribute that holds upto 2 space-separated values:
- Token Type – always present, the type of XQuery token, for example step is used to denote an XQuery
step
operator. - Part Identifier – posible values: partA or partB, indicates the A or B origin of the token, only present when no match is found for the token in the other file.
Now we’ve previewed the output, its time to look at the pipeline configuration and filters used to help produce this:
The Pipeline Configuration
The Pipeline Configuration file, referenced in the Ant file using its ‘xquery’ id
attribute, is used to declare the input and output filters for the comparison, in this case there is just one input filter and one output filter:
<!DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd"> <!-- $Id
<!-- nbsp; -->
<comparatorPipeline description="compare xquery" id="xquery">
<inputFilters>
<filter>
<file path="xquery2xml.xsl" relBase="dxp"/>
</filter>
</inputFilters>
<outputFilters>
<filter>
<file path="xquery-tokens2html.xsl" relBase="dxp"/>
</filter>
</outputFilters>
<outputProperties>
<property name="indent" literalValue="no"/>
</outputProperties>
<comparatorFeatures>
<feature name="https://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
<feature name="https://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
</comparatorFeatures>
</comparatorPipeline>
The Core Pipeline Configuration file: compare-xquery.xml
The Input Filter
The Input Filter first fetches the XQuery file content as a string by invoking the unparsed-text
XPath function, for this, it uses the URI contained within the input XML file. The result string is then passed as an argument to the xqf:show-xquery
function (imported from XMLSpectrum), this returns a sequence of span
elements which are then wrapped in a pre
element. This pre
element is then output as the principle result from the filter.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xqf="urn:xq.internal-function"
xmlns:f="internal"
version="2.0"
exclude-result-prefixes="f xs xqf">
<xsl:import href="xmlspectrum-xsl/xq-spectrum.xsl"/>
<!-- Input XML is a single 'text-file' element containing the file URI: eg.
<text-file>
xqdoc-display1.xqy
</text-file>
-->
<xsl:template match="/">
<xsl:variable name="text-file-uri" select="f:path-to-uri(normalize-space(text-file))"/>
<xsl:message>xquery2xml transform on: <xsl:value-of select="$text-file-uri"/></xsl:message>
<xsl:variable name="file-content" as="xs:string" select="unparsed-text($text-file-uri)"/>
<xsl:variable name="tokens" as="element()*" select="xqf:show-xquery($file-content)"/>
<pre xmlns="http://www.w3.org/1999/xhtml">
<xsl:sequence select="$tokens"/>
</pre>
</xsl:template>
<xsl:function name="f:path-to-uri">
<xsl:param name="path"/>
<xsl:choose>
<xsl:when test="matches($path, '^[A-Za-z]:.*')">
<xsl:value-of select="concat('file:/', $path)"/>
</xsl:when>
<xsl:when test="starts-with($path, '/')">
<xsl:value-of select="concat('file://', $path)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$path"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>
The Input Filter: xquery2xml.xsl
The Output Filter
The output from Core conforms to the deltaV2 format. From the point of view of our XSLT filter, the most significant part of this is the deltaxml:deltaV2
attribute which indicates the origin of the contents contained within the associated element. The filter must also take into account additional elements in the deltaxml namespaces used to represent changes within a single element and changes to attributes. The XSLT for handling the deltaV2 format is relatively straightforwards, requiring just a handful of short templates.
The only goal remaining for the XSLT is to enclose the pre
element within HTML content to make a valid HTML document, and also generated the CSS file (linked to from the HTML) by making a call to f:get-css
which uses the color-theme
parameter to generate the appropriate CSS for rendering.
<xsl:stylesheet
version="2.0"
xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:dxa="https://www.deltaxml.com/ns/non-namespaced-attribute"
xmlns:xhtml="https://www.w3.org/1999/xhtml"
xmlns:xs="https://www.w3.org/2001/XMLSchema"
xmlns:deltaxml="https://www.deltaxml.com/ns/well-formed-delta-v1"
xmlns:f="internal"
xmlns:dxx="https://www.deltaxml.com/ns/xml-namespaced-attribute"
exclude-result-prefixes="xs deltaxml dxa dxx f">
<xsl:import href="xmlspectrum-xsl/highlight-file.xsl"/>
<xsl:output method="html"/>
<xsl:param name="title" select="'HTML Result'"/>
<xsl:param name="color-theme" select="'pg-light'"/>
<xsl:variable name="css-name" select="'theme.css'"/>
<xsl:template match="/">
<html>
<head>
<title>
<xsl:value-of select="$title"/>
</title>
<!-- for dark background style:
<style type="text/css">
span.partA {background-color:#501010} span.partB {background-color:#105010}
</style>
-->
<style type="text/css">
span.partA {background-color:#ffdada} span.partB {background-color:#daffda}
</style>
<link rel="stylesheet" type="text/css" href="{$css-name}"/>
<xsl:if test="$font-name eq 'scp' and $css-inline eq 'yes'">
<style>
@import url(https://fonts.googleapis.com/css?family=Source+Code+Pro);
</style>
</xsl:if>
</head>
<body>
<div>
<pre class="spectrum">
<xsl:apply-templates select="xhtml:pre/*"/>
</pre>
</div>
</body>
</html>
<xsl:result-document href="{$css-name}" method="text" indent="no">
<xsl:sequence select="f:get-css()"/>
</xsl:result-document>
</xsl:template>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="xhtml:span[not(contains(@deltaxml:deltaV2, '!='))]">
<span>
<xsl:apply-templates select="@* | node()"/>
</span>
</xsl:template>
<xsl:template match="xhtml:span[@deltaxml:deltaV2 = ('A','B')]">
<xsl:variable name="part" select="if (@deltaxml:deltaV2 eq 'A') then 'partA' else 'partB'"/>
<span class="{$part, @class}">
<xsl:value-of select="."/>
</span>
</xsl:template>
<xsl:template match="xhtml:span[contains(@deltaxml:deltaV2, '!=')]">
<xsl:apply-templates select="@* | node()" mode="not-equal"/>
</xsl:template>
<xsl:template match="@deltaxml:deltaV2"/>
<xsl:template match="@deltaxml:deltaV2" mode="not-equal"/>
<xsl:template match="@class" mode="not-equal"/>
<xsl:template match="deltaxml:attributes" mode="#default not-equal"/>
<xsl:template match="deltaxml:textGroup" mode="not-equal">
<xsl:apply-templates mode="group"/>
</xsl:template>
<xsl:template match="deltaxml:text" mode="group">
<xsl:variable name="part" select="if (@deltaxml:deltaV2 eq 'A') then 'partA' else 'partB'"/>
<xsl:variable name="class" select="f:get-class(.)"/>
<span class="{$part, $class}">
<xsl:value-of select="."/>
</span>
</xsl:template>
<xsl:template match="text()" mode="not-equal">
<span class="{../@class}">
<xsl:value-of select="."/>
</span>
</xsl:template>
<xsl:function name="f:get-class">
<xsl:param name="text-element" as="element(deltaxml:text)"/>
<xsl:variable name="class" select="$text-element/../../@class"/>
<xsl:variable name="part" select="$text-element/@deltaxml:deltaV2"/>
<xsl:value-of select="if (exists($class)) then
$class
else $text-element/../preceding-sibling::deltaxml:attributes/
dxa:class/deltaxml:attributeValue[@deltaxml:deltaV2 eq $part]"/>
</xsl:function>
</xsl:stylesheet>
The Output Filter: xquery-tokens2html.xsl
Conclusion
I’ve shown here that Core can be used to perform a ‘token by token’ comparison of XQuery and return the result as syntax-highlighted XQuery rendered using HTML and CSS by using 2 very simple XSLT filters and some simple pipeline configuration. The tokenisation of XQuery was handled separately by an XSLT function that was imported by the input filter.
This experiment does show that converting a non-XML language, in this case XQuery, into XML to achieve a more semantic/intelligent display of changes is not too difficult. These initial results show that it is worth the effort. Because the resultant differences are represented in XML it would also be possible to generate reports on changes rather than just display a red-lined document. Such reports could be useful for documentation or audit.
Future Enhancements
As it stands we’ve produced an XQuery code comparator that provides much better granularity than a text-based, line-by-line comparison tool. However, we could enhance functionality considerably with some fairly simple updates to the input and output filters. I hope to look at these enhancements in future blog posts, but here are some that I’ve identified:
- Ignore difference in whitespace tokens used only for XQuery formatting
- Ignore changes to the location of variable and function definitions – provided they remain in scope
- Provide more granular matching (using the ‘word-by-word’ feature) within certain token types, such as literal-text tokens