20060125 Mittwoch Januar 25, 2006

Some thoughts on partial xdiff

In our current project we get data from different sources and need to compare some of those data. We store all the data in xml. But because the data originates from different sources, they may have a different structure and ordering. The elements we need to compare may have different attributes, where only a subset needs to be compared. So two elements are supposed to be equal, if a specified set of attributes or subelements is equal. The two elements need not be exact copies of each others. We've written a small xml diff tool, to solve the structuring issues. On the command line, we specify, which attributes shall be compared. But what about ordering. We couldn't compare elements based on their position in their files. In our case, they may have attributes that do not exist in the other element. So we had to find a way to define the unique identifier, that we can use for sorting, for each element type. We do this by normalizing the xml-data files through an xsl sort script. The script needs to be adopted for every xml document type. The following is an incomplete example how it may look like:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...
    >

<xsl:template match="/">
<xsl:apply-templates select="*"/> </xsl:template> ... <xsl:template match="myElement"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates select="mySubElement"> <xsl:sort select="@id" data-type="text"/> </xsl:apply-templates> </xsl:copy> </xsl:template> <!-- copy all other elements and attributes--> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
These scripts are used to transform both files in a normalized form. In this script we can also select which attributes we need to copy. So the attributes that may differ for the elements may be omitted. Gesendet von jct ( Jan 25 2006, 03:14:50 PM CET ) Permalink Kommentare [0]
20060119 Donnerstag Januar 19, 2006

Hello World!

My first blog entry. In goot tradition it has to be a HelloWorld example! Gesendet von jct ( Jan 19 2006, 05:15:09 PM CET ) Permalink Kommentare [1]