Ignoring Duplicate Elements

Problem

You want to select all nodes that are unique in a given context based on uniqueness criteria.

Solution

Selecting unique nodes is a common application of the preceding and preceding-sibling axes. If the elements you select are not all siblings, then use preceding. The following code produces a unique list of products from SalesBySalesperson.xml:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
<products>
     <xsl:for-each select="//product[not(@sku=preceding::product/@sku)]">
          <xsl:copy-of select="."/>
     </xsl:for-each>
</products>
</xsl:template>     
   
</xsl:stylesheet>
If the elements are all siblings then use preceding-sibling. 
<products>
     <product sku="10000" totalSales="10000.00"/>
     <product sku="10000" totalSales="990000.00"/>
     <product sku="10000" totalSales="1110000.00"/>
     <product sku="20000" totalSales="50000.00"/>
     <product sku="20000" totalSales="150000.00"/>
     <product sku="20000" totalSales="150000.00"/>
     <product sku="25000" totalSales="920000.00"/>
     <product sku="25000" totalSales="2920000.00"/>
     <product sku="30000" totalSales="5500.00"/>
     <product sku="30000" totalSales="115500.00"/>
     <product sku="70000" totalSales="10000.00"/>
</products>
   
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
<products>
     <xsl:for-each select="product[not(@sku=preceding-sibling::product/@sku)]">
          <xsl:copy-of select="."/>
     </xsl:for-each>
</products>
</xsl:template>     
   
</xsl:stylesheet>

To avoid preceding, which can be inefficient, travel up to the ancestors that are siblings, and then use preceding-sibling and travel down to the nodes you want to test:

<xsl:for-each select="//product[not(@sku=../preceding-sibling::*/product/@sku)]">
     <xsl:copy-of select="."/>
</xsl:for-each>

If you are certain that the elements are sorted so that duplicate nodes are adjacent (as in the earlier products), then you only have to consider the immediately preceding sibling:

<xsl:for-each 
     select="/salesperson/product[not(@name=preceding-sibling::product[1]/@name]">
     <!-- do something with each uniquiely named product -->
</xsl:for-each>

Discussion

In XSLT Version 2.0 (or Version 1.0 in conjunction with the node-set( ) extension function), you can also do the following:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
     
<xsl:variable name="products">
     <xsl:for-each select="//product">
          <xsl:sort select="@sku"/>
          <xsl:copy-of select="."/>
     </xsl:for-each>
</xsl:variable>
     
<products>
     <xsl:for-each select="$products/product">
          <xsl:variable name="pos" select="position(  )"/>
          <xsl:if test="$pos = 1 or 
          not(@sku = $products/preceding-sibling::product[1]/@sku">
               <xsl:copy-of select="."/>
          </xsl:if>
     </xsl:for-each>
</products>
     
</xsl:template>

However, I have never found this technique to be faster than using the preceding axis. This technique does have an advantage in situations where the duplicate testing is not trivial. For example, consider a case where duplicates are determined by the concatenation of two attributes.

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
     
<xsl:variable name="people">
     <xsl:for-each select="//person">
          <xsl:sort select="concat(@lastname,@firstname)"/>
          <xsl:copy-of select="."/>
     </xsl:for-each>
</xsl:variable>
     
<products>
     <xsl:for-each select="$people/person">
          <xsl:variable name="pos" select="position(  )"/>
          <xsl:if test="$pos = 1 or 
               concat(@lastname,@firstname) != 
                          concat(people/person[$pos - 1]/@lastname,
                                 people/person[$pos - 1]/@firstname)">
               <xsl:copy-of select="."/>
          </xsl:if>
     </xsl:for-each>
</products>
     
</xsl:template>

When you attempt to remove duplicates, the following examples do not work:

<xsl:template match="/">
<products>
     <xsl:for-each select="//product[not(@sku=preceding::product[1]/@sku)]">
          <xsl:sort select="@sku"/>
          <xsl:copy-of select="."/>
     </xsl:for-each>
</products>
</xsl:template>

Do not sort to avoid considering all but the immediately preceding element. The axis is relative to the node’s original order in the document. The same situation applies when using preceding-sibling. The following code is also sure to fail:

<xsl:template match="/">
     
<xsl:variable name="products">
     <xsl:for-each select="//product">
     <!— sort removed from here —>
          <xsl:copy-of select="."/>
     </xsl:for-each>
</xsl:variable>
     
<products>
     <xsl:for-each select="$products/product">
          <xsl:sort select="@sku"/>
          <xsl:variable name="pos" select="position(  )"/>
          <xsl:if test="$pos = 1 or 
               @sku != $products/product[$pos - 1]/@sku">
               <xsl:copy-of select="."/>
          </xsl:if>
     </xsl:for-each>
</products>
</xsl:template>

This code fails because position( ) returns the position after sorting, but the contents of $products has not been sorted; instead, an inaccessible copy of it was.

See Also

The XSLT FAQ (http://www.dpawson.co.uk/xsl/sect2/N2696.html) describes a solution that uses keys and describes solutions to related problems.

Get XSLT Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.