The “Michael Kay / Jeni Tennison / XML Summer School” Top XSLT Performance Tips

Michael Kay led a final afternoon Application Development Workshop at XML Summer School 2010, with Jeni Tennison in the front row.

The delegates were treated to a series of performance improvement tips from two of our leading XSLT practitoners.

I thought they were worth sharing more widely.

1. Use Keys

– No performance problem has ever been solved by NOT using keys.
– Learn to use and key()

2. Don’t use preceding:: when you mean preceding-sibling::

– And forget an index position i.e. preceding-sibling::p[1]

3. <xsl:variable/> – select vs value-of

– If you use <xsl:variable name-=”a”><xsl:value-of select=”node-a”/></xsl:variable> then stop. This is a very expensive unnecessary node creation operation.
– Use <xsl:variable name-=”a”  select=”node-a”/> instead.

4. Arithemetic

– In XSLT 2.0 think about using DOUBLE arithmetic, instead of DECIMAL arithmetic, especially on joins.

5. On the fly spreadsheets

– If generating on-the-fly stylesheets, e.g. in XProc pipelines, consider compile-time performance issues, which in other situations are probably not an issue.

6. Small Changes x Multiple Iterations = Poor Performance

– XSLT can have poor performance in a situation where multiple transforms, or iterations of the same transform, are changing small parts of a large source document. The overhead of multiple copying of large quantities of unchanged nodes may mean it would be preferable to choose a different technology. XQuery Update might be more suitable.

7. Profiling in XSLT (1)

– XSLT profiling is available in Saxon and is used by tools like Oxygen.

8. Profiling in XSLT (2)

– Subtractive Measurement can be used in profiling. If you are concerned about a particular part of your transform, you can measure the time cost by removing the operation and measuring again.

9. Benefits of typing

– Where possible you should use data typing in your schemas. It speeds up validation no end.

10.  minOccurs/maxOccurs

– If you use non-zero values in minOccurs/maxOccurs in your schemas, the larger the values then the slower the validation will be, because the parser will need to count the number of elements.
– So its much much quicker to validate minOccurs=1 than it is to validate minOccurs=101.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: