This chapter will take you into the mysterious and sometimes puzzling world of database indexes. As soon as your dataset starts growing and performance starts degrading as a result, indexes become a necessity.
Just for a moment let’s imagine that there are no indexes. This would
mean that every XPath request must be resolved by brute force. So, for a
//line[@author eq "erik"], the
full document(s) node tree(s) must be traversed to try to find
line elements with an
author attribute that matches the value
erik. You can probably see that on a large dataset
this could be an intensive, and ultimately a slow, operation. If you further
imagine running many of these queries on demand by your users in parallel,
things can only get worse!
Of course, indexes come with a cost of their own: when XML documents are created or updated, the corresponding indexes must be updated too. However, this is generally not a problem. For most (but not all) applications, updating is a much rarer event than querying, and the short time lags created by updating the indexes go unnoticed.
Large databases, XML or otherwise, rarely scale well without indexes. Performance degradation as the dataset grows could be linear, or often worse. Therefore, defining and tuning indexes is well worth the effort, and often a necessity.
Besides the indexes mentioned here and in Chapter 12, there is also an index that supports explicit ordering, known as the sort index. Since this works differently ...