Chapter 4. Hierarchies in SQL

Hierarchical structures have a sort of nondeterministic nature in that the exact structure is determined only when you populate the hierarchy with data. This makes them appealing for various sorts of applications. An employment hierarchy is a classical example of such a structure. A company will have employees. Supervisors will be assigned to lead groups of employees. Those supervisors, in turn, will report to managers. Low-level managers will report to higher-level managers. Eventually, you get to the top where you often find a chief executive officer (CEO). If you sketch it out, the typical company organization chart will look like an upside down tree, with some branches in the tree being wider and narrower than others. Hierarchical structures are widely used in procedural languages, such as C, but are rather underutilized in SQL, because they conflict with the inherent table-shaped data layout of relational databases.

Almost any type of complex application can make use of a hierarchical model to some degree or another. The recipes in this chapter show you how to manipulate hierarchical data using Transact-SQL. This chapter will discuss three major topics:

  • Specialized hierarchies

  • General hierarchies

  • Efficiency extensions

Some vendors, Oracle Corporation being among the most notable, have extended their SQL syntax with additional functionality to support querying hierarchical data. Unfortunately, neither Sybase nor Microsoft have chosen to implement such extensions in Transact-SQL. Although the Transact-SQL language hasn’t been designed to support dynamic structures such as trees, programmers have come up with some reasonably efficient design patterns that address the common operations that you need to perform on hierarchical data. As an added bonus, the lack of vendor-specific functionality tends to make these solutions more easily portable across database platforms.

Types of Hierarchies

There are two types of hierarchies that we recognize in this chapter: specialized hierarchies and general hierarchies. In a specialized hierarchy, you are limited in the maximum depth of the hierarchy. As you’ll see in the recipes, SQL is quite unique since you can implement proper general hierarchies with unlimited depth. In a general hierarchy, you cannot define a maximum depth.

In the first part of this chapter, we are going to show you how specialized hierarchies can be implemented in a SQL Server database. We’ll also show you how you can make use of their special property. Our recipe solutions are based on a real-world example of a hierarchical structure used to manage permissions in an online trading system.

Next, our discussion will turn to general hierarchies. Because general hierarchies allow nodes to be nested to any depth, manipulating them requires using generalized solutions. Often, these solutions involve recursion. We’ll begin by showing you what you can do using a pure hierarchical model, and we’ll show both recursive and nonrecursive algorithms. Then, we’ll show you how you can improve the efficiency of a hierarchical model by adding what we term service information to it.

Before going further into the chapter, we’d like to review some of the terminology that we use when we discuss hierarchies. Figure 4-1 illustrates a hierarchical structure, and many of the elements are labeled with the terms that we use to refer to them.

A hierarchical structure
Figure 4-1. A hierarchical structure

The following list contains our definitions for the terms shown in Figure 4-1, as well as for some other terms that we use in this chapter:

Hierarchy

Refers, generally, to a hierarchical structure. For example, employee relationships in a business represent a hierarchy.

Tree

Refers to a specific manifestation of a hierarchical structure. A specific organizational chart would be an example of a tree. As Figure 4-1 shows, hierarchically structured data may consist of more than one tree.

Subtree

Refers to a tree that is known to not start at the root node. The term branch is sometimes used as a synonym for subtree.

Node

Refers to an element in a tree.

Leaf node

Refers to an element in a tree that does not have any child nodes. A leaf node, sometimes just referred to as a leaf, is a special case of the more generic node.

Root

Refers to the node having no parents. Any given tree will have exactly one root node.

Parent

Refers to a node that has other nodes (child nodes) underneath it.

Child

Refers to a node that falls underneath another node (a parent node).

Vertex

Is a synonym for node.

In a hierarchy, child nodes are subordinate to, and belong to, their parents. If you delete a node, or move a node, that operation affects not only the node on which you are operating, but also all of its children. For example, if you delete node 2, as shown in Figure 4-1, you are also implicitly deleting nodes 3 and 4.

Specialized Hierarchies

Specialized hierarchies can be found in almost any real-world application. However, they aren’t always manifested in an obvious way, and programmers often don’t even realize that what they are actually doing is programming a hierarchy. When programming for a specialized hierarchy, you can take advantage of limitations such as a limited number of nodes, a limited number of levels, the fact that only leaves hold data, or some other specialized aspect of the hierarchy in question.

As a basis for the recipes in this chapter, we’ve used a permission-hierarchy example that is taken from an online trading system that is actually in production use. We’ve simplified our example somewhat for demonstration purposes. The purpose of the mechanism is to implement support for an order-routing system in which traders have different permissions relating to trades that they can make. An automatic system processes orders according to given permission rules. If a trader sends an order through a system for which he is not authorized, the system rejects the order before submitting it to the market. The following data is an example of some of the rules that the system must enforce:

Trader   Type    Limit
---------------------------
Alex     Shares    100
Alex     Futures    10
Alex     Bond       10
Cindy    Shares   1000
Cindy    Bonds    1000
Michael  Shares    200
...

As you can see, there are several types of permissions granted to each trader. Alex, for example, can trade shares of stock in blocks of no more than 100 shares. Cindy, on the other hand, is allowed to trade in blocks of up to 1,000 shares. If you were designing a database structure to hold rules like these, the simplistic approach would be simply to define permissions for each specific trader using the system. However, it’s likely to be more practical to group those permissions so that supervisors can easily grant a set of permissions to a particular trader. For example, you could have permission groups for commodity traders, future traders, and spot market traders. The resulting hierarchy would be the limited hierarchy shown in Figure 4-2.

The limited-trading permission hierarchy
Figure 4-2. The limited-trading permission hierarchy

Some might look at this scheme and see two distinct entities: groups and permissions. That’s probably a valid view, but we would argue that, at a higher level of abstraction, this structure represents a limited hierarchy of only two levels. The highest level is the group level, and the level below that is the permission level. When writing code to query and manage this hierarchy, you can take advantage of the fact that it’s limited to only two levels of depth. You can use solutions that are specific for that case that are more efficient than those that you would need to navigate a more generalized and unrestricted hierarchy. The recipes we have in this chapter highlight the considerations that you need to take into account, as well as the efficiencies that you gain, when working with a specialized hierarchy such as that shown in Figure 4-2.

General Hierarchies

The standard model for representing general hierarchies is to set up a parent/child relationship. The implementation in Transact-SQL, however, is different from tree-structure implementations in other programming languages. Hierarchical structures implemented in Transact-SQL have two characteristics that distinguish them from hierarchical structures implemented in a programming language such as C:

  • A lack of pointers

  • An unlimited number of children

Programmers using C, or other procedural languages, to implement hierarchies usually deal with fixed-size structures in which the parent and child nodes are linked with pointers. Moving from one node to another is fairly inexpensive in terms of the resources that are consumed, because such a move involves only the swapping of a couple of pointers in memory. With SQL, on the other hand, a move between nodes usually involves disk I/O, because each node ends up being a row in a table. Disk I/O is expensive in terms of the time and resources consumed, so SQL programmers need to be very careful about the number of visits they make to each node. An advantage of SQL, though, is that it’s easy to allow for an unlimited number of children. Unlike the fixed structures often used in procedural languages, database tables have no arbitrary limit on the number of rows they may contain.

The design of a hierarchy is a good example of a case in which how you model your data is much more important than the actual algorithms that you use to manipulate your data. The following CREATE TABLE statement shows the structure most often used to represent general hierarchies in a relational database:

CREATE TABLE Hierarchy(
   VertexId INTEGER,
   Parent INTEGER,
   ...
   PRIMARY KEY(VertexId)
)

In this structure, each row in the Hierarchy table represents one node. The VertexId field contains a unique address for each node. Each node then points back to its parent by storing the parent’s VertexId value in the Parent field. Using such a structure, you can create an unlimited number of children for each parent.

SQL programmers have been using this type of structure for some time, and it looks great from a design point of view. However, SQL has strangely weak support for hierarchies. If you aren’t careful, the code you write to manipulate a hierarchical structure, such as the one shown here, can be inefficient and lead to a host of performance problems.

We have several recipes in this chapter that show reasonably efficient solutions for common operations that you’ll need to perform on generalized hierarchical data. These solutions are based on the example of a project-tracking database that you manage. In this project-tracking database, you store data about projects and their subprojects. Any given project can be broken up into an arbitrary number of subprojects for planning purposes. Those subprojects can be further broken up into even smaller subprojects. This process can continue for any number of levels of depth. Eventually, you get down to the bottom-level in which the subprojects have no children. A subproject with no children is referred to as a task. Your job is to build SQL queries, procedures, and functions in support of a project-tracking application.

Data about projects and subprojects is stored in the Projects table, which is defined as follows:

CREATE TABLE Projects(
   Name VARCHAR(20),
   Cost INTEGER,
   Parent INTEGER,
   VertexId INTEGER,
   Primary key(VertexId)
)

The first two columns — the Name and Cost columns — represent the actual project and subproject data. In a real application, you would have more information than that, but these two columns will suffice for the purposes of example. The remaining two columns in the table are used to define the hierarchy. The VertexId column contains a unique identifier for each project, subproject, and task. The Parent column in any given row points to the parent project or subproject for that row. Figure 4-3 provides a visual illustration of this hierarchy.

The general project-tracking hierarchy
Figure 4-3. The general project-tracking hierarchy

The scripts named ch04.*.objects.sql (downloadable from this book’s web site) will create and populate all the tables used in this chapter’s recipes. When you run the script, you’ll get the following data in the Projects table:

Name                 Cost        Parent      VertexId    
-------------------- ----------- ----------- ----------- 
New SW               0           0           1
Specifications       0           1           2
Interviews           5           2           3
Drafts               10          2           4
Consolidations       2           2           5
Final document       15          2           6
Presentation         1           6           7
Prototype            0           1           8
UI Design            10          8           9
Calculations         10          8           10
Correctness Testing  3           10          11
Database             10          8           12
Development          30          1           13
UI Implementation    10          13          14
Coding               20          13          15
Initial testing      40          13          16
Beta testing         40          1           17
Final adjustments    5           17          18
Production testing   20          1           19

As you can see, the Parent column is used to maintain the hierarchical structure of the project data. Each child has only one parent, so only one Parent column is needed for each row in the table. Consequently, an unlimited number of children can be created for any given project or subproject. Figure 4-4 shows a graphical view of our example project.

The sample project
Figure 4-4. The sample project

Creating a Permission Hierarchy

Problem

You need to create a system for maintaining a simple permission hierarchy for a trading system. This hierarchy needs to be similar to the one described earlier in Section 4.1.1. Permissions should be manageable in groups so a set of permissions can be granted to a trader easily. In addition, your solution needs to allow for permission types to be added dynamically while the system is in production.

Solution

Define two tables. In the first table, define groups of permissions; in the second table, link those groups to particular traders. Each trader can belong to more than one permission group. The table defining the permission groups can be defined as follows:

CREATE TABLE GroupPermissions(
   GroupId VARCHAR(20) NOT NULL,
   ProductType VARCHAR(10) NOT NULL,
   Status CHAR(1) CHECK(status in ('V','S')) DEFAULT('V'),
   Limit NUMERIC(10,2) NULL, 
   PRIMARY KEY(GroupId,ProductType)
)

In our GroupPermissions table, there are two attributes describing trading permissions for each product group. One attribute is a status attribute. The status of each trading permission can be set to valid ('V') or suspended ('S'). The second attribute is an order limit that defines the maximum permitted size for a trade involving the product group. If a trader is authorized to trade bonds and has a trading limit of 1,000, that trader is not allowed to initiate a trade involving more than 1,000 bonds. The following is a representative sample of the type of data that you could store in this table:

GroupId              ProductType Status Limit        
-------------------- ----------- ------ ------------ 
Debt                 Bill        V      10000.00
Debt                 Bond        V      10000.00
Derivatives          Future      V      200.00
Derivatives          Option      V      100.00
Equities             Share       V      1000.00

The second table needs to keep track of the permission groups assigned to each trader. Its definition can be as follows:

CREATE TABLE GroupMembership(
   AccountId VARCHAR(20) NOT NULL,
   GroupId VARCHAR(20) NOT NULL
   PRIMARY KEY(AccountId,GroupId)
)

This table simply links accounts with groups. Whenever a trader is assigned to a new group or removed from an existing group, the table must be updated to reflect that fact. The following sample shows the group assignments for three traders:

AccountId            GroupId              
-------------------- -------------------- 
Alex0001             Debt
Alex0001             Derivatives
Alex0001             Equities
Betty0002            Derivatives
Charles0003          Debt

You can see from this example that a single trader can belong to multiple groups. In this case, Alex0001 belongs to three groups: Debt, Derivatives, and Equities. In the next section, we’ll describe one way to handle cases where a trader’s permissions from one group conflict with his permissions from another group.

Discussion

The main goal of our solution is to separate the definition of permissions and groups from the assignment of groups to traders. Please note that the GroupPermissions table is not fully normalized. To normalize it, you would have to separate the definition of groups from the definition of products and then use an intersection table to define the Limit and Status attributes. The lack of normalization here simplifies the example somewhat, but it doesn’t affect the overall behavior of our design nor does it affect the queries that you’ll see in this section.

Tip

The goal of normalization is to provide specific benefits to database design — reduced redundancy being the foremost benefit. However, there are many situations in which reduced redundancy is less valuable than the performance gains you get from a denormalized structure.

Checking permissions for a trader

Queries regarding trading authorizations can easily be performed on the model that we’ve designed. For example, to list the trading permissions for the account Alex0001, all you need is the following simple join:

SELECT m.AccountId, g.ProductType, MIN(g.Limit) Limit
FROM GroupMembership m JOIN GroupPermissions g
   ON m.groupId=g.groupId
WHERE  Status='V' AND AccountId='Alex0001'
GROUP BY m.AccountId, g.ProductType

AccountId            ProductType Limit        
-------------------- ----------- ------------ 
Alex0001             Bill        10000.00
Alex0001             Bond        10000.00
Alex0001             Future      200.00
Alex0001             Option      100.00
Alex0001             Share       1000.00

While the results of this query are straightforward, the use of the GROUP BY clause together with the aggregate function MIN deserves some additional explanation. The grouping allows us to deal with cases where more than one of a trader’s permission groups defines a limit for the same product. In our example data, both the Debt and Equities groups define a trading limit for bonds. Alex0001 is a member of both groups, so an ungrouped query would return the following two records:

AccountId            ProductType Limit        
-------------------- ----------- ------------ 
Alex0001             Bond        10000.00
Alex0001             Bond        2000.00

With two limits for the same product, the question is which limit takes precedence. In our example query, we used the MIN function for the lower limit to take precedence over the higher limit. If you wanted the higher limit to take precedence, you would make that happen simply by using MAX instead of MIN. The GROUP BY clause is a requirement when using aggregate functions in this manner and ensures that only one permission is ultimately returned for each product that the trader is authorized to trade.

Revoking permissions for a group

The Status column in our design allows you to quickly and efficiently revoke a specific permission from a group or to suspend all permissions for a given group. For example, to suspend all trading permissions for the Debt group, you could execute the following statement:

UPDATE GroupPermissions SET Status='S' WHERE GroupId='Debt'

The previously shown query for checking permissions only looks at valid permissions (Status = 'V'); those you’ve just suspended (Status = 'S') will automatically be excluded. Their definitions, however, will still exist in the database, so you can easily enable them again when you want to do so.

The solution shown in this recipe is a general solution and can be used for almost any type of permission hierarchy with a limited number of levels. It can be used for security hierarchies where different levels of security authorization are defined, for storing information regarding types of users and their authorizations in a web-based authorization system, or for any other type of permission hierarchy where a large number of users or accounts is used.

Changing Individual Permissions

Problem

You’ve implemented the permission solution described in the previous section, and you are happy with the ability to assign permissions to groups of users, but you also want the ability to grant specific exceptions directly to specific traders.

Solution

The solution is twofold. First, create an additional table to record exceptional permissions granted directly to a trader. Second, modify the query that you use to obtain the permissions that are currently valid for a given trader. The additional table could look like the following AccountPermissions table:

CREATE TABLE AccountPermissions(
   AccountId VARCHAR(20) NOT NULL,
   ProductType VARCHAR(20) NOT NULL,
   Status CHAR(1) CHECK(Status in ('V','S')) DEFAULT('V'), 
   Limit NUMERIC(10,2) NULL,
   PRIMARY KEY(AccountId, ProductType)
)

In this table, insert records to record all the permissions that you want to assign to traders that are in addition to the permissions those users inherit from their groups. For example, suppose that Alex0001 had a share limit of 1,000 because he belonged to the equities group. Further suppose that you wanted to make Alex0001 an exception to the rule for that group and raise his share limit to 5,000. To do this, insert the following row into the AccountPermissions table:

AccountId            ProductType          Status Limit        
-------------------- -------------------- ------ ------------ 
Alex0001             Share                V      5000.00

With the AccountPermissions table in place, you need to modify the query that you use to retrieve permissions for a given user. That query now needs to take into account this new table. The query in the following example, which returns permissions for Alex0001, will do that:

SELECT m.AccountId, g.ProductType, 
   CASE WHEN isnull(MIN(a.Limit),0) > MIN(g.Limit) 
      THEN MIN(a.Limit) 
      ELSE MIN(g.Limit) 
   END Limit
FROM GroupMembership m JOIN GroupPermissions g 
      ON (m.GroupId=g.GroupId)
   LEFT OUTER JOIN AccountPermissions a 
      ON (m.AccountId=a.AccountId AND g.ProductType=a.ProductType)
WHERE  m.AccountId='Alex0001' 
GROUP BY m.AccountId, g.ProductType
HAVING MIN(g.status)='V' AND isnull(MIN(a.status),MIN(g.status))='V'

AccountId            ProductType Limit        
-------------------- ----------- ------------ 
Alex0001             Bill        10000.00
Alex0001             Bond        2000.00
Alex0001             Future      200.00
Alex0001             Option      100.00
Alex0001             Share       5000.00

Note the emphasized line in the output. That line represents the permission to trade 5,000 shares that came from the AccountPermissions table.

Discussion

The idea behind the permissions model described in this chapter is to have permissions set through group memberships whenever possible. However, in practice, you’ll often find that it is useful to have a tuning facility of sorts that allows you to directly change a specific permission for a specific user. To that end, we created the additional table named AccountPermissions in which such exceptions are recorded. Our new query joins the GroupPermissions table with the GroupMembership table to return permissions set at the group level. Those results are then joined with the AccountPermissions table, which adds account-specific exceptions to the result set. An outer join is used for this third table to make the account-specific exceptions optional.

The GROUP BY clause is again used in this new version of the query, for the same reason as before — we only want one permission to be returned for each product type. There is one difference, though: this time, the GROUP BY function includes more columns in its column list:

GROUP BY m.AccountId, g.ProductType

The CASE statement that you see in the query is used to decide which value to take if both individual and group permissions for the same account and product are present. It checks both values and reports just one:

CASE WHEN isnull(MIN(a.Limit),0) > MIN(g.Limit) 
      THEN MIN(a.Limit) 
      ELSE MIN(g.Limit) 
   END Limit

In our case, our authorization policy is that account-specific permissions only take precedence over permissions granted at the group level when the account-specific limit is greater than the group-specific limit. The isnull( ) function takes care of cases where individual permissions are not set. It does this by supplying a zero value for those cases. Using a CASE statement like this is a very flexible approach, because you can easily implement different authorization policies. It would be trivial, for example, to change the CASE statement such that account-specific permissions always took precedence over group-level permissions, regardless of whether the account-specific permission specified a higher or lower limit.

Unlike the query shown previously in this chapter in which the Status flag was checked in the WHERE clause, this query takes the different approach of checking the Status flag in the HAVING clause. In fact, the query checks to be sure that both flags — for the group-level permission and for the account-specific permission — are set:

HAVING MIN(g.status)='V' AND isnull(MIN(a.status),MIN(g.status))='V'

In this case, if only one of the two applicable flags is set to 'S', the permission is revoked.

The solution shown in this recipe is very useful when you need to make exceptions for existing permissions set at the group level. However, it has one significant problem: you cannot define an account-specific permission for a trader in cases where a permission for the same product has not also been granted at the group level.

Adding New Individual Permissions

Problem

The previous recipe’s solution allows you to define individual exceptions to permissions that a trader already has at the group level. However, you wish to grant permissions to specific traders without regard to whether permissions for the same products have been granted at the group level.

Solution

In the previous recipe, you could define an exception to a permission that a trader had already been granted at the group level. For example, you could create the following two rows in AccountPermissions:

AccountId            ProductType          Status Limit        
-------------------- -------------------- ------ ------------ 
Alex0001             Share                V      5000.00
Betty0002            Share                V      8000.00

The query in the previous recipe, however, will only respect these account-specific permissions in cases where the traders in question have also been granted permission to trade shares at the group level. This limitation, if you wish to call it that, comes about as a result of the three-way join used in the query.

You may wish account-specific permissions to take effect all the time. To do that, you can use the same data model as shown in the previous recipe, but with a different query. The query shown in the following example is a union query and correctly returns both group-level and account-specific permissions for Betty0002:

SELECT m.AccountId, g.ProductType, MIN(g.Limit) Limit
FROM GroupMembership m JOIN GroupPermissions g
   ON m.groupId=g.groupId 
WHERE Status='V' AND AccountId='Betty0002' 
   AND NOT EXISTS(SELECT * FROM AccountPermissions  a
      WHERE m.AccountId=a.AccountId AND g.ProductType=a.ProductType) 
GROUP BY m.AccountId, g.ProductType
UNION
SELECT a.AccountId, a.ProductType,a.Limit 
FROM AccountPermissions a
WHERE a.AccountId='Betty0002' AND a.Status='V'

AccountId            ProductType          Limit        
-------------------- -------------------- ------------ 
Betty0002            Share                8000.00
Betty0002            Future               200.00
Betty0002            Option               100.00

As you can see, even though Betty0002 has not been granted permission to trade shares at the group level, this query still picked up the account-level share permissions. The query in the previous recipe won’t do that.

Discussion

The key to this query is that it is a union query. The first query in the union reports all those permissions that are defined only at the group level. The subquery in that query’s WHERE clause ensures that group-level permissions for products are excluded when account-specific permissions exist for those same products. The second query in the union then returns account-specific permissions. The results from the two queries are combined as a result of the UNION clause.

There are two drawbacks to the solution shown in this recipe. One is that this solution is less efficient than the one shown in the previous recipe. This is because there are three SELECT statements involved instead of just one. Another drawback is that this solution is inflexible in terms of which permission to use when permission for the same product is granted at both the group and the account level. In such cases, the account-specific permission always takes precedence over the group-level permission. A way to overcome this latter limitation is to create a more general UNION query that includes both group- and account-level permissions, embed that query in a view, and then manipulate the view with an appropriate query. The following statement creates such a view:

CREATE VIEW Permissions 
AS
SELECT m.AccountId, g.ProductType, MIN(g.Limit) Limit
   FROM GroupMembership m JOIN GroupPermissions g
      ON m.groupId=g.groupId 
   WHERE Status='V' 
   GROUP BY m.AccountId, g.ProductType
UNION
   SELECT a.AccountId, a.ProductType,a.Limit 
   FROM AccountPermissions a
   WHERE  a.Status='V'

This VIEW returns the group permissions expanded for each account and also includes any account-specific permissions that may exist. To list all permissions for a particular account, query the view and apply your interpretation policy in the query. For example:

SELECT ProductType, MIN(Limit) Limit FROM permissions 
WHERE AccountId='Alex0001'
GROUP BY ProductType

ProductType          Limit        
-------------------- ------------ 
Bill                 10000.00
Bond                 2000.00
Future               200.00
Option               100.00
Share                1000.00

This query lists permissions for Alex0001. The MIN function resolves cases where multiple permissions exist for the same product type. When such cases occur, MIN ensures that only the lowest applicable limit is returned. To always return the highest-applicable limit, you could use the MAX function.

The solutions shown in this recipe are flexible, because they allow for easy addition of new levels to the permission structure. All you need is to add UNION clauses to your query or to your view.

Centralizing Authorization Logic

Problem

You wish to centralize the logic for your authorization policy. You want to be able to query for an individual trader’s permissions, but you don’t want that query to contain the logic that handles conflicting permissions and individual exceptions. Your goal is to be able to make reasonable changes to your permissions policies and model without forcing a change to queries already running in production programs.

Solution

Create a view into which you embed an authorization query. The benefit of this is that you can easily change the underlying query of the view without having to change the code in all the programs that query the view. The following view implements the authorization logic shown in the first recipe titled “Creating a Permission Hierarchy”:

CREATE VIEW orderAuthorization 
AS
SELECT AccountId, ProductType, MIN(Limit) Limit
FROM GroupMembership m JOIN GroupPermissions g
   ON m.groupId=g.groupId 
WHERE Status='V'
GROUP BY AccountId, ProductType

Using this view, you can now obtain the permissions for a specific trader by executing a simple query such as the one shown in the following example:

SELECT AccountId, ProductType, Limit
FROM OrderAuthorization
WHERE AccountId = 'Alex0001'

AccountId            ProductType Limit        
-------------------- ----------- ------------ 
Alex0001             Bill        10000.00
Alex0001             Bond        2000.00
Alex0001             Future      200.00
Alex0001             Option      100.00
Alex0001             Share       1000.00

You can also issue a query against this view to check for a specific permission. For example, suppose an order to trade 3,000 shares for the account Alex0001 comes into the system. You can use the simple query shown in the following example to retrieve the maximum limit for this type of order:

SELECT Limit 
FROM orderAuthorization 
WHERE AccountId='Alex0001' AND productType='Share'

Limit        
------------ 
1000.00

By keeping your queries simple and writing them against a view such as this, you can easily modify your permission model and logic. In the event that you make such modifications, the view will insulate your existing queries from the changes.

Discussion

This solution demonstrates a good separation of business logic from the data model. Imagine that these permissions can be any type of permissions for a large amount of accounts. Having the interpretation of the data model embedded within a view is very likely to pay off for you. Changes in real-life systems are frequent, and often unexpected, so it is usually not wise to hand-code the interpretation (embedding the query) of a data model into your application system. With the view-based solution shown in this recipe, you are free to change the underlying permission model and the interpretation policy during production without going through the additional trouble of recompiling the system or changing the code.

Implementing General Hierarchies

Problem

You want to implement a project-tracking system based on a general hierarchy, and you want to run some basic, single-level hierarchical queries against that hierarchy.

Solution

Use the general parent/child model described earlier in the project-tracking example in Section 4.1 as the basis for the project-tracking system. In this model, a project is composed of subprojects or tasks, tasks may be broken into subtasks, and subtasks may themselves be broken up into subtasks. There is no limit to the level of nesting that may occur, and it all occurs within one database table named Projects. Conceptually, a project is simply the highest-level task. The Projects table is defined as follows:

CREATE TABLE Projects(
   Name VARCHAR(20),
   Cost INTEGER,
   Parent INTEGER,
   VertexId INTEGER,
   Primary key(VertexId)
)

With this solution in mind, let’s look at a few problems that are common to hierarchical structures, such as the one used here to define projects:

  • List all leaf nodes

  • List all nodes that are not leaf nodes

  • Find the most expensive vertices

  • List the immediate children of a node

  • Find the root

The next few sections present SQL Server solutions to each of these problems.

List all leaf nodes

Leaf nodes are nodes without children. The subquery in the following SELECT statement checks each node to see if any other nodes claim it as a parent. If none do, then that node is a leaf node:

SELECT Name FROM Projects p 
WHERE NOT EXISTS(
   SELECT * FROM Projects 
   WHERE Parent=p.VertexId)

The result of executing this query is a list of all subprojects (or tasks if you prefer) that have no children:

Name                 
-------------------- 
Interviews
Drafts
Consolidations
Presentation
UI Design
Correctness Testing
Database
UI Implementation
Coding
Initial Testing
Final Adjustments
Production Testing

List all nodes that are not leaf nodes

To list all nodes that are not leaf nodes — in other words, all nodes that have children — begin with the same query used to list leaf nodes and just convert the NOT EXISTS into an EXISTS:

SELECT Name FROM Projects p 
WHERE EXISTS(
   SELECT * FROM Projects 
   WHERE Parent=p.VertexId)

This query lists all nodes with children:

Name                 
-------------------- 
New SW
Specifications
Final Document
Prototype
Calculating
Development
Beta Testing

Find the most expensive vertices

Each task in the project has a cost associated with it. Each task is also a vertex. To find the N most expensive tasks, simply query the table, order the results by cost, and use the TOP operator to limit the results to the top N rows. For example:

SELECT TOP 5 Name, Cost 
FROM Projects 
ORDER BY cost DESC

Name                 Cost        
-------------------- ----------- 
Inital testing       40
Beta testing         40
Development          30
Coding               20
Production testing   20

This is a good example of a case where it is more productive to issue straight SQL queries, rather than attempt to navigate the hierarchy.

Find the immediate children of a node

Given a particular node, you can write a query to return that node’s immediate children. The following query, for example, returns the subprojects of the project named Specifications:

SELECT Name FROM Projects 
WHERE Parent=(
   SELECT VertexId FROM Projects 
   WHERE Name='Specifications' )

Name                 
-------------------- 
Interviews
Drafts
Consolidations
Final document

Only immediate children are returned. The parent record for each of the four records shown here is the record for the “Specifications” task. These records may, themselves, have children, but those children won’t show up in the results for this query.

Find the root

The root is the vertex with no parents. The query to return it is simple, because all you need is to return the one node with no parent pointer. For example:

SELECT Name FROM Projects WHERE Parent=0

Name                 
-------------------- 
New SW

In our example, we use a value of zero to indicate that a node does not have a parent. You could just as easily use a NULL to indicate the same thing, in which case you would specify Parent IS NULL as your WHERE clause condition.

Discussion

As you can see, these queries work on a single level where, at most, they refer to one level above (to parents) or below (to children). For these types of queries, you don’t need any additional algorithms or data structures to retrieve information. The queries are pretty straightforward, and you just have to remember the definition of your hierarchy and its elements to write them.

Traversing Hierarchies Recursively

Problem

You need to step through a hierarchy and print out a report listing all vertices. You want the report to show the hierarchical nature of the data by indenting children under their parent node.

Solution

One solution is to use a recursive algorithm to traverse the tree. To do that, encapsulate the algorithm in a stored procedure. The following code creates a stored procedure named TraversProjectsRecursive. The stored procedure takes one argument, specifying from which node to begin. The procedure then works its way down through that node’s children, grandchildren, and so forth.

CREATE PROCEDURE TraverseProjectsRecursive
@VertexId INTEGER
AS
   /* to change action on each vertex, change these lines */
   DECLARE @Name VARCHAR(20)
   SELECT @Name=(SELECT Name 
                    FROM Projects WHERE VertexId=@VertexId) 
   PRINT SPACE(@@NESTLEVEL*2)+STR(@VertexId)+' '+@Name
   /* ****** */

   DECLARE subprojects CURSOR LOCAL FOR
      SELECT VertexId FROM Projects WHERE Parent=@VertexId      

   OPEN subprojects
      FETCH NEXT FROM subprojects INTO @VertexId
      WHILE @@FETCH_STATUS=0 BEGIN
         EXEC TraverseProjectsRecursive @VertexId
         FETCH NEXT FROM subprojects INTO @VertexId
      END
   CLOSE subprojects
   DEALLOCATE subprojects

When calling the procedure, you need to specify the initial node from which you want the procedure to start traversing. If you want to print out the entire hierarchy, then specify the root node (in our case 1). For example:

 TraverseProjectsRecursive 1
 
 1 New SW
   2 Specifications
     3 Interviews
     4 Drafts
     5 Consolidations
     6 Final document
       7 Presentation
   8 Prototype
     9 UI Design
     10 Calculations
       11 Correctness Testing
     12 Database
   13 Development
     14 UI Implementation
     15 Coding
     16 Inital testing
   17 Beta testing
     18 Final adjustments
   19 Production testing

Discussion

The algorithm used by the TraverseProjectsRecursive procedure is a classical tree-traversal method adapted to the specifics of SQL. You can easily adapt this procedure for use with other hierarchical structures besides the projects structure shown in this chapter. To adapt this procedure, change only the SELECT and PRINT statements.

Tip

When you create a recursive procedure such as the one shown here, MS SQL Server will warn you with a message such as the following: “Cannot add rows to sysdepends for the current stored procedure because it depends on the missing object ‘TraverseProjectsRecursive.’ The stored procedure will still be created.'' Do not worry about this message. Recursive procedures are a special case, and you can safely ignore the warning.

The first SELECT statement in the procedure retrieves the name associated with the vertex that is passed in as a parameter. That name is placed into the @Name variable, which is then used in the subsequent PRINT statement:

PRINT SPACE(@@NESTLEVEL*2)+STR(@VertexId)+' '+@Name

In this PRINT statement, we use the @@NESTLEVEL variable. That variable is maintained automatically by SQL Server and returns the current nesting level. This information is a great advantage when working with stored procedures. In our case, we want to indent every project proportionally to its depth. To do that, we multiply the nesting level by two to indent each child two spaces underneath its parent.

After we print the name of the current vertex, we need to go forward to its children. A vertex can have more than one child, so we define a cursor to select these:

DECLARE subprojects CURSOR LOCAL FOR
   SELECT VertexId FROM Projects WHERE Parent=@VertexId

This cursor definition uses the LOCAL clause, ensuring that the cursor is defined for the current procedure only. By default, cursors are global, which means that any cursor defined is visible from all code executed within the current connection. Since this procedure calls itself repeatedly within the context of one connection, we must specify that this cursor definition be local to each invocation of the procedure.

After opening the subprojects cursor, we simply step through the list of subprojects and recursively invoke the TraverseProjectsRecursive procedure on each one:

FETCH NEXT FROM subprojects INTO @VertexId
   WHILE @@FETCH_STATUS=0 BEGIN
      EXEC TraverseProjectsRecursive @VertexId
      FETCH NEXT FROM subprojects INTO @VertexId
   END

Even though the recursive mechanisms are very elegant, they are not very efficient. Furthermore, in SQL Server they have a limit of only 32 nesting levels. If you query your hierarchy infrequently, then the overhead of recursion may be acceptable, and you need not bother with other mechanisms. However, if you do this type of thing often, some further optimizations might be necessary to give you adequate performance. Sometimes it’s useful to use recursion for prototyping, because recursive solutions are simple to code. Later, you can optimize your code with a nonrecursive solution before you go to production.

Manipulating Hierarchies Recursively

Problem

You want to add and delete vertices and their subtrees. When you hire a new employee or open a new project in your company, you need to add this information into your system. To maintain the hierarchical structure of your employee information, special procedures must be followed.

Solution

To add a vertex to a hierarchy, you simply insert the row into the table with the proper parent pointer. No other action is necessary. If the vertex you just added has children, you insert those children in the same manner. No additional manipulations are needed.

Things become more complex when you need to delete a vertex, because you not only need to delete the row for the vertex, you need to delete any rows that represent children as well. For this purpose, you can use a modified version of the traversing algorithm shown in the previous recipe. The following RemoveProjectsRecursive procedure will delete a specified vertex and all vertices that fall underneath it:

CREATE PROCEDURE RemoveProjectsRecursive
@VertexId INTEGER
AS
   SET NOCOUNT ON
   DECLARE @LocalVertex INTEGER
   SELECT @LocalVertex=@VertexId

   DECLARE subprojects CURSOR LOCAL FOR
      SELECT VertexId FROM Projects WHERE Parent=@VertexId      

   OPEN subprojects
      FETCH NEXT FROM subprojects INTO @LocalVertex
      WHILE @@FETCH_STATUS=0 BEGIN
         EXEC RemoveProjectsRecursive @LocalVertex
         FETCH NEXT FROM subprojects INTO @LocalVertex
      END
    
   CLOSE subprojects
   DEALLOCATE subprojects

   DELETE FROM Projects WHERE VertexId=@VertexId

   PRINT 'Vertex ' +CONVERT(VARCHAR(8),@VertexId) + ' removed!'

To delete a vertex, simply invoke the RemoveProjectsRecursive procedure and pass in the vertex ID as a parameter. In the following example, the procedure is used to delete the vertex for Specifications, which happens to be vertex 2:

RemoveProjectsRecursive 2

Vertex 3 removed!
Vertex 4 removed!
Vertex 5 removed!
Vertex 7 removed!
Vertex 6 removed!
Vertex 2 removed!

Six rows were deleted as a result of deleting the Specifications vertex. Four of the rows are for the four children, one row is for a child of a child, and the sixth row is for the Specifications vertex, itself. This procedure always deletes the parent nodes last to avoid any foreign-key violations.

Discussion

Obviously, adding new members to a hierarchy is fairly easy since a hierarchical structure does not have any limitations on the number of children or levels of hierarchy that you can have. Inserting a new row under a parent is easy.

For deleting, we used the traversing algorithm from the previous recipe. When we adapted that algorithm for deleting a vertex, we were careful to have it remove all children before finally removing a parent node. If a parent is removed before its children, the table would contain nodes with no parents. Such nodes would need to be collected and eliminated using a specially coded garbage-collection mechanism. While such a mechanism is possible, you have the problem of how to differentiate valid parent-free nodes (project roots) from the orphan nodes that remain because you deleted their parents.

The algorithm shown in this recipe is fairly efficient; it removes leaf nodes and parent nodes with the same procedure, and it does not result in significant overhead if you are removing just a leaf vertex.

Aggregating Hierarchies

Problem

You want to aggregate hierarchical data. For example, you wish to sum the cost of a project or task, beginning from a specific vertex and working your way down through all levels of the hierarchy underneath that vertex.

Solution

With respect to the projects example that we have been using for these general hierarchy recipes, you can use the following stored procedure to summarize the costs of all subprojects for a specific project. You specify the project by passing its vertex ID as a parameter to the procedure.

CREATE PROCEDURE AggregateProjects
@VertexId INTEGER
AS
SET NOCOUNT ON
   DECLARE @lvl INTEGER
   DECLARE @Name VARCHAR(20) 
   DECLARE @Cost INTEGER
   DECLARE @Sum INTEGER

   CREATE TABLE #stack (
      VertexId INTEGER, 
      Name VARCHAR(20),
      Cost INTEGER,
      Lvl INTEGER
   )
   
   SELECT @Sum=0
   SELECT @Lvl = 1
   
   INSERT INTO #stack 
      SELECT VertexId, Name, Cost, 1 FROM Projects 
      WHERE VertexID=@VertexId  
   
   WHILE @Lvl > 0 BEGIN
      IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN

         SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost 
            FROM #stack WHERE lvl = @lvl
            ORDER BY VertexId

         /* to change action each vertex, change this line */
         SELECT @Sum=@Sum+@Cost
         /* ******* */

         DELETE FROM #stack WHERE vertexId = @VertexId

         INSERT #stack
            SELECT VertexId, Name, Cost, @lvl + 1 
            FROM Projects WHERE parent = @VertexId

         IF @@ROWCOUNT > 0
            SELECT @lvl = @lvl + 1
      END ELSE 
         SELECT @lvl = @lvl - 1
      
   END
   PRINT 'Sum of project costs is '+STR(@Sum)+'.'
   DROP TABLE #stack
SET NOCOUNT OFF

The following examples show the results of executing this procedure for all projects (VertexId=1) and for the Specifications project (VertexId=2).

AggregateProjects 1
Sum of project costs is         231.

AggregateProjects 2
Sum of project costs is         33.

Discussion

The algorithm shown here is adapted from the SQL Server Books Online recommendation for expanding hierarchies. Those familiar with algorithms and data structures will notice that it is a nonrecursive implementation of a recursive traversing algorithm. The code uses an internal stack implemented as a temporary table. The stack holds interesting information from the hierarchical Projects table, plus an additional column named lvl. The lvl column records the level that each entry holds in the hierarchy. The stack definition is as follows:

CREATE TABLE #stack (
      VertexId INTEGER, 
      Name VARCHAR(20),
      Cost INTEGER,
      Lvl INTEGER
   )

After the #stack table is created, two variables are initialized. The @lvl variable tracks the current level on which the code is operating, while the @sum variable accumulates the sum of all costs. Next, an INSERT statement is used to store the root onto the stack. Note that the root in our case is the vertex identified by @VertexId.

The code then loops through each element on the stack. The loop begins by popping one element from the stack. The retrieved data contains a cost, which is added to the total cost being accumulated in @sum:

SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost 
   FROM #stack WHERE lvl = @lvl
   ORDER BY VertexId

SELECT @Sum=@Sum+@Cost

After accumulating the cost for the vertex just pulled from the stack, the code deletes that read vertex and adds all of its children to the stack:

DELETE FROM #stack WHERE vertexId = @VertexId

INSERT #stack
   SELECT VertexId, Name, Cost, @lvl + 1 
   FROM Projects WHERE parent = @VertexId

IF @@ROWCOUNT > 0
   SELECT @lvl = @lvl + 1

The IF statement that you see in this code ensures that if the vertex just deleted from the stack has any children, the @lvl value is increased to indicate that the code is moving one level down in the hierarchy.

The IF EXISTS clause at the beginning of the loop ensures that so long as there are some candidates available on the current level, the loop repeats and browses through them all. The SET NOCOUNT ON directive at the beginning of the procedure just limits the number of messages displayed from the procedure. It does not affect the logic of the algorithm. Without SET NOCOUNT ON, you’ll see a steady stream of “(1 row(s) affected)” messages as the code executes.

This algorithm is general enough that it can be used for any kind of operation for which you prefer traversing the hierarchy in a nonrecursive manner. If you want to change the operation that is performed on each node, change the code where current cost is added to the total sum.

The code demonstrates why the general model for hierarchies has a limited application in SQL. When you need to traverse over more than one level efficiently, the code starts to expand to an almost unreadable size.

Preparing Multilevel Operations

Problem

Your system performs many multilevel, hierarchical operations, and the performance of those operations has not been satisfactory. You need to improve that poor performance.

Solution

One solution is to store additional accessibility information into a service table, and then use that information when querying your hierarchical table. For example, the following ProjectPaths table records the path to, and the depth of, each vertex in the Projects table:

CREATE TABLE ProjectPaths(
   VertexId INTEGER,
   Depth INTEGER,
   Path VARCHAR(300)
)

After creating the ProjectPaths table, you can use the following procedure to fill the table with the depth and path information for each vertex in the Projects table:

CREATE PROCEDURE BuildProjectPathsRecursive
@VertexId INTEGER
AS
SET NOCOUNT ON
   DECLARE @Path VARCHAR(300)
   DECLARE @Depth INTEGER

   SELECT @Depth=a.Depth,@Path=a.Path
   FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId
   WHERE @vertexId=p.vertexId  
     
   DELETE FROM ProjectPaths WHERE VertexId=@VertexId
   INSERT INTO ProjectPaths VALUES(
         @VertexId, 
         isnull(@Depth,0)+1, 
         isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')

   DECLARE subprojects CURSOR LOCAL FOR
      SELECT VertexId FROM Projects p WHERE Parent=@VertexId        
   OPEN subprojects

      FETCH NEXT FROM subprojects INTO @VertexId
      WHILE @@FETCH_STATUS=0 BEGIN

         EXEC BuildProjectPathsRecursive @VertexId
         FETCH NEXT FROM subprojects INTO @VertexId
      END
   CLOSE subprojects
   DEALLOCATE subprojects
SET NOCOUNT OFF

This procedure takes one parameter, which tells the procedure with which node to start. The procedure then works its way down the hierarchy. To process all nodes in the Projects table, invoke this procedure and pass a value of 1 as follows:

BuildProjectPathsRecursive 1

The procedure fills the ProjectPaths table with additional information for every vertex. The Depth column records the depth of each vertex. The Path column records the path to each vertex. In the path, the vertex numbers are separated by dots. The ProjectPaths table that will be built contains the following rows:

VertexId    Depth       Path
----------- ----------- ----------- 
1           1           .1.
2           2           .1.2.
3           3           .1.2.3.
4           3           .1.2.4.
5           3           .1.2.5.
6           3           .1.2.6.
7           4           .1.2.6.7.
8           2           .1.8.
9           3           .1.8.9.
10          3           .1.8.10.
11          4           .1.8.10.11.
12          3           .1.8.12.
13          2           .1.13.
14          3           .1.13.14.
15          3           .1.13.15.
16          3           .1.13.16.
17          2           .1.17.
18          3           .1.17.18.
19          2           .1.19.

Discussion

The idea for this recipe has been taken from an article published by Itzik Ben-Gan (SQL Server Magazine, June, 2000). His development of this technique is a recent achievement resulting from his search for an ultimate support structure to improve the efficiency of the classical hierarchical model. Although it was originally promoted as an add-on to an existing hierarchy table, we see no reason why you shouldn’t normalize properly and separate the hierarchy and its data from the support structure.

The path leading to every vertex is stored in the ProjectPaths table. This represents the work of traversing the hierarchy, and because it is stored in the ProjectPaths table, it only needs to be done once. Please note that the length of the Path field can be changed according to your needs. It does, however, make sense to keep it reasonably small, especially if you want to index it.

The stored procedure named BuildProjectPathsRecursive fills the ProjectPaths table with the paths to each vertex in the subtree. It uses the recursive traversal algorithm introduced earlier in this chapter and runs the following code for each vertex:

SELECT @Depth=a.Depth,@Path=a.Path
FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId
WHERE @vertexId=p.vertexId  
 
DELETE FROM ProjectPaths WHERE VertexId=@VertexId
INSERT INTO ProjectPaths VALUES(
      @VertexId, 
      isnull(@Depth,0)+1, 
      isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')

The SELECT statement reads the depth and path data from the parent. Next, any old information for the vertex is deleted from the ProjectPaths table, and new data is inserted. If the @Depth or @Path variables are null, indicating that no access path for the parent exists, then an initial value of 0 is set for the depth, and an initial value of a dot (.) is set for the path. Regardless of how the depth gets set, it is increased by one. That’s because the @Depth variable represents the depth of the current node’s parent. You have to increment that depth by 1 to get the current node’s depth. Similarly, the @Path variable contains the path to the parent. The current vertex ID is appended onto that path to yield the path to the current node. These new depth and path values are then inserted into the ProjectPaths table.

If you prefer nonrecursive algorithms, you can rewrite the recursive BuildProjectPathsRecursive procedure as a nonrecursive procedure. This code is as follows and uses the stack-based technique shown earlier in the recipe titled Section 4.9:

CREATE PROCEDURE BuildProjectsPaths 
@VertexId INTEGER
AS
SET NOCOUNT ON

   DECLARE @lvl INTEGER

   CREATE TABLE #stack (
      VertexId INTEGER, 
      Lvl INTEGER
   )

   SELECT @Lvl = 1
 
   INSERT INTO #stack 
      SELECT VertexId,1 FROM Projects WHERE VertexId=@VertexID
   
   WHILE @Lvl > 0 BEGIN
      IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN

         SELECT TOP 1 @VertexId = VertexId FROM #stack 
            WHERE lvl = @lvl
            ORDER BY VertexId
   
          DELETE FROM ProjectPaths WHERE VertexId=@VertexId
          INSERT INTO ProjectPaths
            SELECT p.vertexId, 
               isnull(a.Depth,0)+1,
               isnull(a.Path,'.')+CAST(p.VertexId AS VARCHAR(15))+'.'
               FROM ProjectPaths a,Projects p
               WHERE @vertexId=p.vertexId AND p.parent*=a.vertexId

         DELETE FROM #stack WHERE vertexId = @VertexId

         INSERT #stack
            SELECT VertexId, @lvl + 1 FROM Projects 
            WHERE parent = @VertexId

         IF @@ROWCOUNT > 0
            SELECT @lvl = @lvl + 1
      END ELSE 
         SELECT @lvl = @lvl - 1
      
   END


SET NOCOUNT OFF

Aggregating Hierarchies Revised

Problem

You want to perform aggregation on your hierarchy. As before, you wish to sum the cost of a project or task by beginning from a specific vertex and working your way through all levels of the hierarchy. This time, though, you wish to enhance the efficiency of your aggregation by using the ProjectPaths service table created in the previous recipe. In addition to summarizing the cost, you also wish to list the hierarchy in an indented format.

Solution

Recall that the previous aggregation procedure was fairly complex and made use of a temporary table named #stack. With the ProjectsPaths table, that same aggregation process becomes simple enough that you can perform it with the SQL query shown in the following example:

SELECT SUM(cost) Total 
FROM ProjectPaths a JOIN Projects p 
   ON a.VertexId=p.VertexId 
WHERE
   Path LIKE (SELECT Path FROM ProjectPaths WHERE VertexId=1)+'%'

Total       
----------- 
231

The query in this example summarizes the costs of all projects and tasks under VertexId 1. As you can see, the result of 231 was obtained without the need for recursion and without the need for a temporary stack table. It was obtained with only a four-line SELECT statement, as opposed to the 48 lines of procedural code required for the earlier version of the aggregation solution.

You can also make use of the ProjectPaths table to list the project in a hierarchical manner:

SELECT Space(Depth*2)+Name Project
FROM ProjectPaths a JOIN Projects p 
ON a.VertexId=p.VertexId 
WHERE
   Path LIKE (SELECT Path FROM ProjectPaths WHERE VertexId=1)+'%'
ORDER BY a.Path

Project
--------------------------------- 
  New SW
    Development
      UI Implementation
      Coding
      Initial testing
    Beta testing
      Final adjustments
    Production testing
    Specifications
      Interviews
      Drafts
      Consolidations
      Final document
        Presentation
    Prototype
      Calculations
        Correctness Testing
      Database
      UI Design

Again, the ProjectPaths table enabled the desired result to be generated using only a short SQL query, as opposed to the rather long procedure that would otherwise be required. Please note that the order in which tasks are listed in the result might not be the same as you get with the TraverseProjectsRecursive procedure. However, the hierarchical structure of the information is still preserved.

Discussion

The first query joins the ProjectPaths and Projects tables. This is a one-to-one join, since both tables have an equal number of rows. The secret to the query lies in the second part of the WHERE clause:

Path LIKE (SELECT Path FROM ProjectPaths WHERE VertexId=1)+'%'

The WHERE clause gathers all vertices for which the beginning of the path string is equal to the root vertex (in our case, it is .1.). The summation of all costs is then just a simple matter of applying the SUM function to those rows.

Multilevel operations can now be performed efficiently using the ProjectPaths table. Once you know the path to the parent node, you know the paths to all of that node’s children. Had you wished to summarize the cost for the Specifications subproject, you could modify the second part of the WHERE clause as follows:

Path LIKE (SELECT Path FROM ProjectPaths WHERE VertexId=2)+'%'

When writing queries using a table like the ProjectPaths table, you need to remember two rules. First, if you wish to perform an operation on a parent vertex together with all its children, you should use the % pattern match operator at the end of the search string in your LIKE predicate. Second, if you wish to exclude the parent from the result set, you should use the _% pattern. The additional underscore in the pattern match string requires that a character be present. Thus, if the parent’s path is .1., it will not match a pattern of .1._%. Any children, however, will have a character following the second dot, so they will match the pattern.

The Depth column in the ProjectPaths table allows you to zero in easily on vertices of a given depth. For example, the following query will return a list of all level two projects in the Projects table:

SELECT SUM(cost) Total 
FROM ProjectPaths a JOIN Projects p 
   ON a.VertexId=p.VertexId
WHERE a.Depth=2

The Depth column can also be used to compute indention, as you saw earlier in this recipe’s second query:

SELECT Space(Depth*2)+Name Project

In this case, the Depth column value was multiplied by two to determine the correct number of leading spaces for each line of output.

Get Transact-SQL Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.