53
d r s q l . o r g How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) [email protected]

Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) [email protected]

Embed Size (px)

Citation preview

Page 1: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

How to Implement a Hierarchy in SQL Server

Louis Davidson (drsql.org)[email protected]

Page 2: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Who am I?

• Been in IT for over 19 years• Microsoft MVP For 10 Years• Corporate Data Architect• Written five books on

database design–Ok, so they were all versions

of the same book. They at least had slightly different titles each time

Page 3: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

3

Hierarchies

3

Page 4: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

44

Hierarchies

• Trees - Single Parent Hierarchies

• Graphs – Multi Parent Hierarchies

– Note: Graphs can be complex to deal with as a whole, but often you can deal with them as a set of trees

ScrewPiece of Wood

Wood with Tape Screw and Tape

Tape

Page 5: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

5

Cycles in Hierarchies

5

Parent

Child

• “I’m my own grandpa” syndrome• Must be understood or can cause infinite loop in processing

• Generally disallowed in trees• May be supported in graphs, particularly for establishing relationships

Grandparent

Page 6: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

66

Hierarchy Uses

• Trees– Species– Jurisdictions – “Simple” Organizational Charts (Or at least the base manager-employee part of the organization)– Directory folders

• Graph– Bill of materials– Complex Organization Chart (all those dotted lines!)– Genealogies

• Biological (Typically with limit cardinality of parents to 2 )• Family Tree – (Sky is the limit)

– Social Networking Relationships • Example: (Bob is connected to Sue, Sue is connected to Fred, Fred is connected to Bob)

Page 7: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

7

Implementation of a Hierarchy

• “There is more than one way to shave a dog”– None of which are pleasant for the dog or the shaver– And the doctor who orders it only asks for a bald dog

• Hierarchies are not at all natural to manipulate/query using relational code– And the natural, recursive processing of a node at a time is horribly difficult and slow in

relational code– So, multiple methods of processing them have arisen through the years

• The topic (much like the topic of how cruel it is to shave a dog), inspires religious-like arguments

• I find all of the implementation possibilities fascinating, so I set out to do an overview of them all…

7

Page 8: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

88

Working with Trees - Background

• Node recursion

• Relational Recursion

Page 9: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

9

Tree Processing Algorithms

• There are several methods for processing trees in SQL• We will look at– Fixed Levels–Adjacency List–HierarchyId–Path Technique–Nested Sets–Kimball Helper Table

• Without giving away too much, pretty much all of the methods have some use…

9

Page 10: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

10

Coding for trees

• Manipulation:– Creating a new node– Moving/Reparenting a node– Deleting a node (without children)– Note: No tree algorithms allow for “simple” SQL solutions to all of these problems

• Usage– Getting the children of a node– Getting the parent of a node– Aggregating along the tree

• We will have demos of all of these operations…available at least

10

Page 11: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1111

Reparenting Example

• Starting with:

• Perhaps ending with:

Dragging along all of it’s child nodes

alongwith it

Page 12: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1212

Implementing a tree – Fixed Levels

CREATE TABLE CompanyHierarchy( Company varchar(100) NOT NULL, Headquarters varchar(100) NOT NULL, Branch varchar(100) NOT NULL, PRIMARY KEY (Company, Headquarters, Branch))

Very limited, but very fast and easy to work withI will not demo this structure today because it’s use is both extremely obvious and limited

Page 13: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1313

Implementing a tree – Adjacency List

• Every row includes the key value of the parent in the row• Parent-less rows have NULL parent value• Code is the most complex to write (though not as inefficient as it might seem)

• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, ParentOrganization varchar(100) NULL REFERENCES CompanyHierarchy (Organization), Name varchar(100) NOT NULL)

Page 14: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

14

Adjacency List – Adding a Node

14

New Node

Page 15: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1515

Page 16: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1616

Simply set the parent and done!

Page 17: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

1717

Implementing a tree – Path Method

Every row includes a representation of the path to their parent

Processing makes use of like and string processing (I have seen a case that used fixed length binary values)

Limitation on path size for string manipulation/indexing

CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900))

900 Bytesallows for indexed

manipulations

Page 18: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Path Method Adding a Node

18

New Node

Page 19: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

19

New Id = 9

Page 20: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

20

Plus the New Id

Path from the parent

Page 21: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

2121

Implementing a tree – Path Method

Every row includes a representation of the path to their parent

Processing makes use of like and string processing (I have seen a case that used fixed length binary values)

Limitation on path size for string manipulation/indexing

CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900))

Page 22: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

2222

Implementing a tree – HierarchyId

Somewhat unnatural method to the typical SQL Programmer

Similar to the Path Method, and has some of the same limitations when moving around nodes

Node path does not use data natural to the table, but rather positional locationing

CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, OrgNode hierarchyId not null)

Page 23: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

2323

Implementing a tree – Nested Sets

• Query processing is done using range queries• Structure is quite slow to maintain due to fragile structure• Can produce excellent performance for queries

• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL)

Page 24: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

New Node

Nested Sets – Adding a Node

Page 25: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Updating Right

Values

Page 26: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

And the One Left value

right of the new node

Page 27: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Renumber, leaving gap

for child

Page 28: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

The New Node

Page 29: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Set the New

Node’s Left/Right

Page 30: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

3030

Implementing a tree – Nested Sets

• Query processing is done using range queries• Structure is quite slow to maintain due to fragile structure• Can produce excellent performance for queries

• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL)

Page 31: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

3131

Implementing a tree – Kimball Helper

• Developed initially for data warehousing since data is modified all at once with a fixed cost

• Basically explodes the hierarchy into a table that turns all hierarchy manipulations into a relational query

• Maintenance can be slightly costly, but using the data is extremely fast

Page 32: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

3232

Implementing a tree – Kimball Helper

• For the rows in yellow, expands to the table shown:

ParentId ChildId Distance ParentRootNode

ChildLeafNode

1 1 0 1 01 2 1 1 01 4 2 1 11 5 2 1 12 2 0 0 02 4 1 0 12 5 1 0 1

Page 33: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

33

Performance Examples and Limitations• The following tests were run multiple times, and the results were

taken from one such run.• Clearly the results are not scientific, and done with random data. • However, they very much match my expectations from my research.• Load times were captured loading one row at a time.

• Test machine (this laptop I am using tonight) was a:– Lenovo Yoga Pro 2, Haswell ULT i7 (4th Gen Intel Mobile Processor), 2.4Ghz

Dual Core (Hyperthreaded), 8GB RAM, 256 GB SSD

• Note: All load times include time to load 5 transactions per node

33

Page 34: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

34

Performance Example Explanation

• For each performance test (which I will show the code later), I ran three query sets on each data set:

1. Load the tree (until my computer couldn’t do it in a reasonable number of hours)

2. Fetch all children from the root node3. Aggregate data for all children at all levels

34

Page 35: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

35

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

5

10

15

20

25

30

3400 Node, 5 Level; Load Time (Seconds)

35

Page 36: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

36

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

20

40

60

80

100

120

140

160

180

200

3400 Node, 5 Level; Fetch Children of Root Node (ms)

Total Time (ms)

36

Page 37: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

37

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

20

40

60

80

100

120

140

160

180

200

3400 Node, 5 Level; Aggregate All Children (ms)

Total Time (ms) CPU (ms)

37

Page 38: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

38

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

10

20

30

40

50

60

70

80

90

100

55301 Node, 5 Level; Load Time (Minutes)

120

38

Page 39: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

39

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

200

400

600

800

1000

1200

1400

55301 Node, 5 Level; Fetch Children of Root Node (ms)

Total Time (ms)

39

Page 40: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

40

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

500

1000

1500

2000

2500

3000

55301 Node, 5 Level; Aggregate All Children (ms)

Total Time (ms) CPU (ms)

40

Page 41: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

41

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

10

20

30

40

50

60

42101 Node, 15 Level; Load Time (Minutes)

41

Page 42: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

42

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

200

400

600

800

1000

1200

42101 Node, 15 Level; Fetch Children of Root Node (ms)

Total Time (ms)

42

Page 43: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

43

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

500

1000

1500

2000

2500

3000

3500

4000

4500

42101 Node, 15 Level; Aggregate All Children (ms)

Total Time (ms) CPU (ms)

43

Page 44: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

44

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

5000

10000

15000

20000

25000

30000

35000

40000

298001 Node, 25 Level; Aggregate All Children (ms)

Total Time (ms) CPU (ms)

44

Page 45: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

45

Performance Comparisons

Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

1274001 Node, 50 Level; Aggregate All Children (ms)

Total Time (ms) CPU (ms)

45

Page 46: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

46

Method Comparison

3400 (5) 55301 (5) 42101 (15) 298001 (25) 1274001 (50)0

50000

100000

150000

200000

250000

300000

350000

400000

450000

Aggregate All Children (ms)

Adjacency List HierarchyId Path Method Nested Sets Kimball

46

Page 47: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

4747

Demo Code

• Example code for all examples available for download. Will demo hierarchies and graphs.

Page 48: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

48

Method ApplicabilityMethod ->Applicability

Adjacency List

HierarchyId PathMethod NestedSet Kimball Helper

General Purpose Hierarchies

*** *** *

VERY Large HierarchyQueries

* * ** *** ***

Offline Reporting

* * ** ** (Cost of maintaining limits use)

***

OLTP Use *** ** ** ** (Perhaps slower to load nodes)

Highly Concurrent Modification

*** ** * *

Highly Concurrent Queries

* ** *** *** ***

Unlimited Hierarchy Size

** ** * (Width unlimited, Effective depth limited by 900 byte index limit)

*** ***

48

Page 49: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

49

Future Improvements

• Use SQL Server 2014 In-Memory Database to help with locking and brute force operations

• Adjust Nested Sets to use fractional numbers to reduce load time costs• Load an order of magnitude more data• Try these examples on a “real” computer!

49

Page 50: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

50

Graphs

• Generally implemented in same manner as adjacency list–Can be processed in the same manner as an adjacency list–Primary difference is child can have > 1 parent node–Cycles are generally acceptable

• Graph structure will always be external to data structure• Graphs are even more natural data structures than trees

50

Page 51: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

51

Graphs are Everywhere

• Almost any many to many can be a graph

51

Movie

ActorActingCast

DirectorMovieDirector

Page 52: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

54

Contact info

• Louis Davidson - [email protected]• Website – http://drsql.org <-- Get slides here• Twitter – http://twitter.com/drsql

• SQL Blog http://sqlblog.com/blogs/louis_davidson

• Simple Talk Blog – What Counts for a DBAhttp://www.simple-talk.com/community/blogs/drsql/default.aspx

Page 53: Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

drsql.org

Thank you That’s all folks!

55