Upload
aiyana-golightly
View
223
Download
2
Embed Size (px)
Citation preview
drsql.org
Who am I?
• Been in IT for over 19 years• Microsoft MVP For 10 Years• Corporate Data Architect• Written five books on
database design–Ok, so they were all versions
of the same book. They at least had slightly different titles each time
drsql.org
3
Hierarchies
3
drsql.org
44
Hierarchies
• Trees - Single Parent Hierarchies
• Graphs – Multi Parent Hierarchies
– Note: Graphs can be complex to deal with as a whole, but often you can deal with them as a set of trees
ScrewPiece of Wood
Wood with Tape Screw and Tape
Tape
drsql.org
5
Cycles in Hierarchies
5
Parent
Child
• “I’m my own grandpa” syndrome• Must be understood or can cause infinite loop in processing
• Generally disallowed in trees• May be supported in graphs, particularly for establishing relationships
Grandparent
drsql.org
66
Hierarchy Uses
• Trees– Species– Jurisdictions – “Simple” Organizational Charts (Or at least the base manager-employee part of the organization)– Directory folders
• Graph– Bill of materials– Complex Organization Chart (all those dotted lines!)– Genealogies
• Biological (Typically with limit cardinality of parents to 2 )• Family Tree – (Sky is the limit)
– Social Networking Relationships • Example: (Bob is connected to Sue, Sue is connected to Fred, Fred is connected to Bob)
drsql.org
7
Implementation of a Hierarchy
• “There is more than one way to shave a dog”– None of which are pleasant for the dog or the shaver– And the doctor who orders it only asks for a bald dog
• Hierarchies are not at all natural to manipulate/query using relational code– And the natural, recursive processing of a node at a time is horribly difficult and slow in
relational code– So, multiple methods of processing them have arisen through the years
• The topic (much like the topic of how cruel it is to shave a dog), inspires religious-like arguments
• I find all of the implementation possibilities fascinating, so I set out to do an overview of them all…
7
drsql.org
88
Working with Trees - Background
• Node recursion
• Relational Recursion
drsql.org
9
Tree Processing Algorithms
• There are several methods for processing trees in SQL• We will look at– Fixed Levels–Adjacency List–HierarchyId–Path Technique–Nested Sets–Kimball Helper Table
• Without giving away too much, pretty much all of the methods have some use…
9
drsql.org
10
Coding for trees
• Manipulation:– Creating a new node– Moving/Reparenting a node– Deleting a node (without children)– Note: No tree algorithms allow for “simple” SQL solutions to all of these problems
• Usage– Getting the children of a node– Getting the parent of a node– Aggregating along the tree
• We will have demos of all of these operations…available at least
10
drsql.org
1111
Reparenting Example
• Starting with:
• Perhaps ending with:
Dragging along all of it’s child nodes
alongwith it
drsql.org
1212
Implementing a tree – Fixed Levels
CREATE TABLE CompanyHierarchy( Company varchar(100) NOT NULL, Headquarters varchar(100) NOT NULL, Branch varchar(100) NOT NULL, PRIMARY KEY (Company, Headquarters, Branch))
Very limited, but very fast and easy to work withI will not demo this structure today because it’s use is both extremely obvious and limited
drsql.org
1313
Implementing a tree – Adjacency List
• Every row includes the key value of the parent in the row• Parent-less rows have NULL parent value• Code is the most complex to write (though not as inefficient as it might seem)
• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, ParentOrganization varchar(100) NULL REFERENCES CompanyHierarchy (Organization), Name varchar(100) NOT NULL)
drsql.org
14
Adjacency List – Adding a Node
14
New Node
drsql.org
1515
drsql.org
1616
Simply set the parent and done!
drsql.org
1717
Implementing a tree – Path Method
Every row includes a representation of the path to their parent
Processing makes use of like and string processing (I have seen a case that used fixed length binary values)
Limitation on path size for string manipulation/indexing
CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900))
900 Bytesallows for indexed
manipulations
drsql.org
Path Method Adding a Node
18
New Node
drsql.org
19
New Id = 9
drsql.org
20
Plus the New Id
Path from the parent
drsql.org
2121
Implementing a tree – Path Method
Every row includes a representation of the path to their parent
Processing makes use of like and string processing (I have seen a case that used fixed length binary values)
Limitation on path size for string manipulation/indexing
CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900))
drsql.org
2222
Implementing a tree – HierarchyId
Somewhat unnatural method to the typical SQL Programmer
Similar to the Path Method, and has some of the same limitations when moving around nodes
Node path does not use data natural to the table, but rather positional locationing
CREATE TABLE CompanyHierarchy( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, OrgNode hierarchyId not null)
drsql.org
2323
Implementing a tree – Nested Sets
• Query processing is done using range queries• Structure is quite slow to maintain due to fragile structure• Can produce excellent performance for queries
• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL)
drsql.org
New Node
Nested Sets – Adding a Node
drsql.org
Updating Right
Values
drsql.org
And the One Left value
right of the new node
drsql.org
Renumber, leaving gap
for child
drsql.org
The New Node
drsql.org
Set the New
Node’s Left/Right
drsql.org
3030
Implementing a tree – Nested Sets
• Query processing is done using range queries• Structure is quite slow to maintain due to fragile structure• Can produce excellent performance for queries
• CREATE TABLE CompanyHierarchy( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL)
drsql.org
3131
Implementing a tree – Kimball Helper
• Developed initially for data warehousing since data is modified all at once with a fixed cost
• Basically explodes the hierarchy into a table that turns all hierarchy manipulations into a relational query
• Maintenance can be slightly costly, but using the data is extremely fast
drsql.org
3232
Implementing a tree – Kimball Helper
• For the rows in yellow, expands to the table shown:
ParentId ChildId Distance ParentRootNode
ChildLeafNode
1 1 0 1 01 2 1 1 01 4 2 1 11 5 2 1 12 2 0 0 02 4 1 0 12 5 1 0 1
drsql.org
33
Performance Examples and Limitations• The following tests were run multiple times, and the results were
taken from one such run.• Clearly the results are not scientific, and done with random data. • However, they very much match my expectations from my research.• Load times were captured loading one row at a time.
• Test machine (this laptop I am using tonight) was a:– Lenovo Yoga Pro 2, Haswell ULT i7 (4th Gen Intel Mobile Processor), 2.4Ghz
Dual Core (Hyperthreaded), 8GB RAM, 256 GB SSD
• Note: All load times include time to load 5 transactions per node
33
drsql.org
34
Performance Example Explanation
• For each performance test (which I will show the code later), I ran three query sets on each data set:
1. Load the tree (until my computer couldn’t do it in a reasonable number of hours)
2. Fetch all children from the root node3. Aggregate data for all children at all levels
34
drsql.org
35
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
5
10
15
20
25
30
3400 Node, 5 Level; Load Time (Seconds)
35
drsql.org
36
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
20
40
60
80
100
120
140
160
180
200
3400 Node, 5 Level; Fetch Children of Root Node (ms)
Total Time (ms)
36
drsql.org
37
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
20
40
60
80
100
120
140
160
180
200
3400 Node, 5 Level; Aggregate All Children (ms)
Total Time (ms) CPU (ms)
37
drsql.org
38
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
10
20
30
40
50
60
70
80
90
100
55301 Node, 5 Level; Load Time (Minutes)
120
38
drsql.org
39
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
200
400
600
800
1000
1200
1400
55301 Node, 5 Level; Fetch Children of Root Node (ms)
Total Time (ms)
39
drsql.org
40
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
500
1000
1500
2000
2500
3000
55301 Node, 5 Level; Aggregate All Children (ms)
Total Time (ms) CPU (ms)
40
drsql.org
41
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
10
20
30
40
50
60
42101 Node, 15 Level; Load Time (Minutes)
41
drsql.org
42
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
200
400
600
800
1000
1200
42101 Node, 15 Level; Fetch Children of Root Node (ms)
Total Time (ms)
42
drsql.org
43
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
500
1000
1500
2000
2500
3000
3500
4000
4500
42101 Node, 15 Level; Aggregate All Children (ms)
Total Time (ms) CPU (ms)
43
drsql.org
44
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
5000
10000
15000
20000
25000
30000
35000
40000
298001 Node, 25 Level; Aggregate All Children (ms)
Total Time (ms) CPU (ms)
44
drsql.org
45
Performance Comparisons
Adjacency List HierarchyId PathMethod Nested Sets Kimball Helper0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1274001 Node, 50 Level; Aggregate All Children (ms)
Total Time (ms) CPU (ms)
45
drsql.org
46
Method Comparison
3400 (5) 55301 (5) 42101 (15) 298001 (25) 1274001 (50)0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Aggregate All Children (ms)
Adjacency List HierarchyId Path Method Nested Sets Kimball
46
drsql.org
4747
Demo Code
• Example code for all examples available for download. Will demo hierarchies and graphs.
drsql.org
48
Method ApplicabilityMethod ->Applicability
Adjacency List
HierarchyId PathMethod NestedSet Kimball Helper
General Purpose Hierarchies
*** *** *
VERY Large HierarchyQueries
* * ** *** ***
Offline Reporting
* * ** ** (Cost of maintaining limits use)
***
OLTP Use *** ** ** ** (Perhaps slower to load nodes)
Highly Concurrent Modification
*** ** * *
Highly Concurrent Queries
* ** *** *** ***
Unlimited Hierarchy Size
** ** * (Width unlimited, Effective depth limited by 900 byte index limit)
*** ***
48
drsql.org
49
Future Improvements
• Use SQL Server 2014 In-Memory Database to help with locking and brute force operations
• Adjust Nested Sets to use fractional numbers to reduce load time costs• Load an order of magnitude more data• Try these examples on a “real” computer!
49
drsql.org
50
Graphs
• Generally implemented in same manner as adjacency list–Can be processed in the same manner as an adjacency list–Primary difference is child can have > 1 parent node–Cycles are generally acceptable
• Graph structure will always be external to data structure• Graphs are even more natural data structures than trees
50
drsql.org
51
Graphs are Everywhere
• Almost any many to many can be a graph
51
Movie
ActorActingCast
DirectorMovieDirector
drsql.org
54
Contact info
• Louis Davidson - [email protected]• Website – http://drsql.org <-- Get slides here• Twitter – http://twitter.com/drsql
• SQL Blog http://sqlblog.com/blogs/louis_davidson
• Simple Talk Blog – What Counts for a DBAhttp://www.simple-talk.com/community/blogs/drsql/default.aspx
drsql.org
Thank you That’s all folks!
55