6
Share this: 01 Mar 2011 | 33 Comments » The kd-tree is a well-known data structure, which alongside quadtrees and octrees, is commonly used for the task of space partitioning. That is, it aims to represent a set of points in k-dimensional space in a way that is efficient to access. Kd-trees, as do all binary trees, have the notable advantage of an average search time, far superior to the naive search. While such trees perhaps have their best known application in computer graphics, I discovered them here in relation to a computational physics project on which I am currently working. For those who are curious about the context, I have recently been doing some statistical analysis on the distribution stars using the Hipparcos catalogue. Unfortunately due to sheer number of stars (100,000+), performing a nearest neighbour search or range search on each star in 3D (Euclidean) space is simply infeasible without the use of a clever algorithm. This is how I discovered the wonder of kd-trees, reducing a task that would otherwise take days to several minutes. The software I am writing for this project happens to be in C# 4.0, and upon a bit of investigation, I discovered that there exists no (decent) implementation of the kd-tree structure and its algorithms for .NET. Thus, with a bit of help from the Wikipedia article and a paper I dug up, I decided to have a go at a generic implementation. Now that it is complete and tested, I can gladly declare that the job was not half as easy as I expected (but somehow quite rewarding)! The following features have been implemented (and fully documented with XML comments). Tree construction Dynamic node addition Dynamic node removal Nearest neighbour search Range search I’ve now uploaded the complete code for the implementation, which is available below as several C# code files. Note that you will also need the (wonderful) Math.NET Numerics library to compile things – it provides the generic vector type and a few other handy things. (You can probably adapt the code easily enough if you don’t fancy having this dependency.) Downloads (All code is licensed under the MIT license. Copyright 2011 Alex Regueiro.) KdTree.cs KdTreeNode.cs Arithmetic.cs Tests (MSTest framework) KdTreeTests.cs The idea is that the KdTree class is available to use as a black box, so let me know if anything is unclear. (If you’re wondering though, the arithmetic stuff is there to provide generic arithmetic support complementing the generic vector support. You’ll probably just want to use double most of the time.) Finally, if you’re interested in the unit tests, you can grab them over at the project repository. Enjoy. CATEGORY(S): Maths & Science, Programming, Projects TAG(S): .net, .net 4.0, algorithms, astronomy, binary trees, bsp trees, c#, c# 4.0, computational physics, data structures, generic arithmetic, hipparcos catalogue, kd trees, math.net, math.net numerics, nearest neighbour, octrees, physics, quadtrees, range search, space partitioning, star catalogues, stars, tree structures 33 RESPONSES TO “KD-TREES FOR .NET” Dave says: July 23, 2011 at 5:09 pm Hey – Noldorin – this looks great – I have been using the Numerical Recipes impl but calling it back and forth from a C# harness which is SLOWWW. Two things are not clear to me however. Am I able to retreive items which are within a radius of r of say point P ? (Is that the public IEnumerable FindInRange(Vector location, TField range) method ) Also, TField and TValue ?? I have a list of Point structures which I want to put into the tree. What is the signature for my KDtree initialisation ? Many thanks for this (my project is a toy Lego robot navigation project ) Many thanks again Dave says: July 24, 2011 at 1:15 pm Hi - I posted earlier – please ignore – having looked at the unit tests all is clear – this is a wonderful piece of code !! Many thanks. I was going to port NRC, but this has saved a whole pile of aggravation. Great job. Best Noldorin's Blog Musings on mathematics, science, technology, philosophy, and the untold wonders of life Home About Contact Jobs Reading Projects TeX CATEGORIES Arts & Humanities Fun Maths & Science Personal Philosophy Programming Projects Software Uncategorized Web Design ARCHIVES February 2012 December 2011 November 2011 October 2011 May 2011 March 2011 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 June 2009 May 2009 March 2009 January 2009 December 2008 November 2008 October 2008 September 2008 BLOGROLL Alsuren (David) Anderson Brown's Philosophy Blog Harry Eakins' Blog JmD (Jan Danielsson) Miguel de Icaza Saturday Morning Breakfast Cereal Scott Hanselman's Computer Zen Teusje's Blog XKCD META Log in Entries RSS Comments RSS WordPress.org MORE RSS Entries Comments POWERED BY WordPress & MyJournal Theme KD-TREES FOR .NET KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/ 1 of 6 22/05/2013 04:29

KD-Trees for .NET « Noldorin's Blog

Embed Size (px)

DESCRIPTION

KD-Trees for .NET « Noldorin's Blog

Citation preview

Page 1: KD-Trees for .NET « Noldorin's Blog

Share this:

01 Mar 2011 | 33 Comments »

The kd-tree is a well-known data structure, which alongside quadtrees and octrees, is commonly used for the task of spacepartitioning. That is, it aims to represent a set of points in k-dimensional space in a way that is efficient to access. Kd-trees, as

do all binary trees, have the notable advantage of an average search time, far superior to the naive search.

While such trees perhaps have their best known application in computer graphics, I discovered them here in relation to acomputational physics project on which I am currently working. For those who are curious about the context, I have recentlybeen doing some statistical analysis on the distribution stars using the Hipparcos catalogue. Unfortunately due to sheer numberof stars (100,000+), performing a nearest neighbour search or range search on each star in 3D (Euclidean) space is simplyinfeasible without the use of a clever algorithm. This is how I discovered the wonder of kd-trees, reducing a task that wouldotherwise take days to several minutes.

The software I am writing for this project happens to be in C# 4.0, and upon a bit of investigation, I discovered that there existsno (decent) implementation of the kd-tree structure and its algorithms for .NET. Thus, with a bit of help from the Wikipediaarticle and a paper I dug up, I decided to have a go at a generic implementation. Now that it is complete and tested, I cangladly declare that the job was not half as easy as I expected (but somehow quite rewarding)!

The following features have been implemented (and fully documented with XML comments).

Tree construction

Dynamic node addition

Dynamic node removal

Nearest neighbour search

Range search

I’ve now uploaded the complete code for the implementation, which is available below as several C# code files. Note that youwill also need the (wonderful) Math.NET Numerics library to compile things – it provides the generic vector type and a few otherhandy things. (You can probably adapt the code easily enough if you don’t fancy having this dependency.)

Downloads

(All code is licensed under the MIT license. Copyright 2011 Alex Regueiro.)

KdTree.cs

KdTreeNode.cs

Arithmetic.cs

Tests (MSTest framework)

KdTreeTests.cs

The idea is that the KdTree class is available to use as a black box, so let me know if anything is unclear. (If you’re wonderingthough, the arithmetic stuff is there to provide generic arithmetic support complementing the generic vector support. You’llprobably just want to use double most of the time.) Finally, if you’re interested in the unit tests, you can grab them over at theproject repository. Enjoy.

CATEGORY(S): Maths & Science, Programming, ProjectsTAG(S): .net, .net 4.0, algorithms, astronomy, binary trees, bsp trees, c#, c# 4.0, computational physics, data structures, generic arithmetic,hipparcos catalogue, kd trees, math.net, math.net numerics, nearest neighbour, octrees, physics, quadtrees, range search, space partitioning,star catalogues, stars, tree structures

33 RESPONSES TO “KD-TREES FOR .NET”

Dave says:July 23, 2011 at 5:09 pm

Hey – Noldorin – this looks great – I have been using the Numerical Recipes impl but calling it back and forth from a C# harnesswhich is SLOWWW. Two things are not clear to me however. Am I able to retreive items which are within a radius of r of say point P ? (Isthat the public IEnumerable FindInRange(Vector location, TField range) method ) Also, TField and TValue ?? I have a list of Point structureswhich I want to put into the tree. What is the signature for my KDtree initialisation ? Many thanks for this (my project is a toy Lego robotnavigation project ) Many thanks again

Dave says:July 24, 2011 at 1:15 pm

Hi -I posted earlier – please ignore – having looked at the unit tests all is clear – this is a wonderful piece of code !! Many thanks.

I was going to port NRC, but this has saved a whole pile of aggravation.

Great job.

Best

Noldorin's BlogMusings on mathematics, science, technology, philosophy, and the untold wonders of life

Home About Contact Jobs Reading Projects TeX

CATEGORIESArts & HumanitiesFunMaths & SciencePersonalPhilosophyProgrammingProjectsSoftwareUncategorizedWeb Design

ARCHIVESFebruary 2012December 2011November 2011October 2011May 2011March 2011September 2010August 2010July 2010June 2010May 2010April 2010March 2010February 2010January 2010December 2009November 2009October 2009September 2009August 2009June 2009May 2009March 2009January 2009December 2008November 2008October 2008September 2008

BLOGROLLAlsuren (David)Anderson Brown's PhilosophyBlogHarry Eakins' BlogJmD (Jan Danielsson)Miguel de IcazaSaturday Morning BreakfastCerealScott Hanselman's ComputerZenTeusje's BlogXKCD

METALog inEntries RSSComments RSSWordPress.org

MORE RSSEntriesComments

POWERED BYWordPress & MyJournal Theme

KD-TREES FOR .NET

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

1 of 6 22/05/2013 04:29

Page 2: KD-Trees for .NET « Noldorin's Blog

Dave

Noldorin says:July 26, 2011 at 6:48 pm

Glad to hear that. And yeah, hopefully the unit tests act as the documentation! I use Math.NET for the vector-based stuff in code,but it should be easy enough to substitute that dependency.

Good luck with your project; sounds fun!

Qua says:July 30, 2011 at 2:38 am

It looks great, it’s really sad that it depends on Math.Net Numerics though. I’m not quite sure whether it would be easiest to modifythis to avoid this dependency or write it from scratch

Noldorin says:July 30, 2011 at 6:30 am

Yeah, the Math.NET Numerics dependency is purely because I originally wrote this code for a computational physics project, and wetied into that dependency anyway. You can abstract it easily enough by writing your own simple Vector classes, I think. Give it a go and letme know if you run into any specific problems.

Qua says:August 1, 2011 at 9:16 am

I ended up writing the whole data structure myself. Your version is built to be very flexible (working on any range of dimension anddata types) and could for such be used in a data structure framework, whereas I built mine very specific for 2D using the “vector” classthat was present in my project.

Thanks for the inspiration though

Noldorin says:August 1, 2011 at 9:37 am

Well, I’m glad it helped at least. The algorithmic implementations are there, for sure. But I agree, it is meant to be a pretty genericclass, so I can understand a rewrite for your specific needs…

Yak Fatzko says:August 3, 2011 at 8:05 am

Hey Qua – any chance of you sharing that code? I need a 2D implementation for a mapping project. If you can contribute it wouldbe deeply appreciated. Thanks. yakadum @ yahoo . com

Noldorin says:August 3, 2011 at 8:31 am

No, you *don’t* need a 2D implementation. This is a generic n-d implementation already. If you simply want to remove theMath.NET Numerics matter, then that’s another matter – a trivial one in fact.

Hass says:August 3, 2011 at 1:14 pm

Noldorin, thanks for the excellent implementation. I would like to implement a specialised version but I was wondering what is thelicense required if any to use this code. For instance GNU GPL ?

Noldorin says:August 4, 2011 at 3:19 am

Hi Hass. I’m glad you find it useful. You’re right, the code really needs a license (especially now that it’s garnering a lot of third-partyinterest.) I’ve updated the post to indicate that everything’s licensed under the MIT license – you should find this fairly unrestrictive.

Hass says:August 8, 2011 at 9:07 am

Thanks Noldorin, that’s great

Curious says:September 27, 2011 at 7:07 pm

Hi!

Would it be possible to put a sample code in C# illustrating the use of the tree? Just a few lines if possible, please. I should also say that itis an excellent implementation, the quality of which is exceeding the many – if not all of – code snippets available on the internet in terms

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

2 of 6 22/05/2013 04:29

Page 3: KD-Trees for .NET « Noldorin's Blog

of C# examples.

Noldorin says:September 27, 2011 at 11:44 pm

You make a fair point. I just realised when I read your comment that the post lacked examples. For the sake of simplicity, I’ve simplymade the KdTree unit tests (for the project I was working on) available for download. Hopefully this (with the comments) should besufficient to get you going, but let me know if you need any further clarification.

Curious says:September 28, 2011 at 2:32 pm

Hi Noldorin,

I thank you very much for your quick response. I am sorry I forgot in my previous post to mention that I had already checked the unit test.I should say that I did not get much information partly because the Math.Net Numerics is also completely new to me. However, I figuredout how it “might” work. Yet, I do not have the full confidence to my interpretation of your data structure.

If it does not much work, I kindly ask a very simple example (just a few lines) targeting how you initialize the structure and how/what youget at the end.

Regards

Qua says:October 31, 2011 at 12:11 am

Are you sure that your Remove method works correctly? My implementation took origin in yours, but now that I starting to useRemove it seems strange to me. As far as I can see you only compare a single dimension and then if this single dimension value is identicalyou proclaim the two elements equal and remove them. There is no guarantee that the elements are equal, however, and this introduce abug.

It might just be my implementation, but it looks like yours have the same flaw.

Noldorin says:October 31, 2011 at 12:48 am

Hi Qua.

You could well be right. The unit tests for Remove are not extensive. Indeed, I did not actually use the Remove functionality actively in myscientific project.

However, I’m tempted to believe that the fix required is small. Perhaps you could provided a test case (just an extra few lines in the unittests file) to demonstrate the problem? Would be happy to fix it then.

Cheers, Alex

Qua says:October 31, 2011 at 6:22 pm

Try adding the nodes (0,0) followed by (0,1) and then remove (0,1) and see if the tree is what you expected.

Ian says:November 16, 2011 at 2:26 am

Hi Noldorin, thank you very much for your work on this. I’m attempting to leverage you code and ran into a problem compiling theKdTreeTests.cs. In the Find TestFindNearestNNeighbours method it looks like the test is calling a method that doesn’t appear to beimplimented in the KdTree class that would allow you to find the nearest N Neighbours instead of only nearest Neighbour. Is there aversion of your work that includes an overloaded FindNearestNeighbours method that will search for multiple Neighbours?

Noldorin says:November 17, 2011 at 2:07 am

Hi Ian.

You’re absolutely right. I added the CS file for the unit tests at a later date, having updated the core KdTree implementation in themeanwhile (for my FermiSim project, linked above). I’ve now updated the KdTree.cs and KdTreeNode.cs files to reflect the latest revision ofmy project (now inactive). It provides the requisite n-nearest-neighbours implementation.

Hope that helps.

César says:January 8, 2012 at 11:59 pm

Hey Noldorin, looking at FindNearestNNeighbors I see that you only insert values into the list when they are within the hypersphereof smaller distance. However, this only makes sense when the size of the list is numNeighbors; before that, you should probably add to thelist every value you come across. Otherwise, you might have the closest value your location at the root, which would result in a smallhypersphere and no more elements would be picked. Am I missing something?

Also, why would you remove from the output a node just because it is the search location? As any other value in the tree, I would expect it

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

3 of 6 22/05/2013 04:29

Page 4: KD-Trees for .NET « Noldorin's Blog

goes into the output. Is there any restriction I’m not aware of?

In any case, thanks for the code. It’s very useful!

Noldorin says:February 4, 2012 at 5:59 pm

Hi Cesar. Sorry for the delayed response. If you’re still interested; yes, you are right that the check to see whether a node is withinthe hypersphere should only be made if the list is already “full”. I threw together this method quite quickly, and since I don’t believe I everused it in my project, it was never thoroughly tested. (It passed the basic unit tests.) Thanks for pointing it out though.

I exclude nodes at the identical location to the search location for semantic reasons. In the application of my project, the input location wasthat of a star (arbitrary node), and we never wanted to return it back. i.e. It’s not technically one of the “nearest neighbours” of a star, butrather the star itself. This should be a trivial change, however, if your application demands it.

Glad the implementation was of help. Also, I’d certainly appreciate a patch/fix for the first issue (FindNearestNNeighbors) if you have one.

maaschn says:February 27, 2012 at 1:39 pm

Hello Noldorin,

thanks for that code. I actually try to export a little program to .NET which I wrote in Matlab a year ago. I used a KD-tree to speed up thedetermination of closest triangles. I was very glad when I found your code but now I stumbled over the fact that the indices from theunsorted raw data array get lost after the construction of the tree. Is that correct or did I miss something?

Thanks again for your work

maaschn

Noldorin says:February 29, 2012 at 12:32 am

Hi maaschn. That’s quite how it’s meant to work. Are these indices ones of an array you input? If so, they are implicit and hence notmeant to be stored. Try making the actual values of the array/collection structs/tuples that contain the relevant indices.

Bernie says:March 6, 2012 at 11:52 pm

There is indeed a bug in Remove(). I’ve attached a fixed version – which I havent tested exhaustively but should work better

BROKEN CODE:if (node.LeftChild == null && node.RightChild == null)return null;

if (node.RightChild != null){node.Value = FindMinimum(node.RightChild, dimension, depth + 1);}else{node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);node.LeftChild = null;}

node.RightChild = Remove(node.Value, node.RightChild, depth + 1);

FIXED CODE:if (node.RightChild != null){node.Value = FindMinimum(node.RightChild, dimension, depth + 1);node.RightChild = Remove(node.Value, node.RightChild, depth + 1);}else if (node.LeftChild != null){node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);node.RightChild = Remove(node.Value, node.LeftChild, depth + 1);node.LeftChild = null;}else{node = null;}

Noldorin says:March 7, 2012 at 12:09 am

Thanks for the suggestion. Would you mind briefly summarising what you observed as the problem with the original code?

Bernie says:

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

4 of 6 22/05/2013 04:29

Page 5: KD-Trees for .NET « Noldorin's Blog

March 7, 2012 at 10:42 am

The original code finds the minimum value in node.LeftChild and copies it up to the pivot node. It then proceeds to remove it fromnode.RightChild (which is always null) instead of node.LeftChild. So you end up with duplicate nodes in the tree.

Bernie says:March 7, 2012 at 10:50 am

There is actually another bug in FindMinumum that causes Remove to corrupt the tree. I dont know how to translate the fix to yourcode – I’ll leave that up to you.

You need to make sure you return the SMALLEST of leftMinValue and rightMinValue. So you need to compare them to each other. Rightnow, you favor leftMinValue.

This can corrupt the tree because you may copy a node up to the pivot that does not respect the splittingDimension (return node.Value).

Noldorin says:March 7, 2012 at 6:37 pm

Okay, so re: the first suggestion I think you’re right actually. You changed my code more than necessary, but I get the idea.

Maybe you could post a reproduction case for the second possible bug? One for the first would be appreciated either; for unit tests, but Ishould be able to do that anyway.

Noldorin says:March 10, 2012 at 9:37 pm

Bernie,

Regarding both bugs, I think I’ve managed to fix them now. Your suggested change for the first issue seems to make sense, so I’veincluded that verbatim. The second wasn’t too tricky, but you can have a look at that too.

Cheers, A.

Bernie says:March 14, 2012 at 5:40 am

Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and myFindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.

Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree,and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. Itdoes not guarantee correctness, but it will quickly catch big issues.

My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equalityrules – so that was easy to implement for me.

Bernie says:March 14, 2012 at 5:40 am

Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and myFindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.

Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree,and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. Itdoes not guarantee correctness, but it will quickly catch big issues.

My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equalityrules – so that was easy to implement for me.

Eric Rini says:April 27, 2012 at 5:31 am

Very nice library! I’m gonna give the author a great big thank you. This is really great.

RSS feed for comments on this post. And trackBack URL.

LEAVE A REPLY Name (required)

Mail (will not be published) (required)

Website

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

5 of 6 22/05/2013 04:29

Page 6: KD-Trees for .NET « Noldorin's Blog

« Films to remember Musical Gem of the Week #14 »

Notify me of follow-up comments by email.

Notify me of new posts by email.

KD-Trees for .NET « Noldorin's Blog http://blog.noldorin.com/2011/03/kd-trees-for-dotnet/

6 of 6 22/05/2013 04:29