Upload
raheem
View
48
Download
0
Embed Size (px)
DESCRIPTION
A A R H U S U N I V E R S I T E T Department of Computer Science. Efficient Handling of Massive (Terrain) Datasets. Lars Arge. Massive Data Algorithmics. Massive data being acquired/used everywhere Storage management software is billion-$ industry - PowerPoint PPT Presentation
Citation preview
Lars Arge
1/13
Efficient Handling ofMassive (Terrain) Datasets
Lars Arge
A A R H U S U N I V E R S I T E T Department of Computer Science
Lars Arge
2/13
Massive Data Algorithmics
Massive data being acquired/used everywhere Storage management software is billion-$ industry
Science is increasingly about mining massive data (Nature 2/06)
Examples (2002): Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Google index 8 billion web pages Geography: NASA satellites generate Terrabytes each day
Lars Arge
3/13
Terrain Data
New technologies: Much easier/cheaper to collect detailed data Previous ‘manual’ or radar based methods
−Often 30 meter between data points−Sometimes 10 meter data available
New laser scanning methods (LIDAR)−Less than 1 meter between data points−Centimeter accuracy (previous meter)
Denmark: ~2 million points at 30 meter (<<1GB) ~18 billion points at 1 meter (>>1TB) COWI (and other) now scanning DK NC scanned after Hurricane Floyd in 1999
Lars Arge
4/13
Massive data = I/O-Bottleneck
I/O is often bottleneck when handling massive datasets Disk access is 106 times slower than main memory access!
Disk systems try to amortize large access time transferring large contiguous blocks of data
Need to store and access data to take advantage of blocks!
track
magnetic surface
read/write armread/write head
“The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in
sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a
sharpener on someone else’s desk.” (D. Comer)
Lars Arge
5/13
I/O-efficient Algorithms
Taking advantage of block access important Traditionally algorithms developed
without block considerations I/O-efficient algorithms leads to large
runtime improvements
data size
runnin
g t
ime
Normal algorithm
I/O-efficient algorithm
Main memory size
Lars Arge
6/13
R
A
M
Scalability: Hierarchical Memory
Block access not only important on disk level Machines have complicated memory hierarchy
Levels get larger and slower Block transfers on all levels
L
1L
2
data size
runnin
g t
ime
Lars Arge
7/13
My Research Work
Theoretically I/O- (and cache-) efficient algorithms work Data structures, computational geometry, graph theory, … Focus on Geographic Information Systems problems
Algorithm engineering work, e.g TPIE system for simple, efficient,
and portable implementation ofI/O-efficient algorithms
Software for terrain data processing LIDAR data handling Terrain flow computations
Lars Arge
8/13
Example: Terrain Flow
Terrain water flow has many important applications Predict location of streams, areas susceptible to floods…
Conceptually flow is modeled using two basic attributes Flow direction: The direction water flows at a point Flow accumulation: Amount of water flowing through a point
Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index…
7 am 3pm
Lars Arge
9/13
Terrain Flow Accumulation
Collaboration with environmental researchers at Duke University Appalachian mountains dataset:
800x800km at 100m resolution a few Gigabytes On ½GB machine:
ArcGIS: Performance somewhat unpredictable Days on few gigabytes of data Many gigabytes of data…..
Appalachian dataset would be Terabytes sized at 1m resolution
14 days!!
Lars Arge
10/13
Terrain Flow Accumulation: TerraFlow We developed theoretically I/O-optimal algorithms TPIE implementation was very efficient
Appalachian Mountains flow accumulation in 3 hours!
Developed into comprehensive software package for flow computation on massive terrains: TerraFlow Efficient: 2-1000 times faster than existing software Scalable: >1 billion elements! Flexible: Flexible flow modeling (direction) methods
Extension to ArcGIS
Lars Arge
11/13
LIDAR Terrain Data Work: TerraStream
Now TerraStream software “pipeline” for handling terrain data Points to DEM (incl. breaklines) DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce) DEM flow accumulation (incl river extraction) DEM hierarchical watershed computation
All work for both grid and TIN DEM’s Capable of handling massive datasets
Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)
Lars Arge
12/13
Examples of Ongoing Terrain Work
Terrain modeling, e.g “Raw” LIDAR to point conversion (LIDAR point classification)
(incl feature, e.g. bridge, detection/removal) Further improved flow and erosion modeling (e.g. carving) Contour line extraction (incl. smoothing and simplification) Terrain (and other) data fusion (incl format conversion)
Terrain analysis, e.g Choke point, navigation, visibility, change detection,…
Major grand goal: Construction of hierarchical (simplified) DEM where
derived features (water flow, drainage, choke points)are preserved/consistent
Lars Arge
13/13
Thanks
Lars [email protected]
Work supported in part by US National Science Foundation ESS grant EIA–98070734, RI grant EIA–
9972879, CAREER grant EIA–9984099, and ITR grant EIA–0112849 US Army Research Office grants W911NF-04-1-0278
and DAAD19-03-1-0352 Ole Rømer Scholarship from Danish Science Research Council NABIIT grant from the Danish Strategic Research Council Danish National Research foundation (MADALGO center)