Upload
sora
View
37
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Spatial Big Data Challenges Intersecting Cloud Computing and Mobility. Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering University of Minnesota www.cs.umn.edu/~shekhar. Shortest Paths. Storing graphs in disk blocks. - PowerPoint PPT Presentation
Citation preview
1
Spatial Big Data ChallengesIntersecting Cloud Computing and Mobility
Shashi Shekhar
McKnight Distinguished University ProfessorDepartment of Computer Science and Engineering
University of Minnesotawww.cs.umn.edu/~shekhar
2
Spatial Databases: Representative Projects
only in old planOnly in new plan In both plans
Evacutation Route Planning
Parallelize Range Queries
Storing graphs in disk blocksShortest Paths
3
Why cloud computing for spatial data?
• Geospatial Intelligence [ Dr. M. Pagels, DARPA, 2006]• Estimated at 140 terabytes per day, 150 peta-bytes annually • Annual volume is 150x historical content of the entire internet• Analyze daily data as well as historical data
•
4
Eco-Routing
U.P.S. Embraces High-Tech Delivery Methods (July 12, 2007) By “The research at U.P.S. is paying off. ……..— saving roughly three million gallons of fuel in good part by mapping routes that minimize left turns.”
• Minimize fuel consumption and GPG emission – rather than proxies, e.g. distance, travel-time– avoid congestion, idling at red-lights, turns and elevation changes, etc.
5
Real-time Real-time and Historic Travel-time, and Historic Travel-time, Fuel ConsumptionFuel Consumption, , GPS GPS TracksTracks
5
6
Eco-Routng Research Challenges
• Frames of Reference– Absolute to moving object based (Lagrangian)
• Data model of lagrangian graphs – Conceptual – generalize time-expanded graph– Logical – Lagrangian abstract data types– Physical – clustering, index, Lagrangian routing algorithms
• Flexible Architecture– Allow inclusion of new algorithms, e.g., gps-track mining– Merge solutions from different algorithms
• Geo-sensing of events, – e.g., volunteered geographic information (e.g., open street map), – social unrest (Ushahidi), flash-mob, …
• Geo-Prediction, – e.g., predict track of a hurricane or a vehicle– Challenges: auto-correlation, non-stationarity
• Geo-privacy
7
Cloud Computing and Spatial Big Data
• Motivation
• Case Study 1: Simpler to Parallelize
• Case Study 2 – Harder
• Case Study 3 – Hardest
• Wrap up
8
Simpler: Land-cover Classification
• Multiscale Multigranular Image Classification into land-cover categories
Inputs Output at 2 Scales
)(2)|()(
)},({maxargˆ
MpenaltyMnobservatiolikelihoodMquality
whereModelqualityodelMModel
9
Parallelization Choice1. Initialize parameters and memory2. for each Spatial Scale3. for each Quad 4. for each Class5. Calculate Quality Measure6 end for Class7. end for Quad8. end for Spatial Scale9. Post-processing
0
0.25
0.5
0.75
1
1 2 4 8Number of Processors
Effic
ienc
y
Class-levelQuad-level
01234567
2 4 8Number of Processors
Spee
dup
Class-levelQuad-level
Input • 64 x 64 image (Plymouth County, MA)• 4 classes (All, Woodland, Vegetated, Suburban)
Language UPC
Platform Cray X1, 1-8 processors)
10
Harder: Parallelizing Vector GIS
•(1/30) second Response time constraint on Range Query• Parallel processing necessary since best sequential computer cannot meet requirement• Blue rectangle = a range query, Polygon colors shows processor assignmentSet of
PolygonsSet of
Polygons
DisplayGraphics Engine
Local Terrain Databas
e
Remote Terrain
Databases
30 Hz. View
Graphics
2Hz. 8Km X 8Km
Bounding Box High
Performance GIS
Component
25 Km X 25 KmBounding Box
11
Data-Partitioning Approach • Initial Static Partitioning • Run-Time dynamic load-balancing (DLB)• Platforms: Cray T3D (Distributed), SGI Challenge (Shared Memory)
12
DLB Pool-Size Choice is Challenging!
13
Hardest – Location Prediction
Nest locationsDistance to open water
Vegetation durability Water depth
14
Ex. 3: Hardest to Parallelize
Name Model
Classical Linear Regression
Spatial Auto-Regression
εx βy
εxβWyy ρ
framework spatialover matrix odneighborho -by- : parameter n)correlatio-(auto regression-auto spatial the:
nnW
• Maximum Likelihood Estimation
• Need cloud computing to scale up to large spatial dataset.• However, computing determinant of large matrix is an open problem!
SSEnn
L 2
)ln(2
)2ln(ln)ln(
2WI
15
Cloud Computing and Spatial Big Data
• Motivation: Spatial Big Data in National Security & Eco-routing
• Case Study 1: Simpler to Parallelize– Map-reduce is okay– Should it provide spatial declustering services?– Can query-compiler generate map-reduce parallel code?
• Case Study 2 – Harder – Need dynamic load balancing beyond map-reduce
• Case Study 3 – Hardest – Need new computer science, e.g.,
• Eco-routing algorithms• determinant of large matrix• Parallel formulation of evacuation route planning
16
Acknowledgments
• HPC Resources, Research Grants– Army High Performance Computing Research Center-AHPCRC– Minnesota Supercomputing Institute - MSI
• Spatial Database Group Members– Mete Celik, Sanjay Chawla, Vijay Gandhi, Betsy George, James Kang,
Baris M. Kazar, QingSong Lu, Sangho Kim, Sivakumar Ravada• USDOD
– Douglas Chubb, Greg Turner, Dale Shires, Jim Shine, Jim Rodgers– Richard Welsh (NCS, AHPCRC), Greg Smith
• Academic Colleagues– Vipin Kumar– Kelley Pace, James LeSage– Junchang Ju, Eric D. Kolaczyk, Sucharita Gopal