28
LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205 Saptak Sen Senior Product Manager Microsoft Technical Computing

LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Embed Size (px)

Citation preview

Page 1: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

LINQ to HPC: Developing “Big Data” Applications on Windows HPC ServerWSV205

Saptak SenSenior Product ManagerMicrosoft Technical Computing

Page 2: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Session Objectives and TakeawaysSession Objective(s):

Understand Microsoft solution for Big DataHow to use and develop LINQ to HPC applicationsDemo of LINQ to HPC/DSC on HPC, Microsoft’s solutions for unstructured Big Data

Key Takeaways:1. LINQ to HPC/DSC provide a highly productive stack for writing

big data applications.2. Demos

1. Data management with DSC2. Application development in LINQ to HPC3. Application management of LINQ to HPC applications

Page 3: LINQ to HPC: Developing Big Data Applications on Windows HPC Server
Page 4: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Characteristics of Big Data

Large Data Volume 100s of TBs to 10s of PBs

Large scale processing and analytics at unprecedented low cost (hardware and software)

New Economics Distributed Parallel Processing

Frameworks Easy to Scale on commodity hardware MapReduce-style programming models

New Technologies

Unstructured Weak relational schema Text, Images, Videos, Logs

Non-Traditional data Types

Sensors Devices Traditional applications Web Servers Public data

New Data Sources

How popular is my product? What is the best ad to serve? Is this a fraudulent transaction?

New Questions & New Insights

4

Page 5: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Example: Traditional e-commerce data flow

5

Page 6: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

New exploratory e-commerce data flow

6

Page 7: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Introduction to LINQ to HPC

Developing Big Data applications for HPC Server

Page 8: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Example: find web pages from many log files

var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line);var user = from access in logentries where access.user.EndsWith(@"\sen") select access;var accesses = from access in user group access by access.page into pages select new UserPageCount(“sen", pages.Key, pages.Count());var htmAccesses = from access in accesses where access.page.EndsWith(".htm") orderby access.count descending select access;

LINQ query transformed into computation graph

Input

Compute

Compute and resort

Compute and resort

Output

2

1

3

4 5

Page 9: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

LINQ to HPC Job Directed Acyclic Graph (DAG) of vertices

Processingvertices

Edges(files)

Inputs

Outputs

Page 10: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Executes DAGs by mapping vertices to Distributed Vertex Hosts

Processingvertices

Edges(files)

Inputs

Outputs

Free Compute Resources

Page 11: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

HPC + LINQ to HPC Job Overview

Application that calls LINQ to

HPC APIs

HPC Head Node

DSC

Submit LINQ to HPC Job

1

1

The LINQ to HPC job also starts a set of parametric

sweep tasks across the rest of the nodes as DVH

2b

A LINQ to HPC job starts 1 basic task

assigning a node as the DGM

2a

2a

LINQ to HPC Vertices read and write files

3b

Graph Manager starts/stops Vertices

3a

HPC Compute Nodes

3a

3b2b

Graph Manager

Vertex Host

Page 12: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

HPC + LINQ to HPC Job Overview

Vertices read and write files

3b

Graph Manager starts/stops Dryad Vertices

3a

HPC Compute Nodes

3a

3b

Graph Manager

Vertex Host

Vertices in logical computation graph

• Graph manager starts vertices on Vertex Hosts

• Preferentially schedules vertices near input files

When input is already on cluster, can make local IO the common case

Page 13: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

More on HPC + LINQ to HPC mechanics

Application that calls LINQ to

HPC APIs

HPC Head Node

DSC

Publish to share:1. binaries for LINQ to HPC job2. XML description of LINQ to

HPC graph

1

1

DVH loads binaries for this LINQ to HPC job from share, executes them according

to commands from DGM

DGM reads XML description of graph from share, calls DSC to locate files referenced in

XML

2a

3b

3a

HPC Compute Nodes

3a

3b2b

LINQ to HPC Graph Manager

LINQ to HPC Vertex Host

The LINQ to HPC job also starts a set of parametric

sweep tasks across the rest of the nodes as DVH

2b

A LINQ to HPC job starts 1 basic task

assigning a node as the DGM

2a

Page 14: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Deployment Steps

DSC NODE ADD sen-cn1 /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:sen-hn

Page 15: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Demo adding a new Node

demo Using the HPC Management Tool

Page 16: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

LINQ to HPC Object Model

Page 17: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Hello World!

using System;using System.Linq;using Microsoft.Hpc.Linq; namespace MyProgram { class Program { static void Main(string[] args) { var config = new HpcLinqConfiguration(“MyHpcClusterHeadNode”); var context = new HpcLinqContext(config);  var lengths = context.FromDsc<LineRecord>("MyTextData") .Select(r => r.Line.Length); Console.WriteLine("The maximum line length is {0}", lengths.Max()); } }}

Page 18: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Analyzing data using LINQ to HPC

demo

Page 19: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Managing data and HPC cluster

HPC Server administration basics: Managing the job queueHow to identify the user that submitted jobsCanceling a runaway job

Data Storage Catalog specific tasks: Monitor disk usage tracked by DSC on each nodeView how the DSC file set maps to NTFS across nodesIdentify the nodes where files are replicated

Page 20: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Quick overview of the software components that made this possible.

HPC provisioning, management, etc.

MPI SOA LINQ to HPC runtime

Windows Server Azure*

Distributed runtimes

Cluster and cloud services

Platform

DSC (Distributed Storage Catalog)

Bind individual NTFS shares together to support the LINQ to

HPC distributed runtime

Programming models LINQ to HPC NEW

* Future support planned

Page 21: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

How LINQ to HPC and Parallel Data Warehouse complement each other

Customer needs for Big Data lie on a spectrumOne extreme is analytics targeting a traditional data warehouse. The analyst knows the cube he or she wants to build, and the analyst knows the data sources.Another extreme is analyzing raw unstructured data. The analyst does not know exactly what the data contains, nor what cube would be justified. The analyst needs to do ad-hoc analyses that may never be run again.

HPC Server targets the raw unstructured data extreme.

Page 22: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Microsoft already has great data platform assets

PowerPivot, SQL Server Integration Services (SSIS), Parallel Data Warehouse (PDW), …HPC+LINQ to HPC’s focus on raw unstructured data analytics enables new solutions that incorporate multiple assets

E.g., analyze raw unstructured data using HPC+LINQ to HPC then pipe it to SSIS and apply rest of BI stack

Page 24: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

For more information

Download HPC Server 2008 R2 Evaluation Copy Today – microsoft.com/hpc

Download Service Pack 2 Beta - connect.microsoft.com

HPC Server Hands-on Labs – microsoft.com/hpc -> Technical Resources

Product Demo Station – in the Server and Cloud Section

HPC Server Certification Exam - microsoft.com/learning/en/us/exam.aspx?ID=70-690

Find Me Later At… twitter: @saptak

Page 25: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

http://northamerica.msteched.com

Connect. Share. Discuss.

Page 26: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

Complete an evaluation on CommNet and enter to win!

Page 27: LINQ to HPC: Developing Big Data Applications on Windows HPC Server

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.

Page 28: LINQ to HPC: Developing Big Data Applications on Windows HPC Server