23
Microsoft Big Data platforma Luka Lovošević, Marko Tošić MICROSOFT HRVATSKA

(Windays 13) Microsoft Big Data Platform

Embed Size (px)

DESCRIPTION

Microsoft Big Data Platform Big Data Cloud Azure Hadoop HDInsight Mahout

Citation preview

Page 1: (Windays 13) Microsoft Big Data Platform

Microsoft Big Data platformaLuka Lovošević, Marko Tošić

MICROSOFT HRVATSKA

Page 2: (Windays 13) Microsoft Big Data Platform

Isključite zvuk telefona

Page 3: (Windays 13) Microsoft Big Data Platform

Sadržaj• Uvod u Big Data• Pregled Microsoft platforme• Hadoop• Demo

Page 4: (Windays 13) Microsoft Big Data Platform

Što je Big Data?

Page 5: (Windays 13) Microsoft Big Data Platform

MICROSOFT CONFIDENTIAL – INTERNAL ONLY

Page 6: (Windays 13) Microsoft Big Data Platform

Što je Big Data?Podaci koji su vam bitni, ali ih tradicionalnim alatima ne možete procesirati.

VOLUME(Količina)

VARIETY (Struktura)

VELOCITY (Brzina)

Page 7: (Windays 13) Microsoft Big Data Platform

Izvori podataka

Telematics Text

Smart-Grid Sensor

Time and Place RFID

Telemetry Social Networks

Page 8: (Windays 13) Microsoft Big Data Platform

Što je Big Data?

Napredna analitika

Podaci u realnom vremenu

Analitika društvenih medija

Kako mogu poboljšati poslovanje ovisno o vremenskim prilikama ili tračevima s društvenih mreža, …?

Što se govori o mojem proizvodu na društvenim mrežama?

Kako da bolje uočim trendove i reagiram na njih?

Page 9: (Windays 13) Microsoft Big Data Platform

Big Data algoritmi

Mining Social-Network Graphs

Finding Similar Items Mining Data Streams Frequent Item Sets

Advertising on the Web

Link Analysis

Recommendation SystemsClustering

c

Page 10: (Windays 13) Microsoft Big Data Platform

Microsoft Big Data platforma

Page 11: (Windays 13) Microsoft Big Data Platform

Microsoft Big Data platforma

SQL Server StreamInsight

Hadoop – HDInsight

(Windows ili Azure)

SQL Server 2012 Parallel Data Warehouse

Self-service BI alati

Page 12: (Windays 13) Microsoft Big Data Platform

Microsoft Big Data platforma

Volume

Varie

t

yVelo

city

pull

push

bigsmall

fk/pk

k/v

SQL Server

PDW

HDInsight

StreamInsight

Page 13: (Windays 13) Microsoft Big Data Platform

Malo više o Hadoopu…

Page 14: (Windays 13) Microsoft Big Data Platform

Što je Hadoop?Platforma za procesiranje velike količine podataka.Apache, open source.Baziran na Google GFS i MapReduce algoritmu.Visoko skalabilan i distribuiran.Jeftini hardver.

2013

Yahoo!

EnterpriseHadoop

Apache projekt

2004 2008 2010 20122006

Page 15: (Windays 13) Microsoft Big Data Platform

Hadoop arhitektura

Page 16: (Windays 13) Microsoft Big Data Platform

Server

ServerServer

MapReduce (i)

Files

Server

Page 17: (Windays 13) Microsoft Big Data Platform

MapReduce (ii)

// Map Reduce function in JavaScript

var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};

var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

Code

Page 18: (Windays 13) Microsoft Big Data Platform

Primjer za Map Reduce

Page 19: (Windays 13) Microsoft Big Data Platform

HDInsight

Hadoop

Programiranje u .NET-uSecurity, HA & managementPodrška za virtualizacijuIntegracija s Microsoft BI alatimaIsto iskustvo za on-premise i cloud

Hadoop za Windows ServerHadoop za Windows Azure

Page 20: (Windays 13) Microsoft Big Data Platform

Tehnologija oko HDInsight-a

Page 21: (Windays 13) Microsoft Big Data Platform

MahoutBiblioteka skalabilnih algoritama za strojno učenje baziranih na MapReduceu.Vrti se na Hadoop infrastrukturi.

Scenariji korištenja:• Recommendation mining• Clustering• Classification

Page 22: (Windays 13) Microsoft Big Data Platform

Demo

Mahout song recommendation

Page 23: (Windays 13) Microsoft Big Data Platform

Pitanja i odgovori