25
Big Data: I Microsoft ima slona za utrku Luka Lovošević, Antonio Faletar Microsoft Hrvatska MICROSOFT HRVATSKA

(ATD 9) Microsoft Big Data Platform

Embed Size (px)

DESCRIPTION

Micorosft Big Data Platform Cloud Azure Hadoop HDInsight Hive Pig Mahout

Citation preview

Page 1: (ATD 9) Microsoft Big Data Platform

Big Data: I Microsoft ima slona za utrkuLuka Lovošević, Antonio FaletarMicrosoft Hrvatska

• MICROSOFT HRVATSKA

Page 2: (ATD 9) Microsoft Big Data Platform

SadržajUvod u Big DataPregled MS platformeHadoopDemo

Page 3: (ATD 9) Microsoft Big Data Platform

Što je Big Data?

Page 4: (ATD 9) Microsoft Big Data Platform

MICROSOFT CONFIDENTIAL – INTERNAL ONLY

Page 5: (ATD 9) Microsoft Big Data Platform

Što je Big Data?Podaci koji su vam bitni, ali ih tradicionalnim alatimane možete procesirati.

VOLUME(Količina)

VARIETY (Struktura)

VELOCITY (Brzina, real-

time)

Page 6: (ATD 9) Microsoft Big Data Platform

Izvori podataka

Logovi Text

Pametne kuće Senzori

Vrijeme i lokacija RFID

Telemetrija Društvene mreže

Page 7: (ATD 9) Microsoft Big Data Platform

Big Data algoritmi

Analiza na društvenim mrežama

Slični artikli (npr. web shop) Real-time analiza Česti skupovi artikala

Reklamiranje na webu

Analiza povezanih pojmova

Sustavi preporukaKlastering (grupiranje)

c

Page 8: (ATD 9) Microsoft Big Data Platform

Microsoft Big Data platforma

Page 9: (ATD 9) Microsoft Big Data Platform

Microsoft Big Data platforma

Hadoop – HDInsight

(Windows ili Azure)

SQL Server 2012 Parallel Data Warehouse

SQL Server StreamInsight

Self-service BI alati

Page 10: (ATD 9) Microsoft Big Data Platform

Malo više o Hadoopu

Page 11: (ATD 9) Microsoft Big Data Platform

Što je Hadoop?Platforma za procesiranje velike količine podataka

Apache, open source

Google GFS i MapReduce

Visoko skalabilan i distribuiran

Commodity hardver

2013

Yahoo!

EnterpriseHadoop

Apache projekt

2004 2008 2010 20122006

Page 12: (ATD 9) Microsoft Big Data Platform

Hadoop arhitektura

Page 13: (ATD 9) Microsoft Big Data Platform

Node

NodeNode

Podaci

Node

MapReduce

Page 14: (ATD 9) Microsoft Big Data Platform

// Map Reduce function in JavaScript

var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};

var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

NodeNode

NodeNode

Program

MapReduce

Page 15: (ATD 9) Microsoft Big Data Platform

Primjer za MapReduce

Page 16: (ATD 9) Microsoft Big Data Platform

Alati za uspješno Hadoopiranje

Page 17: (ATD 9) Microsoft Big Data Platform

Pig

Procesiranje i oblikovanjepodataka

ETL tool

MapReduce

Page 18: (ATD 9) Microsoft Big Data Platform

Hive

Strukturiranje podataka

SQL sintaksa

ODBC, Excel …

MapReduce

Page 19: (ATD 9) Microsoft Big Data Platform

MahoutBiblioteka gotovih algoritama

Strojno učenje (npr. clustering, recommendation, …)

MapReduce

Page 20: (ATD 9) Microsoft Big Data Platform

HDInsight

Hadoop

Programiranje u .NET-uSecurity, HA & managementPodrška za virtualizacijuIntegracija s Microsoft BI alatimaIsto iskustvo za on-premise i cloud

Hadoop za Windows ServerHadoop za Windows Azure

Page 21: (ATD 9) Microsoft Big Data Platform

Demo

Windows Azure HDInsight

Page 22: (ATD 9) Microsoft Big Data Platform

Hadoop 2.0

HortonWorks Stinger inicijativa

Tez (interactive) vs. batch

Streaming (Storm project), itd.

Page 23: (ATD 9) Microsoft Big Data Platform

ZaključakBig data trendHadoop de facto standardWindows Azure HDInsightOpen source

Page 24: (ATD 9) Microsoft Big Data Platform

Pitanja?

Page 25: (ATD 9) Microsoft Big Data Platform

Hvala!