GOTO 2011 preso: 3x Hadoop

iso va

n Voll

86.88.37.142 - - [26/Jul/2011:00:01:46 +0200] "GET /nl/index.html?Referrer=ADVNLGOO22901030000bsl HTTP/1.1" 200 15551 "http://www.google.nl/search?sourceid=navclient&aq=0h&oq=b&hl=nl&ie=UTF-8&q=bol.com.nl" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" "DYN_USER_ID=12660142780; DYN_USER_CONFIRM=8bc25ea623423bae5c4ce970faf1b13f4; BOL_RFID=ADVNLGOO1322090000bsl; BUI=86.55.31.109.1278181451852406" 0 "Ti3nysCoEI4AAGMfqZAAAAPD" "-" "325886" "ps316"

Millions of these, each day

Egypt @ Jan 27, 2011

BGP4MP|980099497|A|193.148.15.68|3333|192.37.0.0/16|3333 5378 286 1836|IGP|193.148.15.140|0|0||NAG||

Hundreds of millions of these, each day

the internet works because of these (and cables and routers and money and people and stuff)

Date Node

Name Node

/some/file /foo/bar

HDFS client create file

write data

read data

replicate

Node localHDFS client

read data

Why ?scalable

open sourcecost-efficient

storage and processing

in one

good for analytics: schema-less, unstructured

Not for me...

I don’t have a lot of data.

I surely don’t have a cluster of machines to spare.

I just read the paper.

It’d be cool if I could try this stuff sometime, though...

Free data...

Getting it...

curl -u fzk:secret \https://stream.twitter.com/1/statuses/sample.json \> tweets.json

8 weeks == ~1/4 TB

Tens of millions of these

Good, now the cluster...

http://whirr.apache.org/

Step 1: Configure

Step 2: Launch

Step 3: ?

Step 4: Pay

whirr.service-name=hadoopwhirr.cluster-name=my-clusterwhirr.instance-templates=\1 hadoop-jobtracker+hadoop-namenode, \19 hadoop-datanode+hadoop-tasktracker

whirr.provider=aws-ec2whirr.identity=SECRETwhirr.credential=EVEN-MORE-SECRETwhirr.private-key-file=${sys:user.home}/.ssh/id_rsawhirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub

whirr.hadoop-install-function=install_cdh_hadoopwhirr.hadoop-configure-function=configure_cdh_hadoop

whirr.hardware-id=c1.xlarge

Step 1: Configure

whirr launch-cluster --config cluster.properties

Step 2: Launch

bash .whirr/my-cluster/hadoop-proxy.sh

wait about 20 minutes...

Twitter mentions

What’s up with Microsoft?

Step 3:

“Hello, Oracle”

“Google vs. Microsoft vs. Apple”

“Apache rocks! Oracle not so much...”

“Apple == iAwesome”

Oracle, 1Google, 1Microsoft, 1Apple, 1Apache, 1Oracle, 1Apple, 1

input: text

split words

emit:$WORD, 1for ‘interesting’ words

MAGIC!

map(input record) => (key, value)

ORDER BY key GROUP BY key

reduce(key, values) => (key, value)

Apache: [1]

Apple: [1,1]

Google: [1]

Microsoft: [1]

Oracle: [1,1]

REDUCE

Apache: 1Apple: 2Google: 1Microsoft: 1Oracle: 2

input: text, count

sum values

emit:$KEY, $SUM for all keys

https://github.com/xebia/BigData-University

hadoop jar bigdata-twitter-0.1-SNAPSHOT-job.jar \-Dxebia.twitter.terms=oracle,google,microsoft,apache \s3://training-hdfs/twitter-sample/* /job-output

wait another 20 minutes...

mvn clean install

export HADOOP_CONF_DIR=$HOME/.whirr/my-cluster

hadoop fs -get /job-output/part-r-00000 .

whirr destroy-cluster --config cluster.properties

20110807 apache 220110807 google 42220110807 microsoft 4420110807 oracle 1120110808 apache 2520110808 google 134120110808 microsoft 16020110808 oracle 3720110809 apache 1720110809 google 143120110809 microsoft 18420110809 oracle 4020110810 apache 1220110810 google 168820110810 microsoft 17920110810 oracle 51

From: no-reply-aws@amazon.comSubject: AWS Billing Statement Available

Greetings from Amazon Web Services,

This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:

Total: $218.02

Thank you for using Amazon Web Services.

Sincerely,Amazon Web Services

Step 4: Pay

Q&A@fzk

fvanvollenhoven@xebia.com

GOTO 2011 preso: 3x Hadoop

Technology

GOTO Fast Delivery - GOTO Conferencegotocon.com/dl/goto-aar-2014/slides/AdrianCockcroft_FastDelivery.pdfNon-Destructive Production Updates “Immutable Code” Service Pattern Existing

Goto Market Strategy

Goto night elasticsearch

kennansibusoftball.kounosusoftball.comkennansibusoftball.kounosusoftball.com/2020ibentodankaikanwame… · Y 700 Y 700 * GoTo* Ck (50%) * GoTo* IC Z (50008k X It 500/0) * GoTo* IC

Sam Goto (goto@google.com) Majid Valipour (majidvp@google

Goto devoxx

, Shin Goto , Ichiei

H15.01 goto furoku01_sahalin

DESIGNING FOR RAPID RELEASE - GOTO Blog – GOTO ...gotocon.com/.../slides/SamNewman_DesigningForRapidRelease.pdf · GOTO: Aarhus 2012 @samnewman Manual Circuit Breaker MusikShop

Krok Round... 1x 1x 1x 12x 3x 3x 3x 3x 12x3x 1x KROK ROUND

scoop.or.jp...8 Go To 7 H 22 To 10 1 fGoTo GOTO 1 *350010B, JTB Go To GO TO (10/1 GOTO -to GOTO GOTO GOTO

Summary Goto

Goto Stage

Goto chicago

GoTo USA Focused Delivery Program Assembly · PDF fileUSL00015/04.2013 | Aluminum Framing GoTo Bosch Rexroth Corporation 3 GoTo Focused Delivery Program The GoTo Focused Delivery Program

Goto islands

GOTO Berlin 2016

IS -M R 15 /1 · 2020. 7. 7. · e2 1x 2x f2 1x a2 b2 c2 a2 f2 e2 8 is-pl2-klz 19/10. 6 j2 3x 6x 3x 3x 3x 3x 3x 3x 3x 1x y j2-5 j2-3 j2-1 j2-4 j2-6 3x 3x j2-2 6x j2-5 9 is-pl2-klz

Cardiology PTL for the quiz: goto WiFi and join dlink goto ...€¦ · Cardiology PTL for the quiz: goto WiFi and join dlink goto webpage 192.168.0.191:8080 Dr Justin Ghosh Clinical

Final ppt GOTO