48
ၮ⑲⸮ ≞ฎᚪᾢ ମᤖ ẊṚ ⯲〃 2012. 12. 07 ⚾Ί ㉧ http://kse.kaist.ac.kr KAIST ⑲⛂ ⚾ΊḚᾢ৳㉗৺

bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

2012. 12. 07

http://kse.kaist.ac.kr

KAIST

Page 2: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²
Page 3: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

?

•– / , , , ,

– ; , ,

– ; data.gov(data.or.kr), recovery.gov, challenge.gov

•– (Polarization; balkanization)

– ;

– ; ;

3

Page 4: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

What Can We Do?

• ?

• ? ( , , )

2011.11

2012.12

Page 5: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

What Can We Do?

• , , – “ ” ; /– : , , – /

• -– : ,

, , ( )

Page 6: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• “” (Doug Schuler, 1999)

• : – : , MMORPG

–––––

6

Page 7: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• : – , , ,

•– : , – : , – : , – : 2.0, ,

Applications

Content provider

Fixed access

Content

Networking

Radioaccess

Page 8: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

참고문헌

Schuler 1994Social computing, CACM 1994

Wikipedia http://wikipedia.org/wiki/Social_computing

,

Wang et al. 2007 Social computing: From Social Informatics to Social Intelligence, IEEE-IntelSys

8

Page 9: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Social computing: From social informatics to social intelligence, Wang et al., IEEE Intelligent Systems, 22(2), 79-83 2007

Page 10: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•––––

Page 11: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• , ,

• /

• (sentiment analysis/opinion mining)

• …

Page 12: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• LinkedIn’s People You May Know (PYMK)–

• Facebook’s PYMK– ;

• Google/Amazon’s web log analytics–

( , )

DJ Patil, Building Data Science Teams, 2012

12

Page 13: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

“Acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products and navigate the competitive landscape”

DJ Patil, Building Data Science Teams, 2012

/ / Product delivery(e.g.., leveraging

smarts in product features)

/

Page 14: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• :

• : /

• :

Page 15: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•––––

Page 16: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•– , ,

•–– API –––

Page 17: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• (Structured data)–– ( , , )

• (Semi-structured data)–

( )

– ( “self-describing” )

– (XML, JSON )

• (Unstructured data)––

http://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/notes.html

Page 18: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

• : HTTP(Hypertext Transfer Protocol)

•GET

response

• GET /parking/space.asp

200 OK… data data

( / / )

18

Page 19: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

•– IP ,

– (GET, POST )

– ( : 200, : 404)

–– Referrer URL:

– User agent ( )

– ,

64.12.105.154 - - [16/Feb/2001:06:59:35 -0800] "GET /cgi-bin/Count.cgi?df=gecbhome&dd=B HTTP/1.0" 404 21164.12.97.10 - - [16/Feb/2001:06:59:37 -0800] "GET /java/FixFontHeadline.class HTTP/1.0" 200 289864.12.97.9 - - [16/Feb/2001:06:59:43 -0800] "GET /graphics/trombone.gif HTTP/1.0" 200 105064.12.96.206 - - [16/Feb/2001:06:59:58 -0800] "GET /images/joinband.jpg HTTP/1.0" 200 1345764.12.97.9 - - [16/Feb/2001:07:00:30 -0800] "GET /images/parade.jpg HTTP/1.0" 200 22754128.93.11.53 - - [16/Feb/2001:10:20:53 -0800] "GET /schedule.shtml HTTP/1.0" 200 7103128.93.11.53 - - [16/Feb/2001:10:26:48 -0800] "GET /index.shtml HTTP/1.0" 200 8650128.93.11.53 - - [16/Feb/2001:10:21:18 -0800] "GET /about.shtml HTTP/1.0" 200 9151128.93.11.53 - - [16/Feb/2001:10:26:25 -0800] "GET /communty.shtml HTTP/1.0" 200 5731

Page 20: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

•– NAT(Network Address Translation)

Page 21: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

: API

• REST – REpresentational State Transfer– HTTP (client/server + stateless server)

–– URL

– - (State transfer)

– (JSON, XML )

GET http://search.twitter.com/trends.json

Returns the top ten topics that are currently trending on Twitter.

GET Read

POST Create

PUT Update

DELETE Delete

How to access top ten Twitter topics?

21

Page 22: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

: API

• 140

• REST API: , ,

22

Page 23: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

: API

• Twitter REST API v1.1– https://dev.twitter.com/docs/api/1.1

• My Applications (OAuth + Access tokens)– https://dev.twitter.com/apps

use Net::Twitter::Lite;

my $nt = Net::Twitter::Lite->new(consumer_key => my consumer_key',consumer_secret => ‘my consumer secret',access_token => my access token',access_token_secret => my access token secret'

);

// search “google”, return 100 resultsmy $r = eval { $nt->search({ q => "google"}) };

for my $status ( @{$r->{results}} ) {print "$status->{text}\n";

}

Karok smpai lebam with @SyarhDinie @MyoArieff @muhd_google @DzuLHarithGoogle Sources Say Company Didn't Buy ICOA Wireless (Arik Hesseldahl/AllThingsD) http://t.co/neX31U01Wide character in print at test.pl line 14.RT @NMB_gplus: [高野祐衣] 寝 !! http://t.co/p4p9P8vh #nmbLG Optimus G2 Allegedly Packs 2GHz Quad-Core Processor, 5-inch Display http://t.co/txmdmKel #tekfalkeRT @NancyGraceHLN: Did #TotMom detectives overlook Google search on Anthony home computer for suffocation methods?Google compra un proveedor de redes wifi por 308 millones http://t.co/L97itVep

http://search.cpan.org/~mmims/Net-Twitter-3.18004/lib/Net/Twitter.pod

23

Page 24: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

: www.data.go.kr

Page 25: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

• :

• : wget curl

• :––– URL

– URL

– URL

• : Nutch( ), Hetrix( )

25

Page 26: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

Web

URLs crawledand parsed

URLs frontier

Unseen Web

Seedpages

26

Page 27: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

•– ( )– ( )– ( )

•–– GPS

• , ,

Page 28: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

• :

Personal Sensing

Public Sensing

Social Sensing

Page 29: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

SENSE

LEARN

INFORM, SHARE, PERSUASION

Mobile Sensing A

rchitecture

Mobile Computing Cloud

Page 30: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

Nexus One

Galaxy Nexus iPhone4/5 Samsung

Galaxy S3HTC

IncredibleGalaxy

Tab/ iPad2

(GPS/ )

Page 31: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

:

•– ( GPS)– ( )

Page 32: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• (Reliability)– (availability) (fault-tolerance)

• (Scalability)– ( )

• (Extensibility)– , ,

• (Manageability)–

http://www.slideshare.net/cloudera/flume-intro100715

32

Page 33: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

agent

agent

agent

agent

agent

agent

agent

agent

agent

( )

/

Real-timeAggregator

Real-timeAggregator

Real-timeAggregators

Collection Manager

CollectionPlanning

: Flume, Scribe, Chukwa

33

Page 34: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•––––

Review Spotlight: A User Interface for Summarizing User-generated Reviews Using Adjective-Noun Word Pairs, Koji Yatani, Michael Novati, Andrew Trusty, Khai N. Truong, CHI 2011

Page 35: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

?

Page 36: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• (Rivadeneira at al.)– :

– :

– (impression formation)

– (recognizing):

36

Page 37: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

“ ”

Page 38: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•( )

Review Spotlight: A User Interface for Summarizing User-generated Reviews Using Adjective-Noun Word Pairs, Koji Yatani, Michael Novati, Andrew Trusty, Khai N. Truong, CHI 2011

Page 39: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• 8

• 4– , – 30

Page 40: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•–––

•– : (Chinese food)– : (great steak)

Page 41: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

•––

• :

Page 42: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

Review Spotlight, Tag-cloud-like interface

Review Spotlight: A User Interface for Summarizing User-generated Reviews Using Adjective-Noun Word PairsKoji Yatani, Michael Novati, Andrew Trusty, Khai N. Truong, CHI 2011

42

Page 43: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

1. -

2.

3.

4.

43

Page 44: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

1. -– (Part-of-speech tagging)

– ( )

– : “The food is great” => great food

2.– ,

– :

:

: 10~30

44

Page 45: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

3. – SentiWordNet ( , )

– , ,

––– ;

45

Page 46: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

4. ––– 4

46

Page 47: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

Review Spotlight

Page 48: bigdata talk uichin v2 - KAISTkseworkshop.kaist.ac.kr/2012/data/2012_KSE_workshop_lecture2.pdf · What Can We Do? • 3 "Ã ú ."ê"Î þ2Z"î Â $O$B+ò1æ0 Ú ¢.Z$B.³2V"ê"²

• : , / , ,

• : / , , ,

• : , API,