Upload
abzetdin-adamov
View
229
Download
3
Embed Size (px)
DESCRIPTION
Latest Trends in Technology:BigData Analytics, Virtualization, Cloud Computing, Internet of Things (IoT)
Citation preview
Assoc. Prof. Abzetdin ADAMOV
Chair of Computer Engineering Department
http://ce.qu.edu.az/~aadamov
Shahdag, 29 November 2014
Latest Trends in Technology:
BigData Analytics, Virtualization, Cloud
Computing, Internet of Things (IoT)
Content
• Why Data Mining in BigData?
• Internet Statistics
• BigData Infrastructure
• Web Crowlers for Web Analytics
• Natural Language Processing (NLP)
• Virtualization
• Introduction to Cloud Computing
• Introduction to Internet of Things (IoT)
Digital Universe
volume of digital data
• 2008 – 480.000 petabytes (PB)
• 2009 – 800.000 PB
• 2010 – 1200 000 PB or 1.2 zettabyte (ZB)
• 2011 – 1.8 ZB
• 2012 – 2.7 ZB
• 2014 ~ 6.2 ZB
• Expected to reach 35 ZB by 2020
IDC's Digital Universe Study
Big Measures for Big Data
• kilobyte (kB) 103 210
• megabyte (MB) 106 220
• gigabyte (GB) 109 230
• terabyte (TB) 1012 240
• petabyte (PB) 1015 250
• exabyte (EB) 1018 260
• zettabyte (ZB) 1021 270
• yottabyte (YB) 1024 280
Why Data Grows so Fast?
Data sets gathered by ubiquity devices:
• Information-sensing mobile devices,
• Aerial sensory technologies (remote sensing),
• Software logs,
• Cameras,
• Microphones,
• Radio-frequency identification readers,
• wireless sensor networks
Internet is Biggest Country
311
1155
1339
3016
0 1000 2000 3000 4000
million
USA
India
China
Internet
Population
9,62
9,97
17,07
514,45
0 200 400 600
million square km
USA
Canada
Russia
Internet
Area
Internet Penetration
13 12,4
8,2
4,1
1,5 1,1 0,9 0,6 0,59 0,5 0,34 0,32 0,04 0,03 0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
EE LV RU LT BY UA KG AM KZ UZ GE MD AZ TM TJ
Inte
rne
t P
en
etr
ati
on
(%
)
Country Internet Codes
Note: Internet stats for December 2001 Avarage Internet usage ın the world 8% - 500 Million - 2001
Foundations of the Web
34,7
18
15
10
5,2 53 2,7 2,3 2,2 1,8 1,8 1,3 0,9 0,8
02468
10121416182022242628303234363840
EE LV RU LT BY UA KG AZ AM KZ UZ GE MD TM TJ
Country Internet Codes
Inte
rnet
Pen
etr
ati
on
(%
)
Note: Internet stats for December 2004
Foundations of the Web
65,6
59,4 59,2
29,127,1
18,216,2 14,7 13,8 12,3
8,8 7,8 6,6 5,8
1,4
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
EE LV LT BY RU AZ MD UA KG KZ UZ GE TJ AM TM
Country Internet Codes
Inte
rnet
Pen
etr
ati
on
(%
)
Note: Internet stats for September 2009 Avarage Internet usage in the world 21.9%
Foundations of the Web
68,2
59,5
47,1 4644,1
40 39,3
34,1 33,930,9
28,3 26,8
9,2
1,6
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
EE LV LT AM BY AZ RU KG KZ UA MD GE UZ TJ TM
Country Internet Codes
Inte
rnet
Pen
etr
ati
on
(%
)
Note: Internet stats for March 2011 Avarage Internet usage ın the world 30.2%
Foundations of the Web
78
71,1
65,160,6
5047,7 46 45 44,8
34,130,2
28,426
13
5
05
101520253035404550556065707580
EE LV LT AM AZ RU BY KZ MD UA UZ GE UZ TJ TM
Country Internet Code
Inte
rnet
Pen
etr
ati
on
(%
)
Note: Internet stats for June 2012 Avarage Internet usage ın the world 34.3%
http://www.internetworldstats.com
Internet Penetration 79
74
68
54,2 53,3 53,3
46,9 45,5 43,4
39,2 36,5
33,7
21,7
14,5
7,2
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
EE LV LT AZ RU KZ BY GE MD AM UZ UA KG TJ TM
Inte
rne
t P
en
etr
ati
on
(%
)
Country Internet Codes
Note: Internet stats for March 2013 Avarage Internet usage ın the world 39% - 2,7 Billion - 2013
Top 15 Most Popular Social Networking Sites | January 2014
1,310,000,000 - Estimated
Unique Monthly
Visitors | 2 - Compete
Rank
313,000,000 - Estimated
Unique Monthly Visitors | 24 -
Compete Rank
277,000,000 - Estimated
Unique Monthly Visitors | 44 -
Compete Rank
70,500,000 - Estimated
Unique Monthly Visitors | 51 -
Compete Rank
740,000,000 - Estimated
Unique Monthly Visitors
25,500,000 - Estimated
Unique Monthly
Visitors | 346 - Compete
Rank
20,500,000 - Estimated
Unique Monthly
Visitors | 605 - Compete
Rank
19,500,000 - Estimated
Unique Monthly
Visitors | 447 - Compete
Rank
17,500,000 - Estimated
Unique Monthly
Visitors | *NA* - Compete
Rank
12,500,000 - Estimated
Unique Monthly
Visitors | 127 - Compete
Rank
12,000,000 - Estimated
Unique Monthly
Visitors | 617 - Compete
Rank
7,500,000 - Estimated
Unique Monthly
Visitors | 838 - Compete
Rank
5,400,000 - Estimated
Unique Monthly
Visitors | 122 - Compete
Rank
3,000,000 - Estimated
Unique Monthly
Visitors | 451 - Compete
Rank
2,500,000 - Estimated
Unique Monthly
Visitors | 1,596 - Compete
Rank
Social Networking
Why Internet became
so Popular?
?
SUN SUN
1
2
3
4
5
6
7
8
Foundations of the WEB
DNS DNS
DNS
DNS
- Countries, Cities, User Groups, …
Problem with Moore’s Law
• The number of transistors that can be
placed on an integrated circuit doubles
every 18 months to two years
• It’s predicted to reach its limit with existing
technology in 2020
• Cutting the size of a transistor to a single
atom may defeat that concept
• The Digital Universe is growing much
more faster than Processing Power
BigData Infrastructure
Google’s First Data Centers
Google’s first data center
Google New Data Centers
Map of Google Data Centers Worldwide
450,000 servers range upwards of
20 megawatts, which cost on the
order of US$2 million per month in
electricity charges.
Google Data Centers
Google data center in Belgium
Google Data Centers
Google data center
in Finland
Everything as a Service
• Utility computing = Infrastructure as a Service (IaaS) – Why buy machines when you can rent cycles?
– Examples: Amazon’s EC2 (Elastic Compute Cloud), Rackspace, Microsoft Azure
• Platform as a Service (PaaS) – Give me nice API and take care of the
maintenance, upgrades, …
– Example: Google App Engine
• Software as a Service (SaaS) – Just run it for me!
– Example: Gmail, Salesforce
Web Crowlers for Web Analytics
• Indexing
• Searching
• Ranking
• Analysis
• Crowling is Essential Job for all Internet
Giants: Google, Yahoo, Facebook, etc.
Some of available open source crowlers: Apache Nutch, Crawler4j,
Bixo, Heritrix, etc.
Web Crowlers for Web Analytics
• Thanks to Crowlers any website can appear in
search results without doing any extra work.
• Customized Crowling by METATags and
“ROBOTS.TXT”
Natural Language Processing
(NLP)
• Natural Language Processing (NLP)
• Computational Linguistics (CL)
• Machine Translation (MT)
Natural Language Processing
(NLP)
• Multilingual NLP
• Text Mining in Multimedia Networks
• Mining Text Streams
• Text Mining in Social Media
• Cross-Lingual Mining of Text Data
• Contextual analysis of text data
Some of availables NLP tools: NLTK, Apache OpenNLP, MontyLingua,
VisualText, etc.
Data Mining and Knowledge
Discovery
Data-driven Decision Making Model
Virtualization as an
Infrastructure
Hardware
Operating System
App App App
Traditional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
Cloud Computing
Cloud Computing
Cloud Computing
IP Address –
Dotted decimal notation
• 32 bit binary
• Four 8-bit octets
Ex: 11100011010100101001101110110001
11100011 - 01010010 - 10011101 - 10110001
E3 - 52 - 9D - B1
• What’s a subnet ?
– device interfaces with same subnet part of IP address
– can physically reach each other without intervening router
Example: 122.97.211.200
We can view these values in their binary form.
122 97 211 200
01111010 01100001 11010011 11001000
Internet of Things (IoT)
Internet of Things (IoT)
CeDAWI Center
Applied Research Center for Data Analytics
and Web Insights (CeDAWI)
Data Mining and Knowledge
Discovery
• Text Mining from Web
• Natural Language Processing
• Web Crawling
• Large-Scale Data Management and Processing
• Internet Structure Research and Visualization
• Knowledge visualization
• Cluster and Distributed Computings
Thank you
Questions
http://ce.qu.edu.az/~aadamov