Big Data
刘鹏 管理学博士/教授
上海财经大学信息管理与工程学院
Big Data Visualization — Why a Picture is Worth a Thousand Words
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:41-Page 2
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:41-Page 3
Prof./Dr. LIU Peng
How Big Data Helps Tell the Story
It’s your first visit to Shanghai
You're excited to visit ……
To get from one place to another, you need to take the
local public transit system, the Metro.
There's just one problem: you don't have a map.
Instead of a map, imagine that the person in the
information booth hands you
an alphabetized list of stations, train line names, and
geographic coordinates.
In theory, you have all the information
But in reality, it would be a nightmare.
09:11:41-Page 4
Prof./Dr. LIU Peng
How Big Data Helps Tell the Story
another representation of
the same data - subway
map
All of a sudden,
navigating the Metro
is easy.
取自维基百科中文版
http://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%
96%E9%A1%B5,以下同
09:11:41-Page 5
Prof./Dr. LIU Peng
How Big Data Helps Tell the Story
The subway map doesn't just give you data—
it gives you knowledge.
which line to take
roughly how long it'll take
to get to your destination.
recognize each of the lines
by the name or the color
……
09:11:41-Page 6
Prof./Dr. LIU Peng
How Big Data Helps Tell the Story
The subway map doesn't just give you data—
it gives you knowledge.
which line to take
roughly how long it'll take
to get to your destination.
recognize each of the lines
by the name or the color
……
See pp4-5 at Julie Steele, Beautiful Visualization, O‘Reilly, 2010
09:11:41-Page 7
Prof./Dr. LIU Peng
How Big Data Helps Tell the Story
This simple example illustrates the compelling nature of
visualization.
With a mix of color, layout, markings, and other elements,
a visualization can show us
. in a few seconds
- what plain numbers or text
. might take minutes or hours to convey,
- if we can draw a conclusion from them at all.
上海已开通运营14条线,337座车站
The number of stations would be virtually impossible to
navigate without a map.
09:11:42-Page 8
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:42-Page 9
Prof./Dr. LIU Peng
Trend Spotting
If you've ever used a spreadsheet,
you've experienced first-hand
how hard it can be to spot trends
- in a mass of number-filled cells.
例:第五/六次人口普查数据
根据国家统计局网站数据整理
网站http://www.stats.gov.cn/tjsj/pcsj/
特别说明:
本人所整理或采用的数据和图标,仅用作技术展示目的,不做他用,亦与本人观点无关
部分数据和图表是本人在明尼苏达大学访问期间采集
本特别说明适用于本人的所有课件数据和图表
09:11:42-Page 10
Prof./Dr. LIU Peng
Trend Spotting
2010/2000年全国分年龄、性别的人口 单位:人
年 龄 组 年龄组编号 2010合计 男 女 2000合计 男 女
总 计 0 1,332,810,869 682,329,104 650,481,765 1,242,612,226 640,275,969 602,336,257
0-4岁 1 75,532,610 41,062,566 34,470,044 68,978,374 37,648,694 31,329,680
5-9岁 2 70,881,549 38,464,665 32,416,884 90,152,587 48,303,208 41,849,379
10-14岁 3 74,908,462 40,267,277 34,641,185 125,396,633 65,344,739 60,051,894
15-19岁 4 99,889,114 51,904,830 47,984,284 103,031,165 52,878,170 50,152,995
20-24岁 5 127,412,518 64,008,573 63,403,945 94,573,174 47,937,766 46,635,408
25-29岁 6 101,013,852 50,837,038 50,176,814 117,602,265 60,230,758 57,371,507
30-34岁 7 97,138,203 49,521,822 47,616,381 127,314,298 65,360,456 61,953,842
35-39岁 8 118,025,959 60,391,104 57,634,855 109,147,295 56,141,391 53,005,904
40-44岁 9 124,753,964 63,608,678 61,145,286 81,242,945 42,243,187 38,999,758
45-49岁 10 105,594,553 53,776,418 51,818,135 85,521,045 43,939,603 41,581,442
50-54岁 11 78,753,171 40,363,234 38,389,937 63,304,200 32,804,125 30,500,075
55-59岁 12 81,312,474 41,082,938 40,229,536 46,370,375 24,061,506 22,308,869
60-64岁 13 58,667,282 29,834,426 28,832,856 41,703,848 21,674,478 20,029,370
65-69岁 14 41,113,282 20,748,471 20,364,811 34,780,460 17,549,348 17,231,112
70-74岁 15 32,972,397 16,403,453 16,568,944 25,574,149 12,436,154 13,137,995
75-79岁 16 23,852,133 11,278,859 12,573,274 15,928,330 7,175,811 8,752,519
80-84岁 17 13,373,198 5,917,502 7,455,696 7,989,158 3,203,868 4,785,290
85-89岁 18 5,631,928 2,199,810 3,432,118 3,030,698 1,056,941 1,973,757
90-94岁 19 1,578,307 530,872 1,047,435 783,594 229,758 553,836
95-99岁 20 369,979 117,716 252,263 169,756 51,373 118,383
100岁及以上 21 35,934 8,852 27,082 17,877 4,635 13,242
09:11:47-Page 11
Prof./Dr. LIU Peng
Trend Spotting
spreadsheets aren't quite as easy to interpret.
That's one reason
programs like Microsoft Excel and Apple Numbers
come with built-in capabilities for creating charts.
That census data shown in the table is a lot easier to understand
when we see it in graphical form,
as shown in the next page
When we see a graph like a pie or bar chart,
it's often a lot easier to see how things are changing
over time or on a relative basis.
09:11:47-Page 12
Prof./Dr. LIU Peng
Trend Spotting
你有什么观察?
0
2000
4000
6000
8000
10000
12000
14000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
2010年全国分年龄的人口(万人)
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
男 女
2010年全国分性别的人口(万人)
男女比例102.4:97.6
09:11:47-Page 13
Prof./Dr. LIU Peng
Trend Spotting
2015年10月14日,浙江财经大学经济与国际贸易学院教授谢作诗运用经济学分析工具,在自己的博客中发出《“3000万光棍”是杞人忧天》一文,抛出一系列“骇人听闻”的观点。21日被媒体报道后,迅速引爆舆论场。
09:11:47-Page 14
Prof./Dr. LIU Peng
Trend Spotting
人口学者、高级统计师姚美雄近日在《第一财经日报》撰文指出,光棍危机可能于2020年后全面爆发:以22至26岁代表女性的初婚年龄,以24至28岁代表男性的初婚年龄,到2020年,24至28岁男性有4900万人,而22至26岁女性只有3900万人,男比女多了1000万人,只能向低年龄女性中去择偶。由于低
年龄段男女失衡持续产生、一直处于高位态势,因此这一问题将非常棘手。或会造成早婚、订婚、婚姻买卖现象增多,导致婚姻错位、代际争夺及婚外情、第三者插足、非婚生育、性犯罪等社会现象增加。
如果成年男性分化成了有配偶阶层和无配偶的“剩男”阶层,在整个社会发展水平上了一个台阶后,却有一大群“剩男”无法成家、当婚未能婚,这将极大地损害他们的切身利益,会大大降低这个特殊人群对社会的认同感,甚至将成为他们中少部分人反社会的诱因。
对于以上数据所印证的“光棍危机”,浙江财经学院谢作诗教授撰写了《“3000万光棍”是杞人忧天》一文,在文章开头他便指出“我不否认3000万光
棍的事实,但我否认一定会出现这样严重的社会问题。”文章从经济学角度对这一问题进行分析的同时,出现了“收入低的男人怎么办呢?一种办法是几
个人合起来找一个老婆。这并不是我的异想天开,在那些偏远的穷地方,就有兄弟几个合娶一个老婆的,而且他们还过得其乐融融。”
以上文字均摘录自网络,不代表本人观点!
09:11:47-Page 15
Prof./Dr. LIU Peng
Trend Spotting
How things change over time is critical when making
decisions.
A single data point, by itself, is often insufficient to tell
you how things are going, regardless of whether you're
looking at sales trends or health data.
Next page’s figure shows the U.S. Census Bureau data on new
home sales starting in the year 2000.
If we were to look just at the value for January 2000,
which is 873,000, that wouldn't tell us much by itself.
But when we look at new home sales over time, the story
is crystal clear.
Chart generated via www.census.gov or click here
09:11:47-Page 16
Prof./Dr. LIU Peng
Trend Spotting
U.S. Census new home sales data visualized over time
09:11:48-Page 17
Prof./Dr. LIU Peng
Trend Spotting
资料来源:国家统计局 http://data.stats.gov.cn/workspace/index?m=hgnd
0
1000
2000
3000
4000
5000
6000
7000
2000年 2001年 2002年 2003年 2004年 2005年 2006年 2007年 2008年 2009年 2010年 2011年 2012年 2013年
全国住宅商品房平均销售价格(元/平方米)
09:11:48-Page 18
Prof./Dr. LIU Peng
Trend Spotting
Using this kind of visual trend analysis is a key way
to understand data.
for example,
Investors often evaluate a company’s performance
over time
while sales and profits in the abstract seem to be good,
The sales have been growing less and less each quarter
in reality,
- the company will be headed for bankruptcy.
if it doesn’t find a way to increase profits.
09:11:48-Page 19
Prof./Dr. LIU Peng
Trend Spotting
Internal context is one of the key indicators
managers and investors use to figure out how business is
trending.
external context tells them how they’re doing relative to others
Suppose that sales are down for a given quarter.
Managers might conclude that their company isn’t
executing well.
In reality, however, sales might be off due to larger
industry issues
Without external context, managers have very little insight
into what’s really causing their business to suffer.
09:11:48-Page 20
Prof./Dr. LIU Peng
Trend Spotting
Even when managers have both internal and external context,
it’s still hard for them to tell what’s going on
just by looking at numbers in the abstract.
That’s where
visualizations
can really help.
09:11:48-Page 21
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 22
Prof./Dr. LIU Peng
The Many Types of Visualizations
Nearly every business user is familiar with the well-known pie
chart, bar chart, or line graph.
These forms of visualization are just the tip of the iceberg
when it comes to converting data into its visual equivalent.
There are many other types of visualizations as well.
09:11:48-Page 23
Prof./Dr. LIU Peng
The Many Types of Visualizations
Geographic visualizations are useful for displaying location
information.
Geographic visualizations often have additional
information layered into them.
For example,
- they can show population densities, store locations,
income distributions, weather patterns, and other
kinds of data that are helpful to see on a visual
basis.
The following figure combines geographic information
with weather
09:11:48-Page 24
Prof./Dr. LIU Peng
The Many Types of Visualizations
A visualization that combines geographic and weather data
09:11:48-Page 25
Prof./Dr. LIU Peng
The Many Types of Visualizations
Maps can show routing information,
telling sales people which locations to visit and when
or showing drivers the optimal route to take from one
location to another.
09:11:48-Page 26
Prof./Dr. LIU Peng
The Many Types of Visualizations
Network diagrams show connections and interconnections.
Network diagrams can illustrate the way information flows
in an organization by showing the relationships between
people.
Network diagrams can also show connections in a social
network or connections between different machines in a
computer network.
09:11:48-Page 27
Prof./Dr. LIU Peng
The Many Types of Visualizations
Computer network diagram
六度分隔理论,即“人际关系脉络方面你必然可以通过不超出六位中间人间接与世上任意的TA相识”
09:11:48-Page 28
Prof./Dr. LIU Peng
The Many Types of Visualizations
Time series visualizations illustrate how things change over
time.
A time series chart might show the consumption of natural
resources such as gas, oil, and coal over a period of many
years.
Or it could show sources of revenue.
Time series visualizations can be
combined with geographic
visualizations to show how the
density of populations, or the
earning power of certain
populations, changes over time. http://complexdatavisualized.com/time-series-visualizations-an-overview/
09:11:48-Page 29
Prof./Dr. LIU Peng
The Many Types of Visualizations
Infographics are frequently used for marketing purposes, and
they don't just show data in visual form but they also
incorporate drawings, text, and graphics that tell a story about
the data.
09:11:48-Page 30
Prof./Dr. LIU Peng
The Many Types of Visualizations
Word maps are useful ways to visualize the most frequently
mentioned words in large quantities of text.
Such visualizations make it easy to determine what a
particular body of text is all about.
You can create word maps using a variety of tools.
One easily accessible web tool is called Wordle,
located at wordle.net.
A word map of the Constitution of the United States
09:11:48-Page 31
Prof./Dr. LIU Peng
The Many Types of Visualizations
More and more visualizations are being created that are
dynamic in nature.
Rather than the static,
fixed visualizations of the past,
today's interactive visualizations enable you
to interact with them so that you can change
- the time period viewed,
- zoom in on certain geographic areas for more detail,
- or change the combinations of variables
. included in the visualizations
. to look at the data in a different way.
09:11:48-Page 32
Prof./Dr. LIU Peng
The Many Types of Visualizations
Interactive visualizations combine
the best characteristics of traditional visualizations
- the power of seeing data presented in graphical form
with access to modern, dynamic analytical capabilities
that are easy to use.
Note
Many sites now showcase the incredible range of visualizations being
created on a daily basis.
Two such sites are
visualizing.org and
www.informationisbeautiful.net.
The visualizations on these sites can serve as an excellent source of
inspiration for creating your own compelling visualizations.
09:11:48-Page 33
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 34
Prof./Dr. LIU Peng
How to Create Visualizations
A number of easy-to-use tools are available
to help you create your own visualizations.
both online and in desktop and mobile versions.
Google Public Data Explorer is one great way to get started
with creating visualizations.
https://www.google.com/publicdata/directory,
comes loaded with lots of different types of publicly
available data.
Without installing any software,
you can experiment with a variety of different
visualizations and view changes in various data sets
over time.
09:11:48-Page 35
Prof./Dr. LIU Peng
How to Create Visualizations
There are also online tools available for creating specific types
of visualizations.
for example,
CartoDB (cartodb.com), is a useful tool for creating
geographic visualizations.
HighCharts (www.highcharts.com) is another
visualization resource available online.
desktop software application like
Tableau Desktop or QlikTech’s QlikView.
to access data stored in a data repository like Hadoop,
Microsoft SQL Server, Oracle, Teradata, or other data
sources
09:11:48-Page 36
Prof./Dr. LIU Peng
How to Create Visualizations
These programs can also connect to
file-based data sources like Excel files and text files.
This means you can access
a wide range of different data sources,
as well as data sources stored in multiple data repositories,
and easily visualize the data contained there
Some screenshots of Tableau Desktop (see the next page)
Regardless of the tool you choose,
visualizations make complex data easy to understand.
You can not only use visualizations in your presentations,
but you can also embed them directly into web sites and applications.
09:11:48-Page 37
Prof./Dr. LIU Peng
How to Create Visualizations
Tableau Desktop
The Show Me popup in Tableau Desktop allows users to switch
easily between different kinds of visualizations
Sample sales forecast data shown in Tableau Desktop with a
corresponding visualization shown to the right
09:11:48-Page 38
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 39
Prof./Dr. LIU Peng
Using Visualization to Compress Knowledge
As the saying goes, a picture is worth a thousand words.
But that begs the question of why visualization is so
powerful.
As visualization expert David McCandless puts it,
"visualization is a form of knowledge compression.“
Shorthand
reducing the size of the data, more efficient to store,
Not easier to understand.
A picture can take a large quantity of information and
represent it in a form that's easy to understand.
In Big Data, such pictures are referred to as visualizations.
09:11:48-Page 40
Prof./Dr. LIU Peng
Using Visualization to Compress Knowledge
Subway maps, pie charts, and bar graphs are all forms of
visualization.
Although visualization might seem like an easy problem at
first, it's hard for a few reasons.
First, it's frequently hard to get all the data that people
want to visualize into one place and in a consistent format
Then, the two forms of data might come in slightly
different formats
Alternatively, the names given to particular pieces of data
might be different
09:11:48-Page 41
Prof./Dr. LIU Peng
Using Visualization to Compress Knowledge
The good news is that
modern visualization products can connect directly to a variety of data
sources,
from local files to databases to data stores like Hadoop.
By taking all that data and creating a picture of it,
the data can become more than data.
It can become knowledge that we can act on.
Visualization is a form of knowledge compression
because a seemingly simple image can
take vast amounts of structured or unstructured data
and compress it into a few lines and colors
- that communicate the meaning of all that data
- quickly and efficiently.
09:11:48-Page 42
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 43
Prof./Dr. LIU Peng
Real-Time Visualization
The information most infographics provide
is static in nature,
and are comprised of historical data.
Frequently, infographics take a long time and a lot of hard
work to create:
they require data,
an interesting story to tell,
and a graphics designer
who can present the data in a compelling way.
09:11:48-Page 44
Prof./Dr. LIU Peng
Real-Time Visualization
The work doesn't stop there;
once the graphic is created,
the graphic has real value only if
- it's distributed, promoted, shared, and viewed.
By then of course the data itself may be
weeks or months old.
So what about presenting compelling visualizations of data
that are real-time in nature?
09:11:48-Page 45
Prof./Dr. LIU Peng
Real-Time Visualization
For data to be valuable in real time, three things must happen.
The data itself must be available,
there must be sufficient storage and computer processing
power
to store and analyze the data,
and there must be a compelling way to visualize the data
that doesn't require days or weeks of work.
09:11:48-Page 46
Prof./Dr. LIU Peng
Real-Time Visualization
But such visualization doesn't stop at
displaying real-time information in web sites.
In the future,
not only will we be able to see visual representations of
data on our computers and mobile phones,
we'll also be able to visualize and understand the physical
world better as we move around it.
Google Glass
Augmented visualization may
very well become part of
our daily lives
09:11:48-Page 47
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 48
Prof./Dr. LIU Peng
Why Understanding Images Is Easy for Us and Hard for Computers
Ironically,
while computers excel at processing large amounts of
textual information,
they still struggle with analyzing visual information.
automatically weed out the bad photos and group
related photos together?
automatically figuring out who is in the photos and
sharing copies of those photos with them?
Today, human beings still perform many of these recognition
and matching tasks.
09:11:48-Page 49
Prof./Dr. LIU Peng
Why Understanding Images Is Easy for Us and Hard for Computers
Images can have a lot of variability,
making it difficult to tell
when different images contain the same objects or people.
What’s more, pattern detection is more difficult;
For example,
the word “president” is easy to find in (a) sentence(s)
it’s much harder to recognize the person holding that
title in images.
09:11:48-Page 50
Prof./Dr. LIU Peng
Why Understanding Images Is Easy for Us and Hard for Computers
Having an individual human being characterize images is one
thing.
But what about trying to do it with millions of images?
Today, computers are good at helping us create visualizations.
But tomorrow, as products like Google Glass continue to
evolve, they may also help us better understand visual
information in real time.
09:11:48-Page 51
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 52
Prof./Dr. LIU Peng
The Visualization Multiplier Effect
As we've seen, visualization and data go hand in hand.
computers can act on data with no human involvement
But there remain any number of situations in which humans
are trying to make better decisions based on data.
Just because we have more data available does not mean
that it's easier to produce better insights from that data.
In fact, the opposite may be true.
The more data we have, the more important it becomes
to be able to distill that data into meaningful insights
that we can act on.
Visualizing such data is one of the most powerful
mechanisms we have for doing so.
09:11:48-Page 53
Prof./Dr. LIU Peng
The Visualization Multiplier Effect
Visualization is effective
because our eyes have ultra high throughput into our brains,
as much as a hundred times greater throughput than
some of our other senses.
Visualization can trigger emotional responses.
It can compress vast amounts of data into knowledge we
can use.
Combine the knowledge compression of visualization
with the high throughput of visual delivery
and you get the visualization multiplier effect
—more data absorbed faster.
09:11:48-Page 54
Prof./Dr. LIU Peng
The Visualization Multiplier Effect
Big Data isn't just about the data itself
but about how we communicate it
and what we do with it.
Tools like visualization also mean that Big Data isn't just the
domain of scientists, data analysts, or engineers.
in the form of visualization,
Big Data is everywhere around us,
from the charts we use to make critical business
decisions
to the advertisements we create to communicate our
messages more effectively.
09:11:48-Page 55
Prof./Dr. LIU Peng
The Visualization Multiplier Effect
Social media platforms
are changing the way we communicate
and are enabling the broader distribution
not just of textual information
but of high-impact visual knowledge.
With the right visualization,
data is more than just text or numbers.
The right visualization can tell a story
that has a very real impact
- not just in business but in broader contexts
. such as education and global health as well.
visualization is just one of the many areas in which Big Data
is creating exciting new opportunities
09:11:48-Page 56
Prof./Dr. LIU Peng
Why a Picture is Worth a Thousand Words
How Big Data Helps Tell the Story
Trend Spotting
The Many Types of Visualizations
How to Create Visualizations
Using Visualization to Compress Knowledge
Real-Time Visualization
Why Understanding Images Is Easy for Us and Hard for Computers
The Visualization Multiplier Effect
09:11:48-Page 57
Prof./Dr. LIU Peng
讨论
数据可视化对大数据分析的特殊意义分析
09:11:48-Page 58
Prof./Dr. LIU Peng
Thank you !!! 09:11:48-Page 59