Evolution of Big Data in USA YANG, Haiqin 2013-04-22 1

Preview:

Citation preview

1

Evolution of Big Data in USA

YANG, Haiqin2013-04-22

2

Outline

• Birth: 1880 US census• Adolescence: Big Science• Modern Era: Big Business• Future Landscape• Conclusion

3

The First Big Data Challenge

• 1880 census• 50 million people• Age, gender (sex),

occupation, education level, no. of insane people in household

4

The First Big Data Solution

• Hollerith Tabulating System

• Punched cards – 80 variables

• Used for 1890 census• 6 weeks instead of 7+

years

5

Manhattan Project (1946 - 1949)

• $2 billion (approx. 26 billion in 2013)

• Catalyst for “Big Science”

6

Space Program (1960s)

• Began in late 1950s

• An active area for Big Data nowadays

7

Adolescence: Big Science

8

Big Science

• The International Geophysical Year– An international scientific

project– Last from Jul. 1, 1957 to Dec.

31, 1958

• A synoptic collection of observational data on a global scale

• Implications– Big budgets, Big staffs, Big

machines, Big laboratories

9

Summary of Big Science

• Laid foundation for ambitious projects– International Biological Program– Long Term Ecological Research Network

• Ended in 1974• Many participants viewed it as a failure• Nevertheless, it was a success– Transform the way of processing data– Realize original incentives– Provide a renewed legitimacy for synoptic data

collection

10

Lessons from Big Science

• Spawn new Big Data projects– Weather prediction – Physics research (supercollider data analytics)– Astronomy images (planet detection)– Medical research (drug interaction)– …

• Businesses latched onto its techniques, methodologies, and objectives

11

Modern Era: Big Business

12

Big Science vs. Big Business

• Common– Need technologies to work with data– Use algorithms to mine data

• Big Science– Source: experiments and research conducted in controlled

environments– Goals: to answer questions, or prove theories

• Big Business– Source: transactions in nature and little control– Goals: to discover new opportunities, measure efficiencies,

uncover relationships

13

Current Status• IDC reports

– 2.7 billion terabytes in 2012, up 48 percent from 2011 – 8 billion terabytes in 2015

• Sources– Structured corporate databases– Unstructured data from webpages, blogs, social networking messages,

…– Countless digital sensors

• Business sectors– Retailers: Walmart, Kohl– Logistics companies: UPS– Telecommunication: AT&T, T-Mobile– …

14

Understanding of Big Data (1)

• An avalanche of data available increasing exponentially

• Google CEO Erik Schmidt said“Every two days we create as much information as we did from the dawn of civilization up until 2003. That’s something like five exabytes of data.”

• Farnam Jahanian kicked off a May 1, 2012 briefing, calling data“a transformative new currency for science, engineering, education, and commerce.”

15

Understanding of Big Data (2)

• Farnam Jahanian (NSF)“Big Data is characterized not only by the enormous volume of data but also by the diversity and heterogeneity of the data and the velocity of its generation.”

• Nuala O’Connor Kelly (GE)“it’s the volume and velocity and variety of data… to achieve new results for …”

• Nick Combs (EMC)“It’s needle in a haystack or connecting the dots.”

• Arvind Krishna (IBM) added the fourth V: – Veracity: data in doubt– Describe 'contradictory data,' or noisy data

16

Implications

• Big Science ? – Big budgets, Big staffs, Big machines, Big laboratories

• Farnam Jahanian (NSF)– To drive the creation of new IT products and services– To accelerate the pace of discovery in almost every SE

discipline– To solve the nation’s most pressing challenges

• Response: $200 million Big Data R&D initiative in 2012– Advance in foundational techniques and technologies– Cyberinfrastructure to manage, curate, and serve data to SE

research and education communities– New approaches to education and workforce development– Nurturance of new types of collaborations

17

Future Landscape

18

Data Bases’ View

OLTP / operational

BI / reporting

• After• DB space 2000 - 2010

scalable nonrelational

(“nosql”)

OLTP / operational

BI / reporting

19

Big Medicine• Information

– Related people: Patients, service providers, nurses, physicians, hospital administrators, government, insurance agencies

– A mixture of structured and unstructured data

• Technologies– Dashboard technologies and analytics,

business intelligence, clinical intelligence, revenue cycle management intelligence

• Other factors– Decision support, ease of information

accessibility, quality of care, physician-patient relationship

20

Changes in Algorithms

• Efficiency vs. Effectiveness• Flexible learning algorithms to remove bias• Big Data is at an evolutionary juncture to

improve/replace human judgment• Businesses are seeing the value, but thwarted

by the cost of storage, slower processing speeds, and the flood of the data themselves.

21

Big Data at NASA

• NASA Open Government Plan ver. 2– Managing and processing– Storage– Archiving and Distribution– Analysis– Visualization – Commercial cloud computing services

• Strategy: push from top down and bottom up

22

Conclusions

• The first challenge• The first solution• What is adolescent age?• What is modern era?• What are characteristics?• What is future landscape?• What does NASA do?

Big Business

Big ScienceCensus

23

References• Frank J. Ohlhorst, Big Data Analytics: Turning Big Data into Big

Money, Wiley, 2012.• 1880 census: http://www.1880census.com/• Herman Hollerith: http://en.wikipedia.org/wiki/Herman_Hollerith• Manhattan Project: http://en.wikipedia.org/wiki/Manhattan_Project• Space exploration: http://en.wikipedia.org/wiki/Space_exploration• Big Science: http://en.wikipedia.org/wiki/Big_Science• IBM Research: http://

ibmresearchalmaden.blogspot.hk/2011/09/ibm-research-almaden-centennial.html

• NASA: http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-big-data-today/

Recommended